https://arxiv.org/api/uGXyJ9rkEpzUtPLGgYKCYdOtFGw2026-03-18T08:45:15Z48240015http://arxiv.org/abs/2603.16850v1Unifying Optimization and Dynamics to Parallelize Sequential Computation: A Guide to Parallel Newton Methods for Breaking Sequential Bottlenecks2026-03-17T17:55:01ZMassively parallel hardware (GPUs) and long sequence data have made parallel algorithms essential for machine learning at scale. Yet dynamical systems, like recurrent neural networks and Markov chain Monte Carlo, were thought to suffer from sequential bottlenecks. Recent work showed that dynamical systems can in fact be parallelized across the sequence length by reframing their evaluation as a system of nonlinear equations, which can be solved with Newton's method using a parallel associative scan. However, these parallel Newton methods struggled with limitations, primarily inefficiency, instability, and lack of convergence guarantees. This thesis addresses these limitations with methodological and theoretical contributions, drawing particularly from optimization. Methodologically, we develop scalable and stable parallel Newton methods, based on quasi-Newton and trust-region approaches. The quasi-Newton methods are faster and more memory efficient, while the trust-region approaches are significantly more stable. Theoretically, we unify many fixed-point methods into our parallel Newton framework, including Picard and Jacobi iterations. We establish a linear convergence rate for these techniques that depends on the method's approximation accuracy and stability. Moreover, we give a precise condition, rooted in dynamical stability, that characterizes when parallelization provably accelerates a dynamical system and when it cannot. Specifically, the sign of the Largest Lyapunov Exponent of a dynamical system determines whether or not parallel Newton methods converge quickly. In sum, this thesis unlocks scalable and stable methods for parallelizing sequential computation, and provides a firm theoretical basis for when such techniques will and will not work. This thesis also serves as a guide to parallel Newton methods for researchers who want to write the next chapter in this ongoing story.2026-03-17T17:55:01ZPhD Dissertation; Stanford UniversityXavier Gonzalez10.25740/vf943fc9855http://arxiv.org/abs/2510.14759v2On the convergence of stochastic variance reduced gradient for linear inverse problems2026-03-17T16:21:01ZStochastic variance reduced gradient (SVRG) is an accelerated version of stochastic gradient descent based on variance reduction, and is promising for solving large-scale inverse problems. In this work, we analyze SVRG and a regularized version that incorporates a priori knowledge of the problem, for solving linear inverse problems in Hilbert spaces. We prove that, with suitable constant step size schedules and regularity conditions, the regularized SVRG can achieve optimal convergence rates in terms of the noise level without any early stopping rules, provided that the truncation level is chosen suitably, and standard SVRG is also optimal for problems with nonsmooth solutions under a priori stopping rules. The analysis is based on an explicit error recursion and suitable a priori estimates on the inner loop updates with respect to the anchor point. Numerical experiments are provided to complement the theoretical analysis.2025-10-16T14:59:11Z29 pages, 2 figuresBangti JinZehui Zhouhttp://arxiv.org/abs/2603.16644v1Perturbation Analysis for Preconditioned Normal Equations in Mixed Precision2026-03-17T15:16:30ZFor real matrices of full column-rank, we analyze the conditioning of several types of normal equations that are preconditioned by a randomized preconditioner computed in lower precision. These include symmetrically preconditioned normal equations, half-preconditioned normal equations, seminormal equations and not-normal equations. Our perturbation bounds are realistic and informative, and suggest that the conditioning depends only mildly on the quality of the preconditioner; however, it does depend on the size of the least squares residual -- even if the normal equations do not originate from a least squares problem. We illustrate that a randomized preconditioner can deliver a solution accuracy comparable to that of Matlab's mldivide command, is efficient in practice, and well-suited to GPU implementations. For the computation of the preconditioner, we propose an automatic selection of the precision, based on a fast condition number estimation in lower precision.2026-03-17T15:16:30ZJames E. GarrisonIlse C. F. Ipsenhttp://arxiv.org/abs/2312.07762v2Interpretable factorization of clinical questionnaires to identify latent factors of psychopathology2026-03-17T15:06:54ZPsychiatry research seeks to understand the manifestations of psychopathology in behavior, as measured in questionnaire data, by identifying a small number of latent factors that explain them. While factor analysis is the traditional tool for this purpose, the resulting factors may not be interpretable, and may also be subject to confounding variables. Moreover, missing data are common, and explicit imputation is often required. To overcome these limitations, we introduce interpretability constrained questionnaire factorization (ICQF), a non-negative matrix factorization method with regularization tailored for questionnaire data. Our method aims to promote factor interpretability and solution stability. We provide an optimization procedure with theoretical convergence guarantees, and an automated procedure to detect latent dimensionality accurately. We validate these procedures using realistic synthetic data. We demonstrate the effectiveness of our method in a widely used general-purpose questionnaire, in two independent datasets (the Healthy Brain Network and Adolescent Brain Cognitive Development studies). Specifically, we show that ICQF improves interpretability, as defined by domain experts, while preserving diagnostic information across a range of disorders, and outperforms competing methods for smaller dataset sizes. This suggests that the regularization in our method matches domain characteristics. The python implementation for ICQF is available at https://github.com/jefferykclam/ICQF.2023-12-12T22:10:38ZKa Chun LamBridget W MahonyArmin RaznahanFrancisco Pereirahttp://arxiv.org/abs/2603.16618v1A Jacobi Field Approach to Splitting Detection in Schrödinger Bridge2026-03-17T14:56:23ZWe study the problem of detecting the onset of path splitting in stochastic interpolation between probability distributions. This question is especially subtle when the target distribution is nonconvex or supported on disconnected components, where interpolating trajectories may separate into distinct branches. Motivated by the stochastic control and Schrödinger bridge viewpoint, we propose a Jacobi field based indicator for identifying candidate splitting times and locations. Our approach is based on the Jacobi field associated with the linearization of an induced interpolating flow. Starting from a stochastic interpolation ansatz, we construct an Eulerian velocity field by conditional averaging and derive its spatial Jacobian in terms of the local posterior geometry of the target sample cloud. This allows us to interpret the symmetric part of the Jacobian as a local strain tensor and to use its spectral structure to quantify the amplification of infinitesimal perturbations along reference trajectories. Numerical experiments on non-convex and disconnected target distributions show that the proposed indicator consistently localizes the emergence of branching regions and captures the temporal development of splitting. These results suggest that Jacobi field analysis provides a natural mathematical framework for studying local instability and splitting phenomena in stochastic interpolation.2026-03-17T14:56:23ZChunhai JiaoJin GuoHaoyan ZhangJinqiao DuanTing Gaohttp://arxiv.org/abs/2503.01057v2Sparse Randomized Approximation of Normal Cycles2026-03-17T14:33:23ZWe extend our work for compression of currents and varifolds to a compression algorithm for the embedded normal cycles representation of shape, restricted to the constant normal kernel case, using the Nystrom approximation in Reproducing Kernel Hilbert Spaces (RKHS) and ridge leverage score (RLS) sampling. Our method comes with theoretical guarantees on the compression error decay, and the approximations are shown to be effective for downstream tasks such as nonlinear shape registration in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework, even for very high compression ratios. The performance of our algorithm is demonstrated on large-scale shape data from modern geometry processing datasets and is shown to accelerate downstream registration tasks significantly.2025-03-02T23:34:30ZAllen PaulNeill CampbellTony Shardlowhttp://arxiv.org/abs/2603.16571v1Splitting horizontal and vertical polynomial order in a compatible finite element discretisation for numerical weather prediction2026-03-17T14:27:25ZThe accurate and efficient representation of atmospheric dynamics remains a central challenge in numerical weather prediction. A particular difficulty arises from the strong anisotropy of the atmosphere, in which horizontal and vertical motions occur on very different length scales, motivating numerical discretisations that can reflect this structure. In this study, we introduce a compatible finite element discretisation of the compressible Boussinesq and compressible Euler equations in which the horizontal and vertical polynomial orders are treated independently.
The split-order discretisation is constructed using a tensor-product framework that preserves the discrete de Rham complex and associated mimetic properties. Its wave-propagation characteristics are examined through a discrete dispersion analysis that extends previous analyses to configurations with differing horizontal and vertical polynomial orders. The results show that increasing horizontal order improves the representation of gravity waves at low and intermediate wavenumbers, while increasing vertical order can degrade dispersion accuracy near the grid scale and introduce spectral gaps.
A series of idealised numerical experiments, including gravity-wave propagation, advective transport, mountain-wave flow, and a global baroclinic-wave test, is used to assess the scheme's accuracy and convergence properties. These experiments demonstrate that increasing the polynomial order in the dominant direction of motion improves convergence, and that increasing the horizontal order yields the greatest gain in accuracy under typical atmospheric conditions. The results indicate that split-order compatible finite element discretisations provide a viable alternative for controlling accuracy and numerical behaviour in atmospheric dynamical cores.2026-03-17T14:27:25ZDaniel WittThomas BendallJemma Shiptonhttp://arxiv.org/abs/2603.16539v1Perturbation Analysis of the QT-Drazin Inverse of Quaternion Tensors via the QT-Product2026-03-17T14:01:12ZThe motivation of this paper is to investigate the perturbation theory for the QT-Drazin inverse of quaternion tensors under the QT-product via the associated $z$-block circulant representation. A fundamental relationship between the QT-Drazin inverse of $\mathtt{bcirc}_z(\mathcal A)$ and the $z$-block circulant form of $\mathcal A^D$ is established. Moreover, the QT-index of a quaternion tensor is characterized by the indices of the diagonal blocks in the corresponding block-diagonalized matrix. As a consequence, a representation of the QT-Drazin inverse in terms of the QT-Moore--Penrose inverse is derived, which offers a practical approach for its direct computation in MATLAB. Furthermore, a decomposition theory for the QT-Drazin inverse is developed by combining the structure of $z$-block circulant matrices with the Jordan decomposition of quaternion matrices. Numerical examples are provided to demonstrate the theoretical results and computational feasibility.2026-03-17T14:01:12ZYue ZhaoDaochang ZhangJingqian LiDijana Mosichttp://arxiv.org/abs/2503.14978v2Inferring diffusivity from killed diffusion2026-03-17T14:00:23ZWe consider diffusion of independent molecules in an insulated Euclidean domain with unknown diffusivity parameter. At a random time and position, the molecules may bind and stop diffusing in dependence of a given `binding potential'. The binding process can be modeled by an additive random functional corresponding to the canonical construction of a `killed' diffusion Markov process. We study the problem of conducting inference on the infinite-dimensional diffusion parameter from a histogram plot of the `killing' positions of the process. We show first that these positions follow a Poisson point process whose intensity measure is determined by the solution of a certain Schrödinger equation. The inference problem can then be re-cast as a non-linear inverse problem for this PDE, which we show to be consistently solvable in a Bayesian way under natural conditions on the initial state of the diffusion, provided the binding potential is not too `aggressive'. In the course of our proofs we obtain novel posterior contraction rate results for high-dimensional Poisson count data that are of independent interest. A numerical illustration of the algorithm by standard MCMC methods is also provided.2025-03-19T08:16:16Z33 pages, to appear in the Annals of StatisticsRichard NicklFanny Seizilleshttp://arxiv.org/abs/2603.16516v1Neural network parametrized level sets for image segmentation2026-03-17T13:41:42ZThe Chan-Vese functionals have proven to by a first-class method for segmentation and classification. Previously they have been implemented with level-set methods based on a pixel-wise representation of the level-sets. Later parametrized level-set approximations, such as splines, have been studied. In this paper we consider neural networks as parametrized approximations of level-set functions. We show in particular, that parametrized two-layer networks are most efficient to approximate polyhedral segments and classes. We also prove the efficiency for segmentation and classification.2026-03-17T13:41:42ZOtmar ScherzerCong ShiThi Lan Nhi Vuhttp://arxiv.org/abs/2602.13472v2Non-Uniform Quantum Fourier Transform2026-03-17T12:52:00ZThe Discrete Fourier Transform (DFT) is central to the analysis of uniformly sampled signals, yet many practical applications involve non-uniform sampling, requiring the Non-Uniform Discrete Fourier Transform (NUDFT). While quantum algorithms for the standard DFT are well established, a corresponding framework for the non-uniform case remains underdeveloped. This work introduces a quantum algorithm for the Non-Uniform Quantum Fourier Transform (NUQFT) based on a low-rank factorization of the NUDFT matrix. The factorization is translated into an explicit quantum construction using block encodings, Quantum Signal Processing, and the Linear Combination of Unitaries framework, yielding an $ε$-accurate block encoding of the NUDFT matrix with controlled approximation error from both classical truncation and quantum implementation. Under standard oracle access assumptions for non-uniform sampling points, we derive explicit, non-asymptotic gate-level resource estimates. The resulting complexity scales polylogarithmically with target precision, quadratically with the number of qubits through the quantum Fourier transform, and logarithmically with a geometry-dependent conditioning parameter induced by the non-uniform grid. This establishes a concrete and resource-efficient quantum analogue of the NUDFT and provides a foundation for quantum algorithms on irregularly sampled data.2026-02-13T21:26:02Zv2 includes numerical results in Section 6Junaid AftabYuehaw KhooHaizhao Yanghttp://arxiv.org/abs/2603.16452v1The peak heat flux conjecture for the first Dirichlet eigenmode of convex planar domains2026-03-17T12:35:30ZIn this paper, we study the scale-invariant quantity \[\mathcal{G}(Ω)=\frac{\|\partial_n u_1\|_{L^\infty(\partialΩ)}}{λ_1},\]where $u_1$ is the first $L^2$-normalized Dirichlet Laplace eigenfunction of a Euclidean domain $Ω$ and $λ_1$ is its eigenvalue. This is related to the peak boundary heat flux in the long time limit. For convex domains we prove that $\|\partial_n u_1\|_{L^\infty(\partialΩ)}$ is upper-bounded by a (domain-independent) constant multiple of $λ_1$. Using layer potentials, we derive shape-derivative formulae for efficient gradient computations. When combined with high-order Nyström discretization, a fast boundary integral equation solver, and eigenvalue rootfinding, this allows us to numerically optimize $\mathcal{G}$ over a class of rounded polygonal discretized domains. Based on extensive numerical experiments, we then conjecture that, over the set of convex domains, $\mathcal{G}$ is maximized by the semidisk, with the peak flux at the center of the diameter. To lend analytical support to this conjecture, we prove that the semidisk is a critical point of $\mathcal{G}$ under infinitesimal perturbations of its circular arc.2026-03-17T12:35:30ZZijian WangJeremy G. HoskinsManas RachhAlex H. Barnetthttp://arxiv.org/abs/2603.16424v1Early-Terminable Energy-Safe Iterative Coupling for Parallel Simulation of Port-Hamiltonian Systems2026-03-17T11:59:30ZParallel simulation and control of large-scale robotic systems often rely on partitioned time stepping, yet finite-iteration coupling can inject spurious energy by violating power consistency--even when each subsystem is passive. This letter proposes a novel energy-safe, early-terminable iterative coupling for port-Hamiltonian subsystems by embedding a Douglas--Rachford (DR) splitting scheme in scattering (wave) coordinates. The lossless interconnection is enforced as an orthogonal constraint in the wave domain, while each subsystem contributes a discrete-time scattering port map induced by its one-step integrator. Under a discrete passivity condition on the subsystem time steps and a mild impedance-tuning condition, we prove an augmented-storage inequality certifying discrete passivity of the coupled macro-step for any finite inner-iteration budget, with the remaining mismatch captured by an explicit residual. As the inner budget increases, the partitioned update converges to the monolithic discrete-time update induced by the same integrators, yielding a principled, adaptive accuracy--compute trade-off, supporting energy-consistent real-time parallel simulation under varying computational budgets. Experiments on a coupled-oscillator benchmark validate the passivity certificates at numerical roundoff (on the order of 10e-14 in double precision) and show that the reported RMS state error decays monotonically with increasing inner-iteration budgets, consistent with the hard-coupling limit.2026-03-17T11:59:30ZQi WeiJianfeng TaoHongyu NieWangtao Tanhttp://arxiv.org/abs/2406.09932v3Compression of Currents and Varifolds2026-03-17T11:39:13ZWe derive an algorithm for compression of the currents and varifolds representations of shapes, using ridge leverage score (RLS) sampling, and the theory of Nystrom approximation in Reproducing Kernel Hilbert Spaces. Our method is faster than existing compression techniques and comes with theoretical guarantees on the rate of decay of the compression error as a function of the smoothness of the associated shape representation. The obtained compressions are shown to be useful for accelerating downstream tasks such as nonlinear shape registration in the Large Deformation Diffeomorphic Metric Mapping (LDDMM) framework without loss of quality, even for very high compression ratios. The performance of our algorithm is demonstrated on large-scale shape data from modern geometry processing datasets, and is shown to be fast and scalable with rapid error decay.2024-06-14T11:26:04ZSIAM Journal on Imaging Sciences (2026) 19, 327-363Allen PaulNeill CampbellTony Shardlow10.1137/24M1699656http://arxiv.org/abs/2603.16393v1Robust Physics-Guided Diffusion for Full-Waveform Inversion2026-03-17T11:27:37ZWe develop a robust physics-guided diffusion framework for full-waveform inversion that combines a score-based generative prior with likelihood guidance computed through wave-equation simulations. We adopt a transport-based data-consistency potential (Wasserstein-2), incorporating wavefield enhancement via bounded weighting and observation-dependent normalization, thereby improving robustness to amplitude imbalance and time/phase misalignment. On the inference side, we introduce a preconditioned guided reverse-diffusion scheme that adapts the guidance strength and spatial scaling throughout the reverse-time dynamics, yielding a more stable and effective data-consistency guidance step than standard diffusion posterior sampling (DPS). Numerical experiments on OpenFWI datasets demonstrate improved reconstruction quality over deterministic optimization baselines and standard DPS under comparable computational budgets.2026-03-17T11:27:37ZJishen PengEnze JiangZheng MaXiongbin Yan