https://arxiv.org/api/GGZdKURXV9RIgNMcBk49xyotfU8 2026-06-21T10:23:05Z 49870 60 15 http://arxiv.org/abs/2606.18551v1 Scalar-Tracking SAV Schemes with Pullback Corrections for Gradient Flows 2026-06-17T00:00:55Z

The scalar auxiliary variable (SAV) method constructs linear, unconditionally energy-stable time discretizations of gradient flows. In a first-order SAV step, eliminating the auxiliary variable shows that the state equation is a semi-implicit update augmented by a rank-one positive semidefinite correction from the previous nonlinear force. The multiple-SAV (MSAV) method produces this correction componentwise, yielding a correction of rank up to the number of energy components. This separates two mechanisms usually coupled in MSAV: the number of scalar variables tracking the nonlinear energy and the rank of the correction applied to the state equation. We introduce a pullback-corrected SAV (PB-SAV) family that keeps a single scalar auxiliary variable but replaces the rank-one SAV correction by the pullback correction induced by an admissible component decomposition. The correction remains positive semidefinite, has rank at most the number of components, and may change from step to step without changing the scalar energy tracker. We prove modified-energy dissipation laws for fixed and step-dependent decompositions, derive a refinement identity whose gain is an explicit weighted variance, and give a Sherman-Morrison-Woodbury implementation of the low-rank perturbation of the standard semi-implicit solve. We also show, in finite dimensions, that the pullback correction is the Gauss-Newton matrix of a least-squares representation of the nonlinear energy. Numerical experiments on finite-dimensional gradient flows, Allen-Cahn dynamics, and nonlocal Cahn-Hilliard models illustrate regimes in which PB-SAV mainly changes the first-order error constant and regimes in which it substantially improves trajectory accuracy.

2026-06-17T00:00:55Z Shiheng Zhang Jie Shen http://arxiv.org/abs/2512.06166v2 A polynomial dimension-dependence analysis of Bramble--Pasciak--Xu preconditioners 2026-06-16T22:26:43Z

We investigate the dimension dependence of Bramble--Pasciak--Xu (BPX) preconditioners for high-dimensional partial differential equations and establish that the condition numbers of BPX-preconditioned systems grow only polynomially with the spatial dimension. Our analysis requires a careful derivation of the dimension dependence of several fundamental tools in the theory of finite element methods, including elliptic regularity, the Bramble--Hilbert lemma, trace inequalities, and inverse inequalities. We further analyze an averaged Scott--Zhang-type quasi-interpolation operator, and show that its associated constants scale polynomially with the dimension. Building on these ingredients, we prove a multilevel norm equivalence theorem and derive a BPX preconditioner with explicit polynomial bounds on its dimensional dependence. The analysis is motivated in part by recent tensor and quantum finite element methods, where dimension-explicit conditioning estimates for BPX preconditioners play an important role.

2025-12-05T21:24:30Z 33 pages, 0 figures Boou Jiang Jongho Park Jinchao Xu http://arxiv.org/abs/2504.03990v3 Parametric Operator Inference to Simulate the Purging Process in Semiconductor Manufacturing 2026-06-16T21:02:22Z

This work presents the application of parametric Operator Inference (OpInf) -- a nonintrusive reduced-order modeling (ROM) technique that learns a low-dimensional representation of a high-fidelity model -- to the numerical model of the purging process in semiconductor manufacturing. Leveraging the data-driven nature of the OpInf framework, we aim to forecast the flow field within a plasma-enhanced chemical vapor deposition (PECVD) chamber using computational fluid dynamics (CFD) simulation data. Our model simplifies the system by excluding plasma dynamics and chemical reactions, while still capturing the key features of the purging flow behavior. The parametric OpInf framework learns nine ROMs based on varying argon mass flow rates at the inlet and different outlet pressures. It then interpolates these ROMs to predict the system's behavior for 25 parameter combinations, including 16 scenarios that are not seen in training. The parametric OpInf ROMs, trained on 36\% of the data and tested on 64\%, demonstrate accuracy across the entire parameter domain, with a maximum error of 9.32\%. Furthermore, the ROM achieves an approximate 142-fold speedup in online computations compared to the full-order model CFD simulation. These OpInf ROMs may be used for fast and accurate predictions of the purging flow in the PECVD chamber, which could facilitate effective particle contamination control in semiconductor manufacturing.

2025-04-04T23:08:38Z 18 pages, 11 figures Seunghyon Kang Hyeonghun Kim Boris Kramer http://arxiv.org/abs/2509.22846v3 General Framework and Error Estimates for ROM-accelerated Fixed Point Iterations 2026-06-16T20:50:12Z

Whether it is for solving nonlinear equations, optimization problems, or autonomous dynamical systems, fixed-point-type iterations are widely used in numerical sciences. On-the-fly reduced-order modelling (ROM) enables the construction of a low-dimensional, self-correcting approximation of the solution to this system during the iterative process, while removing the need to do an offline training phase and any dependence on a precomputed reduced basis (e.g., a fixed geometry or mesh). This technique has been used in specific fields before, including fluid-structure interactions and topology optimization, but no general study of this method has been done to the knowledge of the authors. A general method for accelerating fixed point schemes will be presented. We show that when the iteration mapping is contractive, the error of the approximate solution is guaranteed to be within the user-defined tolerance using inexact fixed-point theory. This methodology is then applied to the solution of systems of PDEs with a block Gauss-Seidel scheme. Errors due to the ROM are propagated through each iteration with respect to the computational graph of the system, which allows one to estimate whether the current iteration is still within the user-defined tolerance. Some working hypotheses necessary to observe a significant speedup and the limitations of the method are explored as well. As a numerical illustration, the methodology is applied to a multiphysics lid-driven cavity flow in two and three dimensions.

2025-09-26T18:56:45Z Philippe-André Luneau Jean Deteix http://arxiv.org/abs/2606.18463v1 Mixed-Precision Communication-Avoiding SGD for Generalized Linear Models on GPUs 2026-06-16T20:14:34Z

Distributed stochastic gradient descent (SGD) is limited by communication rather than computation, since each iteration requires an AllReduce across processes. Communication-avoiding SGD (CA-SGD) amortizes communication over $s$ iterations by replacing $s$ consecutive AllReduces with a single AllReduce of an $sb\times sb$ Gram matrix, trading more computation and bandwidth for fewer synchronization points. Modern GPUs with matrix hardware and reduced-precision formats offset this by accelerating the Gram GEMM and shrinking BF16 traffic. We study mixed-precision CA-SGD for generalized linear models on NVIDIA GPUs. Our finite-precision analysis decomposes the local rounding error of one CA-SGD outer iteration into nine independent precision choices, depending on the hardware only through its low-precision unit roundoffs, so the resulting recipes transfer in principle across GPU generations. The recipe stores the input matrix and margin vector in low precision, computes the Gram matrix from low-precision inputs with high-precision accumulation, communicates it in high precision, and performs the inner recurrence and weight updates in high precision. On NERSC Perlmutter A100 GPUs, mixed-precision CA-SGD matches FP32 SGD loss within $0.5\%$ on logistic, linear, and Poisson problems and reaches $5.1$--$6.8\times$ speedup over FP32 SGD on epsilon, SUSY, HIGGS, synth, and Poisson-synth. Our software is available at https://doi.org/10.5281/zenodo.20448273

2026-06-16T20:14:34Z Aditya Devarakonda Irene Simó Muñoz Giulia Guidi http://arxiv.org/abs/2510.24679v4 Kemeny's constant minimization for reversible Markov chains via structure-preserving perturbations 2026-06-16T19:09:28Z

Kemeny's constant measures the efficiency of a Markov chain in traversing its states. We investigate whether structure-preserving perturbations to the transition probabilities of a reversible Markov chain can improve its connectivity while maintaining a fixed stationary distribution. Although the minimum achievable value for Kemeny's constant can be estimated, the required perturbations may be infeasible. We reformulate the problem as an optimization task, focusing on solution existence and efficient algorithms, with an emphasis on the problem of minimizing Kemeny's constant under sparsity constraints.

2025-10-28T17:44:33Z Fabio Durastante Miryam Gnazzo Beatrice Meini http://arxiv.org/abs/2606.18411v1 Numerically Stable Cholesky-QR on GPU via Mixed-Precision Randomized Preconditioning 2026-06-16T19:04:53Z

Cholesky-QR is among the fastest algorithms for computing the thin QR factorization of tall-and-skinny matrices on GPUs, relying entirely on BLAS-3 operations. However, it is numerically unstable: forming the Gram matrix squares the condition number, causing breakdown when $κ_2(\boldsymbol{A}) \gtrsim 10^8$. We present MRCQR (Mixed-Precision Randomized Cholesky-QR), a stable GPU algorithm that addresses this limitation. MRCQR uses a subsampled randomized trigonometric transform to construct a preconditioner $\boldsymbol{R}_s$ that reduces $κ_2(\boldsymbol{A}\boldsymbol{R}_s^{-1})$ to near unity with high probability, then applies Cholesky-QR in double precision to the preconditioned matrix. The key insight -- supported by perturbation analysis -- is that the preconditioner requires far less accuracy than the final result: single (FP32) precision suffices when $κ_2(\boldsymbol{A}) \lesssim 10^8$, and half (FP16) when $κ_2(\boldsymbol{A}) \lesssim 10^4$. MRCQR produces an explicit orthogonal factor $\widehat{\boldsymbol{Q}}$ satisfying $\|\boldsymbol{I} - \widehat{\boldsymbol{Q}}^\top\widehat{\boldsymbol{Q}}\|_2 = \cal O(\mathbf{u})$ ($\mathbf{u} \approx 10^{-16}$, double-precision unit roundoff) for condition numbers up to $10^{16}$, far beyond the $10^8$ limit of CholQR2. Experiments on an NVIDIA H100 GPU show that MRCQR (FP16) outperforms rand-cholQR by $1.4$--$1.8\times$ across all tested column counts and is $1.8$--$13.5\times$ faster than cuSOLVER geqrf, while the FP16 sketch (used when $κ_2(\boldsymbol{A}) \lesssim 10^4$) is $2\times$ cheaper than FP64 at no accuracy cost.

2026-06-16T19:04:53Z James E. Garrison Chao Chen Ilse C. F. Ipsen http://arxiv.org/abs/2606.18404v1 Two-level convergence of Algebraic Multigrid with Overlapping Smoothers and Spectral Coarse Grids 2026-06-16T18:52:29Z

We recently developed the least-squares algebraic-multigrid domain-decomposition (LS-AMG-DD) solver as an algebraic multilevel method for sparse symmetric positive definite matrices that admit a Gram representation $A=G^{\top}G$ \cite{southworth2026lsamgdd}. Many problem classes admit such structure, including many conforming finite-element discretizations. The solver constructs coarse spaces from local eigenproblems on nonoverlapping, algebraic aggregates and uses Schwarz-type smoothers on the induced overlapping subdomains. This paper develops a novel two-level convergence theory for this solver. Our theory shows that the solver's coarse space satisfies a weak approximation property in a norm induced by an aggregate-wise block-Jacobi smoother, and moreover, that the corresponding approximation constant is bounded by a user-controlled local spectral cutoff threshold. We combine this approximation property with standard sharp theory for multiplicative two-level cycles. The resulting two-level bound is cleanly factored by the cutoff threshold and a smoother norm-comparison constant; we derive explicit bounds for this constant for block Jacobi and overlapping additive Schwarz smoothers. We also develop a new convergence bound for additive Schwarz methods in terms of a trivially computable constant that is bounded above by the coloring constant. Numerical experiments on scalar $H^1$, vector $H(\operatorname{div})$, and vector $H(\operatorname{curl})$ finite-element problems provide supporting evidence for the theory, including evidence for the solver's insensitivity to mesh refinement and polynomial degree.

2026-06-16T18:52:29Z O. A. Krzysik B. S. Southworth H. Al Daas http://arxiv.org/abs/2606.18221v1 LGNO: A Local-Global Neural Operator for Hyperbolic Conservation Laws 2026-06-16T17:49:09Z

Solutions of hyperbolic conservation laws exhibit both smooth structures across large scales and sharp localized features such as shocks and contact discontinuities, making them difficult to approximate accurately with existing neural operators. The Fourier Neural Operator (FNO) captures long-range interactions well but tends to smear localized structures through excessive numerical dissipation. To address this, we propose a Local-Global Neural Operator (LGNO) that learns a one-step discrete flow map by combining a global FNO branch for representing smooth dynamics at large scales with a local multiresolution branch for enhancing localized discontinuities and nonsmooth features. The model is trained with a one-step loss that combines a physical space prediction term and a spectral penalty on high frequencies to suppress spurious oscillations near steep fronts. On a large collection of benchmarks in one and two dimensions, LGNO consistently outperforms FNO baselines with matched parameter counts, reducing one-step errors by factors of 2-5 and remaining significantly more accurate over long autoregressive rollouts. Most strikingly, although it is trained only on short-time data from a high-order WENO-Z scheme, the long-time rollout of LGNO on a coarse $256^2$ grid exhibits lower numerical dissipation than the same WENO-Z scheme run on a finer $512^2$ grid, while being orders of magnitude cheaper to evaluate. These results suggest that, with an appropriate architecture and training objective, learned operators can effectively learn discrete flow maps. They further suggest that such learned operators have the potential to control long-time numerical dissipation better than the conventional shock-capturing schemes that generate the training data.

2026-06-16T17:49:09Z Hao Wang Chi-Wang Shu Qi Tang http://arxiv.org/abs/2505.19222v3 Asymptotic numerical hypocoercivity of the space-time discontinuous Galerkin method for the Kolmogorov equation 2026-06-16T17:39:27Z

We are concerned with discretisations of the classical Kolmogorov equation by a standard space-time discontinuous Galerkin method. {The} Kolmogorov equation serves as simple, yet rich enough in the present context, model problem for a wide range of kinetic-type equations: although it involves diffusion in one of the two spatial dimensions only, the combined nature of the first order transport/drift term and the degenerate diffusion are sufficient to `propagate dissipation' across the spatial domain in its entirety. This is a manifestation of the celebrated concept of hypocoercivity, a term coined and studied extensively by Villani in \cite{villani}. We show that the {classical} space-time discontinuous Galerkin method {admits} a corresponding hypocoercivity property at the discrete level, asymptotically for large times. To the best of our knowledge, this is the first result of this kind for any standard Galerkin scheme. This property is shown by proving one part of a discrete inf-sup-type stability result for the method in a family of norms dictated by a modified scalar product motivated by the theory in \cite{villani}. This family of norms contains the full gradient of the numerical solution, thereby allowing for a full spectral gap/Poincaré-type inequality at the discrete level, thus, showcasing a subtle, discretisation-parameter-dependent, numerical hypocoercivity property. Further, we show that the space-time discontinuous Galerkin method is inf-sup stable in the family of norms containing the full gradient of the numerical solution, which may be a result of independent interest.

2025-05-25T16:36:09Z Zhaonan Dong Emmanuil H. Georgoulis Philip J. Herbert http://arxiv.org/abs/2606.18200v1 A Diagnostic Software Suite for Auditing Learned PDE Simulators 2026-06-16T17:30:25Z

Learned PDE simulators are increasingly used as low-cost replacements for expensive numerical solvers, but standard relative $L^2$ error does not determine whether a learned model behaves as a coherent numerical time propagator. This paper presents a diagnostic software suite for auditing learned PDE simulators as approximate evolution operators. The suite provides architecture-independent, post hoc diagnostics for relative state error, semigroup consistency, finite-difference generator discrepancy, energy behavior, integral balance, admissibility constraints, perturbation response, and scaling-law consistency. The software is designed around a minimal contract: reference trajectories, a learned propagator or saved predictions, equation metadata, and a diagnostic configuration specifying which structures are meaningful for the problem under study. We validate the suite on five benchmark PDE tasks: two-dimensional incompressible Navier-Stokes, shallow-water dynamics, active matter, three-dimensional compressible Navier-Stokes, and three-dimensional magnetohydrodynamics, using FNO, DeepONet, U-Net, and ResNet-style surrogate models together with controlled underfit and oversmoothed variants. The validation study shows that relative $L^2$ error can remain moderate, or even improve, while structural diagnostics deteriorate substantially. The package therefore supports software-level auditing of learned PDE simulators by reporting an interpretable diagnostic panel rather than collapsing model behavior into a single state-error score.

2026-06-16T17:30:25Z 33 pages, 12 tables. Submitted to Computer Physics Communications. Code available at https://github.com/lennonshikhman/diagnostics_for_physics Lennon J. Shikhman http://arxiv.org/abs/2606.18185v1 An Encoder-Transformer Architecture for Recognition of the Jordan Structure of a Matrix 2026-06-16T17:17:23Z

We propose a machine-learning framework for detecting whether a given matrix is a perturbation of a matrix with a large Jordan block. The proposed model achieves high classification accuracy on synthetically generated, robustly perturbed data and outperforms a classical numerical baseline. Moreover, we demonstrate that the learned model generalizes to several classes of matrices not seen during training. These results suggest that the architecture captures structural properties associated with matrix defectiveness.

2026-06-16T17:17:23Z Michał Trojanowski Michał Wojtylak http://arxiv.org/abs/2606.18177v1 A minimizing-movement framework for geometric gradient flows with admissible tangential motion 2026-06-16T17:11:04Z

We develop a minimizing-movement framework for parametric finite element approximations of geometric gradient flows with admissible tangential motion. At each time step, the discrete variational problem combines a metric dissipation term for the normal displacement with a surface Dirichlet energy. The metric determines the normal geometric evolution: the $L^2(Γ)$ metric gives mean curvature flow, while the $H^{-1}(Γ)$ metric gives surface diffusion flow. Tangential velocity is selected independently through weak constraints on the deformation map. The central structural condition is admissibility, namely, that the identity map satisfies the constraint. This condition keeps the identity map available as a comparison function and yields the natural stability estimate. The framework recovers the classical Barrett--Garcke--Nürnberg (BGN) scheme from the unconstrained formulation and the dual minimal-deformation-rate (MDR) scheme from the MDR constraint. We further introduce two new admissible variants: an admissible BGN scheme and a relaxed MDR scheme. For the resulting fully discrete schemes, we prove existence and uniqueness under natural nondegeneracy assumptions and establish unconditional energy stability. Numerical experiments compare the admissible and classical schemes and illustrate their stability properties and mesh-quality behavior.

2026-06-16T17:11:04Z Xiaoxiao Liu Quan Zhao http://arxiv.org/abs/2606.18175v1 A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks 2026-06-16T17:09:59Z

We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters and solved by a single direct linear least-squares QR factorization. The trial space, which we term Linear-in-Learnables (LiL), comprises representations whose trainable parameters enter linearly, including random-feature extreme learning machines, spectral polynomial bases, and trigonometric expansions, each implemented as a physics-informed neural network. The method thus replaces the nonconvex gradient-based training that limits standard PINNs with a convex per-step solve. We establish local Newton-Kantorovich convergence of the outer iteration to a residual-limited neighborhood under an explicit smallness condition, with the limiting accuracy governed by the best-approximation residual of the trial space rather than by an optimization tolerance. The method, denoted LiL-Q, is assessed on seven benchmarks spanning scalar nonlinear PDEs (Bratu, viscous Burgers, Buckley-Leverett), coupled systems (plane-strain elasticity and the incompressible Navier-Stokes equations in two and three spatial dimensions), and steady-state Darcy flow with heterogeneous permeability. Across these problems, LiL-Q converges in single-digit outer iterations in most cases, even at the coarsest basis sizes and independent of the parameter count. When the exact solution lies in the span of the trial space, the method recovers it to machine precision in a single solve. On the Navier-Stokes benchmarks, it matches or exceeds published PINN solvers with up to two orders of magnitude fewer trainable parameters, without gradient-based optimization.

2026-06-16T17:09:59Z Preprint. 56 pages, 18 figures. Code: https://github.com/awojinrin/lilq-pinn Gbenga T. Awojinrin Abdul-Akeem Olawoyin Rami M. Younis http://arxiv.org/abs/2606.18173v1 An algorithm to exactly compute minimal upper bounds in the Loewner order 2026-06-16T17:08:55Z

The Loewner order on Hermitian matrices is a partial order that compares matrices in terms of positive semidefiniteness. The Loewner order plays a key role in many fields such as optimization, numerical linear algebra, control theory, operator theory, and quantum information. A fundamental difficulty is that two or more Hermitian matrices do not necessarily have a unique minimal upper bound (or maximal lower bound). In this paper, we propose an iterative method to exactly compute a minimal upper bound for any finite collection of $n\times n$ Hermitian matrices. It is shown that the algorithm terminates in at most $n$ iterations. The exactitude of the algorithm is proved using standard results from finite-dimensional linear algebra. A self-contained proof of an algebraic characterization of minimality originally explored by Stott is provided. We illustrate the algorithm in examples and also provide an implementation of the algorithm in Python.

2026-06-16T17:08:55Z 20 pages, 2 figures Adam Humeniuk Gabriel Jarry-Bolduc Patrick Pascua Nejaunie Williams