https://arxiv.org/api/uGXyJ9rkEpzUtPLGgYKCYdOtFGw2026-06-13T10:47:32Z49733015http://arxiv.org/abs/2412.08059v7Parameter optimization for restarted mixed precision iterative sparse solver2026-06-11T17:50:16ZThe problem of optimal precision switching for the conjugate gradient (CG) method applied to sparse linear systems is considered. A sparse matrix is defined as an $n\!\times\!n$ matrix with $m\!=\!O(n)$ nonzero entries. The algorithm first computes an approximate solution in single precision with tolerance $\varepsilon_1$, then switches to double precision to refine the solution to the required stopping tolerance $\varepsilon_2$. Based on estimates of system matrix parameters -- computed in time which does not exceed $1\%$ of the time needed to solve the system in double precision -- we determine the optimal value of $\varepsilon_1$ that minimizes total computation time. This value is obtained by classifying the matrix using the $k$-nearest neighbors method on a small precomputed sample. Classification relies on a feature vector comprising: the matrix size $n$, the number of nonzeros $m$, the pseudo-diameter of the matrix sparsity graph, and the average rate of residual norm decay during the early CG iterations in single precision. We show that, in addition to the matrix condition number, the diameter of the sparsity graph influences the growth of rounding errors during iterative computations. The proposed algorithm reduces the computational complexity of the CG -- expressed in equivalent double-precision iterations -- by more than $17\%$ on average across the considered matrix types in a sequential setting. The resulting speedup is at most $1.5\%$ worse than that achieved with the optimal (oracle) choice of $\varepsilon_1$.
While the impact of matrix structure on Krylov subspace method convergence is well understood, the use of the sparsity graph diameter as a predictive feature for rounding error growth in mixed-precision CG appears to be novel. To the best of our knowledge, no prior work employs graph diameter to guide precision switching in iterative linear solvers.2024-12-11T03:02:58Z51 pages, 5 figuresAlexander V. Prolubnikovhttp://arxiv.org/abs/2512.07004v4Accurate Models of NVIDIA Tensor Cores2026-06-11T17:47:01ZMatrix multiplication is a fundamental operation in both training of neural networks and inference. To accelerate matrix multiplication, Graphical Processing Units (GPUs) provide it implemented in hardware. Due to the increased throughput over the software-based matrix multiplication, the multipliers are increasingly used outside of AI, to accelerate various applications in scientific computing. However, matrix multipliers targeted at AI are at present not compliant with IEEE 754 floating-point arithmetic behaviour, with different vendors offering different numerical features. This leads to non-reproducible results across different generations of GPU architectures, at the matrix multiply-accumulate instruction level. To study numerical characteristics of matrix multipliers - such as rounding behaviour, accumulator width, normalization points, extra carry bits, and others - test vectors are typically constructed. Yet, these vectors may or may not distinguish between different hardware models, and due to limited hardware availability, their reliability across many different platforms remains largely untested. We present software models for emulating the inner product behavior of low- and mixed-precision matrix multipliers in the V100, A100, H100 and B200 data center GPUs in most supported input formats of interest to mixed-precision algorithm developers: 8-, 16-, and 19-bit floating point. These matrix multiplier models are first approximated by determining the numerical features via test vectors designed to trigger outputs sensitive to bit level differences in the implementation, followed by semi-exhaustive comparison (randomised input vectors of $10^7$ values) between the models and the actual GPU matrix multipliers - this process is repeated until the model is bit accurate.2025-12-07T21:13:18ZFaizan A. KhattakMantas Mikaitishttp://arxiv.org/abs/2606.13549v1A general-purpose global regularization method for 3D volume integral operators2026-06-11T16:29:43ZSingular volume integral operators associated with constant-coefficient partial differential operators extend the applicability of potential theory to inhomogeneous problems, for example arising from nonlinearities or variable coefficients. Typically the PDE kernels in these operators give rise to singularities at all $\mathcal{O}(1/h^3)$ volume discretization/evaluation points in a mesh of characteristic size $h$, while the slowly-decaying nature of such kernels give rise to long-range interactions that require coupling to fast summation algorithms. The presented method uses Green's identities to regularize a wide variety of both scalar-valued and vector-valued volume integral operators by use of a certain regularizing volume density interpolant. The analysis shows how the regularizing effect of the interpolant is global in the sense that the interpolation quality increases in an exactly compensatory fashion as the distance to the Green's function singularity decreases. High-order convergence estimates with tabulated simplex quadratures are established, including with exact representation of curved domains.2026-06-11T16:29:43Z28 pages, 4 figuresThomas G. AndersonMarc BonnetLuiz M. FariaCarlos Pérez-Arancibiahttp://arxiv.org/abs/2603.08415v2Discontinuous Galerkin approximation of a nonlinear multiphysics problem arising in ultrasound-enhanced drug delivery2026-06-11T15:46:04ZMotivated by simulations of ultrasound-enhanced drug delivery, this work presents the numerical analysis of a mathematical model that captures the influence of ultrasound waves on the diffusivity of the drug. The system under study consists of the Westervelt wave equation, accounting for the nonlinear propagation of ultrasound, coupled to a convection-diffusion equation modeling the drug concentration. In particular, drug delivery is affected by ultrasound through a pressure-dependent diffusion coefficient. The Westervelt equation is supplemented by linear absorbing boundary conditions as a means of reducing spurious reflections off the boundaries of computational domains. For spatial discretization of this multiphysics system, we employ a discontinuous Galerkin approach on simplicial meshes. Under suitable assumptions on the exact pressure and the mesh size, we first establish well-posedness, non-degeneracy, and optimal convergence rates in the energy norm for the semi-discrete pressure subproblem. The smallness of the semi-discrete pressure is then used to establish the well-posedness and convergence of the wave--convection-diffusion system under suitable regularity of the exact concentration. Finally, theoretical findings are illustrated through numerical experiments.2026-03-09T14:13:07ZFemke de WitVanja Nikolićhttp://arxiv.org/abs/2606.13482v1A Stabilized Multilevel B-Spline-Based Fast Integral Method for the Solution of the Electric Field Integral Equation2026-06-11T15:32:30ZWe present a multilevel B-spline-based fast integral method for the solution of the electric field integral equation (EFIE), combining fast Fourier transformation (FFT)-compatible kernel interpolation with robust high-order interpolation. Existing FFT-accelerated global Lagrange-based approaches rely on equidistant interpolation points and can, therefore, suffer from Runge-type instabilities at high interpolation orders, limiting robust high-accuracy compression. In contrast, B-splines on equidistant knot vectors overcome these instabilities and enable robust high-order interpolation for accurate matrix compression. Replacing Lagrange interpolation by B-spline interpolation is, however, non-trivial: B-spline coefficients do not coincide with function values at the interpolation points, and the associated sampling matrices can become ill-conditioned. To address these challenges, we introduce a knot-removal stabilization strategy, combined with exact interlevel transfers based on knot insertion, yielding accurate, well-conditioned multilevel interpolation. Moreover, we propose a factorization strategy that preserves the null space of the scalar potential operator up to machine precision and is compatible with low-frequency preconditioning techniques. Numerical results for both canonical and realistic geometries demonstrate robust high-order interpolation without the breakdown observed for Lagrange-based approaches and confirm $\mathcal{O}(N)$ complexity.2026-06-11T15:32:30ZDanijel JukićBernd HofmannThomas F. EibertSimon B. Adrianhttp://arxiv.org/abs/2510.02111v2Coarse scrambling for Sobol' and Niederreiter sequences2026-06-11T15:19:25ZWe introduce coarse scrambling, a novel randomization for digital sequences that permutes blocks of digits in a mixed-radix representation. This construction is designed to preserve the powerful $(0,\mathbb{e},d)$-sequence property of the underlying points. For sufficiently smooth integrands, we prove that this method achieves the canonical $O(n^{-3+ε})$ variance decay rate, matching that of standard Owen's scrambling. Crucially, we show that its maximal gain coefficient grows only logarithmically with dimension, $O(\log d)$, thus providing theoretical robustness against the curse of dimensionality affecting scrambled Sobol' sequences. Numerical experiments validate these findings and illustrate a practical trade-off: while Owen's scrambling is superior for integrands sensitive to low-dimensional projections, coarse scrambling is competitive for functions with low effective truncation dimension.2025-10-02T15:20:49ZKosuke Suzukihttp://arxiv.org/abs/2606.13457v1Reduced basis algorithm for solving nonlinear differential equations on quantum computers2026-06-11T15:13:38ZAs quantum computing moves toward scientific computing applications, nonlinear differential equations remain a central challenge since quantum evolution is intrinsically linear. In this work, we introduce a reduced basis algorithm (RBA) for polynomial nonlinear ordinary differential equations (ODEs) and spatially discretized partial differential equations (PDEs). After time discretization, the method composes the resulting polynomial update map over $m$ timesteps, identifies the reduced monomial basis appearing in this composed map, and constructs a linear RBA operator whose action recovers the exact $m$-timestep nonlinear dynamics. Thus, at the level of the chosen discrete update rule, the method introduces no additional approximation error beyond the time discretization error. The qubit number requirement is governed by the size of the reduced monomial basis. For an $n$-dimensional polynomial ODE system of degree $p>1$, the lifted register requires at most $q_m^{\mathrm{ODE}} = O(nm\log p)$ qubits in the full basis scenario. For PDEs discretized on $N^D$ grid points, a locality-based construction requires at most $q_m^{\mathrm{PDE}} = O(D\log N + n m^{D+1}\log p)$ qubits. Hence, the dependence on the grid size remains logarithmic, while the nonlinear overhead is controlled by local reduced basis size. The main computational burden is moved from the quantum computer to a classical preprocessing step, where the reduced monomial basis and RBA operator are constructed for the chosen timestep window. Through numerical tests on the Lorenz system and the one-dimensional Burgers equation, we verify that the RBA reproduces the corresponding discrete time nonlinear dynamics exactly, while exposing the trade-off between timestep composition, reduced basis growth, and locality.2026-06-11T15:13:38ZMonica LăcătuşMatthias MöllerSauro Succihttp://arxiv.org/abs/2505.16345v2Convergence analysis of GMRES applied to Helmholtz problems near resonances2026-06-11T15:06:04ZThe finite element solution of Helmholtz problems near resonant or quasi-resonant frequencies poses significant challenges, as iterative solvers typically suffer from severely degraded convergence. We analyze the convergence behavior of GMRES applied to linear systems arising from such configurations. Theoretical convergence estimates are derived based on harmonic Ritz values, highlighting their proximity to small eigenvalues as a key determining factor. We further examine deflation strategies and their interplay with preconditioning techniques, using the Complex Shifted Laplacian preconditioner as a case study. Numerical experiments on resonant and quasi-resonant test cases validate the theoretical framework and demonstrate the effectiveness of deflation strategies. This study provides new insights and practical guidance for analyzing and improving iterative solvers for time-harmonic problems near resonances.2025-05-22T07:59:18ZVictorita DoleanPierre MarchandAxel ModaveTimothée Raynaudhttp://arxiv.org/abs/2606.13434v1Momentum Space Algorithm for Electronic Structure of Double-Incommensurate Trilayer Graphene2026-06-11T14:59:34ZNumerical algorithms for computing electronic structure of incommensurate 2D materials using ab initio models is critical for predicting material properties and guiding experiment. For bilayers, momentum space and continuum models have been introduced to approximate observables of ab initio tight-binding models using a momenta description despite the lack of periodicity in the tight-binding model required for Bloch theory. A similar structure has been introduced for double-incommensurate trilayers using a continuum model, where the three lattices are all mutually incommensurate. However, this description leads to a four-dimensional lattice space, and numerical convergence of the density of states was observed to have poor convergence.
In this work, we introduce a momentum space framework for double incommensurate trilayer graphene, and introduce an efficient truncation scheme of the four-dimensional lattice to drastically improve convergence of the density of states and momentum local density of states (a parallel object to classical band structure). We implement this algorithm on an ab initio model of twisted trilayer graphene and validate convergence estimates. We further verify numerically that the momentum space algorithm, inherently higher order than the continuum model as it is an exact transformation of the tight-binding model, captures altered band behavior near the flat bands at magic angles.2026-06-11T14:59:34Z54 pages, 8 figuresKen BeardDaniel Massatthttp://arxiv.org/abs/2606.13429v1A Scalable Deflated Conjugate Gradient Solver for the Time-Dependent Pseudo-Stress Stokes Problem2026-06-11T14:57:25ZWe propose a novel iterative solution framework for the unsteady Stokes equations in the pseudo-stress formulation. When solving this class of problems by using implicit time-integration schemes, standard solvers suffer from deteriorating convergence properties for small time steps, independently of the chosen space discretisation method. This is due to the singular modes of the dev-dev operator. For this reason, we introduce a computational framework obtained by combining a deflated Conjugate Gradient method with a W-cycle multigrid scheme that employs a Restricted Additive Schwarz smoother. The key point is to choose the deflation subspace so that the inner system to be solved within a deflated Conjugate Gradient scheme corresponds to a Laplace problem defined on the singular modes of the original dev-dev operator. This results to be independent of the spatial discretisation method and allows one to use efficient multigrid iterative solvers. Numerical experiments show that the proposed strategy significantly accelerates the Conjugate Gradient convergence and provides stable performance with respect to the time step, confirming its robustness for solving linear systems in the pseudo-stress framework.2026-06-11T14:57:25ZAlessandra CancriniGabriele CiaramellaPaola F. Antoniettihttp://arxiv.org/abs/2605.13648v4Sticky CIR process with potential: invariant measure and exact sampling2026-06-11T14:37:20ZWe study the sticky Cox--Ingersoll--Ross (CIR) process in one dimension, a diffusion on $[0,\infty)$ with a sticky boundary condition at the origin, arising as the marginal process in a sparse Bayesian inference framework based on Hadamard--Langevin dynamics. For the parameter range $δ\in(1,2)$, in which the origin is accessible but not absorbing, we prove well-posedness of the process and uniqueness of its invariant measure, which is a mixture of a point mass at zero and a weighted gamma-type density on the interior. We derive an explicit Green's function for the resolvent in terms of confluent hypergeometric functions, and use this to construct an exact sampler for the invariant measure in the zero-potential case. For a non-trivial potential $G$, we establish existence and uniqueness of the tilted invariant measure via a Girsanov change of measure, and develop two sampling algorithms: a Metropolis--Hastings corrected sampler that targets the invariant measure exactly, and a cheaper, biased unadjusted Langevin algorithm (ULA) for a boundary-clamped variant of which we prove a first-order expansion of the stationary bias with an explicit constant: the leading error is a rank-one transfer of mass $K_\star h|\log h| $ onto the atom, so the total-variation bias is of exact order $h|\log h | $ -- independent of $δ$ -- whenever the potential has nonzero boundary drift. Numerical experiments confirm the predicted behaviour: the Metropolis--Hastings sampler achieves the target invariant measure at all step sizes, while the ULA bias follows the proven first-order law, including its constant.2026-05-13T15:07:10ZTony Shardlowhttp://arxiv.org/abs/2606.13357v1Linear convergence of iterative contour integral-based eigensolvers for nonlinear eigenvalue problems2026-06-11T13:45:14ZSolving nonlinear eigenvalue problems is an important and challenging task in scientific computing. Contour integral-based approaches are attractive for such eigenvalue problems because they reliably target all eigenvalues in a prescribed domain. However, unlike in the linear case, many traditional methods of this type, such as Beyn's method, lack an inherent iterative refinement mechanism. Consequently, achieving high accuracy requires high-quality quadrature rules for approximating the contour integral, which often leads to prohibitive computational costs. A notable exception is the so-called NLFEAST algorithm, which combines contour integral techniques with a nonlinear Rayleigh--Ritz extraction step. In this work, we propose a general framework of iterative contour integral-based methods for nonlinear eigenvalue problems that includes NLFEAST. This allows us to prove linear convergence of NLFEAST under mild assumptions and also explains why certain nonlinear eigensolvers do not combine well with iterative methods. Numerical experiments confirm our theoretical findings; in particular that NLFEAST can achieve high accuracy even with a limited number of quadrature nodes, significantly outperforming Beyn's method on challenging problems.2026-06-11T13:45:14ZDaniel KressnerYuqi LiuJose E. RomanMeiyue ShaoNian Shaohttp://arxiv.org/abs/2606.13339v1A Note About Algebraic $(s, t)$-Weak Tractability Of Linear Tensor Product Problems In The Worst-Case Setting2026-06-11T13:29:28ZThis paper is devoted to discussing the linear tensor product problems in the worst case setting. We consider algorithms that use finitely many evaluations of arbitrary continuous linear functionals. We investigate algebraic $(s, t)$-weak tractability (ALG-$(s, t)$-WT) under the absolute error criterion in the case $λ_1 > 1$, where $λ_1$ is the square of the univariate maximal singular value. We solve the problem by giving the necessary and sufficient conditions for ALG-$(s, t)$-WT on univariate singular values and fill the gap left open.2026-06-11T13:29:28Z11 pagesZirong LiuHeping Wanghttp://arxiv.org/abs/2606.13308v1Subdivision-based isogeometric analysis for axisymmetric electromagnetic problems2026-06-11T13:05:30ZThis paper applies a subdivision-based isogeometric method to solve the axisymmetric Maxwell eigenvalue problem. The reduction to an $H^1$-formulation allows to use a Catmull-Clark construction for both geometry and field discretization. The approach yields a numerical solution for the electric field, which is $C^1$-continuous everywhere except at extraordinary vertices. This is demonstrated by computing the eigenmodes of a TESLA 9-cell cavity, showing smoother fields with less numerical noise than conventional methods. The convergence rate of the method is numerically analyzed and is in agreement with rates observed in the literature.2026-06-11T13:05:30ZDevin BalianSebastian SchöpsMelina Merkelhttp://arxiv.org/abs/2606.00274v2Error bounds for approximate posteriors from likelihood-informed reduced-order models2026-06-11T12:56:44ZIn the design of computational methods for Bayesian inverse problems, costly forward model evaluations make it difficult to sample from or compute the posterior. This motivates the need for approximate forward models that are cheaper to evaluate. We consider reduced-order forward models which exploit the lower-dimensional structure in the Bayesian inverse problem by projecting to the "likelihood-informed subspace" of the parameter space where the prior-to-posterior update is significant. However, the theoretical properties of these reduced-order forward models and their impact on the solution of the Baysian inverse problem are not always well-understood. In this work we consider linear Gaussian inverse problems with a possibly singular prior covariance matrix. We analyse a recently proposed reduced-order model which uses a Petrov-Galerkin projection to likelihood-informed subspaces that arise in optimal low-rank approximations of the posterior covariance matrix. We bound the error in the resulting approximation of the root prior-preconditioned Hessian of the data misfit. Based on this we also bound the errors of the approximate posterior covariance and mean. Our analysis shows that this reduced-order model recovers the exact posterior when the rank of the reduced-order model is equal to the "intrinsic dimension" of the inverse problem, i.e. the rank of the prior-preconditioned Hessian. Two numerical experiments from structural engineering illustrate the performance of our bounds.2026-05-29T19:07:50ZHan Cheng LieJakob ScheffelsElisabeth Ullmann