https://arxiv.org/api/aMS9w2OI0kB4Fu08cGFzynftBS8 2026-06-21T23:27:55Z 2664 255 15 http://arxiv.org/abs/2306.17541v5 Rigorous Function Calculi in Ariadne 2025-09-17T10:29:35Z

Almost all problems in applied mathematics, including the analysis of dynamical systems, deal with spaces of real-valued functions on Euclidean domains in their formulation and solution. In this paper, we describe the the tool Ariadne, which provides a rigorous calculus for working with Euclidean functions. We first introduce the Ariadne framework, which is based on a clean separation of objects as providing exact, effective, validated and approximate information. We then discuss the function calculus as implemented in Ariadne, including polynomial function models which are the fundamental class for concrete computations. We then consider solution of some core problems of functional analysis, namely solution of algebraic equations and differential equations, and briefly discuss their use for the analysis of hybrid systems. We will give examples of C++ and Python code for performing the various calculations. Finally, we will discuss progress on extensions, including improvements to the function calculus and extensions to more complicated classes of system.

2023-06-30T10:53:27Z Logical Methods in Computer Science, Volume 21, Issue 3 (September 18, 2025) lmcs:11531 Pieter Collins Luca Geretti Sanja Zivanovic Gonzalez Davide Bresolin Tiziano Villa 10.46298/lmcs-21(3:28)2025 http://arxiv.org/abs/2509.13006v1 Efficient Compilation of Algorithms into Compact Linear Programs 2025-09-16T12:18:41Z

Linear Programming (LP) is widely applied in industry and is a key component of various other mathematical problem-solving techniques. Recent work introduced an LP compiler translating polynomial-time, polynomial-space algorithms into polynomial-size LPs using intuitive high-level programming languages, offering a promising alternative to manually specifying each set of constraints through Algebraic Modeling Languages (AMLs). However, the resulting LPs, while polynomial in size, are often extremely large, posing challenges for existing LP solvers. In this paper, we propose a novel approach for generating substantially smaller LPs from algorithms. Our goal is to establish minimum-size compact LP formulations for problems in P having natural formulations with exponential extension complexities. Our broader vision is to enable the systematic generation of Compact Integer Programming (CIP) formulations for problems with exponential-size IPs having polynomial-time separation oracles. To this end, we introduce a hierarchical linear pipelining technique that decomposes nested program structures into synchronized regions with well-defined execution transitions -- functions of compile-time parameters. This decomposition allows us to localize LP constraints and variables within each region, significantly reducing LP size without the loss of generality, ensuring the resulting LP remains valid for all inputs of size $n$. We demonstrate the effectiveness of our method on two benchmark problems -- the makespan problem, which has exponential extension complexity, and the weighted minimum spanning tree problem -- both of which have exponential-size natural LPs. Our results show up to a $25$-fold reduction in LP size and substantial improvements in solver performance across both commercial and non-commercial LP solvers.

2025-09-16T12:18:41Z Preliminary version will appear in CASCON 2025 Shermin Khosravi David Bremner http://arxiv.org/abs/2509.11781v1 A Computational Framework and Implementation of Implicit Priors in Bayesian Inverse Problems 2025-09-15T11:03:04Z

Solving Bayesian inverse problems typically involves deriving a posterior distribution using Bayes' rule, followed by sampling from this posterior for analysis. Sampling methods, such as general-purpose Markov chain Monte Carlo (MCMC), are commonly used, but they require prior and likelihood densities to be explicitly provided. In cases where expressing the prior explicitly is challenging, implicit priors offer an alternative, encoding prior information indirectly. These priors have gained increased interest in recent years, with methods like Plug-and-Play (PnP) priors and Regularized Linear Randomize-then-Optimize (RLRTO) providing computationally efficient alternatives to standard MCMC algorithms. However, the abstract concept of implicit priors for Bayesian inverse problems is yet to be systematically explored and little effort has been made to unify different kinds of implicit priors. This paper presents a computational framework for implicit priors and their distinction from explicit priors. We also introduce an implementation of various implicit priors within the CUQIpy Python package for Computational Uncertainty Quantification in Inverse Problems. Using this implementation, we showcase several implicit prior techniques by applying them to a variety of different inverse problems from image processing to parameter estimation in partial differential equations.

2025-09-15T11:03:04Z Jasper M. Everink Chao Zhang Amal M. A. Alghamdi Rémi Laumont Nicolai A. B. Riis Jakob S. Jørgensen http://arxiv.org/abs/2509.10613v1 pySigLib -- Fast Signature-Based Computations on CPU and GPU 2025-09-12T18:00:14Z

Signature-based methods have recently gained significant traction in machine learning for sequential data. In particular, signature kernels have emerged as powerful discriminators and training losses for generative models on time-series, notably in quantitative finance. However, existing implementations do not scale to the dataset sizes and sequence lengths encountered in practice. We present pySigLib, a high-performance Python library offering optimised implementations of signatures and signature kernels on CPU and GPU, fully compatible with PyTorch's automatic differentiation. Beyond an efficient software stack for large-scale signature-based computation, we introduce a novel differentiation scheme for signature kernels that delivers accurate gradients at a fraction of the runtime of existing libraries.

2025-09-12T18:00:14Z Daniil Shmelev Cristopher Salvi http://arxiv.org/abs/2509.10226v1 Matrix-Free Evaluation Strategies for Continuous and Discontinuous Galerkin Discretizations on Unstructured Tetrahedral Grids 2025-09-12T13:18:12Z

This study presents novel strategies for improving the node-level performance of matrix-free evaluation of continuous and discontinuous Galerkin spatial discretizations on unstructured tetrahedral grids. In our approach the underlying integrals of a generic finite-element operator are computed cell-by-cell through numerical quadrature using tabulated dense local matrices of shape functions, achieving high throughput for low to moderate-order polynomial degrees. By employing dense matrix-matrix products instead of matrix-vector products for the cell-wise interpolation, the method reaches over $60\%$ of peak performance. The optimization strategies exploit explicit data parallelism to enhance computational efficiency, complemented by a hierarchical mesh reordering algorithm that improves data locality. The matrix-free implementation achieves up to a $6\times$ speedup compared to a global sparse matrix-based approach at a polynomial degree of three. The effectiveness of the method is demonstrated through numerical experiments on the Poisson and Navier--Stokes equations. The Poisson operator is preconditioned by a hybrid multigrid scheme that combines auxiliary continuous finite-element spaces, polynomial and geometric coarsening where possible while employing algebraic multigrid on the coarse mesh. Within the preconditioner, the implementation transitions between the matrix-free and matrix-based strategies for optimal efficiency. Finally, we analyze the strong scaling behavior of the Poisson and Helmholtz operators, demonstrating the method's potential to solve large real-world problems.

2025-09-12T13:18:12Z 26 pages, 13 figures, submitted to SIAM Journal on Scientific Computing Dominik Still Niklas Fehn Wolfgang A. Wall Martin Kronbichler http://arxiv.org/abs/2509.06439v1 Relational Algebras for Subset Selection and Optimisation 2025-09-08T08:35:46Z

The database community lacks a unified relational query language for subset selection and optimisation queries, limiting both user expression and query optimiser reasoning about such problems. Decades of research (latterly under the rubric of prescriptive analytics) have produced powerful evaluation algorithms with incompatible, ad-hoc SQL extensions that specify and filter through distinct mechanisms. We present the first unified algebraic foundation for these queries, introducing relational exponentiation to complete the fundamental algebraic operations alongside union (addition) and cross product (multiplication). First, we extend relational algebra to complete domain relations-relations defined by characteristic functions rather than explicit extensions-achieving the expressiveness of NP-complete/hard problems, while simultaneously providing query safety for finite inputs. Second, we introduce solution sets, a higher-order relational algebra over sets of relations that naturally expresses search spaces as functions f: Base to Decision, yielding |Decision|^|Base| candidate relations. Third, we provide structure-preserving translation semantics from solution sets to standard relational algebra, enabling mechanical translation to existing evaluation algorithms. This framework achieves the expressiveness of the most powerful prior approaches while providing the theoretical clarity and compositional properties absent in previous work. We demonstrate the capabilities these algebras open up through a polymorphic SQL where standard clauses seamlessly express data management, subset selection, and optimisation queries within a single paradigm.

2025-09-08T08:35:46Z 15 pages main text, 28 pages appendicies David Robert Pratten Luke Mathieson Fahimeh Ramezani http://arxiv.org/abs/2509.05666v1 Accuracy of Mathematical Functions in Julia 2025-09-06T10:04:11Z

Basic computer arithmetic operations, such as $+$, $\times$, or $÷$ are correctly rounded, whilst mathematical functions such as $e^x$, $\ln(x)$, or $\sin(x)$ in general are not, meaning that separate implementations may provide different results when presented with an exact same input, and that their accuracy may differ. We present a methodology and a software tool that is suited for exhaustive and non-exhaustive testing of mathematical functions of Julia in various floating-point formats. The software tool is useful to the users of Julia, to quantise the level of accuracy of the mathematical functions and interpret possible effects of errors on their scientific computation codes that depend on these functions. It is also useful to the developers and maintainers of the functions in Julia Base, to test the modifications to existing functions and to test the accuracy of new functions. The software (a test bench) is designed to be easy to set up for running the accuracy tests in automatic regression testing. Our focus is to provide software that is user friendly and allows to avoid the need for specialised knowledge of floating-point arithmetic or the workings of mathematical functions; users only need to supply a list of formats, choose the rounding modes, and specify the input space search strategies based on how long they can afford the testing to run. We have utilized the test bench to determine the errors of a subset of mathematical functions in the latest version of Julia, for binary16, binary32, and binary64 IEEE 754 floating-point formats, and found $0.49$ to $0.51$ULPs in binary16, and $0.5$ to $2.4$ULPs of error in binary32 and binary64. The functions that may be correctly rounded (error of $0.5$ULP) in all the three formats are sqrt and cbrt. The following functions may be correctly rounded only for binary16: sinh, asin, cospi, sinpi, atanh, log2, tanh.

2025-09-06T10:04:11Z Mantas Mikaitis Tejaswa Rizyal http://arxiv.org/abs/2402.02523v3 FEniCSx-pctools: Tools for PETSc block linear algebra preconditioning in FEniCSx 2025-09-05T13:54:36Z

Solving partial differential equations with the finite element method leads to large linear systems of equations that must be solved. When these systems have a natural block structure due to multiple field variables, using iterative solvers with carefully designed preconditioning strategies that exploit the underlying physical structure becomes necessary for an efficient and scalable solution process. FEniCSx Preconditioning Tools (FEniCSx-pctools) is a software package that eases the specification of PETSc (Portable, Extensible Toolkit for Scientific Computation) block preconditioning strategies on linear systems assembled using the DOLFINx finite element solver of the FEniCS Project. The package automatically attaches all necessary metadata so that preconditioning strategies can be applied via PETSc's standard options database to monolithic and block assembled systems. The documented examples include a simple mixed Poisson system and more complex pressure convection-diffusion approach to preconditioning the Navier-Stokes equations. We show weak parallel scaling on a fully coupled temperature-Navier-Stokes system up to 8192 MPI (Message Passing Interface) processes, demonstrating the applicability of the approach to large-scale problems.

2024-02-04T15:11:04Z 12 pages, 8 figures, 1 table Martin Řehoř Jack S. Hale http://arxiv.org/abs/2510.15884v1 Generalized Methodology for Determining Numerical Features of Hardware Floating-Point Matrix Multipliers: Part I 2025-09-03T05:46:08Z

Numerical features of matrix multiplier hardware units in NVIDIA and AMD data centre GPUs have recently been studied. Features such as rounding, normalisation, and internal precision of the accumulators are of interest. In this paper, we extend the methodology for analysing those features, to consumer-grade NVIDIA GPUs by implementing an architecture-independent test scheme for various input and output precision formats. Unlike current approaches, the proposed test vector generation method neither performs an exhaustive search nor relies on hard-coded {constants that are device-specific, yet remains applicable to a wide range of mixed-precision formats. We have applied the scheme to the RTX-3060 (Ampere architecture), and Ada RTX-1000 (Ada Lovelace architecture) graphics cards and determined numerical features of matrix multipliers for binary16, TensorFloat32, and bfloat16 input floating point formats and binary16 and binary32 IEEE 754 output formats. Our methodology allowed us to determine that} the numerical features of RTX-3060, a consumer-grade GPU, are identical to those of the A100, a data centre GPU. We do not expect our code to require any changes for performing analysis of matrix multipliers on newer NVIDIA GPUs, Hopper or Blackwell, and their future successors, and any input/output format combination, including the latest 8-bit floating-point formats.

2025-09-03T05:46:08Z Accepted for IEEE HPEC 2025 Faizan A Khattak Mantas Mikaitis http://arxiv.org/abs/2509.02840v1 Fast and Accurate SVD-Type Updating in Streaming Data 2025-09-02T21:17:37Z

For a datastream, the change over a short interval is often of low rank. For high throughput information arranged in matrix format, recomputing an optimal SVD approximation after each step is typically prohibitive. Instead, incremental and truncated updating strategies are used, which may not scale for large truncation ranks. Therefore, we propose a set of efficient new algorithms that update a bidiagonal factorization, and which are similarly accurate as the SVD methods. In particular, we develop a compact Householder-type algorithm that decouples a sparse part from a low-rank update and has about half the memory requirements of standard bidiagonalization methods. A second algorithm based on Givens rotations has only about 10 flops per rotation and scales quadratically with the problem size, compared to a typical cubic scaling. The algorithm is therefore effective for processing high-throughput updates, as we demonstrate in tracking large subspaces of recommendation systems and networks, and when compared to well known software such as LAPACK or the incremental SVD.

2025-09-02T21:17:37Z Johannes J. Brust Michael A. Saunders http://arxiv.org/abs/2509.01855v1 A Million-Point Fast Trajectory Optimization Solver 2025-09-02T00:47:59Z

One might argue that solving a trajectory optimization problem over a million grid points is preposterous. How about solving such a problem at an incredibly fast computational time? On a small form-factor processor? Algorithmic details that make possible this trifecta of breakthroughs are presented in this paper. The computational mathematics that deliver these advancements are: (i) a Birkhoff-theoretic discretization of optimal control problems, (ii) matrix-free linear algebra leveraging Krylov-subspace methods, and (iii) a near-perfect Birkhoff preconditioner that helps achieve $\mathcal{O}(1)$ iteration speed with respect to the grid size,~$N$. A key enabler of this high performance is the computation of Birkhoff matrix-vector products at $\mathcal{O}(N\log(N))$ time using fast Fourier transform techniques that eliminate traditional computational bottlenecks. A numerical demonstration of this unprecedented scale and speed is illustrated for a practical astrodynamics problem.

2025-09-02T00:47:59Z 20 pages, 7 figures, AAS Paper 25-689 Proceedings of the 2025 AAS/AIAA Astrodynamics Specialist Conference, Boston, Massachusetts, August 10-14 2025 A. Javeed D. P. Kouri D. Ridzal J. D. Steinman I. M. Ross http://arxiv.org/abs/2303.05385v3 PyGenStability: Multiscale community detection with generalized Markov Stability 2025-08-28T17:00:39Z

We present PyGenStability, a general-use Python software package that provides a suite of analysis and visualisation tools for unsupervised multiscale community detection in graphs. PyGenStability finds optimized partitions of a graph at different levels of resolution by maximizing the generalized Markov Stability quality function with the Louvain or Leiden algorithms. The package includes automatic detection of robust graph partitions and allows the flexibility to choose quality functions for weighted undirected, directed and signed graphs, and to include other user-defined quality functions.

2023-03-08T16:22:03Z Alexis Arnaudon Juni Schindler Robert L. Peach Adam Gosztolai Maxwell Hodges Michael T. Schaub Mauricio Barahona 10.1145/3651225 http://arxiv.org/abs/2506.12501v2 Implementation of McMurchie-Davidson algorithm for Gaussian AO integrals suited for SIMD processors 2025-08-28T04:53:46Z

We report an implementation of the McMurchie-Davidson evaluation scheme for 1- and 2-particle Gaussian AO integrals designed for processors with Single Instruction Multiple Data (SIMD) instruction sets. Like in our recent MD implementation for graphical processing units (GPUs) [J. Chem. Phys. 160, 244109 (2024)], variable-sized batches of shellsets of integrals are evaluated at a time. By optimizing for the floating point instruction throughput rather than minimizing the number of operations, this approach achieves up to 50% of the theoretical hardware peak FP64 performance for many common SIMD-equipped platforms (AVX2, AVX512, NEON), which translates to speedups of up to 30 over the state-of-the-art one-shellset-at-a-time implementation of Obara-Saika-type schemes in Libint for a variety of primitive and contracted integrals. As with our previous work, we rely on the standard C++ programming language -- such as the std::simd standard library feature to be included in the 2026 ISO C++ standard -- without any explicit code generation to keep the code base small and portable. The implementation is part of the open source LibintX library freely available at https://github.com/ValeevGroup/libintx.

2025-06-14T13:33:08Z Andrey Asadchev Edward F. Valeev 10.1021/acs.jpca.5c04136 http://arxiv.org/abs/2508.18202v1 Uncertain data assimilation for urban wind flow simulations with OpenLB-UQ 2025-08-25T17:01:36Z

Accurate prediction of urban wind flow is essential for urban planning, pedestrian safety, and environmental management. Yet, it remains challenging due to uncertain boundary conditions and the high cost of conventional CFD simulations. This paper presents the use of the modular and efficient uncertainty quantification (UQ) framework OpenLB-UQ for urban wind flow simulations. We specifically use the lattice Boltzmann method (LBM) coupled with a stochastic collocation (SC) approach based on generalized polynomial chaos (gPC). The framework introduces a relative-error noise model for inflow wind speeds based on real measurements. The model is propagated through a non-intrusive SC LBM pipeline using sparse-grid quadrature. Key quantities of interest, including mean flow fields, standard deviations, and vertical profiles with confidence intervals, are efficiently computed without altering the underlying deterministic solver. We demonstrate this on a real urban scenario, highlighting how uncertainty localizes in complex flow regions such as wakes and shear layers. The results show that the SC LBM approach provides accurate, uncertainty-aware predictions with significant computational efficiency, making OpenLB-UQ a practical tool for real-time urban wind analysis.

2025-08-25T17:01:36Z Mingliang Zhong Dennis Teutscher Adrian Kummerländer Mathias J. Krause Martin Frank Stephan Simonis http://arxiv.org/abs/2409.12067v4 Fitting Multilevel Factor Models 2025-08-25T13:02:26Z

We examine a special case of the multilevel factor model, with covariance given by multilevel low rank (MLR) matrix~\cite{parshakova2023factor}. We develop a novel, fast implementation of the expectation-maximization algorithm, tailored for multilevel factor models, to maximize the likelihood of the observed data. This method accommodates any hierarchical structure and maintains linear time and storage complexities per iteration. This is achieved through a new efficient technique for computing the inverse of the positive definite MLR matrix. We show that the inverse of positive definite MLR matrix is also an MLR matrix with the same sparsity in factors, and we use the recursive Sherman-Morrison-Woodbury matrix identity to obtain the factors of the inverse. Additionally, we present an algorithm that computes the Cholesky factorization of an expanded matrix with linear time and space complexities, yielding the covariance matrix as its Schur complement. This paper is accompanied by an open-source package that implements the proposed methods.

2024-09-18T15:39:12Z Tetiana Parshakova Trevor Hastie Stephen Boyd