https://arxiv.org/api/vn4j+ag/tixz5tXnuTlH0SNWNQk 2026-06-21T15:14:44Z 2664 150 15 http://arxiv.org/abs/2601.23237v1 Applications of QR-based Vector-Valued Rational Approximation 2026-01-30T18:06:46Z

Several applications of the QR-AAA algorithm, a greedy scheme for vector-valued rational approximation, are presented. The focus is on demonstrating the flexibility and practical effectiveness of QR-AAA in a variety of computational settings, including Stokes flow computation, multivariate rational approximation, function extension, the development of novel quadrature methods and near-field approximation in the boundary element method.

2026-01-30T18:06:46Z Simon Dirckx http://arxiv.org/abs/2208.14314v4 Cardinal Optimizer (COPT) User Guide 2026-01-30T03:43:15Z

Cardinal Optimizer is a high-performance mathematical programming solver for efficiently solving largescale optimization problem. This documentation provides basic introduction to the Cardinal Optimizer.

2022-08-30T14:52:44Z Dongdong Ge Qi Huangfu Zizhuo Wang Jian Wu Yinyu Ye http://arxiv.org/abs/2601.22200v1 Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series 2026-01-29T15:58:01Z

Overparameterized models have recently challenged conventional learning theory by exhibiting improved generalization beyond the interpolation limit, a phenomenon known as benign overfitting. This work introduces Adaptive Benign Overfitting (ABO), extending the recursive least-squares (RLS) framework to this regime through a numerically stable formulation based on orthogonal-triangular updates. A QR-based exponentially weighted RLS (QR-EWRLS) algorithm is introduced, combining random Fourier feature mappings with forgetting-factor regularization to enable online adaptation under non-stationary conditions. The orthogonal decomposition prevents the numerical divergence associated with covariance-form RLS while retaining adaptability to evolving data distributions. Experiments on nonlinear synthetic time series confirm that the proposed approach maintains bounded residuals and stable condition numbers while reproducing the double-descent behavior characteristic of overparameterized models. Applications to forecasting foreign exchange and electricity demand show that ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent. The results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.

2026-01-29T15:58:01Z 32 pages, 3 figures, 10 tables Luis Ontaneda Mijares Nick Firoozye http://arxiv.org/abs/2405.03456v2 Floating Point Compression of Hierarchical Matrix Formats and its Impact on Matrix-Vector Multiplication 2026-01-28T20:08:27Z

Matrix-vector multiplication forms the basis of many iterative solution algorithms and as such is an important algorithm also for hierarchical matrices which are used to represent dense data in an optimized form by applying low-rank compression. However, due to its low computational intensity, the performance of matrix-vector multiplication is typically limited by the available memory bandwidth on parallel systems. With floating point compression the memory footprint can be optimized, which reduces the stress on the memory sub system and thereby increases performance. We will look into the compression of different formats of hierachical matrices and how this can be used to speed up the corresponding matrix-vector multiplication.

2024-05-06T13:29:14Z Ronald Kriemann http://arxiv.org/abs/2602.00125v1 MiniTensor: A Lightweight, High-Performance Tensor Operations Library 2026-01-27T21:21:59Z

We present MiniTensor, an open source tensor operations library that focuses on minimalism, correctness, and performance. MiniTensor exposes a familiar PyTorch-like Python API while it executes performance critical code in a Rust engine. The core supports dense $n$ dimensional tensors, broadcasting, reductions, matrix multiplication, reverse mode automatic differentiation, a compact set of neural network layers, and standard optimizers. In this paper, we describe the design of MiniTensor's architecture, including its efficient memory management, dynamic computation graph for gradients, and integration with Python via PyO3. We also compare the install footprint with PyTorch and TensorFlow to demonstrate that MiniTensor achieves a package size of only a few megabytes, several orders of magnitude smaller than mainstream frameworks, while preserving the essentials needed for research and development on CPUs. The repository can be found at https://github.com/neuralsorcerer/minitensor

2026-01-27T21:21:59Z Soumyadip Sarkar http://arxiv.org/abs/2601.17708v1 High-Order Mesh r-Adaptivity with Tangential Relaxation and Guaranteed Mesh Validity 2026-01-25T05:42:25Z

High-order meshes are crucial for achieving optimal convergence rates in curvilinear domains, preserving symmetry, and aligning with key flow features in moving mesh simulations, but their quality is challenging to control. In prior work, we have developed techniques based on Target-Matrix Optimization Paradigm (TMOP) to adapt a given high-order mesh to the geometry and solution of the partial differential equation (PDE). Here, we extend this framework to address two key gaps in the literature for high-order mesh r-adaptivity. First, we introduce tangential relaxation on curved surfaces using solely the discrete mesh representation, eliminating the need for access to underlying geometry (e.g., CAD model). Second, we ensure a continuously positive Jacobian determinant throughout the domain. This determinant positivity is essential for using the high-order mesh resulting from r-adaptivity with arbitrary quadrature schemes in simulations. The proposed approach is demonstrated to be robust using a variety of numerical experiments.

2026-01-25T05:42:25Z 13 pages, 10 figures Ketan Mittal Veselin Dobrev Tzanio Kolev Vladimir Tomov http://arxiv.org/abs/2602.00075v1 Dimensional Peeking for Low-Variance Gradients in Zeroth-Order Discrete Optimization via Simulation 2026-01-21T04:23:06Z

Gradient-based optimization methods are commonly used to identify local optima in high-dimensional spaces. When derivatives cannot be evaluated directly, stochastic estimators can provide approximate gradients. However, these estimators' perturbation-based sampling of the objective function introduces variance that can lead to slow convergence. In this paper, we present dimensional peeking, a variance reduction method for gradient estimation in discrete optimization via simulation. By lifting the sampling granularity from scalar values to classes of values that follow the same control flow path, we increase the information gathered per simulation evaluation. Our derivation from an established smoothed gradient estimator shows that the method does not introduce any bias. We present an implementation via a custom numerical data type to transparently carry out dimensional peeking over C++ programs. Variance reductions by factors of up to 7.9 are observed for three simulation-based optimization problems with high-dimensional input. The optimization progress compared to three meta-heuristics shows that dimensional peeking increases the competitiveness of zeroth-order optimization for discrete and non-convex simulations.

2026-01-21T04:23:06Z Accepted at ACM SIGSIM PADS 2026 Philipp Andelfinger Wentong Cai http://arxiv.org/abs/2601.12220v1 Canonicalization of Batched Einstein Summations for Tuning Retrieval 2026-01-18T01:50:28Z

We present an algorithm for normalizing \emph{Batched Einstein Summation} expressions by mapping mathematically equivalent formulations to a unique normal form. Batches of einsums with the same Einstein notation that exhibit substantial data reuse appear frequently in finite element methods (FEM), numerical linear algebra, and computational chemistry. To effectively exploit this temporal locality for high performance, we consider groups of einsums in batched form. Representations of equivalent batched einsums may differ due to index renaming, permutations within the batch, and, due to the commutativity and associativity of multiplication operation. The lack of a canonical representation hinders the reuse of optimization and tuning knowledge in software systems. To this end, we develop a novel encoding of batched einsums as colored graphs and apply graph canonicalization to derive a normal form. In addition to the canonicalization algorithm, we propose a representation of einsums using functional array operands and provide a strategy to transfer transformations operating on the normal form to \emph{functional batched einsums} that exhibit the same normal form; crucial for fusing surrounding computations for memory bound einsums. We evaluate our approach against JAX, and observe a geomean speedup of $4.7\times$ for einsums from the TCCG benchmark suite and an FEM solver.

2026-01-18T01:50:28Z Kaushik Kulkarni Andreas Klöckner http://arxiv.org/abs/2601.17028v1 PALMA: A Lightweight Tropical Algebra Library for ARM-Based Embedded Systems 2026-01-17T23:42:52Z

Tropical algebra, including max-plus, min-plus, and related idempotent semirings, provides a unifying framework in which many optimization problems that are nonlinear in classical algebra become linear. This property makes tropical methods particularly well suited for shortest paths, scheduling, throughput analysis, and discrete event systems. Despite their theoretical maturity and practical relevance, existing tropical algebra implementations primarily target desktop or server environments and remain largely inaccessible on resource-constrained embedded platforms, where such optimization problems are most acute. We present PALMA (Parallel Algebra Library for Max-plus Applications), a lightweight, dependency-free C library that brings tropical linear algebra to ARM-based embedded systems. PALMA implements a generic semiring abstraction with SIMD-accelerated kernels, enabling a single computational framework to support shortest paths, bottleneck paths, reachability, scheduling, and throughput analysis. The library supports five tropical semirings, dense and sparse (CSR) representations, tropical closure, and spectral analysis via maximum cycle mean computation. We evaluate PALMA on a Raspberry Pi 4 and demonstrate peak performance of 2,274 MOPS, speedups of up to 11.9 times over classical Bellman-Ford for single-source shortest paths, and sub-10 microsecond scheduling solves for real-time control workloads. Case studies in UAV control, IoT routing, and manufacturing systems show that tropical algebra enables efficient, predictable, and unified optimization directly on embedded hardware. PALMA is released as open-source software under the MIT license.

2026-01-17T23:42:52Z Open-source software available at https://github.com/ReFractals/palma Gnankan Landry Regis N'guessan http://arxiv.org/abs/2601.10828v1 High-Order Lie Derivatives from Taylor Series in the ADTAYL Package 2026-01-15T20:01:32Z

High-order Lie derivatives are essential in nonlinear systems analysis. If done symbolically, their evaluation becomes increasingly expensive as the order increases. We present a compact and efficient numerical approach for computing Lie derivatives of scalar, vector, and covector fields using the MATLAB ADTAYL package. The method exploits a fact noted by Röbenack: that these derivatives coincide, up to factorial scaling, with the Taylor coefficients of expressions built from a Taylor expansion about a trajectory point and, when required, the associated variational matrix. Computational results for a gantry crane model demonstrate orders of magnitude speedups over symbolic evaluation using the MATLAB Symbolic Math Toolbox.

2026-01-15T20:01:32Z 16 pages Nedialko S. Nedialkov John D. Pryce http://arxiv.org/abs/2601.07827v1 Tensor Algebra Processing Primitives (TAPP): Towards a Standard for Tensor Operations 2026-01-12T18:58:31Z

To address the absence of a universal standard interface for tensor operations, we introduce the Tensor Algebra Processing Primitives (TAPP), a C-based interface designed to decouple the application layer from hardware-specific implementations. We provide a mathematical formulation of tensor contractions and a reference implementation to ensure correctness and facilitate the validation of optimized kernels. Developed through community consensus involving academic and industrial stakeholders, TAPP aims to enable performance portability and resolving dependency challenges. The viability of the standard is demonstrated through successful integrations with the TBLIS and cuTENSOR libraries, as well as the DIRAC quantum chemistry package.

2026-01-12T18:58:31Z 45 pages, 5 figures Jan Brandejs Niklas Hörnblad Edward F. Valeev Alexander Heinecke Jeff Hammond Devin Matthews Paolo Bientinesi http://arxiv.org/abs/2410.21231v2 $\texttt{skwdro}$: a library for Wasserstein distributionally robust machine learning 2026-01-09T15:45:40Z

We present skwdro, a Python library for training robust machine learning models. The library is based on distributionally robust optimization using Wasserstein distances, popular in optimal transport and machine learnings. The goal of the library is to make the training of robust models easier for a wide audience by proposing a wrapper for PyTorch modules, enabling model loss' robustification with minimal code changes. It comes along with scikit-learn compatible estimators for some popular objectives. The core of the implementation relies on an entropic smoothing of the original robust objective, in order to ensure maximal model flexibility. The library is available at https://github.com/iutzeler/skwdro and the documentation at https://skwdro.readthedocs.io.

2024-10-28T17:16:00Z 7 pages 2 figures Florian Vincent Waïss Azizian Franck Iutzeler Jérôme Malick http://arxiv.org/abs/2601.04901v1 Rigorous numerical computation of the Stokes multipliers for linear differential equations with single level one 2026-01-08T12:56:55Z

We describe a practical algorithm for computing the Stokes multipliers of a linear differential equation with polynomial coefficients at an irregular singular point of single level one. The algorithm follows a classical approach based on Borel summation and numerical ODE solving, but avoids a large amount of redundant work compared to a direct implementation. It applies to differential equations of arbitrary order, with no genericity assumption, and is suited to high-precision computations. In addition, we present an open-source implementation of this algorithm in the SageMath computer algebra system and illustrate its use with several examples. Our implementation supports arbitrary-precision computations and automatically provides rigorous error bounds. The article assumes minimal prior knowledge of the asymptotic theory of meromorphic differential equations and provides an elementary introduction to the linear Stokes phenomenon that may be of independent interest.

2026-01-08T12:56:55Z Michèle Loday-Richaud LMO Marc Mezzarobba LIX Pascal Remy LMV http://arxiv.org/abs/2601.02506v1 Star Formation in Galaxy Collisions: Dependence on Impact Velocity and Gas Mass of Galaxies in GADGET-4 Simulations 2026-01-05T19:22:40Z

This work investigates variations in the star formation rate during galaxy collisions when the initial conditions of velocity and gas mass are altered. For this purpose, hydrodynamic simulations were performed using the GADGET-4 code, with initial conditions generated by the Galstep and SnapshotJoiner programs. Systems of two galaxies on a head-on collision course were modeled with relative initial velocities ranging from 100 km/s to 1000 km/s, considering two scenarios: the first with identical galaxies, and the second with galaxies of different sizes. In simulations of systems with higher initial relative velocities, both found more intense peaks in the star formation rate, triggered by the first contact of the collision, followed by a strong decline caused by gas dispersion. In contrast, for systems with lower initial velocities, mergers between galaxies were observed, leading to multiple peaks in the star formation rate. A greater initial distance between galaxies has also been linked to whether or not the galaxy system merges, since it implies longer timescales for gravitational action, which leads to higher relative velocities at the moment of collision. Furthermore, the star formation rate in galaxies was found to have a clear dependence on initial gas content. Furthermore, the initial gas content in galaxies was found to have a clear dependence on star formation rates. Overall, our results show that the relative impact velocity, the initial distance between the galaxies, and the gas content are important parameters for analyzing the star formation rate in colliding galaxies.

2026-01-05T19:22:40Z 17 pages and 17 figures. Keywords: star formation; galaxy mergers; hydrodynamic simulations; GADGET-4 Gustavo Neves Pereira Paulo Laerte Natti http://arxiv.org/abs/2504.02117v2 Vectorized Parallel in Time methods for low-order discretizations with application to Porous Media problems 2026-01-05T17:26:45Z

High order methods have shown great potential to overcome performance issues of simulations of partial differential equations (PDEs) on modern hardware, still many users stick to low-order, matrix-based simulations, in particular in porous media applications. Heterogeneous coefficients and low regularity of the solution are reasons not to employ high order discretizations. We present a new approach for the simulation of instationary PDEs that allows to partially mitigate the performance problems. By reformulating the original problem we derive a parallel in time time integrator that increases the arithmetic intensity and introduces additional structure into the problem. By this it helps accelerate matrix-based simulations on modern hardware architectures. Based on a system for multiple time steps we will formulate a matrix equation that can be solved using vectorized solvers like Block Krylov methods. The structure of this approach makes it applicable for a wide range of linear and nonlinear problems. In our numerical experiments we present some first results for three different PDEs, a linear convection-diffusion equation, a nonlinear diffusion-reaction equation and a realistic example based on the Richards' equation.

2025-04-02T20:26:22Z Christian Engwer Alexander Schell Nils-Arne Dreier