https://arxiv.org/api/zoRb1EmY+KL8fauTVzKNehVo1TQ 2026-06-21T14:03:30Z 2664 135 15 http://arxiv.org/abs/2602.22371v1 Quadratization of Autonomous Partial Differential Equations: Theory and Algorithms 2026-02-25T20:08:07Z

Quadratization for partial differential equations (PDEs) is a process that transforms a nonquadratic PDE into a quadratic form by introducing auxiliary variables. This symbolic transformation has been used in diverse fields to simplify the analysis, simulation, and control of nonlinear and nonquadratic PDE models. This paper presents a rigorous definition of PDE quadratization, theoretical results for the PDE quadratization problem of spatially one-dimensional PDEs-including results on existence and complexity-and introduces QuPDE, an algorithm based on symbolic computation and discrete optimization that outputs a quadratization for any spatially one-dimensional polynomial or rational PDE. This algorithm is the first computational tool to find quadratizations for PDEs to date. We demonstrate QuPDE's performance by applying it to fourteen nonquadratic PDEs in diverse areas such as fluid mechanics, space physics, chemical engineering, and biological processes. QuPDE delivers a low-order quadratization in each case, uncovering quadratic transformations with fewer auxiliary variables than those previously discovered in the literature for some examples, and finding quadratizations for systems that had not been transformed to quadratic form before.

2026-02-25T20:08:07Z Albani Olivieri Gleb Pogudin Boris Kramer http://arxiv.org/abs/2409.07563v4 MPPI-Generic: A CUDA Library for Stochastic Trajectory Optimization 2026-02-24T19:21:16Z

This paper introduces a new C++/CUDA library for GPU-accelerated stochastic optimization called MPPI-Generic. It provides implementations of Model Predictive Path Integral control, Tube-Model Predictive Path Integral Control, and Robust Model Predictive Path Integral Control, and allows for these algorithms to be used across many pre-existing dynamics models and cost functions. Furthermore, researchers can create their own dynamics models or cost functions following our API definitions without needing to change the actual Model Predictive Path Integral Control code. Finally, we compare computational performance to other popular implementations of Model Predictive Path Integral Control over a variety of GPUs to show the real-time capabilities our library can allow for. Library code can be found at: https://acdslab.github.io/mppi-generic-website/ .

2024-09-11T18:31:33Z Renamed ros2 comparisons to nav2 after feedback. Also added more tests on Jetson Orin Nano in the appendix Bogdan Vlahov Jason Gibson Manan Gandhi Evangelos A. Theodorou http://arxiv.org/abs/2603.04446v1 Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks 2026-02-24T02:23:26Z

We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrative register data, which represent very large, multilayer, mixed-mode networks with millions of nodes and billions of edges, Threadle addresses a fundamental limitation of existing network libraries: the inability to efficiently handle two-mode (bipartite) data at scale. Threadle's core innovation is a pseudo-projection approach that allows two-mode layers to be queried as if they were projected into one-mode form, without ever materializing the memory-prohibitive projection. We demonstrate that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection. Additionally, Threadle provides native support for multilayer mixed-mode networks, an integrated node attribute manager, and a CLI frontend with 50+ commands for the construction, processing, file handling, and management of very large heterogeneous networks. Threadle is freely available at https://www.threadle.dev and can either be obtained as precompiled binaries for Win, macOS and Linux, or compiled directly from source. Supplementing Threadle is threadleR, an R frontend that enables advanced sampling- and traversal-based analyses on very large, heterogeneous, multilayer, mixed-mode population-scale networks.

2026-02-24T02:23:26Z 9 pages, 1 figure, 3 listings Carl Nordlund Yukun Jiao http://arxiv.org/abs/2602.20226v1 trainsum -- A Python package for quantics tensor trains 2026-02-23T16:41:02Z

We present trainsum, a versatile Python package for doing computations with multidimensional quantics tensor trains: https://github.com/fh-igd-iet/trainsum. Using the Array API standard together with opt_einsum, trainsum allows the effortless approximation of tensors or functions by tensor trains independent of their shape or dimensionality. Once approximated, our package can perform normal arithmetic operations with quantics tensor trains, including addition, Einstein summations and element-wise transformations. It can be therefore used for generic computations with applications in simulation, data compression, machine learning and data analysis.

2026-02-23T16:41:02Z Paul Haubenwallner Matthias Heller http://arxiv.org/abs/2511.09943v2 SeQuant Framework for Symbolic and Numerical Tensor Algebra. I. Core Capabilities 2026-02-19T13:13:51Z

SeQuant is an open-source library for symbolic algebra of tensors over commutative (scalar) and non-commutative (operator) rings. The key innovation supporting most of its functionality is a graph-theoretic tensor network (TN) canonicalizer that can handle tensor networks with symmetries faster than their standard group-theoretic counterparts. The TN canonicalizer is used for routine simplification of conventional tensor expressions, for optimizing application of Wick's theorem (used to canonicalize products of tensors over operator fields), and for manipulation of the intermediate representation leading to the numerical evaluation. Notable features of SeQuant include support for noncovariant tensor networks (which often arise from tensor decompositions) and for tensors with modes that depend parametrically on indices of other tensor modes (such dependencies between degrees of freedom are naturally viewed as nesting of tensors, "tensors of tensors" arising in block-wise data compressions in data science and modern quantum simulation). SeQuant blurs the line between pure symbolic manipulation/code generation and numerical evaluation by including compiler-like components to optimize and directly interpret tensor expressions using external numerical tensor algebra frameworks. The SeQuant source code is available at https://github.com/ValeevGroup/SeQuant.

2025-11-13T04:17:05Z Bimal Gaudel Robert G. Adam Ajay Melekamburath Conner Masteran Nakul Teke Azam Besharatnik Andreas Köhn Edward F. Valeev 10.1063/5.0311913 http://arxiv.org/abs/2602.17151v1 ARCANE: Scalable high-degree cubature formulae for simulating SDEs without Monte Carlo error 2026-02-19T07:51:48Z

Monte Carlo sampling is the standard approach for estimating properties of solutions to stochastic differential equations (SDEs), but accurate estimates require huge sample sizes. Lyons and Victoir (2004) proposed replacing independently sampled Brownian driving paths with "cubature formulae", deterministic weighted sets of paths that match Brownian "signature moments" up to some degree $D$. They prove that cubature formulae exist for arbitrary $D$, but explicit constructions are difficult and have only reached $D=7$, too small for practical use. We present ARCANE, an algorithm that efficiently and automatically constructs cubature formulae of arbitrary degree. It reproduces the state of the art in seconds and reaches $\boldsymbol{D=19}$ within hours on modest hardware. In simulations across multiple different SDEs and error metrics, our cubature formulae robustly achieve an error orders of magnitude smaller than Monte Carlo with the same number of paths.

2026-02-19T07:51:48Z 57 pages Peter Koepernik Thomas Coxon James Foster http://arxiv.org/abs/2602.15613v1 Algorithmic differentiation for domain specific languages in C++ with expression templates 2026-02-17T14:42:07Z

The application of operator overloading algorithmic differentiation (AD) to computer programs in order to compute the derivative is quite common. But, the replacement of the underlying computational floating point type with the specialized type of an AD tool has two problems. First, the memory structure of the program is changed and floating-point data is interleaved with identifiers from AD. This prevents the compiler from performing optimizations such as SIMD optimizations. Second, the AD tool does not see any domain-specific operations, e.,g. linear algebra operations, that the program uses. This prevents the AD tool from using specialized algorithms in such places. We propose a new AD tool that is tailored to such situations. The memory structure of the primal data is retained by associating an identifier with each entity, e.,g. matrix, and not with each floating point value, e.,g. element of the matrix. Operations on such entities can then be annotated and a generator is used to create the AD overloads. We demonstrate that this approach provides performance comparable to that of other specializations. In addition, the run-time factor is below the theoretical 4.5 of reverse AD for programs that are written purely with linear algebra entities and operations.

2026-02-17T14:42:07Z 14 pages, 14 figures, Max Sagebaum Nicolas R. Gauger http://arxiv.org/abs/2602.12824v1 Explicit Euclidean division algorithms for some degree 8 number rings 2026-02-13T11:19:38Z

This article focuses on some rings of integers of number fields which are known to be norm-Euclidean domains, but for which no explicit algorithm computing the Euclidean division has yet been studied or implemented. The rings of integers we are interested in were proven to be Euclidean by H.W. Lenstra, Jr in 1978; they include the $n$-th cyclotomic rings for $n=15,20,24$. We present an algorithm performing Euclidean division in these rings based on Lenstra's proof and a closest vector computation by Conway and Sloane, and study its complexity. We give a complete implementation of the algorithm in SageMath. We also estimate the size of the remainders obtained when computing Euclidean divisions with this algorithm.

2026-02-13T11:19:38Z Christophe Levrat http://arxiv.org/abs/2602.12178v1 Systematic Analysis of Penalty-Optimised Illumination Design for Tomographic Volumetric Additive Manufacturing via the Extendable Framework TVAM AID Using the Core Imaging Library 2026-02-12T17:09:52Z

Tomographic Volumetric Additive Manufacturing(TVAM) is a novel manufacturing method that allows for the fast creation of objects of complex geometry in layerless fashion. The process is based on the solidification of photopolymer that occurs when a sufficient threshold dose of light-energy is absorbed. In order to create complex shapes, an illumination plan must be designed to force solidification in some desired areas while leaving other regions liquid. Determining an illumination plan can be considered as an optimisation problem where a variety of objective functionals (penalties) can be used. This work considers a selection of penalty functions and their impact on selected printing metrics; linking the shape of penalty functions to ranges of light-energy dose levels in in-part regions that should be printed and out-of-part regions that should remain liquid. Further, the threshold parameters that are typically used to demarcate minimum light-energy for in-part regions and maximum light-energy for out-of-part regions are investigated systematically as design parameters on both existing and new methods. This enables the characterisation of their effects on some selected printing metrics as well as informed selection for default values. This work is underpinned by a reproducible and extensible framework, TVAM Adaptive Illumination Design(TVAM AID), which makes use of the open-source Core Imaging Library(CIL) that is designed for tomographic imaging with an emphasis on reconstruction. The foundation of TVAM AID which is presented here can hence be easily enhanced by existing functionality in CIL thus lowering the barrier to entry and encouraging use of strategies that already exist for reconstruction optimisation.

2026-02-12T17:09:52Z 22 Pages, 19 Figures Nicole Pellizzon Richard Huber Jon Spangenberg Jakob Sauer Jørgensen http://arxiv.org/abs/2602.11843v1 Fast Evaluation of Truncated Neumann Series by Low-Product Radix Kernels 2026-02-12T11:37:31Z

Truncated Neumann series $S_k(A)=I+A+\cdots+A^{k-1}$ are used in approximate matrix inversion and polynomial preconditioning. In dense settings, matrix-matrix products dominate the cost of evaluating $S_k$. Naive evaluation needs $k-1$ products, while splitting methods reduce this to $O(\log k)$. Repeated squaring, for example, uses $2\log_2 k$ products, so further gains require higher-radix kernels that extend the series by $m$ terms per update. Beyond the known radix-5 kernel, explicit higher-radix constructions were not available, and the existence of exact rational kernels was unclear. We construct radix kernels for $T_m(B)=I+B+\cdots+B^{m-1}$ and use them to build faster series algorithms. For radix 9, we derive an exact 3-product kernel with rational coefficients, which is the first exact construction beyond radix 5. This kernel yields $5\log_9 k=1.58\log_2 k$ products, a 21% reduction from repeated squaring. For radix 15, numerical optimization yields a 4-product kernel that matches the target through degree 14 but has nonzero spillover (extra terms) at degrees $\ge 15$. Because spillover breaks the standard telescoping update, we introduce a residual-based radix-kernel framework that accommodates approximate kernels and retains coefficient $(μ_m+2)/\log_2 m$. Within this framework, radix 15 attains $6/\log_2 15\approx 1.54$, the best known asymptotic rate. Numerical experiments support the predicted product-count savings and associated runtime trends.

2026-02-12T11:37:31Z Piyush Sao http://arxiv.org/abs/2602.11381v1 MEmilio -- A high performance Modular EpideMIcs simuLatIOn software for multi-scale and comparative simulations of infectious disease dynamics 2026-02-11T21:15:26Z

Epidemic and pandemic preparedness with rapid outbreak response rely on timely, trustworthy evidence. Mathematical models are crucial for supporting timely and reliable evidence generation for public health decision-making with models spanning approaches from compartmental and metapopulation models to detailed agent-based simulations. Yet, the accompanying software ecosystem remains fragmented across model types, spatial resolutions, and computational targets, making models harder to compare, extend, and deploy at scale. Here we present MEmilio, a modular, high-performance framework for epidemic simulation that harmonizes the specification and execution of diverse dynamic epidemiological models within a unified and harmonized architecture. MEmilio couples an efficient C++ simulation core with coherent model descriptions and a user-friendly Python interface, enabling workflows that run on laptops as well as high-performance computing systems. Standardized representations of space, demography, and mobility support straightforward adaptations in resolution and population size, facilitating systematic inter-model comparisons and ensemble studies. The framework integrates readily with established tools for uncertainty quantification and parameter inference, supporting a broad range of applications from scenario exploration to calibration. Finally, strict software-engineering practices, including extensive unit and continuous integration testing, promote robustness and minimize the risk of errors as the framework evolves. By unifying implementations across modeling paradigms, MEmilio aims to lower barriers to reuse and generalize models, enable principled comparisons of implicit assumptions, and accelerate the development of novel approaches that strengthen modeling-based outbreak preparedness.

2026-02-11T21:15:26Z 47 pages, 6 figures Julia Bicker Carlotta Gerstein David Kerkmann Sascha Korf René Schmieding Anna Wendler Henrik Zunker Daniel Abele Maximilian Betz Khoa Nguyen Lena Plötzke Kilian Volmer Agatha Schmidt Nils Waßmuth Patrick Lenz Daniel Richter Hannah Tritzschak Ralf Hannemann-Tamas Julian Litz Paul Johannssen Marielena Borges Annika Jungklaus Manuel Heger Annalena Lange Elisabeth Kluth Kathrin Rack Vincent Wieland Jonas Arruda Sebastian Binder Margrit Klitz Martin Siggel Manuel Dahmen Achim Basermann Michael Meyer-Hermann Jan Hasenauer Martin J. Kühn http://arxiv.org/abs/2506.14976v2 New Time Integrators and Capabilities in SUNDIALS Versions 6.2.0-7.4.0 2026-02-10T00:01:25Z

SUNDIALS is a well-established numerical library that provides robust and efficient time integrators and nonlinear solvers. This paper overviews several significant improvements and new features added over the last three years to support scientific simulations run on high-performance computing systems. Notably, three new classes of one-step methods have been implemented: low storage Runge-Kutta, symplectic partitioned Runge-Kutta, and operator splitting. In addition, we describe new time step adaptivity support for multirate methods, adjoint sensitivity analysis capabilities for explicit Runge-Kutta methods, additional options for Anderson acceleration in nonlinear solvers, and improved error handling and logging.

2025-06-17T21:04:31Z Steven B. Roberts Mustafa Ağgül Daniel R. Reynolds Cody J. Balos David J. Gardner Carol S. Woodward 10.1145/3797888 http://arxiv.org/abs/2402.01373v3 cmaes: A Simple yet Practical Python Library for CMA-ES 2026-02-07T02:16:59Z

The covariance matrix adaptation evolution strategy (CMA-ES) has been highly effective in black-box continuous optimization, as demonstrated by its success in both benchmark problems and various real-world applications. To address the need for an accessible and powerful tool in this domain, we developed cmaes, a simple and practical Python library for CMA-ES. cmaes is characterized by its simplicity, offering intuitive use and high code readability. This makes it suitable for quick use of CMA-ES, as well as for educational purposes and seamless integration into other libraries. Despite its simple design, cmaes maintains advanced functionality. It incorporates recent advancements in CMA-ES, such as learning rate adaptation for challenging scenarios, transfer learning, mixed-variable optimization, and multi-objective optimization capabilities. These advanced features are accessible through a user-friendly API, ensuring that cmaes can be easily adopted in practical applications. We present cmaes as a strong candidate for a practical Python CMA-ES library aimed at practitioners. The software is available under the MIT license at https://github.com/CyberAgentAILab/cmaes.

2024-02-02T12:55:10Z Masahiro Nomura Masashi Shibata Ryoki Hamano http://arxiv.org/abs/2602.05490v1 Report on the second Toulouse Tensor Workshop 2026-02-05T09:51:39Z

This report documents the program of the second Toulouse Tensor Workshop which took place at the University of Toulouse on September 17-19, 2025, and summarizes the main points of discussion. This workshop follows the first Workshop (CECAM workshop on Tensor Contraction Library Standardization), which took place in Toulouse one year earlier, on May 24-25, 2024 and led to the formation of a tensor standardization working group, which has since specified a low-level standard interface for tensor operations available freely on GitHub. The 2025 workshop brought together developers of applications which rely extensively on tensor computations such as quantum many-body simulations in chemistry and physics (material science and electronic structure calculations), as well as developers and experts of tensor software who have the know-how to provide the technical support for such applications. The workshop enabled the community to provide feedback on the specified low-level interface and how it can be further refined. It also initiated a discussion on how the standardization efforts should be oriented in the near feature, in particular on what should be higher-level interfaces and how to tackle other requirements of the community such as tensor decompositions, symmetric tensors and structured sparsity support.

2026-02-05T09:51:39Z 27 pages Jan Brandejs Trond Saue Andre Severo Pereira Gomes Lucas Visscher Paolo Bientinesi http://arxiv.org/abs/2602.01996v1 Optimizing Tensor Train Decomposition in DNNs for RISC-V Architectures Using Design Space Exploration and Compiler Optimizations 2026-02-02T11:56:36Z

Deep neural networks (DNNs) have become indispensable in many real-life applications like natural language processing, and autonomous systems. However, deploying DNNs on resource-constrained devices, e.g., in RISC-V platforms, remains challenging due to the high computational and memory demands of fully connected (FC) layers, which dominate resource consumption. Low-rank factorization (LRF) offers an effective approach to compressing FC layers, but the vast design space of LRF solutions involves complex trade-offs among FLOPs, memory size, inference time, and accuracy, making the LRF process complex and time-consuming. This paper introduces an end-to-end LRF design space exploration methodology and a specialized design tool for optimizing FC layers on RISC-V processors. Using Tensor Train Decomposition (TTD) offered by TensorFlow T3F library, the proposed work prunes the LRF design space by excluding first, inefficient decomposition shapes and second, solutions with poor inference performance on RISC-V architectures. Compiler optimizations are then applied to enhance custom T3F layer performance, minimizing inference time and boosting computational efficiency. On average, our TT-decomposed layers run 3x faster than IREE and 8x faster than Pluto on the same compressed model. This work provides an efficient solution for deploying DNNs on edge and embedded devices powered by RISC-V architectures.

2026-02-02T11:56:36Z 36 pages, 16 figures, this is the author-accepted version of the article published in ACM Transactions on Embedded Computing Systems (TECS), Vol. 24, No. 6 ACM Transactions on Embedded Computing Systems 24, 6, Article 171 (October 2025), 34 pages Theologos Anthimopoulos Milad Kokhazadeh Vasilios Kelefouras Benjamin Himpel Georgios Keramidas 10.1145/3768624