https://arxiv.org/api/9IxkjJmvdGu6EVKNzC0DtGQskiM2026-06-22T03:34:03Z266431515http://arxiv.org/abs/2506.21804v1Large-Scale Simulations of Turbulent Flows using Lattice Boltzmann Methods on Heterogeneous High Performance Computers2025-06-11T11:14:01ZCurrent GPU-accelerated supercomputers promise to enable large-scale simulations of turbulent flows. Lattice Boltzmann Methods (LBM) are particularly well-suited to fulfilling this promise due to their intrinsic compatibility with highly parallel execution on both SIMD CPUs and GPUs. A novel LBM scheme for wall-modeled LES in complex geometries is described with a special focus on the efficient implementation in the open source LBM framework OpenLB. Detailed scalability results are provided for all HoreKa partitions, utilizing up to 128 nodes and covering problem sizes up to 18 billion cells.2025-06-11T11:14:01ZAnnual report of LBRG's usage of the HoreKa supercomputer within the scope of the NHR JARDS project CPE. Submitted to the HLRS Results and Review Workshop 2025Adrian KummerländerFedor BukreevYuji ShimojimaShota ItoMathias J. Krausehttp://arxiv.org/abs/2506.06373v1El0ps: An Exact L0-regularized Problems Solver2025-06-04T13:47:43ZThis paper presents El0ps, a Python toolbox providing several utilities to handle L0-regularized problems related to applications in machine learning, statistics, and signal processing, among other fields. In contrast to existing toolboxes, El0ps allows users to define custom instances of these problems through a flexible framework, provides a dedicated solver achieving state-of-the-art performance, and offers several built-in machine learning pipelines. Our aim with El0ps is to provide a comprehensive tool which opens new perspectives for the integration of L0-regularized problems in practical applications.2025-06-04T13:47:43ZThéo GuyardCédric HerzetClément Elvirahttp://arxiv.org/abs/2506.03766v1Conventional and Fuzzy Data Envelopment Analysis with deaR2025-06-04T09:29:23ZdeaR is a recently developed R package for data envelopment analysis (DEA) that implements a large number of conventional and fuzzy models, along with super-efficiency models, cross-efficiency analysis, Malmquist index, bootstrapping, and metafrontier analysis. It should be noted that deaR is the only package to date that incorporates Kao-Liu, Guo-Tanaka and possibilistic fuzzy models. The versatility of the package allows the user to work with different returns to scale and orientations, as well as to consider special features, namely non-controllable, non-discretionary or undesirable variables. Moreover, it includes novel graphical representations that can help the user to display the results. This paper is a comprehensive description of deaR, reviewing all implemented models and giving examples of use.2025-06-04T09:29:23Z41 pages, 9 figuresVicente J. BolosRafael BenitezVicente Coll-Serranohttp://arxiv.org/abs/2506.03729v1IntLevPy: A Python library to classify and model intermittent and Lévy processes2025-06-04T09:03:58ZIntLevPy provides a comprehensive description of the IntLevPy Package, a Python library designed for simulating and analyzing intermittent and Lévy processes. The package includes functionalities for process simulation, including full parameter estimation and fitting optimization for both families of processes, moment calculation, and classification methods. The classification methodology utilizes adjusted-$R^2$ and a noble performance measure Γ, enabling the distinction between intermittent and Lévy processes. IntLevPy integrates iterative parameter optimization with simulation-based validation. This paper provides an in-depth user guide covering IntLevPy software architecture, installation, validation workflows, and usage examples. In this way, IntLevPy facilitates systematic exploration of these two broad classes of stochastic processes, bridging theoretical models and practical applications.2025-06-04T09:03:58Z6 pages, 2 figuresSoftwareX 31 (2025) 102334Shailendra BhandariPedro LencastreSergiy DenysovYurii BystrykPedro G. Lind10.1016/j.softx.2025.102334http://arxiv.org/abs/2304.01906v4Torch-Choice: A PyTorch Package for Large-Scale Choice Modeling with Python2025-06-03T23:38:14ZThe $\texttt{torch-choice}$ is an open-source library for flexible, fast choice modeling with Python and PyTorch. $\texttt{torch-choice}$ provides a $\texttt{ChoiceDataset}$ data structure to manage databases flexibly and memory-efficiently. The paper demonstrates constructing a $\texttt{ChoiceDataset}$ from databases of various formats and functionalities of $\texttt{ChoiceDataset}$. The package implements two widely used models, namely the multinomial logit and nested logit models, and supports regularization during model estimation. The package incorporates the option to take advantage of GPUs for estimation, allowing it to scale to massive datasets while being computationally efficient. Models can be initialized using either R-style formula strings or Python dictionaries. We conclude with a comparison of the computational efficiencies of $\texttt{torch-choice}$ and $\texttt{mlogit}$ in R as (1) the number of observations increases, (2) the number of covariates increases, and (3) the expansion of item sets. Finally, we demonstrate the scalability of $\texttt{torch-choice}$ on large-scale datasets.2023-04-04T16:00:48ZTianyu DuAyush KanodiaSusan Atheyhttp://arxiv.org/abs/2506.02647v1Multilevel Stochastic Gradient Descent for Optimal Control Under Uncertainty2025-06-03T09:00:43ZWe present a multilevel stochastic gradient descent method for the optimal control of systems governed by partial differential equations under uncertain input data. The gradient descent method used to find the optimal control leverages a parallel multilevel Monte Carlo method as stochastic gradient estimator. As a result, we achieve precise control over the stochastic gradient's bias, introduced by numerical approximation, and its sampling error, arising from the use of incomplete gradients, while optimally managing computational resources. We show that the method exhibits linear convergence in the number of optimization steps while avoiding the cost of computing the full gradient at the highest fidelity. Numerical experiments demonstrate that the method significantly outperforms the standard (mini-) batched stochastic gradient descent method in terms of convergence speed and accuracy. The method is particularly well-suited for high-dimensional control problems, taking advantage of parallel computing resources and a distributed multilevel data structure. Additionally, we evaluate and implement different step size strategies, optimizer schemes, and budgeting techniques. The method's performance is studied using a two-dimensional elliptic subsurface diffusion problem with log-normal coefficients and Matérn covariance.2025-06-03T09:00:43ZNiklas BaumgartenDavid Schneiderhanhttp://arxiv.org/abs/2506.06495v1Optimizing Optimizations: Case Study on Detecting Specific Types of Mathematical Optimization Constraints with E-Graphs in JijModeling2025-06-02T05:11:49ZIn solving mathematical optimization problems efficiently, it is crucial to make use of information about specific types of constraints, such as the one-hot or Special-Ordered Set (SOS) constraints. In many cases, exploiting such information gives asymptotically better execution time. JijModeling, an industrial-strength mathematical optimization modeller, achieves this by separating the symbolic representation of an optimization problem from the input data. In this paper, we will report a real-world case study on a constraint detection mechanism modulo the algebraic congruence using e-graphs, and describe heuristic criteria for designing rewriting systems. We give benchmarking result that shows the performance impact of the constraint detection mechanism.
We also introduce egg_recursive, a utility library for writing egg-terms as recursive abstract syntax trees, reducing the burden of writing and maintaining complex terms in S-expressions.2025-06-02T05:11:49ZTo be presented at EGRAPHS '25 https://pldi25.sigplan.org/home/egraphs-2025Hiromi IshiiJij, IncTaro ShimizuJij, IncToshiki TeramuraJij, Inchttp://arxiv.org/abs/2502.16517v2Annotation-guided AoS-to-SoA conversions and GPU offloading with data views in C++2025-05-31T17:28:04ZThe C++ programming language provides classes and structs as fundamental modeling entities. Consequently, C++ code tends to favour array-of-structs (AoS) for encoding data sequences, even though structure-of-arrays (SoA) yields better performance for some calculations. We propose a C++ language extension based on attributes that allows developers to guide the compiler in selecting memory arrangements, i.e.~to select the optimal choice between AoS and SoA dynamically depending on both the execution context and algorithm step. The compiler can then automatically convert data into the preferred format prior to the calculations and convert results back afterward. The compiler handles all the complexity of determining which data to convert and how to manage data transformations. Our implementation realises the compiler-extension for the new annotations in Clang and demonstrates their effectiveness through a smoothed particle hydrodynamics (SPH) code, which we evaluate on an Intel CPU, an ARM CPU, and a Grace-Hopper GPU. While the separation of concerns between data structure and operators is elegant and provides performance improvements, the new annotations do not eliminate the need for performance engineering. Instead, they challenge conventional performance wisdom and necessitate rethinking approaches how to write efficient implementations.2025-02-23T09:43:23ZPawel K. RadtkeTobias Weinzierl10.1002/cpe.70199http://arxiv.org/abs/2504.07409v2RLibm-MultiRound: Correctly Rounded Math Libraries Without Worrying about the Application's Rounding Mode2025-05-29T20:45:05ZOur RLibm project generates a single implementation for an elementary function that produces correctly rounded results for multiple rounding modes and representations with up to 32-bits. They are appealing for developing fast reference libraries without double rounding issues. The key insight is to build polynomials that produce the correctly rounded result for a representation with two additional bits when compared to the largest target representation and with the "non-standard" round-to-odd rounding mode, which makes double rounding the RLibm math library result to any smaller target representation innocuous. The resulting approximations generated by the RLibm approach are implemented with machine supported floating-point operations with the round-to-nearest rounding mode. When an application uses a rounding mode other than the round-to-nearest mode, the RLibm math library saves the application's rounding mode, changes the system's rounding mode to round-to-nearest, computes the correctly rounded result, and restores the application's rounding mode. This frequent change of rounding modes has a performance cost.
This paper proposes two new methods, which we call rounding-invariant outputs and rounding-invariant input bounds, to avoid the frequent changes to the rounding mode and the dependence on the round-to-nearest mode. First, our new rounding-invariant outputs method proposes using the round-to-zero rounding mode to implement RLibm's polynomial approximations. We propose fast, error-free transformations to emulate a round-to-zero result from any standard rounding mode without changing the rounding mode. Second, our rounding-invariant input bounds method factors any rounding error due to different rounding modes using interval bounds in the RLibm pipeline. Both methods make a different set of trade-offs and improve the performance of resulting libraries by more than 2X.2025-04-10T03:02:52Z31 pagesSehyeok ParkJustin KimSantosh Nagarakattehttp://arxiv.org/abs/2505.23565v1DRO: A Python Library for Distributionally Robust Optimization in Machine Learning2025-05-29T15:39:12ZWe introduce dro, an open-source Python library for distributionally robust optimization (DRO) for regression and classification problems. The library implements 14 DRO formulations and 9 backbone models, enabling 79 distinct DRO methods. Furthermore, dro is compatible with both scikit-learn and PyTorch. Through vectorization and optimization approximation techniques, dro reduces runtime by 10x to over 1000x compared to baseline implementations on large-scale datasets. Comprehensive documentation is available at https://python-dro.org.2025-05-29T15:39:12ZJiashuo LiuTianyu WangHenry LamHongseok NamkoongJose Blanchethttp://arxiv.org/abs/2505.13315v2KHRONOS: a Kernel-Based Neural Architecture for Rapid, Resource-Efficient Scientific Computation2025-05-26T01:40:00ZContemporary models of high dimensional physical systems are constrained by the curse of dimensionality and a reliance on dense data. We introduce KHRONOS (Kernel Expansion Hierarchy for Reduced Order, Neural Optimized Surrogates), an AI framework for model based, model free and model inversion tasks. KHRONOS constructs continuously differentiable target fields with a hierarchical composition of per-dimension kernel expansions, which are tensorized into modes and then superposed. We evaluate KHRONOS on a canonical 2D, Poisson equation benchmark: across 16 to 512 degrees of freedom (DoFs), it obtained L_2-square errors of 5e-4 down to 6e-11. This represents a greater than 100-fold gain over Kolmogorov Arnold Networks (which itself reports a 100 times improvement on MLPs/PINNs with 100 times fewer parameters) when controlling for the number of parameters. This also represents a 1e6-fold improvement in L_2-square error compared to standard linear FEM at comparable DoFs. Inference complexity is dominated by inner products, yielding sub-millisecond full-field predictions that scale to an arbitrary resolution. For inverse problems, KHRONOS facilitates rapid, iterative level set recovery in only a few forward evaluations, with sub-microsecond per sample latency. KHRONOS's scalability, expressivity, and interpretability open new avenues in constrained edge computing, online control, computer vision, and beyond.2025-05-19T16:29:07ZReza T. BatleySourav Sahahttp://arxiv.org/abs/2505.17430v1SEvoBench : A C++ Framework For Evolutionary Single-Objective Optimization Benchmarking2025-05-23T03:23:10ZWe present SEvoBench, a modern C++ framework for evolutionary computation (EC), specifically designed to systematically benchmark evolutionary single-objective optimization algorithms. The framework features modular implementations of Particle Swarm Optimization (PSO) and Differential Evolution (DE) algorithms, organized around three core components: (1) algorithm construction with reusable modules, (2) efficient benchmark problem suites, and (3) parallel experimental analysis. Experimental evaluations demonstrate the framework's superior performance in benchmark testing and algorithm comparison. Case studies further validate its capabilities in algorithm hybridization and parameter analysis. Compared to existing frameworks, SEvoBench demonstrates three key advantages: (i) highly efficient and reusable modular implementations of PSO and DE algorithms, (ii) accelerated benchmarking through parallel execution, and (iii) enhanced computational efficiency via SIMD (Single Instruction Multiple Data) vectorization for large-scale problems.2025-05-23T03:23:10Z9 pages, 9 figuresYongkang YangJian ZhaoTengfei Yang10.1145/3712255.3734350http://arxiv.org/abs/2405.15573v2Uniform H-matrix Compression with Applications to Boundary Integral Equations2025-05-20T20:14:58ZBoundary integral equations lead to dense system matrices when discretized, yet they are data-sparse. Using the $\mathcal{H}$-matrix format, this sparsity is exploited to achieve $\mathcal{O}(N\log N)$ complexity for storage and multiplication by a vector. This is achieved purely algebraically, based on low-rank approximations of subblocks, and hence the format is also applicable to a wider range of problems. The $\mathcal{H}^2$-matrix format improves the complexity to $\mathcal{O}(N)$ by introducing a recursive structure onto subblocks on multiple levels. However, in many cases this comes with a large proportionality constant, making the $\mathcal{H}^2$-matrix format advantageous mostly for large problems. In this paper we investigate the usefulness of a matrix format that lies in between these two: Uniform $\mathcal{H}$-matrices. An algebraic compression algorithm is introduced to transform a regular $\mathcal{H}$-matrix into a uniform $\mathcal{H}$-matrix, which maintains the asymptotic complexity. Using examples of the BEM formulation of the Helmholtz equation, we show that this scheme lowers the storage requirement and execution time of the matrix-vector product without significantly impacting the construction time.2024-05-24T14:03:42ZKobe BruyninckxDaan HuybrechsKarl Meerbergen10.1137/24M1665209http://arxiv.org/abs/2505.14790v1Generalised Burnside and Dixon algorithms for irreducible projective representations2025-05-20T18:00:39ZBased on the recently proposed character theory of projective representations of finite groups proposed, we generalise several algorithms for computing character tables and matrices of irreducible linear representations to projective representations. In particular, we present an algorithm based on that of Burnside to compute the characters of all irreducible projective representations of a finite group with a given Schur multiplier, and transpose it to exact integer arithmetic following Dixon's character table algorithm. We also describe an algorithm based on that of Dixon to split a projective representation into irreducible subspaces in floating-point arithmetic, and discuss how it can be used to compute matrices for all projective irreps with a given multiplier. Our algorithms bypass the construction of the representation group of the Schur multiplier, which makes them especially attractive for floating-point computations, where exact values of the multiplier are not necessarily available.2025-05-20T18:00:39Z11 pagesAttila Szabóhttp://arxiv.org/abs/2505.13980v1Symbolic and Numerical Tools for $L_{\infty}$-Norm Calculation2025-05-20T06:26:45ZThe computation of the $L_\infty $-norm is an important issue in $H_{\infty}$ control, particularly for analyzing system stability and robustness. This paper focuses on symbolic computation methods for determining the $L_{\infty} $-norm of finite-dimensional linear systems, highlighting their advantages in achieving exact solutions where numerical methods often encounter limitations. Key techniques such as Sturm-Habicht sequences, Rational Univariate Representations (RUR), and Cylindrical Algebraic Decomposition (CAD) are surveyed, with an emphasis on their theoretical foundations, practical implementations, and specific applicability to $ L_{\infty} $-norm computation. A comparative analysis is conducted between symbolic and conventional numerical approaches, underscoring scenarios in which symbolic computation provides superior accuracy, particularly in parametric cases. Benchmark evaluations reveal the strengths and limitations of both approaches, offering insights into the trade-offs involved. Finally, the discussion addresses the challenges of symbolic computation and explores future opportunities for its integration into control theory, particularly for robust and stable system analysis.2025-05-20T06:26:45ZGrace YounesAlban QuadratFabrice Rouillier