https://arxiv.org/api/ZBmgWuE/WNrLzUNWHqNq5nCoYyc2026-06-21T16:32:36Z266416515http://arxiv.org/abs/2512.21678v2Some Patterns of Duplications in the outputs of Mersenne Twister Pseudorandom Number Generator MT199372026-01-02T00:34:40ZThe Mersenne Twister MT19937 pseudorandom number generator, introduced by the last two authors in 1998, is still widely used. It passes all existing statistical tests, except for the linear complexity test, which measures the ratio of the even-odd of the number of 1's among specific bits (and hence should not be important for most applications). Harase reported that MT19937 is rejected by some birthday-spacing tests, which are rather artificially designed. In this paper, we report that MT19937 fails in a natural test based on the distribution of run-lengths on which we found an identical value in the output 32-bit integers. The number of observations of the run-length 623 is some 40 times larger than the expectation (and than the numbers of the observations of 622 and 624, etc.), which implies that the corresponding p-value is almost 0.
We mathematically analyze the phenomena, and obtain a theorem which explains these failures. It seems not to be a serious defect of MT19937, because finding the defect requires astronomical efforts. Still, the phenomena should be reported to the academic society relating to pseudorandom number generation.2025-12-25T13:45:06Z8 pages, no figureAlain SchumacherTakuji NishimuraMakoto Matsumotohttp://arxiv.org/abs/2505.19304v2f4ncgb: High Performance Gröbner Basis Computations in Free Algebras2026-01-01T22:11:41ZWe present f4ncgb, a new open-source C++ library for Gröbner basis computations in free algebras, which transfers recent advancements in commutative Gröbner basis software to the noncommutative setting. As our experiments show, f4ncgb establishes a new state-of-the-art for noncommutative Gröbner basis computations. We also discuss implementation details and design choices.2025-05-25T20:24:22Z21 pages, 2 figures, 3 tablesMaximilian HeisingerClemens Hofstadlerhttp://arxiv.org/abs/2410.09053v3Fast Symbolic Integer-Linear Spectra2025-12-31T22:14:03ZHere we contribute a fast symbolic eigenvalue solver for matrices whose eigenvalues are $\mathbb{Z}$-linear combinations of their entries, alongside efficient general and stochastic $M^{X}$ generators. Users can interact with a few degrees of freedom to create linear operators, making high-dimensional symbolic analysis feasible for when numerical analyses are insufficient.2024-09-18T06:59:39ZJonny LuntzelAbraham Millerhttp://arxiv.org/abs/2508.12835v2Rapid Variable Resolution Particle Initialization for Complex Geometries2025-12-31T13:35:04ZThe accuracy of meshless methods like Smoothed Particle Hydrodynamics (SPH) is highly dependent on the quality of the particle distribution. Existing particle initialization techniques often struggle to simultaneously achieve adaptive resolution, handle intricate boundaries, and efficiently generate well-packed distributions inside and outside a boundary. This work presents a fast and robust particle initialization method that achieves these goals using standard SPH building blocks. Our approach enables simultaneous initialization of fluid and solid regions, supports arbitrary geometries, and achieves high-quality, quasi-uniform particle arrangements without complex procedures like surface bonding. Extensive results in both 2D and 3D demonstrate that the obtained particle distributions exhibit good boundary conformity, low spatial disorder, and minimal density variation, all with significantly reduced computational cost compared to existing approaches. This work paves the way for automated particle initialization to accurately model flow in and around bodies with meshless methods, particularly with SPH.2025-08-18T11:19:10Z39 pages, 24 figuresComputer Physics Communications 320 (2026) 109992Navaneet VillodiPrabhu Ramachandran10.1016/j.cpc.2025.109992http://arxiv.org/abs/2510.16172v2Fast, Differentiable, GPU-Accelerated Ray Tracing for Multiple Diffraction and Reflection Paths2025-12-31T13:24:02ZWe present a fast, differentiable, GPU-accelerated optimization method for ray path tracing in environments containing planar reflectors and straight diffraction edges. Based on Fermat's principle, our approach reformulates the path-finding problem as the minimization of total path length, enabling efficient parallel execution on modern GPU architectures. Unlike existing methods that require separate algorithms for reflections and diffractions, our unified formulation maintains consistent problem dimensions across all interaction sequences, making it particularly suitable for vectorized computation. Through implicit differentiation, we achieve efficient gradient computation without differentiating through solver iterations, significantly outperforming traditional automatic differentiation approaches. Numerical simulations demonstrate convergence rates comparable to specialized Newton methods while providing superior scalability for large-scale applications. The method integrates seamlessly with differentiable programming libraries such as JAX and DrJIT, enabling new possibilities in inverse design and optimization for wireless propagation modeling. The source code is openly available at https://github.com/jeertmans/fpt-jax.2025-10-17T19:27:00Z5 pages, 3 figures, accepted at EuCAP 2026Jérome EertmansSophie LequeuBenoît LegatLaurent JacquesClaude Oestgeshttp://arxiv.org/abs/2505.12980v2Algorithms for Nonlinear Mixed-Integer Location Estimation2025-12-31T06:05:43ZFor three decades, carrier-phase observations have been used to obtain the most accurate location estimates using global navigation satellite systems (GNSS). These estimates are computed by minimizing a nonlinear mixed-integer least-squares problem. Existing algorithms linearize the problem, orthogonally project it to eliminate real variables, and then solve the integer least-square problem. There is now considerable interest in developing similar localization techniques for terrestrial and indoor settings. We show that algorithms that linearize first fail in these settings and we propose several algorithms for computing the estimates. Some of our algorithms are elimination algorithms that start by eliminating the non-linear terms in the constraints; others construct a geometric arrangement that allows us to efficiently enumerate integer solutions (in polynomial time). We focus on simplified localization problems in which the measurements are range (distance) measurements and carrier phase range measurements, with no nuisance parameters. The simplified problem allows us to focus on the core question of untangling the nonlinearity and the integer nature of some parameters. We show using simulations that the new algorithms are effective at close ranges at which the linearize-first approach fails.2025-05-19T11:17:17ZOphir UzielEfi FogelDan HalperinSivan Toledohttp://arxiv.org/abs/2511.03566v3Improving Directions in Mixed Integer Bilevel Linear Optimization2025-12-30T18:33:53ZWe consider the central role of improving directions in solution methods for mixed integer bilevel linear optimization problems (MIBLPs). Current state-of-the-art methods for solving MIBLPs employ the branch-and-cut framework originally developed for solving mixed integer linear optimization problems. This approach relies on oracles for two kinds of subproblems: those for checking whether a candidate pair of leader's and follower's decisions is bilevel feasible, and those required for generating valid inequalities. Typically, these two types of oracles are managed separately, but in this work, we explore their close connection and propose a solution framework based on solving a single type of subproblem: determining whether there exists a so-called improving feasible direction for the follower's problem. Solution of this subproblem yields information that can be used both to check feasibility and to generate strong valid inequalities. Building on prior works, we expose the foundational role of improving directions in enforcing the follower's optimality condition and extend a previously known hierarchy of optimality-based relaxations to the mixed-integer setting, showing that the associated relaxed feasible regions coincide exactly with the closure associated with intersection cuts derived from improving directions. Numerical results with an implementation using a modified version of the open source solver MibS show that this approach can yield practical improvements.2025-11-05T15:48:56ZFederico BattistaTed K. Ralphshttp://arxiv.org/abs/2312.04018v4Ricci-Notation Tensor Framework for Model-based Approaches to Imaging2025-12-29T22:57:04ZModel-based approaches to imaging, like specialized image enhancements in astronomy, facilitate explanations of relationships between observed inputs and computed outputs. These models may be expressed with extended matrix-vector (EMV) algebra, especially when they involve only scalars, vectors, and matrices, and with n-mode or index notations, when they involve multidimensional arrays, also called numeric tensors or, simply, tensors. While this paper features an example, inspired by exoplanet imaging, that employs tensors to reveal (inverse) 2D fast Fourier transforms in an image enhancement model, the work is actually about the tensor algebra and software, or tensor frameworks, available for model-based imaging. The paper proposes a Ricci-notation tensor (RT) framework, comprising a dual-variant index notation, with Einstein summation convention, and codesigned object-oriented software, called the RTToolbox for MATLAB. Extensions to Ricci notation offer novel representations for entrywise, pagewise, and broadcasting operations popular in EMV frameworks for imaging. Complementing the EMV algebra computable with MATLAB, the RTToolbox demonstrates programmatic and computational efficiency via careful design of numeric tensor and dual-variant index classes. Compared to its closest competitor, also a numeric tensor framework that uses index notation, the RT framework enables superior ways to model imaging problems and, thereby, to develop solutions.2023-12-07T03:25:40Z15 pages, 7 figures, 5 tablesJournal of Imaging Science and Technology, 68(4), 2024Dileepan JosephElectrical and Computer Engineering, University of Alberta10.2352/J.ImagingSci.Technol.2024.68.4.040504http://arxiv.org/abs/2502.03000v5Armadillo: An Efficient Framework for Numerical Linear Algebra2025-12-29T13:37:07ZA major challenge in the deployment of scientific software solutions is the adaptation of research prototypes to production-grade code. While high-level languages like MATLAB are useful for rapid prototyping, they lack the resource efficiency required for scalable production applications, necessitating translation into lower level languages like C++. Further, for machine learning and signal processing applications, the underlying linear algebra primitives, generally provided by the standard BLAS and LAPACK libraries, are unwieldy and difficult to use, requiring manual memory management and other tedium. To address this challenge, the Armadillo C++ linear algebra library provides an intuitive interface for writing linear algebra expressions that are easily compiled into efficient production-grade implementations. We describe the expression optimisations we have implemented in Armadillo, exploiting template metaprogramming. We demonstrate that these optimisations result in considerable efficiency gains on a variety of benchmark linear algebra expressions.2025-02-05T08:52:37ZInternational Conference on Computer and Automation Engineering, 2025Conrad SandersonRyan Curtin10.1109/ICCAE64891.2025.10980539http://arxiv.org/abs/2507.09435v2GeoWarp: An automatically differentiable and GPU-accelerated implicit MPM framework for geomechanics based on NVIDIA Warp2025-12-27T05:18:52ZThe material point method (MPM), a hybrid Lagrangian-Eulerian particle method, is increasingly used to simulate large-deformation and history-dependent behavior of geomaterials. While explicit time integration dominates current MPM implementations due to its algorithmic simplicity, such schemes are unsuitable for quasi-static and long-term processes typical in geomechanics. Implicit MPM formulations are free of these limitations but remain less adopted, largely due to the difficulty of computing the Jacobian matrix required for Newton-type solvers, especially when consistent tangent operators should be derived for complex constitutive models. In this paper, we introduce GeoWarp -- an implicit MPM framework for geomechanics built on NVIDIA Warp -- that exploits GPU parallelism and reverse-mode automatic differentiation to compute Jacobians without manual derivation. To enhance efficiency, we develop a sparse Jacobian construction algorithm that leverages the localized particle-grid interactions intrinsic to MPM. The framework is verified through forward and inverse examples in large-deformation elastoplasticity and coupled poromechanics. Results demonstrate that GeoWarp provides a robust, scalable, and extensible platform for differentiable implicit MPM simulation in computational geomechanics.2025-07-13T00:11:36ZAdv. Eng. Softw. 212 (2026) 104072Yidong ZhaoXuan LiChenfanfu JiangJinhyun Choo10.1016/j.advengsoft.2025.104072http://arxiv.org/abs/2109.04193v2OGRe: An Object-Oriented General Relativity Package for Mathematica2025-12-25T02:09:03ZWe present OGRe, a modern Mathematica package for tensor calculus, designed to be both powerful and user-friendly. The package can be used in a variety of contexts where tensor calculations are needed, in both mathematics and physics, but it is especially suitable for general relativity. By implementing an object-oriented design paradigm, OGRe allows calculating arbitrarily complicated tensor formulas easily, and automatically transforms between index configurations and coordinate systems behind the scenes as needed, eliminating user errors by making it impossible for the user to combine tensors in inconsistent ways. Other features include displaying tensors in various forms, automatic calculation of curvature tensors and geodesic equations, easy importing and exporting of tensors between sessions, optimized algorithms and parallelization for improved performance, and more.2021-09-06T00:31:23Z4 pages, final version published in JOSS. NOTE: The software has been updated since this publication. Full and up-to-date documentation and source code for the latest version are available at https://github.com/bshoshany/OGReJournal of Open Source Software, 6(65), 3416 (2021)Barak Shoshany10.21105/joss.03416http://arxiv.org/abs/2409.03803v3OGRePy: An Object-Oriented General Relativity Package for Python2025-12-24T23:02:19ZOGRePy is a modern, open-source Python package designed to perform symbolic tensor calculations, with a particular focus on applications in general relativity. Built on an object-oriented architecture, OGRePy encapsulates tensors, metrics, and coordinate systems as self-contained objects, automatically handling raising and lowering of indices, coordinate transformations, contractions, partial or covariant derivatives, and all tensor operations. By leveraging the capabilities of SymPy and Jupyter Notebook, OGRePy provides a robust, user-friendly environment that facilitates both research and teaching in general relativity and differential geometry. This Python package reproduces the functionality of the popular Mathematica package OGRe, while greatly improving upon it by making use of Python's native object-oriented syntax. In this paper, we describe OGRePy's design and implementation, and discuss its potential for reuse across research and education in mathematics and physics.2024-09-05T03:40:27Z4 pages, final version published in JORS. NOTE: The software has been updated since this publication. Full and up-to-date documentation and source code for the latest version are available at https://github.com/bshoshany/OGRePyJournal of Open Research Software, 13: 9 (2025)Barak Shoshany10.5334/jors.558http://arxiv.org/abs/2512.20015v1HeylandCircle: A Computational Framework for the Geometric Reconstruction of the Heyland Circle Diagram2025-12-23T03:15:50ZThe Heyland circle diagram is a classical graphical tool for representing the steady-state behavior of induction machines using no-load and blocked-rotor test data. While widely used in alternating-current machinery texts, the diagram is typically presented as a hand-constructed aid and lacks a standardized computational formulation. This paper presents HeylandCircle, a computational framework that reconstructs the classical Heyland circle diagram directly from standard test parameters. The framework formalizes the traditional geometric construction as a deterministic, reproducible sequence of geometric operations, establishing a clear mapping between measured data, fixed geometric objects, and steady-state operating points. Quantities such as power factor, slip, output power, torque, and efficiency are obtained through explicit geometric relationships on the constructed diagram. Validation using a representative textbook example demonstrates close agreement with classical results. The framework provides a computational realization of the traditional Heyland diagram suitable for instruction, analysis, and systematic extension.2025-12-23T03:15:50Z8 pages, 1 figureAnubhav GuptaAbhinav Guptahttp://arxiv.org/abs/2512.22215v1SPUMA: a minimally invasive approach to the GPU porting of OPENFOAM2025-12-22T10:04:36ZHigh Performance Computing (HPC) on hybrid clusters represents a significant opportunity for Computational Fluid Dynamics (CFD), especially when modern accelerators are utilized effectively. However, despite the widespread adoption of GPUs, programmability remains a challenge, particularly in open-source contexts. In this paper, we present SPUMA, a full GPU porting of OPENFOAM targeting NVIDIA and AMD GPUs. The implementation strategy is based on a portable programming model and the adoption of a memory pool manager that leverages the unified memory feature of modern GPUs. This approach is discussed alongside several numerical tests conducted on two pre-exascale clusters in Europe, LUMI and Leonardo, which host AMD MI250X and NVIDIA A100 GPUs, respectively. In the performance analysis section, we present results related to memory usage profiling and kernel wall-time, the impact of the memory pool, and energy consumption obtained by simulating the well-known DrivAer industrial test case. GPU utilization strongly affects strong scalability results, reaching 65% efficiency on both LUMI and Leonardo when approaching a load of 8 million cells per GPU. Weak scalability results, obtained on 20 GPUs with the OpenFOAM native multigrid solver, range from 75% on Leonardo to 85% on LUMI. Notably, efficiency is no lower than 90% when switching to the NVIDIA AmgX linear algebra solver. Our tests also reveal that one A100 GPU on Leonardo is equivalent 200-300 Intel Sapphire Rapids cores, provided the GPUs are sufficiently oversubscribed (more than 10 million of cells per GPU). Finally, energy consumption is reduced by up to 82% compared to analogous simulations executed on CPUs.2025-12-22T10:04:36Z43 pagesSimone BnàGiuseppe GiaquintoEttore FadigaTommaso ZanelliFrancesco Bottauhttp://arxiv.org/abs/2406.06095v4An extension of C++ with memory-centric specifications for HPC to reduce memory footprints and streamline MPI development2025-12-20T09:40:31ZC++ leans towards a memory-inefficient storage of structs: The compiler inserts padding bits, while it is not able to exploit knowledge about the range of integers, enums or bitsets. Furthermore, the language provides no support for arbitrary floating-point precisions. We propose a language extension based upon attributes through which developers can guide the compiler what memory arrangements would be beneficial: Can multiple booleans or integers with limited range be squeezed into one bit field, do floating-point numbers hold fewer significant bits than in the IEEE standard, and is a programmer willing to trade attribute ordering guarantees for a more compact object representation? The extension offers the opportunity to fall back to normal alignment and native C++ floating point representations via plain C++ assignments, no dependencies upon external libraries are introduced, and the resulting code remains (syntactically) standard C++. As MPI remains the de-facto standard for distributed memory calculations in C++, we furthermore propose additional attributes which streamline the MPI datatype modelling in combination with our memory optimisation extensions. Our work implements the language annotations within LLVM and demonstrates their potential impact through smoothed particle hydrodynamics benchmarks. They uncover the potential gains in terms of performance and development productivity.2024-06-10T08:26:27ZPawel K. RadtkeCristian G. Barrera-HinojosaMladen IvkovicTobias Weinzierl