https://arxiv.org/api/l6lC/PIJnQeT+sNA0A8ROtktyu82026-06-13T10:39:33Z26521515http://arxiv.org/abs/2606.05466v1Look Before You Leap: Checking in on Type Tag Checking2026-06-03T21:44:02ZTagging of generic dynamic values is important in symbolic-computation and dynamic-language systems, but the trade-offs change as machine architectures and workloads evolve. In particular, old folklore about boxed values, immediate values, and type tags must be recalibrated from time to time. We revisit the performance of badged object headers, low-bit tagging, and two NaN-boxing layouts on a range of platforms in use today, including AArch64 and x86-64 architectures from different manufacturers. The experiments isolate two distinct effects: the cost avoided by not heap-allocating common scalar values, and the cost avoided by obtaining tag information from the value word rather than by performing a heap read. The results show that several local bit operations are often cheaper than opening a heap object to obtain a tag or small value. Low-bit tagging remains the simplest and usually fastest choice for mostly symbolic workloads, while NaN-boxing is close in access cost and avoids the time and space of heap allocation for ordinary floating-point values.2026-06-03T21:44:02ZStephen M. Watthttp://arxiv.org/abs/2606.05017v1GoldenFloat: A Phi-Derived Static-Split Floating-Point Family from GF4 to GF256 with a Lucas-Exact Integer Identity2026-06-03T15:41:16ZWe present a hardware-oriented description of GoldenFloat (GF), a static-split floating-point family generated by a single closed rule, and three concrete artefacts: (i) an open multi-width RTL generator covering GF4-GF256 with a continuous-integration differential sweep against a correctly-rounded reference; (ii) an integer-backed Lucas-exact accumulator path verified at 500-digit precision for n = 1, ..., 256; and (iii) a GF16 FPGA codec passing a 35-of-35 testbench at 323 MHz on Artix-7 (Xilinx XC7A35T). For each total width N >= 4, the exponent width is e = round((N-1)/phi^2) with fraction f = N-1-e and phi = (1+sqrt(5))/2. The rule reproduces the realised exponent widths of nine formats (9/9) and extends consistently to GF128, GF512, GF1024. The rule is positioned alongside posit, takum, OCP-MX, and the IEEE P3109 multi-width float draft. We make no per-rung accuracy or superiority claim against any of them. The breadth/toolchain-coherence framing is recorded as an open conjecture with a pre-registered falsification path. A falsification ledger (FL-002) records open questions and the experiments that would settle them. An RTL-correctness erratum dated 2026-05-31 is reported; the fabricated TTSKY26b dies carry the defective multiplier portfolio, and the corrected generator is the regeneration baseline.2026-06-03T15:41:16Z19 pages, single-file LaTeX, ASCII source. RTL generator and CI artefacts at github.com/gHashTag/goldenfloat-preprintDmitrii Vasilievhttp://arxiv.org/abs/2606.04670v1Fitting scattered data with optional monotonicity constraints on GPU: LipFit package2026-06-03T09:51:56ZThis paper presents a method of multivariate scattered data interpolation and approximation that produces optimal Lipschitz-continuous approximation, subject to the desired monotonicity constraints. This method relies on tight upper and lower approximations to the data, and is similar in its spirit to the nearest-neighbour approximation but does not suffer from discontinuities. Local Lipschitz interpolation and Lipschitz smoothing are also presented. This approach falls under the umbrella of instance-based approximation with no training phase, and it is suitable for GPU-based parallelisation. A Python GPU-friendly package LipFit which implements the methods discussed is discussed.2026-06-03T09:51:56ZGleb Beliakovhttp://arxiv.org/abs/2601.08082v3Hierarchical Recursive Precision for Accelerating Symmetric Linear Solves on MXUs2026-05-29T20:18:39ZSymmetric positive-definite system solvers based on Cholesky factorization are fundamental to many scientific applications, such as climate modeling. We present a portable, nested recursive mixed-precision solver designed for Matrix Processing Units (MXUs), including NVIDIA Tensor Cores (H200) and AMD Matrix Cores (MI300X), that assigns low-precision FP16 arithmetic to large off-diagonal blocks, while preserving high precision on diagonal blocks to ensure numerical stability. The solver is implemented in Julia, providing a high-level, hardware-agnostic interface. We demonstrate up to a 5.07x speedup relative to the diagonal-precision vendor baseline, with 100x better accuracy than pure half precision on H200, providing higher accuracy than low-precision at higher speed than high-precision. Positive performance trends are also observed on MI300X, demonstrating broad applicability across GPUs.2026-01-12T23:46:20Z10 pages, 11 figuresVicki CarricaRabab AlomairyEvelyne RingootAlan Edelmanhttp://arxiv.org/abs/2605.00172v2FitED: A User-Centric, Extensible Software Environment for Robust Peak-Profile and General Functional Data Fitting2026-05-29T14:14:45ZReliable parameter extraction from experimental data is essential for quantitative analysis across spectroscopy, diffraction, photoluminescence, chromatography, microscopy, and time-resolved measurements. However, nonlinear fitting often remains difficult to reproduce, especially when complex models, correlated parameters, uncertain derived quantities, and user-dependent fitting choices are involved. We present FitED, a Python-based desktop application for nonlinear fitting of one-dimensional scientific data that combines an accessible graphical interface with a transparent and flexible numerical backend. FitED supports conventional peak profiles, including Gaussian, Lorentzian, Pseudo-Voigt, and exact area-normalized Voigt functions, as well as arbitrary user-defined analytical models for broader experimental applications. The software integrates local and global-search-assisted optimization strategies, automated model initialization, repeated stability testing, parameter-correlation analysis, and covariance-based propagation of uncertainty for derived quantities. By combining interactive usability with uncertainty-aware analysis and structured export of fitting results, FitED provides a practical platform for reproducible and interpretable fitting of experimental data. The software is intended to support both routine analysis and advanced model evaluation while preserving the parameter-level control required by experimental researchers.2026-04-30T19:44:02ZMustafa Mahmoud Aboulsaadhttp://arxiv.org/abs/2605.23830v2IntegrateUnitary.jl: A Julia package for symbolic integration over Haar measures2026-05-29T14:12:05ZSymbolic integration over the Haar measure of compact groups is a computational cornerstone in quantum information science and random matrix theory. We present \texttt{IntegrateUnitary.jl}, a comprehensive Julia package for computing exact expectations of polynomial functions over a wide range of compact groups ($U(d)$, $O(d)$, $Sp(d)$, and $SU(d)$ for balanced polynomials), circular and Gaussian ensembles, Ginibre ensembles, permutation groups, random pure states, and unitary $t$-designs. The package provides a fully open-source implementation of the Weingarten calculus and Wick contractions with broad symbolic-$d$ support for entry-wise and trace-polynomial integrals, while selected workflows currently require concrete integer dimensions (including higher pure trace moments $|\mathrm{tr}(U)|^{2k}$ for $k > 1$ and HCIZ with \texttt{SymbolicMatrix} inputs, and direct matrix-valued integration of \texttt{SymbolicMatrix}/\texttt{SymbolicMatrixProduct} expressions), automatic asymptotic expansions, a high-level symbolic trace interface that reconstructs Weingarten graphs from index-free expressions, and a bridge to \texttt{ITensors.jl} for tensor network averaging. We discuss the underlying algorithms, including the Murnaghan-Nakayama rule and symplectic-orthogonal duality, and demonstrate that the package efficiently handles high-degree moments and quantum information metrics.2026-05-22T16:36:35ZŁukasz PawelaZbigniew Puchałahttp://arxiv.org/abs/2605.22378v2Fast computation of Ehrhart polynomials of Gelfand--Tsetlin polytopes via Macdonald reciprocity2026-05-28T11:57:56ZWe describe an efficient method for computing the Ehrhart polynomial of Gelfand--Tsetlin polytopes arising from Kostka coefficients. The key idea is to exploit Ehrhart--Macdonald reciprocity: evaluating the Ehrhart polynomial at negative integers reduces to counting \emph{strict} Gelfand--Tsetlin patterns, which are often zero or very small for low dilations. Combined with an adaptive strategy that chooses the cheapest evaluation point (positive or negative) at each step, this yields substantial practical speedups compared to general-purpose polytope software. We benchmark against $\mathtt{OSCAR}$/$\mathtt{polymake}$, and illustrate the broader applicability of the method through order polytopes and permutation posets. The implementation is available in the Rust \texttt{kostka} package, with related optimizations also incorporated in the new \texttt{lrcalc-rs} replacement for \texttt{lrcalc}.2026-05-21T12:09:25Z12 pagesPer Alexanderssonhttp://arxiv.org/abs/2605.25282v2Computing statistical solutions of a Mach 2000 astrophysical jet2026-05-28T09:26:17ZThe simulation of extreme Mach astrophysical flows is traditionally viewed through the lens of deterministic positivity-preserving schemes. However, due to Kelvin--Helmholtz instabilities and shock anomalies, the multi-dimensional Euler equations admit a variety of non-unique entropy solutions in turbulent regimes. Here, we computationally explore the limits of weak-strong uniqueness of a Mach 2000 jet by defining the statistical solution as the pushforward of a probability measure through a vectorial lattice Boltzmann method operator. Utilizing optimized CUDA kernels, we compute an ensemble of 1000 Monte Carlo samples across a sequence of highly refined spatial grids of up to 3.2 million cells and subsequently post-process the empirical measures via memory-mapped CPU streaming. We contrast the strong sample-wise $L^1$ error divergence with the convergence of the probability measure in the 1-point Wasserstein distance via empirical Cauchy rates. Our results demonstrate that while individual flow realizations physically diverge due to chaotic shear-layer instabilities, the statistical solution converges to an admissible limit measure at a rate of 0.5. Consequently, we provide numerical evidence that the statistical solution to the considered problem is non-Dirac and remains stable in the extreme compressible regime.2026-05-24T22:34:25ZStephan SimonisGauthier Wissocqhttp://arxiv.org/abs/2605.29208v1libhmm: A Modern C++20 Library for Hidden Markov Models with Correct MLE Emission M-Steps2026-05-28T00:42:19ZWe describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library suitable for embedding in production systems, and the widespread use of method-of-moments (MOM) approximations in the emission distribution M-step of the Baum-Welch algorithm. The library implements correct maximum likelihood estimators for sixteen continuous and discrete emission distributions, including an ECME algorithm for the location-scale Student-t distribution, Newton-Raphson maximization for Gamma, Beta, Weibull, and Negative Binomial distributions, and the von Mises distribution for circular data. All forward-backward and Viterbi calculations operate in full log-space. SIMD acceleration is provided for AVX-512, AVX2, SSE2, and ARM NEON via compile-time dispatch with scalar fallback. Python bindings are available via the companion package pylibhmm. We compare libhmm against established C and C++ HMM libraries and against published R reference packages on five real-data benchmarks, and discuss the architectural tradeoffs made in the design.2026-05-28T00:42:19Z17 pages, 3 figures, 8 tablesGary Wolfmanhttp://arxiv.org/abs/2602.22631v2TorchLean: Formalizing Neural Networks in Lean2026-05-24T04:59:07ZNeural networks are increasingly deployed in scientific, safety critical, and mission critical pipelines, yet verification and analysis are often performed outside the programming environment that defines and runs the model. This creates a semantic gap between the executed network and the analyzed artifact: guarantees can depend on implicit conventions about operator semantics, tensor layouts, preprocessing, floating-point behavior, graph transformations, accelerated kernels, and external certificates. We present TorchLean, a unified framework for formalizing, executing, and verifying neural networks in Lean 4. TorchLean treats learned models as executable programs and mathematical objects with a shared semantics for computation, verification, and theorem proving. The framework provides a PyTorch style API for typed tensors, layers, objectives, optimizers, automatic differentiation, and graph programs, with eager and compiled execution paths that lower to a common computation-graph representation. TorchLean supports exact and finite-precision tensor semantics, verified reverse-mode differentiation, interval and affine bound propagation, CROWN/LiRPA style certificate checking, import/export workflows, and CUDA-backed execution through explicit FFI boundaries. It also includes semantic layers for attention and FlashAttention, state-space sequence models, diffusion and sampling processes, probability kernels, reinforcement-learning objectives and Markov decision processes, and self-supervised objectives such as masked autoencoding, JEPA-style predictive views, and variance/correlation-based anti-collapse losses. Together, these components provide a semantic foundation for verified machine learning, where executable neural network artifacts, verification procedures, runtime boundaries, and mathematical claims can be stated and related inside one theorem-proving environment.2026-02-26T05:11:44Z55 pagesRobert Joseph GeorgeJennifer CrudenWill AdkissonXiangru ZhongHuan ZhangAnima Anandkumarhttp://arxiv.org/abs/2602.14289v2Parallel Sparse and Data-Sparse Factorization-based Linear Solvers2026-05-22T15:57:25ZEfficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their robustness and accuracy, direct solvers are crucial components in building a scalable solver toolchain. In this chapter, we will review recent advances of sparse direct solvers along two axes: 1) reducing communication and latency costs in both task- and data-parallel settings, and 2) reducing computational complexity via low-rank and other compression techniques such as hierarchical matrix algebra. In addition to algorithmic principles, we also illustrate the key parallelization challenges and best practices to deliver high speed and reliability on modern heterogeneous parallel machines.2026-02-15T19:40:14ZXiaoye Sherry LiYang Liuhttp://arxiv.org/abs/2510.15881v2ParamRF: A JAX-native Framework for Declarative Circuit Modelling2026-05-21T08:43:39ZThis work introduces ParamRF: a Python library for efficient, parametric modelling of radio frequency (RF) circuits. Built on top of the next-generation computational library JAX, as well as the object-oriented wrapper Equinox, the framework provides an easy-to-use, declarative modelling interface, without sacrificing performance. By representing circuits as JAX PyTrees and leveraging just-in-time compilation, models are compiled as pure functions into an optimized, algebraic graph. Since the resultant functions are JAX-native, this allows computation on CPUs, GPUs, or TPUs, providing integration with a wide range of solvers. Further, thanks to JAX's automatic differentiation, gradients with respect to both frequency and circuit parameters can be calculated for any circuit model outputs. This allows for more efficient optimization, as well as exciting new analysis opportunities. We showcase ParamRF's typical use-case of fitting a model to measured data via its built-in fitting engines, which include classical optimizers like L-BFGS and SLSQP, as well as modern Bayesian samplers such as PolyChord and BlackJAX. The result is a flexible framework for frequency-domain circuit modelling, fitting and analysis.2025-08-28T14:48:35Z6 pages, 4 code listings. Code available at https://github.com/paramrf/paramrfGary V. C. AllenDirk I. L. de Villiershttp://arxiv.org/abs/2605.22039v1Secure and Parallel Determinant Computation for Large-Scale Matrices in Edge Environments2026-05-21T06:23:17ZThe advent of edge computing has enabled resource-constrained clients to delegate intensive computational tasks to distributed edge servers, especially within Internet of Things (IoT) environments. Among such tasks, Matrix Determinant Computation (MDC) remains critical for applications in control systems, cryptography, and machine learning. However, the cubic complexity of traditional determinant algorithms makes them unsuitable for real-time processing in constrained edge scenarios.
We propose a Secure Parallel Determinant Computation (SPDC) framework, which provides strong security guaranties, including privacy-preserving MDC, across N distributed edge servers. The framework achieves privacy through Composite Element Distortion (CED) - a lightweight encryption method that combines Element-wise Obfuscation (EWO) and the Panth Rotation Theorem (PRT) to conceal both structural and numerical matrix content while preserving determinant properties. Parallel LU decomposition is used to distribute encrypted matrix blocks across an arbitrary number of untrusted edge servers, enabling efficient and scalable determinant computation. A one-way communication model further reduces coordination overhead by eliminating inter-server interactions. To ensure result integrity with minimal client burden, we further introduce two verification algorithms: Q_2, a probabilistic scalar method, and Q_3, a deterministic and low-complexity alternative.
Mathematical analysis demonstrates that the proposed framework provides strong privacy and security guaranties, low computational overhead, and deployment flexibility - making it well-suited for secure, scalable, and real-time MDC in distributed edge-assisted systems.2026-05-21T06:23:17Z15 pages, 7 figures, 5 tables. This paper was first made public in October 2024 and subsequently posted as v1 on TechRxiv (Dec 10, 2025): https://doi.org/10.36227/techrxiv.176539387.75109768/v1. The present arXiv submission is identical to that version (v1)Prajwal Panthhttp://arxiv.org/abs/2605.20884v1Solving Multivariate Polynomial Systems and Rectangular Multiparameter Eigenvalue Problems with MacaulayLab2026-05-20T08:22:48ZWe present the Matlab toolbox MacaulayLab, which implements numerical linear algebra algorithms for solving multivariate polynomial systems and rectangular multiparameter eigenvalue problems. Its structure and functionality are the result of several years of research and algorithmic development. We demonstrate how the software works and compare its performance with other software packages, such as PNLA, PHCpack, and MultiParEig. Some core features of MacaulayLab are the fact that it solves two key problems via one common approach, works independently of the chosen polynomial basis and monomial order, and is capable of dealing with positive-dimensional solution sets at infinity. The toolbox (including its future updates) and a large collection of test problems are freely available online.2026-05-20T08:22:48ZSince the manuscript is currently undergoing review, we anticipate several improvements in the upcoming revision. In particular, the comparison with other software packages will be expanded significantlyChristof VermeerschBart De Moorhttp://arxiv.org/abs/2605.18686v2critband: A Python Package for Critical Bandwidth Analysis of Multimodal Distributions2026-05-19T17:56:25ZMultimodal density estimation is a fundamental problem in scientific computing. Determining the number of modes in a distribution is a core numerical challenge with applications across ecology, economics, genomics, and astronomy. While the R ecosystem provides mature tools through the multimode package, the Python ecosystem has lacked an equivalent cohesive implementation. We present critband, a Python package for critical bandwidth bimodality detection based on Silverman's kernel density approach. The package implements critical bandwidth search with a robust bracketed mode-count solver and FFT-accelerated KDE, and provides additional features including k-mode detection, component decomposition, bimodality strength quantification, and excess mass estimation. Validation against twelve benchmark cases spanning separation regimes, unequal variances, unequal weights, and small sample sizes shows stable estimates for clearly separated cases and expected instability for boundary cases. Performance benchmarks show critband is typically 3-10 times faster per case than R's modetest() in the tested setup.2026-05-18T17:23:41Z12 pagesRuiyu ZhangQihao Wang