https://arxiv.org/api/0SUCgg49GGfvHDAJrq+Cca86rwE2026-06-22T11:27:54Z266442015http://arxiv.org/abs/2406.08646v2PETSc/TAO Developments for GPU-Based Early Exascale Systems2024-11-14T19:49:25ZThe Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries.2024-06-12T21:11:46Z17 pagesRichard Tran MillsMark AdamsSatish BalayJed BrownJacob FaibussowitschToby IsaacMatthew KnepleyTodd MunsonHansol SuhStefano ZampiniHong ZhangJunchao Zhanghttp://arxiv.org/abs/2411.06631v1SequentialSamplingModels.jl: Simulating and Evaluating Cognitive Models of Response Times in Julia2024-11-10T23:46:37ZSequential sampling models (SSMs) are a widely used framework describing decision-making as a stochastic, dynamic process of evidence accumulation. SSMs popularity across cognitive science has driven the development of various software packages that lower the barrier for simulating, estimating, and comparing existing SSMs. Here, we present a software tool, SequentialSamplingModels.jl (SSM.jl), designed to make SSM simulations more accessible to Julia users, and to integrate with the Julia ecosystem. We demonstrate the basic use of SSM.jl for simulation, plotting, and Bayesian inference.2024-11-10T23:46:37ZProceedings of the JuliaCon Conferences, 7(78):186, 2025Kianté FernandezDominique MakowskiChristopher Fisher10.21105/jcon.00186http://arxiv.org/abs/2411.03851v1On a probabilistic global optimizer derived from the Walker slice sampling2024-11-06T11:40:43ZThis article presents a zeroth order probabilistic global optimization algorithm -- SwiftNav -- for (not necessarily convex) functions over a compact domain. A discretization procedure is deployed on the compact domain, starting with a small step-size $h > 0$ and subsequently adaptively refining it in the course of a simulated annealing routine utilizing the Walker slice and the Gibbs sampler, in order to identify a set of global optimizers up to good precision. SwiftNav is parallelizable, which helps with scalability as the dimension of decision variables increases. Several numerical experiments are included here to demonstrate the effectiveness and accuracy of SwiftNav in high-dimensional benchmark optimization problems.2024-11-06T11:40:43Z18 pages, 16 figuresAditya GuptaSouvik DasDebasish Chatterjeehttp://arxiv.org/abs/2411.03501v1The Python LevelSet Toolbox (LevelSetPy)2024-11-05T20:31:23ZThis paper describes open-source scientific contributions in python surrounding the numerical solutions to hyperbolic Hamilton-Jacobi (HJ) partial differential equations viz., their implicit representation on co-dimension one surfaces; dynamics evolution with levelsets; spatial derivatives; total variation diminishing Runge-Kutta integration schemes; and their applications to the theory of reachable sets. They are increasingly finding applications in multiple research domains such as reinforcement learning, robotics, control engineering and automation. We describe the library components, illustrate usage with an example, and provide comparisons with existing implementations. This GPU-accelerated package allows for easy portability to many modern libraries for the numerical analyses of the HJ equations. We also provide a CPU implementation in python that is significantly faster than existing alternatives.2024-11-05T20:31:23ZThe 63rd IEEE Conference on Decision and Control, Milan, 2024Lekan Moluhttp://arxiv.org/abs/2412.16161v1Antiassociative algebra in R: introducing the evitaicossa package2024-10-31T16:31:26ZIn this short article I introduce the evitaicossa package which provides functionality for antiassociative algebras in the R programming language; it is available on CRAN at https://CRAN.R-project.org/package=evitaicossa.2024-10-31T16:31:26Z6 pagesRobin K. S. Hankinnhttp://arxiv.org/abs/2401.05868v2Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations2024-10-30T23:57:13ZIn this work, we introduce a new algorithm for N-to-M checkpointing in finite element simulations. This new algorithm allows efficient saving/loading of functions representing physical quantities associated with the mesh representing the physical domain. Specifically, the algorithm allows for using different numbers of parallel processes for saving and loading, allowing for restarting and post-processing on the process count appropriate to the given phase of the simulation and other conditions. For demonstration, we implemented this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific Computation, and added a convenient high-level interface into Firedrake, a system for solving partial differential equations using finite element methods. We evaluated our new implementation by saving and loading data involving 8.2 billion finite element degrees of freedom using 8,192 parallel processes on ARCHER2, the UK National Supercomputing Service.2024-01-11T12:20:50Zauthor accepted manuscriptSIAM SISC 46(6):B830-B859 (2024)David A. HamVaclav HaplaMatthew G. KnepleyLawrence MitchellKoki Sagiyama10.1137/23M1613724http://arxiv.org/abs/2410.22652v1Development of a Python-Based Software for Calculating the Jones Polynomial: Insights into the Behavior of Polymers and Biopolymers2024-10-30T02:41:42ZThis thesis details a Python-based software designed to calculate the Jones polynomial, a vital mathematical tool from Knot Theory used for characterizing the topological and geometrical complexity of curves in \( \mathbb{R}^3 \), which is essential in understanding physical systems of filaments, including the behavior of polymers and biopolymers. The Jones polynomial serves as a topological invariant capable of distinguishing between different knot structures. This capability is fundamental to characterizing the architecture of molecular chains, such as proteins and DNA. Traditional computational methods for deriving the Jones polynomial have been limited by closure-schemes and high execution costs, which can be impractical for complex structures like those that appear in real life. This software implements methods that significantly reduce calculation times, allowing for more efficient and practical applications in the study of biological polymers. It utilizes a divide-and-conquer approach combined with parallel computing and applies recursive Reidemeister moves to optimize the computation, transitioning from an exponential to a near-linear runtime for specific configurations. This thesis provides an overview of the software's functions, detailed performance evaluations using protein structures as test cases, and a discussion of the implications for future research and potential algorithmic improvements.2024-10-30T02:41:42ZCaleb Musfeldthttp://arxiv.org/abs/2411.00819v1A Bellman-Ford algorithm for the path-length-weighted distance in graphs2024-10-28T15:31:34ZConsider a finite directed graph without cycles in which the arrows are weighted. We present an algorithm for the computation of a new distance, called path-length-weighted distance, which has proven useful for graph analysis in the context of fraud detection. The idea is that the new distance explicitly takes into account the size of the paths in the calculations. Thus, although our algorithm is based on arguments similar to those at work for the Bellman-Ford and Dijkstra methods, it is in fact essentially different. We lay out the appropriate framework for its computation, showing the constraints and requirements for its use, along with some illustrative examples.2024-10-28T15:31:34Z20 pages, 10 figuresR. ArnauJ. M. CalabuigL. M. García RaffiE. A. Sánchez PérezS. Sanjuan10.3390/math12162590http://arxiv.org/abs/2405.07819v2Local Adjoints for Simultaneous Preaccumulations with Shared Inputs2024-10-27T19:11:54ZIn shared-memory parallel automatic differentiation, inputs that are shared among simultaneous thread-local preaccumulations lead to data races if Jacobians are accumulated with a single, shared vector of adjoint variables. In this work, we discuss the benefits and tradeoffs of re-enabling such preaccumulations by a transition to suitable local adjoints. We propose different vector- and map-based approaches for storing local adjoint variables and analyze them with respect to memory consumption, memory allocation, and adjoint variable access times in the context of simultaneous preaccumulations in multiple threads. We implement the approaches in CoDiPack and benchmark them in parallel discrete adjoint computations in the multiphysics simulation suite SU2.2024-05-13T15:01:18Z12 pages, 5 figures. Updated and extended all parts of the paperJohannes BlühdornNicolas R. Gauger10.1137/1.9781611979039.13http://arxiv.org/abs/2312.08006v2Performance of linear solvers in tensor-train format on current multicore architectures2024-10-24T14:02:01ZTensor networks are a class of algorithms aimed at reducing the computational complexity of high-dimensional problems. They are used in an increasing number of applications, from quantum simulations to machine learning. Exploiting data parallelism in these algorithms is key to using modern hardware. However, there are several ways to map required tensor operations onto linear algebra routines ("building blocks"). Optimizing this mapping impacts the numerical behavior, so computational and numerical aspects must be considered hand-in-hand. In this paper we discuss the performance of solvers for low-rank linear systems in the tensor-train format (also known as matrix-product states). We consider three popular algorithms: TT-GMRES, MALS, and AMEn. We illustrate their computational complexity based on the example of discretizing a simple high-dimensional PDE in, e.g., $50^{10}$ grid points. This shows that the projection to smaller sub-problems for MALS and AMEn reduces the number of floating-point operations by orders of magnitude. We suggest optimizations regarding orthogonalization steps, singular value decompositions, and tensor contractions. In addition, we propose a generic preconditioner based on a TT-rank-1 approximation of the linear operator. Overall, we obtain roughly a 5x speedup over the reference algorithm for the fastest method (AMEn) on a current multicore CPU.2023-12-13T09:28:09Z28 pages, 8 figures, submitted to IJHPCAMelven Röhrig-ZöllnerManuel Joey BecklasJonas ThiesAchim Basermannhttp://arxiv.org/abs/2410.15963v1An Efficient Local Optimizer-Tracking Solver for Differential-Algebriac Equations with Optimization Criteria2024-10-21T12:48:12ZA sequential solver for differential-algebraic equations with embedded optimization criteria (DAEOs) was developed to take advantage of the theoretical work done by Deussen et al. Solvers of this type separate the optimization problem from the differential equation and solve each individually. The new solver relies on the reduction of a DAEO to a sequence of differential inclusions separated by jump events. These jump events occur when the global solution to the optimization problem jumps to a new value. Without explicit treatment, these events will reduce the order of convergence of the integration step to one. The solver implements a "local optimizer tracking" procedure to detect and correct these jump events. Local optimizer tracking is much less expensive than running a deterministic global optimizer at every time step. This preserves the order of convergence of the integrator component without sacrificing performance to perform deterministic global optimization at every time step. The newly developed solver produces correct solutions to DAEOs and runs much faster than sequential DAEO solvers that rely only on global optimization.2024-10-21T12:48:12Z8 pages, 5 figuresAlexander FlemingJens DeussenUwe Naumannhttp://arxiv.org/abs/2410.12942v1modOpt: A modular development environment and library for optimization algorithms2024-10-16T18:30:23ZRecent advances in computing hardware and modeling software have given rise to new applications for numerical optimization. These new applications occasionally uncover bottlenecks in existing optimization algorithms and necessitate further specialization of the algorithms. However, such specialization requires expert knowledge of the underlying mathematical theory and the software implementation of existing algorithms. To address this challenge, we present modOpt, an open-source software framework that facilitates the construction of optimization algorithms from modules. The modular environment provided by modOpt enables developers to tailor an existing algorithm for a new application by only altering the relevant modules. modOpt is designed as a platform to support students and beginner developers in quickly learning and developing their own algorithms. With that aim, the entirety of the framework is written in Python, and it is well-documented, well-tested, and hosted open-source on GitHub. Several additional features are embedded into the framework to assist both beginner and advanced developers. In addition to providing stock modules, the framework also includes fully transparent implementations of pedagogical optimization algorithms in Python. To facilitate testing and benchmarking of new algorithms, the framework features built-in visualization and recording capabilities, interfaces to modeling frameworks such as OpenMDAO and CSDL, interfaces to general-purpose optimization algorithms such as SNOPT and SLSQP, an interface to the CUTEst test problem set, etc. In this paper, we present the underlying software architecture of modOpt, review its various features, discuss several educational and performance-oriented algorithms within modOpt, and present numerical studies illustrating its unique benefits.2024-10-16T18:30:23Z37 pages with 13 figures. For associated code, see https://github.com/LSDOlab/modoptAnugrah Jo JoshyJohn T. Hwanghttp://arxiv.org/abs/2410.12614v1Mixed-precision finite element kernels and assembly: Rounding error analysis and hardware acceleration2024-10-16T14:32:10ZIn this paper we develop the first fine-grained rounding error analysis of finite element (FE) cell kernels and assembly. The theory includes mixed-precision implementations and accounts for hardware-acceleration via matrix multiplication units, thus providing theoretical guidance for designing reduced- and mixed-precision FE algorithms on CPUs and GPUs. Guided by this analysis, we introduce hardware-accelerated mixed-precision implementation strategies which are provably robust to low-precision computations. Indeed, these algorithms are accurate to the lower-precision unit roundoff with an error constant that is independent from: the conditioning of FE basis function evaluations, the ill-posedness of the cell, the polynomial degree, and the number of quadrature nodes. Consequently, we present the first AMX-accelerated FE kernel implementations on Intel Sapphire Rapids CPUs. Numerical experiments demonstrate that the proposed mixed- (single/half-) precision algorithms are up to 60 times faster than their double precision equivalent while being orders of magnitude more accurate than their fully half-precision counterparts.2024-10-16T14:32:10ZKeywords: Mixed precision, finite element method, finite element kernel and assembly, rounding error analysis, hardware acceleration, matrix units, Intel AMXM. CrociG. N. Wellshttp://arxiv.org/abs/2407.15973v3Mixed Precision Block-Jacobi Preconditioner: Algorithms, Performance Evaluation and Feature Analysis2024-10-15T13:17:13ZIn this paper, we propose two mixed precision algorithms for Block-Jacobi preconditioner(BJAC): a fixed low precision strategy and an adaptive precision strategy. We evaluate the performance improvement of the proposed mixed precision BJAC preconditioners combined with the preconditioned conjugate gradient algorithm using problems including diffusion equations and radiation hydrodynamics equations. Numerical results show that, compared to the uniform high precision PCG algorithm, the mixed precision preconditioners can achieve speedups from 1.3 to 1.8 without sacrificing accuracy. Furthermore, we observe the phenomenon of convergence delay in some test cases for the mixed precision preconditioners, and further analyse the matrix features associate with the convergence delay behavior.2024-07-22T18:35:05ZNingxi TianSilu HuangXiaowen Xuhttp://arxiv.org/abs/2404.10143v2Computing with Hypergeometric-Type Terms2024-10-14T23:01:11ZTake a multiplicative monoid of sequences in which the multiplication is given by Hadamard product. The set of linear combinations of interleaving monoid elements then yields a ring. For hypergeometric sequences, the resulting ring is a subring of the ring of holonomic sequences. We present two algorithms in this setting: one for computing holonomic recurrence equations from hypergeometric-type normal forms and the other for finding products of hypergeometric-type terms. These are newly implemented commands in our Maple package $HyperTypeSeq$, available at \url{https://github.com/T3gu1a/HyperTypeSeq}, which we also describe.2024-04-15T21:24:18ZMainly correcting a miscopy of the explicit formula that the code outputs for the sequence at https://oeis.org/A212579 (see equation (3)). This is the version considered for ISSAC'24 software presentationBertrand Teguia Tabuguia