https://arxiv.org/api/oKPfreCAuJ8fE9N3rlBj/i4qScc 2026-06-22T22:28:10Z 2664 570 15 http://arxiv.org/abs/2402.13925v1 UMAT4COMSOL: An Abaqus user material (UMAT) subroutine wrapper for COMSOL 2024-02-21T16:45:02Z

We present a wrapper that allows Abaqus user material subroutines (UMATs) to be used as an External Material library in the software COMSOL Multiphysics. The wrapper, written in C language, transforms COMSOL's external material subroutine inputs and outputs into Fortran-coded Abaqus UMAT inputs and outputs, by means of a consistent variable transformation. This significantly facilitates conducting coupled, multi-physics studies employing the advanced material models that the solid mechanics community has developed over the past decades. We exemplify the potential of our new framework, UMAT4COMSOL, by conducting numerical experiments in the areas of elastoplasticity, hyperelasticity and crystal plasticity. The source code, detailed documentation and example tutorials are made freely available to download at www.empaneda.com/codes.

2024-02-21T16:45:02Z S. Lucarini E. Martínez-Pañeda http://arxiv.org/abs/2310.03755v2 Physics Informed Neural Network Code for 2D Transient Problems (PINN-2DT) Compatible with Google Colab 2024-02-19T21:08:45Z

We present an open-source Physics Informed Neural Network environment for simulations of transient phenomena on two-dimensional rectangular domains, with the following features: (1) it is compatible with Google Colab which allows automatic execution on cloud environment; (2) it supports two dimensional time-dependent PDEs; (3) it provides simple interface for definition of the residual loss, boundary condition and initial loss, together with their weights; (4) it support Neumann and Dirichlet boundary conditions; (5) it allows for customizing the number of layers and neurons per layer, as well as for arbitrary activation function; (6) the learning rate and number of epochs are available as parameters; (7) it automatically differentiates PINN with respect to spatial and temporal variables; (8) it provides routines for plotting the convergence (with running average), initial conditions learnt, 2D and 3D snapshots from the simulation and movies (9) it includes a library of problems: (a) non-stationary heat transfer; (b) wave equation modeling a tsunami; (c) atmospheric simulations including thermal inversion; (d) tumor growth simulations.

2023-09-24T07:08:36Z 21 pages, 13 figures Paweł Maczuga Maciej Sikora Maciej Skoczeń Przemysław Rożnawski Filip Tłuszcz Marcin Szubert Marcin Łoś Witold Dzwinel Keshav Pingali Maciej Paszyński http://arxiv.org/abs/2402.11868v1 Recent Extensions of the ZKCM Library for Parallel and Accurate MPS Simulation of Quantum Circuits 2024-02-19T06:24:25Z

A C++ library ZKCM and its extension library ZKCM_QC have been developed since 2011 for multiple-precision matrix computation and accurate matrix-product-state (MPS) quantum circuit simulation, respectively. In this report, a recent progress in the extensions of these libraries is described, which are mainly for parallel processing with the OpenMP and CUDA frameworks.

2024-02-19T06:24:25Z 6 pages, 2 figures, under review in the post-conference Proc. CCP2023 Akira SaiToh http://arxiv.org/abs/2402.09983v1 Optimistix: modular optimisation in JAX and Equinox 2024-02-15T14:49:18Z

We introduce Optimistix: a nonlinear optimisation library built in JAX and Equinox. Optimistix introduces a novel, modular approach for its minimisers and least-squares solvers. This modularity relies on new practical abstractions for optimisation which we call search and descent, and which generalise classical notions of line search, trust-region, and learning-rate algorithms. It provides high-level APIs and solvers for minimisation, nonlinear least-squares, root-finding, and fixed-point iteration. Optimistix is available at https://github.com/patrick-kidger/optimistix.

2024-02-15T14:49:18Z 8 pages, 4 figures, 2 tables Jason Rader Terry Lyons Patrick Kidger http://arxiv.org/abs/2304.06935v3 Groebner.jl: A package for Gröbner bases computations in Julia 2024-02-12T16:25:18Z

We present Groebner.jl, a Julia package for computing Groebner bases with the F4 algorithm. Groebner.jl is an efficient, portable, and open-source software. Groebner.jl works over integers modulo a prime and over the rationals, supports basic multi-threading, and specializes in computation in the degree reverse lexicographical monomial ordering. The implementation incorporates various symbolic computation techniques and leverages the Julia type system and tooling, which allows Groebner.jl to compete with the existing state of the art, in many instances outperform it, and exceed them in extensibility. Groebner.jl is freely available at https://github.com/sumiya11/Groebner.jl.

2023-04-14T05:47:34Z 10 pages Alexander Demin Shashi Gowda http://arxiv.org/abs/2401.17345v2 Reproducibility, energy efficiency and performance of pseudorandom number generators in machine learning: a comparative study of python, numpy, tensorflow, and pytorch implementations 2024-02-10T12:09:18Z

Pseudo-Random Number Generators (PRNGs) have become ubiquitous in machine learning technologies because they are interesting for numerous methods. The field of machine learning holds the potential for substantial advancements across various domains, as exemplified by recent breakthroughs in Large Language Models (LLMs). However, despite the growing interest, persistent concerns include issues related to reproducibility and energy consumption. Reproducibility is crucial for robust scientific inquiry and explainability, while energy efficiency underscores the imperative to conserve finite global resources. This study delves into the investigation of whether the leading Pseudo-Random Number Generators (PRNGs) employed in machine learning languages, libraries, and frameworks uphold statistical quality and numerical reproducibility when compared to the original C implementation of the respective PRNG algorithms. Additionally, we aim to evaluate the time efficiency and energy consumption of various implementations. Our experiments encompass Python, NumPy, TensorFlow, and PyTorch, utilizing the Mersenne Twister, PCG, and Philox algorithms. Remarkably, we verified that the temporal performance of machine learning technologies closely aligns with that of C-based implementations, with instances of achieving even superior performances. On the other hand, it is noteworthy that ML technologies consumed only 10% more energy than their C-implementation counterparts. However, while statistical quality was found to be comparable, achieving numerical reproducibility across different platforms for identical seeds and algorithms was not achieved.

2024-01-30T15:44:14Z 20 pages, 10 tables, 1 figure Benjamin Antunes David R. C Hill http://arxiv.org/abs/2108.12981v2 The ensmallen library for flexible numerical optimization 2024-02-09T13:07:00Z

We overview the ensmallen numerical optimization library, which provides a flexible C++ framework for mathematical optimization of user-supplied objective functions. Many types of objective functions are supported, including general, differentiable, separable, constrained, and categorical. A diverse set of pre-built optimizers is provided, including Quasi-Newton optimizers and many variants of Stochastic Gradient Descent. The underlying framework facilitates the implementation of new optimizers. Optimization of an objective function typically requires supplying only one or two C++ functions. Custom behavior can be easily specified via callback functions. Empirical comparisons show that ensmallen outperforms other frameworks while providing more functionality. The library is available at https://ensmallen.org and is distributed under the permissive BSD license.

2021-08-30T03:49:21Z Journal of Machine Learning Research, Vol. 22, No. 166, 2021 Ryan R. Curtin Marcus Edel Rahul Ganesh Prabhu Suryoday Basak Zhihao Lou Conrad Sanderson http://arxiv.org/abs/2312.01709v4 A New Challenging Curve Fitting Benchmark Test Set for Global Optimization 2024-02-07T19:27:42Z

Benchmark sets are extremely important for evaluating and developing global optimization algorithms and related solvers. A new test set named PCC benchmark is proposed especially for optimization problems of nonlinear curve fitting for the first time, with the aspiration of helping developers to investigate and compare the performance of different global optimization solvers, as well as more effective optimization algorithms could be developed. Compared with the well-known classical nonlinear curve fitting benchmark set given by the National Institute of Standards and Technology (NIST) of USA, the most distinguishable features of the PCC benchmark are small problem dimensions, unconstrained with free search domain and high level of difficulty for obtaining global optimization solutions, which make the PCC benchmark be not only suitable for validating the effectiveness of different global optimization algorithms, but also more ideal for verifying and comparing various related solvers. Seven of the world's leading global optimization solvers, including Baron, Antigone, Couenne, Lingo, Scip, Matlab-GA and 1stOpt, are employed to test NIST and PCC benchmark thoroughly in terms of both effectiveness and efficiency. The results showed that the NIST benchmark is relatively simple and not suitable for global optimization testing, meanwhile the PCC benchmark is a unique, challenging and effective test dataset for global optimization.

2023-12-04T07:52:42Z Peicong Cheng Peicheng Cheng http://arxiv.org/abs/2212.13977v2 Fast and energy-efficient derivatives risk analysis: Streaming option Greeks on Xilinx and Intel FPGAs 2024-02-02T12:29:47Z

Whilst FPGAs have enjoyed success in accelerating high-frequency financial workloads for some time, their use for quantitative finance, which is the use of mathematical models to analyse financial markets and securities, has been far more limited to-date. Currently, CPUs are the most common architecture for such workloads, and an important question is whether FPGAs can ameliorate some of the bottlenecks encountered on those architectures. In this paper we extend our previous work accelerating the industry standard Securities Technology Analysis Center's (STAC\textregistered) derivatives risk analysis benchmark STAC-A2\texttrademark{}, by first porting this from our previous Xilinx implementation to an Intel Stratix-10 FPGA, exploring the challenges encountered when moving from one FPGA architecture to another and suitability of techniques. We then present a host-data-streaming approach that ultimately outperforms our previous version on a Xilinx Alveo U280 FPGA by up to 4.6 times and requiring 9 times less energy at the largest problem size, while outperforming the CPU and GPU versions by up to 8.2 and 5.2 times respectively. The result of this work is a significant enhancement in FPGA performance against the previous version for this industry standard benchmark running on both Xilinx and Intel FPGAs, and furthermore an exploration of optimisation and porting techniques that can be applied to other HPC workloads.

2022-12-28T17:51:54Z This work uses a benchmark of STAC, whilst this was approved at the time they have asked we remove the paper as it needs to be made more explicit that these are unofficial ports and are entirely independent from any vendor and don't follow STAC rules. As we are comparing vendor hardware in the paper, it was felt that this could easily be mistaken to be representing something that the paper is not Mark Klaisoongnoen Nick Brown Oliver Brown 10.1109/H2RC56700.2022.00008 http://arxiv.org/abs/2312.13527v4 MindOpt Adapter for CPLEX Benchmarking Performance Analysis 2024-02-01T03:09:51Z

This report provides a comprehensive analysis of the performance of MindOpt Adapter for CPLEX 12.9 in benchmark testing. CPLEX, recognized as a robust Mixed Integer Programming (MIP) solver, has faced some scrutiny regarding its performance on MIPLIB 2017 when configured to default settings. MindOpt Adapter aims to enhance CPLEX's performance by automatically applying improved configurations for solving optimization problems. Our testing demonstrates that MindOpt Adapter for CPLEX yields successfully solved 232 of the 240 problems in the MIPLIB 2017 benchmark set. This performance surpasses all the other solvers in terms of the number of problems solved and the geometric mean of running times. The report provides a comparison of the benchmark results against the outcomes achieved by CPLEX under its default configuration.

2023-12-21T01:59:33Z Mou Sun Tao Li Wotao Yin http://arxiv.org/abs/2401.17184v1 Rigorous Error Analysis for Logarithmic Number Systems 2024-01-30T17:12:56Z

Logarithmic Number Systems (LNS) hold considerable promise in helping reduce the number of bits needed to represent a high dynamic range of real-numbers with finite precision, and also efficiently support multiplication and division. However, under LNS, addition and subtraction turn into non-linear functions that must be approximated - typically using precomputed table-based functions. Additionally, multiple layers of error correction are typically needed to improve result accuracy. Unfortunately, previous efforts have not characterized the resulting error bound. We provide the first rigorous analysis of LNS, covering detailed techniques such as co-transformation that are crucial to implementing subtraction with reasonable accuracy. We provide theorems capturing the error due to table interpolations, the finite precision of pre-computed values in the tables, and the error introduced by fix-point multiplications involved in LNS implementations. We empirically validate our analysis using a Python implementation, showing that our analytical bounds are tight, and that our testing campaign generates inputs diverse-enough to almost match (but not exceed) the analytical bounds. We close with discussions on how to adapt our analysis to LNS systems with different bases and also discuss many pragmatic ramifications of our work in the broader arena of scientific computing and machine learning.

2024-01-30T17:12:56Z 42 pages, 14 figures, 6 tables Thanh Son Nguyen Alexey Solovyev Ganesh Gopalakrishnan http://arxiv.org/abs/2401.16369v1 Mixed-Order Meshes through rp-adaptivity for Surface Fitting to Implicit Geometries 2024-01-29T18:10:01Z

Computational analysis with the finite element method requires geometrically accurate meshes. It is well known that high-order meshes can accurately capture curved surfaces with fewer degrees of freedom in comparison to low-order meshes. Existing techniques for high-order mesh generation typically output meshes with same polynomial order for all elements. However, high order elements away from curvilinear boundaries or interfaces increase the computational cost of the simulation without increasing geometric accuracy. In prior work, we have presented one such approach for generating body-fitted uniform-order meshes that takes a given mesh and morphs it to align with the surface of interest prescribed as the zero isocontour of a level-set function. We extend this method to generate mixed-order meshes such that curved surfaces of the domain are discretized with high-order elements, while low-order elements are used elsewhere. Numerical experiments demonstrate the robustness of the approach and show that it can be used to generate mixed-order meshes that are much more efficient than high uniform-order meshes. The proposed approach is purely algebraic, and extends to different types of elements (quadrilaterals/triangles/tetrahedron/hexahedra) in two- and three-dimensions.

2024-01-29T18:10:01Z 14 pages, 11 figures Ketan Mittal Veselin A. Dobrev Patrick Knupp Tzanio Kolev Franck Ledoux Claire Roche Vladimir Z. Tomov http://arxiv.org/abs/2401.14117v1 Evaluation of POSIT Arithmetic with Accelerators 2024-01-25T11:54:44Z

We present an evaluation of 32-bit POSIT arithmetic through its implementation as accelerators on FPGAs and GPUs. POSIT, a floating-point number format, adaptively changes the size of its fractional part. We developed hardware designs for FPGAs and software for GPUs to accelerate linear algebra operations using Posit(32,2) arithmetic. Our FPGA- and GPU-based accelerators in Posit(32,2) arithmetic significantly accelerated the Cholesky and LU decomposition algorithms for dense matrices. In terms of numerical accuracy, Posit(32,2) arithmetic is approximately 0.5 - 1.0 digits more accurate than the standard 32-bit format, especially when the norm of the elements of the input matrix is close to 1. Evaluating power consumption, we observed that the power efficiency of the accelerators ranged between 0.043 - 0.076 Gflops/watts for the LU decomposition in Posit(32,2) arithmetic. The power efficiency of the latest GPUs as accelerators of Posit(32,2) arithmetic is better than that of the evaluated FPGA chip.

2024-01-25T11:54:44Z 11 pages, 8 figures; Published in HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region HPCAsia '24: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region, January 2024, Pages 62-72 Naohito Nakasato Yuki Murakami Fumiya Kono Maho Nakata 10.1145/3635035.3635046 http://arxiv.org/abs/2309.06306v2 CDL: A fast and flexible library for the study of permutation sets with structural restrictions 2024-01-25T11:04:46Z

In this paper, we introduce CDL, a software library designed for the analysis of permutations and linear orders subject to various structural restrictions. Prominent examples of these restrictions include pattern avoidance, a topic of interest in both computer science and combinatorics, and "never conditions" utilized in social choice and voting theory. CDL offers a range of fundamental functionalities, including identifying the permutations that meet specific restrictions and determining the isomorphism of such sets. To facilitate exploration of large permutation sets or domains, CDL incorporates multiple search strategies and heuristics.

2023-09-12T15:17:16Z 7 pages Bei Zhou Klas Markstrōm Søren Riis 10.1016/j.softx.2024.101951 http://arxiv.org/abs/2401.14077v1 LongMemory.jl: Generating, Estimating, and Forecasting Long Memory Models in Julia 2024-01-25T10:55:38Z

LongMemory.jl is a package for time series long memory modelling in Julia. The package provides functions to generate long memory, estimate model parameters, and forecast. Generating methods include fractional differencing, stochastic error duration, and cross-sectional aggregation. Estimators include the classic ones used to estimate the Hurst effect, those inspired by log-periodogram regression, and parametric ones. Forecasting is provided for all parametric estimators. Moreover, the package adds plotting capabilities to illustrate long memory dynamics and forecasting. This article presents the theoretical developments for long memory modelling, show examples using the data included with the package, and compares the properties of LongMemory.jl with current alternatives, including benchmarks. For some of the theoretical developments, LongMemory.jl provides the first publicly available implementation in any programming language. A notable feature of this package is that all functions are implemented in the same programming language, taking advantage of the ease of use and speed provided by Julia. Therefore, all code is accessible to the user. Multiple dispatch, a novel feature of the language, is used to speed computations and provide consistent calls to related methods. The package is related to the R packages LongMemoryTS and fracdiff.

2024-01-25T10:55:38Z J. Eduardo Vera-Valdés 10.21105/joss.07708