https://arxiv.org/api/hS0belbH33AXIzP5w5Xyz0Z3LDk 2026-06-22T19:23:23Z 2664 525 15 http://arxiv.org/abs/2405.10130v1 PyOptInterface: Design and implementation of an efficient modeling language for mathematical optimization 2024-05-16T14:29:02Z

This paper introduces the design and implementation of PyOptInterface, a modeling language for mathematical optimization embedded in Python programming language. PyOptInterface uses lightweight and compact data structure to bridge high-level entities in optimization models like variables and constraints to internal indices of optimizers efficiently. It supports a variety of optimization solvers and a range of common problem classes. We provide benchmarks to exhibit the competitive performance of PyOptInterface compared with other state-of-the-art modeling languages.

2024-05-16T14:29:02Z 10 pages Yue Yang Chenhui Lin Luo Xu Wenchuan Wu http://arxiv.org/abs/2405.08631v1 A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent 2024-05-14T14:10:48Z

We develop fast and scalable algorithms based on block-coordinate descent to solve the group lasso and the group elastic net for generalized linear models along a regularization path. Special attention is given when the loss is the usual least squares loss (Gaussian loss). We show that each block-coordinate update can be solved efficiently using Newton's method and further improved using an adaptive bisection method, solving these updates with a quadratic convergence rate. Our benchmarks show that our package adelie performs 3 to 10 times faster than the next fastest package on a wide array of both simulated and real datasets. Moreover, we demonstrate that our package is a competitive lasso solver as well, matching the performance of the popular lasso package glmnet.

2024-05-14T14:10:48Z James Yang Trevor Hastie http://arxiv.org/abs/2405.10973v1 Adaptation of XAI to Auto-tuning for Numerical Libraries 2024-05-12T09:00:56Z

Concerns have arisen regarding the unregulated utilization of artificial intelligence (AI) outputs, potentially leading to various societal issues. While humans routinely validate information, manually inspecting the vast volumes of AI-generated results is impractical. Therefore, automation and visualization are imperative. In this context, Explainable AI (XAI) technology is gaining prominence, aiming to streamline AI model development and alleviate the burden of explaining AI outputs to users. Simultaneously, software auto-tuning (AT) technology has emerged, aiming to reduce the man-hours required for performance tuning in numerical calculations. AT is a potent tool for cost reduction during parameter optimization and high-performance programming for numerical computing. The synergy between AT mechanisms and AI technology is noteworthy, with AI finding extensive applications in AT. However, applying AI to AT mechanisms introduces challenges in AI model explainability. This research focuses on XAI for AI models when integrated into two different processes for practical numerical computations: performance parameter tuning of accuracy-guaranteed numerical calculations and sparse iterative algorithm.

2024-05-12T09:00:56Z This article has been submitted to Special Session: Performance Optimization and Auto-Tuning of Software on Multicore/Manycore Systems (POAT), In conjunction with IEEE MCSoC-2024 (Dec 16-19, 2024, Days Hotel & Suites by Wyndham Fraser Business Park, Kuala Lumpur) 2024 IEEE 17th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Shota Aoki Takahiro Katagiri Satoshi Ohshima Masatoshi Kawai Toru Nagai Tetsuya Hoshino 10.1109/MCSoC64144.2024.00095 http://arxiv.org/abs/2308.14222v3 Accurate complex Jacobi rotations 2024-05-11T17:17:00Z

This note shows how to compute, to high relative accuracy under mild assumptions, complex Jacobi rotations for diagonalization of Hermitian matrices of order two, using the correctly rounded functions $\mathtt{cr\_hypot}$ and $\mathtt{cr\_rsqrt}$, proposed for standardization in the C programming language as recommended by the IEEE-754 floating-point standard. The rounding to nearest (ties to even) and the non-stop arithmetic are assumed. The numerical examples compare the observed with theoretical bounds on the relative errors in the rotations' elements, and show that the maximal observed departure of the rotations' determinants from unity is smaller than that of the transformations computed by LAPACK.

2023-08-27T22:46:18Z Supplementary material is available in https://github.com/venovako/AccJac and https://github.com/venovako/libpvn repositories. This is a slightly extended and enhanced version of the manuscript accepted for publication in Journal of Computational and Applied Mathematics J. Comput. Appl. Math. 450 (2024) 116003 Vedran Novaković 10.1016/j.cam.2024.116003 http://arxiv.org/abs/2401.11404v2 PlasmoData.jl -- A Julia Framework for Modeling and Analyzing Complex Data as Graphs 2024-05-10T20:53:38Z

Datasets encountered in scientific and engineering applications appear in complex formats (e.g., images, multivariate time series, molecules, video, text strings, networks). Graph theory provides a unifying framework to model such datasets and enables the use of powerful tools that can help analyze, visualize, and extract value from data. In this work, we present PlasmoData$.$jl, an open-source, Julia framework that uses concepts of graph theory to facilitate the modeling and analysis of complex datasets. The core of our framework is a general data modeling abstraction, which we call a DataGraph. We show how the abstraction and software implementation can be used to represent diverse data objects as graphs and to enable the use of tools from topology, graph theory, and machine learning (e.g., graph neural networks) to conduct a variety of tasks. We illustrate the versatility of the framework by using real datasets: i) an image classification problem using topological data analysis to extract features from the graph model to train machine learning models; ii) a disease outbreak problem where we model multivariate time series as graphs to detect abnormal events; and iii) a technology pathway analysis problem where we highlight how we can use graphs to navigate connectivity. Our discussion also highlights how PlasmoData$.$jl leverages native Julia capabilities to enable compact syntax, scalable computations, and interfaces with diverse packages.

2024-01-21T05:04:38Z 62 pages, 18 figures, 8 tables David L Cole Victor M Zavala http://arxiv.org/abs/2405.05640v1 Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures 2024-05-09T09:28:43Z

The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the modular supercomputing architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.

2024-05-09T09:28:43Z 13 pages, 5 figures, 3 tables, preprint Martin Karp Estela Suarez Jan H. Meinke Måns I. Andersson Philipp Schlatter Stefano Markidis Niclas Jansson http://arxiv.org/abs/2405.09644v1 Optimizing Tensor Contraction Paths: A Greedy Algorithm Approach With Improved Cost Functions 2024-05-08T09:25:39Z

Finding efficient tensor contraction paths is essential for a wide range of problems, including model counting, quantum circuits, graph problems, and language models. There exist several approaches to find efficient paths, such as the greedy and random greedy algorithm by Optimized Einsum (opt_einsum), and the greedy algorithm and hypergraph partitioning approach employed in cotengra. However, these algorithms require a lot of computational time and resources to find efficient contraction paths. In this paper, we introduce a novel approach based on the greedy algorithm by opt_einsum that computes efficient contraction paths in less time. Moreover, with our approach, we are even able to compute paths for large problems where modern algorithms fail.

2024-05-08T09:25:39Z Sheela Orgler Mark Blacher http://arxiv.org/abs/2405.04172v1 An efficient active-set method with applications to sparse approximations and risk minimization 2024-05-07T10:14:33Z

In this paper we present an efficient active-set method for the solution of convex quadratic programming problems with general piecewise-linear terms in the objective, with applications to sparse approximations and risk-minimization. The algorithm is derived by combining a proximal method of multipliers (PMM) with a standard semismooth Newton method (SSN), and is shown to be globally convergent under minimal assumptions. Further local linear (and potentially superlinear) convergence is shown under standard additional conditions. The major computational bottleneck of the proposed approach arises from the solution of the associated SSN linear systems. These are solved using a Krylov-subspace method, accelerated by certain novel general-purpose preconditioners which are shown to be optimal with respect to the proximal penalty parameters. The preconditioners are easy to store and invert, since they exploit the structure of the nonsmooth terms appearing in the problem's objective to significantly reduce their memory requirements. We showcase the efficiency, robustness, and scalability of the proposed solver on a variety of problems arising in risk-averse portfolio selection, $L^1$-regularized partial differential equation constrained optimization, quantile regression, and binary classification via linear support vector machines. We provide computational evidence, on real-world datasets, to demonstrate the ability of the solver to efficiently and competitively handle a diverse set of medium- and large-scale optimization instances.

2024-05-07T10:14:33Z arXiv admin note: substantial text overlap with arXiv:2302.14497, arXiv:2201.10211 Spyridon Pougkakiotis Jacek Gondzio Dionysis Kalogerias http://arxiv.org/abs/2404.17039v2 Differentiating Through Linear Solvers 2024-05-06T19:15:43Z

Computer programs containing calls to linear solvers are a known challenge for automatic differentiation. Previous publications advise against differentiating through the low-level solver implementation, and instead advocate for high-level approaches that express the derivative in terms of a modified linear system that can be solved with a separate solver call. Despite this ubiquitous advice, we are not aware of prior work comparing the accuracy of both approaches. With this article we thus empirically study a simple question: What happens if we ignore common wisdom, and differentiate through linear solvers?

2024-04-25T21:05:01Z Paul Hovland Jan Hückelheim http://arxiv.org/abs/2407.04706v1 Minimization of Nonlinear Energies in Python Using FEM and Automatic Differentiation Tools 2024-05-03T12:51:59Z

This contribution examines the capabilities of the Python ecosystem to solve nonlinear energy minimization problems, with a particular focus on transitioning from traditional MATLAB methods to Python's advanced computational tools, such as automatic differentiation. We demonstrate Python's streamlined approach to minimizing nonlinear energies by analyzing three problem benchmarks - the p-Laplacian, the Ginzburg-Landau model, and the Neo-Hookean hyperelasticity. This approach merely requires the provision of the energy functional itself, making it a simple and efficient way to solve this category of problems. The results show that the implementation is about ten times faster than the MATLAB implementation for large-scale problems. Our findings highlight Python's efficiency and ease of use in scientific computing, establishing it as a preferable choice for implementing sophisticated mathematical models and accelerating the development of numerical simulations.

2024-05-03T12:51:59Z 13 pages, 7 figure, conference PPAM 2024, Ostrava Michal Béreš Jan Valdman http://arxiv.org/abs/2303.02205v2 The Awkward World of Python and C++ 2024-05-01T19:30:36Z

There are undeniable benefits of binding Python and C++ to take advantage of the best features of both languages. This is especially relevant to the HEP and other scientific communities that have invested heavily in the C++ frameworks and are rapidly moving their data analyses to Python. Version 2 of Awkward Array, a Scikit-HEP Python library, introduces a set of header-only C++ libraries that do not depend on any application binary interface. Users can directly include these libraries in their compilation instead of linking against platform-specific libraries. This new development makes the integration of Awkward Arrays into other projects easier and more portable, as the implementation is easily separable from the rest of the Awkward Array codebase. The code is minimal; it does not include all of the code needed to use Awkward Arrays in Python, nor does it include references to Python or pybind11. The C++ users can use it to make arrays and then copy them to Python without any specialized data types - only raw buffers, strings, and integers. This C++ code also simplifies the process of just-in-time (JIT) compilation in ROOT. This implementation approach solves some of the drawbacks, like packaging projects where native dependencies can be challenging. In this paper, we demonstrate the technique to integrate C++ and Python using a header-only approach. We also describe the implementation of a new LayoutBuilder and a GrowableBuffer. Furthermore, examples of wrapping the C++ data into Awkward Arrays and exposing Awkward Arrays to C++ without copying them are discussed.

2023-03-03T20:33:50Z 6 pages, 2 figures; submitted to ACAT 2022 proceedings Manasvi Goyal Ianna Osborne Jim Pivarski http://arxiv.org/abs/2403.14844v2 Extrapolating Solution Paths of Polynomial Homotopies towards Singularities with PHCpack and phcpy 2024-05-01T19:04:28Z

PHCpack is a software package for polynomial homotopy continuation, which provides a robust path tracker [Telen, Van Barel, Verschelde, SISC 2020]. This tracker computes the radius of convergence of Newton's method, estimates the distance to the nearest path, and then applies Padé approximants to predict the next point on the path. A priori step size control is less sensitive to finely tuned tolerances than a posteriori step size control, and is therefore robust. The Python interface phcpy is extended with a new step-by-step tracker and is applied to experiment with extrapolation methods to accurately locate the singular points at the end of solution paths.

2024-03-21T21:28:47Z Accepted by the 8th International Congress on Mathematical Software 2024 Jan Verschelde Kylash Viswanathan http://arxiv.org/abs/2405.00326v1 A Communication Avoiding and Reducing Algorithm for Symmetric Eigenproblem for Very Small Matrices 2024-05-01T05:19:49Z

In this paper, a parallel symmetric eigensolver with very small matrices in massively parallel processing is considered. We define very small matrices that fit the sizes of caches per node in a supercomputer. We assume that the sizes also fit the exa-scale computing requirements of current production runs of an application. To minimize communication time, we added several communication avoiding and communication reducing algorithms based on Message Passing Interface (MPI) non-blocking implementations. A performance evaluation with up to full nodes of the FX10 system indicates that (1) the MPI non-blocking implementation is 3x as efficient as the baseline implementation, (2) the hybrid MPI execution is 1.9x faster than the pure MPI execution, (3) our proposed solver is 2.3x and 22x faster than a ScaLAPACK routine with optimized blocking size and cyclic-cyclic distribution, respectively.

2024-05-01T05:19:49Z This article was submitted to Parallel Computing in December 9, 2013.This article was also published in IPSJ SIG Notes, Vol. 2015-HPC-148, Vol.2, pp.1-17 (February 23, 2015). (a non-reviewed technical report) Takahiro Katagiri Jun'ichi Iwata Kazuyuki Uchida http://arxiv.org/abs/2405.01599v1 Xabclib:A Fully Auto-tuned Sparse Iterative Solver 2024-05-01T00:14:47Z

In this paper, we propose a general application programming interface named OpenATLib for auto-tuning (AT). OpenATLib is designed to establish the reusability of AT functions. By using OpenATLib, we develop a fully auto-tuned sparse iterative solver named Xabclib. Xabclib has several novel run-time AT functions. First, the following new implementations of sparse matrix-vector multiplication (SpMV) for thread processing are implemented:(1) non-zero elements; (2) omission of zero-elements computation for vector reduction; (3) branchless segmented scan (BSS). According to the performance evaluation and the comparison with conventional implementations, the following results are obtained: (1) 14x speedup for non-zero elements and zero-elements computation omission for symmetric SpMV; (2) 4.62x speedup by using BSS. We also develop a "numerical computation policy" that can optimize memory space and computational accuracy. Using the policy, we obtain the following: (1) an averaged 1/45 memory space reduction; (2) avoidance of the "fault convergence" situation, which is a problem of conventional solvers.

2024-05-01T00:14:47Z This article was submitted to SC11, and also was published as a preprint for Research Gate in April 2011. Please refer to: https://www.researchgate.net/publication/258223774_Xabclib_A_Fully_Auto-tuned_Sparse_Iterative_Solver Takahiro Katagiri Takao Sakurai Mitsuyoshi Igai Shoji Itoh Satoshi Ohshima Hisayasu Kuroda Ken Naono Kengo Nakajima http://arxiv.org/abs/2310.00001v2 AsaPy: A Python Library for Aerospace Simulation Analysis 2024-04-30T02:02:41Z

AsaPy is a custom-made Python library designed to simplify and optimize the analysis of aerospace simulation data. Instead of introducing new methodologies, it excels in combining various established techniques, creating a unified, specialized platform. It offers a range of features, including the design of experiment methods, statistical analysis techniques, machine learning algorithms, and data visualization tools. AsaPy's flexibility and customizability make it a viable solution for engineers and researchers who need to quickly gain insights into aerospace simulations. AsaPy is built on top of popular scientific computing libraries, ensuring high performance and scalability. In this work, we provide an overview of the key features and capabilities of AsaPy, followed by an exposition of its architecture and demonstrations of its effectiveness through some use cases applied in military operational simulations. We also evaluate how other simulation tools deal with data science, highlighting AsaPy's strengths and advantages. Finally, we discuss potential use cases and applications of AsaPy and outline future directions for the development and improvement of the library.

2023-07-12T00:02:37Z Joao P. A. Dantas Samara R. Silva Vitor C. F. Gomes Andre N. Costa Adrisson R. Samersla Diego Geraldo Marcos R. O. A. Maximo Takashi Yoneyama