https://arxiv.org/api/n3riah3ZQxAs5QFb6YmnXO59QHQ 2026-06-22T16:20:42Z 2664 480 15 http://arxiv.org/abs/2408.00718v1 A Multi-Reference Relaxation Enforced Neighborhood Search Heuristic in SCIP 2024-08-01T17:03:02Z This paper proposes and evaluates a Multi-Reference Relaxation Enforced Neighborhood Search (MRENS) heuristic within the SCIP solver. This study marks the first integration and evaluation of MRENS in a full-fledged MILP solver, specifically coupled with the recently-introduced Lagromory separator for generating multiple reference solutions. Computational experiments on the MIPLIB 2017 benchmark set show that MRENS, with multiple reference solutions, improves the solver's ability to find higher-quality feasible solutions compared to single-reference approaches. This study highlights the potential of multi-reference heuristics in enhancing primal heuristics in MILP solvers. 2024-08-01T17:03:02Z six pages, new primal heuristic in SCIP, mixed integer linear optimization Suresh Bolusani Gioni Mexi Mathieu Besançon Mark Turner http://arxiv.org/abs/2309.11808v2 Unlocking massively parallel spectral proper orthogonal decompositions in the PySPOD package 2024-07-31T08:55:58Z We propose a parallel (distributed) version of the spectral proper orthogonal decomposition (SPOD) technique. The parallel SPOD algorithm distributes the spatial dimension of the dataset preserving time. This approach is adopted to preserve the non-distributed fast Fourier transform of the data in time, thereby avoiding the associated bottlenecks. The parallel SPOD algorithm is implemented in the PySPOD (https://github.com/MathEXLab/PySPOD) library and makes use of the standard message passing interface (MPI) library, implemented in Python via mpi4py (https://mpi4py.readthedocs.io/en/stable/). An extensive performance evaluation of the parallel package is provided, including strong and weak scalability analyses. The open-source library allows the analysis of large datasets of interest across the scientific community. Here, we present applications in fluid dynamics and geophysics, that are extremely difficult (if not impossible) to achieve without a parallel algorithm. This work opens the path toward modal analyses of big quasi-stationary data, helping to uncover new unexplored spatio-temporal patterns. 2023-09-21T06:28:07Z Marcin Rogowski Brandon C. Y. Yeung Oliver T. Schmidt Romit Maulik Lisandro Dalcin Matteo Parsani Gianmarco Mengaldo http://arxiv.org/abs/2407.20026v1 JAX-SSO: Differentiable Finite Element Analysis Solver for Structural Optimization and Seamless Integration with Neural Networks 2024-07-29T14:02:33Z Differentiable numerical simulations of physical systems have gained rising attention in the past few years with the development of automatic differentiation tools. This paper presents JAX-SSO, a differentiable finite element analysis solver built with JAX, Google's high-performance computing library, to assist efficient structural design in the built environment. With the adjoint method and automatic differentiation feature, JAX-SSO can efficiently evaluate gradients of physical quantities in an automatic way, enabling accurate sensitivity calculation in structural optimization problems. Written in Python and JAX, JAX-SSO is naturally within the machine learning ecosystem so it can be seamlessly integrated with neural networks to train machine learning models with inclusion of physics. Moreover, JAX-SSO supports GPU acceleration to further boost finite element analysis. Several examples are presented to showcase the capabilities and efficiency of JAX-SSO: i) shape optimization of grid-shells and continuous shells; ii) size (thickness) optimization of continuous shells; iii) simultaneous shape and topology optimization of continuous shells; and iv) training of physics-informed neural networks for structural optimization. We believe that JAX-SSO can facilitate research related to differentiable physics and machine learning to further address problems in structural and architectural design. 2024-07-29T14:02:33Z Gaoyuan Wu http://arxiv.org/abs/2407.19987v1 HOBOTAN: Efficient Higher Order Binary Optimization Solver with Tensor Networks and PyTorch 2024-07-29T13:20:11Z In this study, we introduce HOBOTAN, a new solver designed for Higher Order Binary Optimization (HOBO). HOBOTAN supports both CPU and GPU, with the GPU version developed based on PyTorch, offering a fast and scalable system. This solver utilizes tensor networks to solve combinatorial optimization problems, employing a HOBO tensor that maps the problem and performs tensor contractions as needed. Additionally, by combining techniques such as batch processing for tensor optimization and binary-based integer encoding, we significantly enhance the efficiency of combinatorial optimization. In the future, the utilization of increased GPU numbers is expected to harness greater computational power, enabling efficient collaboration between multiple GPUs for high scalability. Moreover, HOBOTAN is designed within the framework of quantum computing, thus providing insights for future quantum computer applications. This paper details the design, implementation, performance evaluation, and scalability of HOBOTAN, demonstrating its effectiveness. 2024-07-29T13:20:11Z Shoya Yasuda Shunsuke Sotobayashi Yuichiro Minato http://arxiv.org/abs/2403.04273v2 GenML: A Python Library to Generate the Mittag-Leffler Correlated Noise 2024-07-28T07:50:50Z Mittag-Leffler correlated noise (M-L noise) plays a crucial role in the dynamics of complex systems, yet the scientific community has lacked tools for its direct generation. Addressing this gap, our work introduces GenML, a Python library specifically designed for generating M-L noise. We detail the architecture and functionalities of GenML and its underlying algorithmic approach, which enables the precise simulation of M-L noise. The effectiveness of GenML is validated through quantitative analyses of autocorrelation functions and diffusion behaviors, showcasing its capability to accurately replicate theoretical noise properties. Our contribution with GenML enables the effective application of M-L noise data in numerical simulation and data-driven methods for describing complex systems, moving beyond mere theoretical modeling. 2024-03-07T07:13:12Z 7 pages, 4 figures Xiang Qu Hui Zhao Wenjie Cai Gongyi Wang Zihan Huang http://arxiv.org/abs/2403.05048v2 Efficient Calculations for Inverse of $k$-diagonal Circulant Matrices and Cyclic Banded Matrices 2024-07-26T13:07:06Z $k$-diagonal circulant matrices and cyclic banded matrices are widely used in numerical simulations and signal processing of circular linear systems. Algorithms that directly involve or specify linear or quadratic complexity for the inverses of these two types of matrices are rare. We find that the inverse of a $k$-diagonal circulant matrix can be uniquely determined by a recursive formula, which can be derived within $O(k^3 \log n+k^4)$. Similarly for the inverse of a cyclic banded matrix, its inverse can be uniquely determined by a series of recursive formulas, with the initial terms of these recursions computable within $O(k^3 n+k^5)$. The additional costs for solving the complete inverses of these two types of matrices are $kn$ and $kn^2$. Our calculations enable rapid representation with most processes defined by explicit formulas. Additionally, most algorithms for inverting $k$-diagonal circulant matrices rely on the Fast Fourier Transform, which is not applicable to finite fields, while our algorithms can be applied to computations in finite fields. 2024-03-08T04:50:04Z Chen Wang Hailong Yu Chao Wang http://arxiv.org/abs/2402.02290v2 Goodness-of-Fit and Clustering of Spherical Data: the QuadratiK package in R and Python 2024-07-25T04:43:32Z We introduce the QuadratiK package that incorporates innovative data analysis methodologies. The presented software, implemented in both R and Python, offers a comprehensive set of goodness-of-fit tests and clustering techniques using kernel-based quadratic distances, thereby bridging the gap between the statistical and machine learning literatures. Our software implements one, two and k-sample tests for goodness of fit, providing an efficient and mathematically sound way to assess the fit of probability distributions. Expanded capabilities of our software include supporting tests for uniformity on the d-dimensional Sphere based on Poisson kernel densities. Particularly noteworthy is the incorporation of a unique clustering algorithm specifically tailored for spherical data that leverages a mixture of Poisson kernel-based densities on the sphere. Alongside this, our software includes additional graphical functions, aiding the users in validating, as well as visualizing and representing clustering results. This enhances interpretability and usability of the analysis. In summary, our R and Python packages serve as a powerful suite of tools, offering researchers and practitioners the means to delve deeper into their data, draw robust inference, and conduct potentially impactful analyses and inference across a wide array of disciplines. 2024-02-03T23:04:32Z 36 pages, 9 figures Giovanni Saraceno Marianthi Markatou Raktim Mukhopadhyay Mojgan Golzy http://arxiv.org/abs/2408.05217v1 Implementing a Restricted Function Space Class in Firedrake 2024-07-24T16:47:01Z The implementation process of a $\texttt{RestrictedFunctionSpace}$ class in Firedrake, a Python library which numerically solves partial differential equations through the use of the finite element method, is documented. This includes an introduction to the current $\texttt{FunctionSpace}$ class in Firedrake, and the key features that it has. With the current $\texttt{FunctionSpace}$ class, the limitations of the capabilities of the solvers in Firedrake when imposing Dirichlet boundary conditions are explored, as well as what the $\texttt{RestrictedFunctionSpace}$ class does differently to remove these issues. These will be considered in both a mathematical way, and in the code as an abstraction of the mathematical ideas presented. Finally, the benefits to the user of the $\texttt{RestrictedFunctionSpace}$ class are considered, and demonstrated through tests and comparisons. This leads to the conclusion that in particular, the eigensolver in Firedrake is improved through the use of the $\texttt{RestrictedFunctionSpace}$, through the removal of eigenvalues associated with the Dirichlet boundary conditions for a system. 2024-07-24T16:47:01Z MSci Research Project, 51 pages, 19 figures Emma Rothwell http://arxiv.org/abs/2407.13726v1 Compressing Structured Tensor Algebra 2024-07-18T17:25:17Z Tensor algebra is a crucial component for data-intensive workloads such as machine learning and scientific computing. As the complexity of data grows, scientists often encounter a dilemma between the highly specialized dense tensor algebra and efficient structure-aware algorithms provided by sparse tensor algebra. In this paper, we introduce DASTAC, a framework to propagate the tensors's captured high-level structure down to low-level code generation by incorporating techniques such as automatic data layout compression, polyhedral analysis, and affine code generation. Our methodology reduces memory footprint by automatically detecting the best data layout, heavily benefits from polyhedral optimizations, leverages further optimizations, and enables parallelization through MLIR. Through extensive experimentation, we show that DASTAC achieves 1 to 2 orders of magnitude speedup over TACO, a state-of-the-art sparse tensor compiler, and StructTensor, a state-of-the-art structured tensor algebra compiler, with a significantly lower memory footprint. 2024-07-18T17:25:17Z Mahdi Ghorbani Emilien Bauer Tobias Grosser Amir Shaikhha http://arxiv.org/abs/2307.05740v2 Minimum Cost Loop Nests for Contraction of a Sparse Tensor with a Tensor Network 2024-07-15T18:38:54Z Sparse tensor decomposition and completion are common in numerous applications, ranging from machine learning to computational quantum chemistry. Typically, the main bottleneck in optimization of these models are contractions of a single large sparse tensor with a network of several dense matrices or tensors (SpTTN). Prior works on high-performance tensor decomposition and completion have focused on performance and scalability optimizations for specific SpTTN kernels. We present algorithms and a runtime system for identifying and executing the most efficient loop nest for any SpTTN kernel. We consider both enumeration of such loop nests for autotuning and efficient algorithms for finding the lowest cost loop-nest for simpler metrics, such as buffer size or cache miss models. Our runtime system identifies the best choice of loop nest without user guidance, and also provides a distributed-memory parallelization of SpTTN kernels. We evaluate our framework using both real-world and synthetic tensors. Our results demonstrate that our approach outperforms available generalized state-of-the-art libraries and matches the performance of specialized codes. 2023-07-11T19:08:06Z 15 pages, 7 figures Raghavendra Kanakagiri Edgar Solomonik 10.1145/3626183.3659985 http://arxiv.org/abs/2407.10372v1 MPAT: Modular Petri Net Assembly Toolkit 2024-07-15T00:41:02Z We present a Python package called Modular Petri Net Assembly Toolkit (MPAT) that empowers users to easily create large-scale, modular Petri Nets for various spatial configurations, including extensive spatial grids or those derived from shape files, augmented with heterogeneous information layers. Petri Nets are powerful discrete event system modeling tools in computational biology and engineering. However, their utility for automated construction of large-scale spatial models has been limited by gaps in existing modeling software packages. MPAT addresses this gap by supporting the development of modular Petri Net models with flexible spatial geometries. 2024-07-15T00:41:02Z SoftwareX, 2024, Volume 28, pgs 1-8 Stefano Chiaradonna Petar Jevtic Beckett Sterner 10.1016/j.softx.2024.101913 http://arxiv.org/abs/2407.09621v1 Acceleration of Tensor-Product Operations with Tensor Cores 2024-07-12T18:16:38Z In this paper, we explore the acceleration of tensor product operations in finite element methods, leveraging the computational power of the NVIDIA A100 GPU Tensor Cores. We provide an accessible overview of the necessary mathematical background and discuss our implementation strategies. Our study focuses on two common programming approaches for NVIDIA Tensor Cores: the C++ Warp Matrix Functions in nvcuda::wmma and the inline Parallel Thread Execution (PTX) instructions mma.sync.aligned. A significant focus is placed on the adoption of the versatile inline PTX instructions combined with a conflict-free shared memory access pattern, a key to unlocking superior performance. When benchmarked against traditional CUDA Cores, our approach yields a remarkable 2.3-fold increase in double precision performance, achieving 8 TFLOPS/s-45% of the theoretical maximum. Furthermore, in half-precision computations, numerical experiments demonstrate a fourfold enhancement in solving the Poisson equation using the flexible GMRES (FGMRES) method, preconditioned by a multigrid method in 3D. This is achieved while maintaining the same discretization error as observed in double precision computations. These results highlight the considerable benefits of using Tensor Cores for finite element operators with tensor products, achieving an optimal balance between computational speed and precision. 2024-07-12T18:16:38Z Cu Cui http://arxiv.org/abs/2207.14341v3 Tensor Decompositions for Count Data that Leverage Stochastic and Deterministic Optimization 2024-07-12T01:48:43Z There is growing interest to extend low-rank matrix decompositions to multi-way arrays, or tensors. One fundamental low-rank tensor decomposition is the canonical polyadic decomposition (CPD). The challenge of fitting a low-rank, nonnegative CPD model to Poisson-distributed count data is of particular interest. Several popular algorithms use local search methods to approximate the maximum likelihood estimator (MLE) of the Poisson CPD model. This work presents two new algorithms that extend state-of-the-art local methods for Poisson CPD. Hybrid GCP-CPAPR combines Generalized Canonical Decomposition (GCP) with stochastic optimization and CP Alternating Poisson Regression (CPAPR), a deterministic algorithm, to increase the probability of converging to the MLE over either method used alone. Restarted CPAPR with SVDrop uses a heuristic based on the singular values of the CPD model unfoldings to identify convergence toward optimizers that are not the MLE and restarts within the feasible domain of the optimization problem, thus reducing overall computational cost when using a multi-start strategy. We provide empirical evidence that indicates our approaches outperform existing methods with respect to converging to the Poisson CPD MLE. 2022-07-18T04:02:56Z Jeremy M. Myers Daniel M. Dunlavy http://arxiv.org/abs/2204.13740v3 Black-Scholes Option Pricing on Intel CPUs and GPUs: Implementation on SYCL and Optimization Techniques 2024-07-10T17:27:44Z The Black-Scholes option pricing problem is one of the widely used financial benchmarks. We explore the possibility of developing a high-performance portable code using the SYCL (Data Parallel C++) programming language. We start from a C++ code parallelized with OpenMP and show optimization techniques that are beneficial on modern Intel Xeon CPUs. Then, we port the code to SYCL and consider important optimization aspects on CPUs and GPUs (device-friendly memory access patterns, relevant data management, employing vector data types). We show that the developed SYCL code is only 10% inferior to the optimized C++ code when running on CPUs while achieving reasonable performance on Intel GPUs. We hope that our experience of developing and optimizing the code on SYCL can be useful to other researchers who plan to port their high-performance C++ codes to SYCL to get all the benefits of single-source programming. 2022-04-28T18:54:47Z 15 pages, 2 figures Lecture Notes in Computer Science, vol 13708 (Springer, Cham), 2022, pp. 48-62 Elena Panova Valentin Volokitin Anton Gorshkov Iosif Meyerov 10.1007/978-3-031-22941-1_4 http://arxiv.org/abs/2308.05244v2 Hybrid approach to the joint spectral radius computation 2024-07-08T13:54:29Z In this paper we propose a modification to the invariant polytope algorithm (ipa) using ideas of the finite expressible tree algorithm (feta) by Möller and Reif. We show that our new feta-flavoured-ipa applies to a wider range of matrix families. 2023-08-09T22:31:17Z Thomas Mejstrik Ulrich Reif