https://arxiv.org/api/ZXNy+jTGkjVdFJWCQ2MM/j/Om48 2026-06-22T12:29:58Z 2664 435 15 http://arxiv.org/abs/2410.09837v1 Tomographic Model Based Iterative Reconstruction of Symmetric Objects 2024-10-13T13:35:30Z Computed Tomography (CT) reconstruction of objects with cylindrical symmetry can be performed with a single projection. When the measured rays are parallel, and the axis of symmetry is perpendicular to the optical axis, the data can be modeled with the so-called Abel Transform. The Abel Transform has been extensively studied and many methods exist for accurate reconstruction. However, most CT geometries are cone-beam rather than parallel-beam. Using Abel methods for reconstruction in these cases can lead to distortions and reconstruction artifacts. Here, we develop analytic and model-based iterative reconstruction (MBIR) methods to reconstruct symmetric objects with an arbitrary axis of symmetry from a cone-beam geometry. The MBIR methods demonstrate superior results relative to the analytic inversion methods by mitigating artifacts and reducing noise while retaining fine image features. We demonstrate the efficacy of our methods using simulated and experimentally-acquired x-ray and neutron projections. 2024-10-13T13:35:30Z Kyle M. Champley Ibrahim Oksuz Matthew G. Bisbee Joseph W. Tringe Brian Maddox http://arxiv.org/abs/2410.09764v1 Adaptive finite element methods based on flux and stress equilibration using FEniCSx 2024-10-13T07:41:44Z This contribution shows how a-posteriori error estimators based on equilibrated fluxes - H(div) functions fulfilling the underlying conservation law - can be implemented in FEniCSx. Therefore, dolfinx_eqlb is introduced, its algorithmic structure is described and classical benchmarks for adaptive solution procedures for the Poisson problem and linear elasticity are presented. 2024-10-13T07:41:44Z Maximilian Brodbeck Fleurianne Bertrand Tim Ricken http://arxiv.org/abs/2410.09497v1 Multigrid methods for the Stokes problem on GPU systems 2024-10-12T11:22:43Z This paper presents a matrix-free multigrid method for solving the Stokes problem, discretized using $H^{\text{div}}$-conforming discontinuous Galerkin methods. We employ a Schur complement method combined with the fast diagonalization method for the efficient evaluation of the local solver within the multiplicative Schwarz smoother. This approach operates directly on both the velocity and pressure spaces, eliminating the need for a global Schur complement approximation. By leveraging the tensor product structure of Raviart-Thomas elements and an optimized, conflict-free shared memory access pattern, the matrix-free operator evaluation demonstrates excellent performance numbers, reaching over one billion degrees of freedom per second on a single NVIDIA A100 GPU. Numerical results indicate efficiency comparable to that of the three-dimensional Poisson problem. 2024-10-12T11:22:43Z Cu Cui Guido Kanschat 10.1016/j.compfluid.2025.106703 http://arxiv.org/abs/2212.02698v3 CDOpt: A Python Package for a Class of Riemannian Optimization 2024-10-12T08:02:15Z Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc. These geometrical materials can be challenging to determine in general. Existing packages only accommodate a few well-known manifolds whose geometrical materials are easily accessible. For other manifolds which are not contained in these packages, the users have to develop the geometric materials by themselves. In addition, it is not always tractable to adopt advanced features from various state-of-the-art unconstrained optimization solvers to Riemannian optimization approaches. We introduce CDOpt (available at https://cdopt.github.io/), a user-friendly Python package for a class Riemannian optimization. Based on constraint dissolving approaches, Riemannian optimization problems are transformed into their equivalent unconstrained counterparts in CDOpt. Therefore, solving Riemannian optimization problems through CDOpt directly benefits from various existing solvers and the rich expertise gained over decades for unconstrained optimization. Moreover, all the computations in CDOpt related to any manifold in question are conducted on its constraints expression, hence users can easily define new manifolds in CDOpt without any background on differential geometry. Furthermore, CDOpt extends the neural layers from PyTorch and Flax, thus allows users to train manifold constrained neural networks directly by the solvers for unconstrained optimization. Extensive numerical experiments demonstrate that CDOpt is highly efficient and robust in solving various classes of Riemannian optimization problems. 2022-12-06T01:43:29Z 48 pages Nachuan Xiao Xiaoyin Hu Xin Liu Kim-Chuan Toh http://arxiv.org/abs/2409.02969v3 LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch 2024-10-11T16:31:46Z Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with thousands / millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order / meta-heuristic methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with thousands / millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community. 2024-09-04T07:44:43Z NeurIPS 2024 Xiaoyuan Zhang Liang Zhao Yingying Yu Xi Lin Yifan Chen Han Zhao Qingfu Zhang http://arxiv.org/abs/2405.10478v2 GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation 2024-10-11T05:34:30Z In this paper we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equation (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific Computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://github$.$com/zjwegert/GridapTopOpt.jl. 2024-05-17T00:36:26Z Struct Multidisc Optim 68, 22 (2025) Zachary J. Wegert Jordi Manyer Connor Mallon Santiago Badia Vivien J. Challis 10.1007/s00158-024-03927-3 http://arxiv.org/abs/2410.07552v1 Methods for Few-View CT Image Reconstruction 2024-10-10T02:49:06Z Computed Tomography (CT) is an essential non-destructive three dimensional imaging modality used in medicine, security screening, and inspection of manufactured components. Typical CT data acquisition entails the collection of a thousand or more projections through the object under investigation through a range of angles covering one hundred eighty degrees or more. It may be desirable or required that the number of projections angles be reduced by one or two orders of magnitude for reasons such as acquisition time or dose. Unless specialized reconstruction algorithms are applied, reconstructing with fewer views will result in streak artifacts and failure to resolve object boundaries at certain orientations. These artifacts may substantially diminish the usefulness of the reconstructed CT volumes. Here we develop constrained and regularized numerical optimization methods to reconstruct CT volumes from 4-28 projections. These methods entail utilization of novel data fidelity and convex and non-convex regularization terms. In addition, the methods outlined here are usually carried out by a sequence of two or three numerical optimization methods in sequence. The efficacy of our methods is demonstrated on four measured and three simulated few-view CT data sets. We show that these methods outperform other state of the art few-view numerical optimization methods. 2024-10-10T02:49:06Z Kyle M. Champley Michael B. Zellner Joseph W. Tringe Harry E. Martz http://arxiv.org/abs/2410.06770v1 BLAS-like Interface for Binary Tensor Contractions 2024-10-09T11:03:14Z In the world of linear algebra computation, a well-established standard exists called BLAS(Basic Linear Algebra Subprograms). This standard has been crucial for the development of software using linear algebra operations. Its benefits include portability with efficiency and mitigation of suboptimal re-implementations of linear algebra operations. Multilinear algebra is an extension of linear algebra in which the central objects are tensors, which are generalizations of vectors and matrices. Though tensor operations are becoming more common, they do not have a standard like BLAS. Such standardization would be beneficial and decrease the now-visible replication of work, as many libraries nowadays use their own implementations. This master thesis aims to work towards such a standard by discovering whether or not a BLAS-like interface is possible for the operation binary tensor contraction. To answer this, an interface has been developed in the programming language C together with an implementation and tested to see if it would be sufficient. The interface developed is: xGETT(RANKA, EXTA, INCA, A, RANKB, EXTB, INCB, B, CONTS, CONTA, CONTB, PERM, INCC, C) with the implementation and tests, it has been deemed sufficient as a BLAS-like interface for binary tensor contractions and possible to use in a BLAS-like standardization for tensor operations. 2024-10-09T11:03:14Z master thesis report, 21 pages, 6 figures Niklas Hörnblad http://arxiv.org/abs/2410.05811v1 lintsampler: Easy random sampling via linear interpolation 2024-10-08T08:42:29Z 'lintsampler' provides a Python implementation of a technique we term 'linear interpolant sampling': an algorithm to efficiently draw pseudo-random samples from an arbitrary probability density function (PDF). First, the PDF is evaluated on a grid-like structure. Then, it is assumed that the PDF can be approximated between grid vertices by the (multidimensional) linear interpolant. With this assumption, random samples can be efficiently drawn via inverse transform sampling. lintsampler is primarily written with 'numpy', drawing some additional functionality from 'scipy'. Under the most basic usage of lintsampler, the user provides a Python function defining the target PDF and some parameters describing a grid-like structure to the 'LintSampler' class, and is then able to draw samples via the 'sample' method. Additionally, there is functionality for the user to set the random seed, employ quasi-Monte Carlo sampling, or sample within a premade grid ('DensityGrid') or tree ('DensityTree') structure. 2024-10-08T08:42:29Z Accepted by Journal of Open Source Software. Describes code repository at https://github.com/aneeshnaik/lintsampler Journal of Open Source Software, 2024, 9(102), 6906 Aneesh P. Naik Michael S. Petersen 10.21105/joss.06906 http://arxiv.org/abs/2410.05471v1 Exact sensitivity analysis of Markov reward processes via algebraic geometry 2024-10-07T20:08:02Z We introduce a new approach for deterministic sensitivity analysis of Markov reward processes, commonly used in cost-effectiveness analyses, via reformulation into a polynomial system. Our approach leverages cylindrical algebraic decomposition (CAD), a technique arising from algebraic geometry that provides an exact description of all solutions to a polynomial system. While it is typically intractable to build a CAD for systems with more than a few variables, we show that a special class of polynomial systems, which includes the polynomials arising from Markov reward processes, can be analyzed much more tractably. We establish several theoretical results about such systems and develop a specialized algorithm to construct their CAD, which allows us to perform exact, multi-way sensitivity analysis for common health economic analyses. We develop an open-source software package that implements our algorithm. Finally, we apply it to two case studies, one with synthetic data and one that re-analyzes a previous cost-effectiveness analysis from the literature, demonstrating advantages of our approach over standard techniques. Our software and code are available at: \url{https://github.com/mmaaz-git/markovag}. 2024-10-07T20:08:02Z 46 pages Timothy C. Y. Chan Muhammad Maaz http://arxiv.org/abs/2410.01911v1 A C++ implementation of the discrete adjoint sensitivity analysis method for explicit adaptive Runge-Kutta methods enabled by automatic adjoint differentiation and SIMD vectorization 2024-10-02T18:09:48Z A C++ library for sensitivity analysis of optimisation problems involving ordinary differential equations (ODEs) enabled by automatic differentiation (AD) and SIMD (Single Instruction, Multiple data) vectorization is presented. The discrete adjoint sensitivity analysis method is implemented for adaptive explicit Runge-Kutta (ERK) methods. Automatic adjoint differentiation (AAD) is employed for efficient evaluations of products of vectors and the Jacobian matrix of the right hand side of the ODE system. This approach avoids the low-level drawbacks of the black box approach of employing AAD on the entire ODE solver and opens the possibility to leverage parallelization. SIMD vectorization is employed to compute the vector-Jacobian products concurrently. We study the performance of other methods and implementations of sensitivity analysis and we find that our algorithm presents a small advantage compared to equivalent existing software. 2024-10-02T18:09:48Z 30 pages, 15 figures, preprint Rui Martins Evgeny Lakshtanov http://arxiv.org/abs/2410.01799v1 Efficient $1$-bit tensor approximations 2024-10-02T17:56:32Z We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of $\{-1, 1\}$-valued vectors. For any matrix $A \in \mathbb{R}^{m \times n}$, $$A - R_w = S_w C_w T_w^\top = \sum_{j=1}^w c_j \cdot \mathbf{s}_j \mathbf{t}_j^\top$$ is a {\it $w$-width signed cut decomposition of $A$}. Here $C_w = "diag"(\mathbf{c}_w)$ for some $\mathbf{c}_w \in \mathbb{R}^w,$ and $S_w, T_w$, and the vectors $\mathbf{s}_j, \mathbf{t}_j$ are $\{-1, 1\}$-valued. To store $(S_w, T_w, C_w)$, we may pack $w \cdot (m + n)$ bits, and require only $w$ floating point numbers. As a function of $w$, $\|R_w\|_F$ exhibits exponential decay when applied to #f32 matrices with i.i.d. $\mathcal N (0, 1)$ entries. Choosing $w$ so that $(S_w, T_w, C_w)$ has the same memory footprint as a \textit{f16} or \textit{bf16} matrix, the relative error is comparable. Our algorithm yields efficient signed cut decompositions in $20$ lines of pseudocode. It reflects a simple modification from a celebrated 1999 paper [1] of Frieze and Kannan. As a first application, we approximate the weight matrices in the open \textit{Mistral-7B-v0.1} Large Language Model to a $50\%$ spatial compression. Remarkably, all $226$ remainder matrices have a relative error $<6\%$ and the expanded model closely matches \textit{Mistral-7B-v0.1} on the {\it huggingface} leaderboard [2]. Benchmark performance degrades slowly as we reduce the spatial compression from $50\%$ to $25\%$. We optimize our open source \textit{rust} implementation [3] with \textit{simd} instructions on \textit{avx2} and \textit{avx512} architectures. We also extend our algorithm from matrices to tensors of arbitrary order and use it to compress a picture of the first author's cat Angus. 2024-10-02T17:56:32Z 16 pages, one cat picture reused a lot Alex W. Neal Riasanovsky Sarah El Kazdadi http://arxiv.org/abs/2404.12703v2 GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems 2024-10-02T08:34:36Z This work presents GALAEXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALAEXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. GALAEXI exhibits excellent strong scaling properties up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALAEXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALAEXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALAEXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures. 2024-04-19T08:21:05Z 19 pages, 12 figures, 3 tables. Accepted Manuscript. Code available at: https://github.com/flexi-framework/galaexi Computer Physics Communications 306 (2025) 109388 Daniel Kempf Marius Kurz Marcel Blind Patrick Kopper Philipp Offenhäuser Anna Schwarz Spencer Starr Jens Keim Andrea Beck 10.1016/j.cpc.2024.109388 http://arxiv.org/abs/2410.14786v1 BDDC Preconditioning on GPUs for Cardiac Simulations 2024-10-01T08:21:10Z In order to understand cardiac arrhythmia, computer models for electrophysiology are essential. In the EuroHPC MicroCARD project, we adapt the current models and leverage modern computing resources to model diseased hearts and their microstructure accurately. Towards this objective, we develop a portable, highly efficient, and performing BDDC preconditioner and solver implementation, demonstrating scalability with over 90% efficiency on up to 100 GPUs. 2024-10-01T08:21:10Z LNCS, volume 14352 (2023), 265-268 Fritz Goebel Terry Cojean Hartwig Anzt 10.1007/978-3-031-48803-0_30 http://arxiv.org/abs/2312.07438v3 Efficient Implementation of Interior-Point Methods for Quantum Relative Entropy 2024-09-28T04:21:26Z Quantum Relative Entropy (QRE) programming is a recently popular and challenging class of convex optimization problems with significant applications in quantum computing and quantum information theory. We are interested in modern interior point (IP) methods based on optimal self-concordant barriers for the QRE cone. A range of theoretical and numerical challenges associated with such barrier functions and the QRE cones have hindered the scalability of IP methods. To address these challenges, we propose a series of numerical and linear algebraic techniques and heuristics aimed at enhancing the efficiency of gradient and Hessian computations for the self-concordant barrier function, solving linear systems, and performing matrix-vector products. We also introduce and deliberate about some interesting concepts related to QRE such as symmetric quantum relative entropy (SQRE). We also introduce a two-phase method for performing facial reduction that can significantly improve the performance of QRE programming. Our new techniques have been implemented in the latest version (DDS 2.2) of the software package DDS. In addition to handling QRE constraints, DDS accepts any combination of several other conic and non-conic convex constraints. Our comprehensive numerical experiments encompass several parts including 1) a comparison of DDS 2.2 with Hypatia for the nearest correlation matrix problem, 2) using DDS for combining QRE constraints with various other constraint types, and 3) calculating the key rate for quantum key distribution (QKD) channels and presenting results for several QKD protocols. 2023-12-12T17:05:38Z Special Issue of INFORMS Journal on Computing: Quantum Computing and Operations Research Mehdi Karimi Levent Tuncel