Tomographic Model Based Iterative Reconstruction of Symmetric Objects

2024-10-13T13:35:30Z

Computed Tomography (CT) reconstruction of objects with cylindrical symmetry can be performed with a single projection. When the measured rays are parallel, and the axis of symmetry is perpendicular to the optical axis, the data can be modeled with the so-called Abel Transform. The Abel Transform has been extensively studied and many methods exist for accurate reconstruction. However, most CT geometries are cone-beam rather than parallel-beam. Using Abel methods for reconstruction in these cases can lead to distortions and reconstruction artifacts. Here, we develop analytic and model-based iterative reconstruction (MBIR) methods to reconstruct symmetric objects with an arbitrary axis of symmetry from a cone-beam geometry. The MBIR methods demonstrate superior results relative to the analytic inversion methods by mitigating artifacts and reducing noise while retaining fine image features. We demonstrate the efficacy of our methods using simulated and experimentally-acquired x-ray and neutron projections.

Adaptive finite element methods based on flux and stress equilibration using FEniCSx

2024-10-13T07:41:44Z

This contribution shows how a-posteriori error estimators based on equilibrated fluxes - H(div) functions fulfilling the underlying conservation law - can be implemented in FEniCSx. Therefore, dolfinx_eqlb is introduced, its algorithmic structure is described and classical benchmarks for adaptive solution procedures for the Poisson problem and linear elasticity are presented.

Multigrid methods for the Stokes problem on GPU systems

2024-10-12T11:22:43Z

This paper presents a matrix-free multigrid method for solving the Stokes problem, discretized using $H^{\text{div}}$-conforming discontinuous Galerkin methods. We employ a Schur complement method combined with the fast diagonalization method for the efficient evaluation of the local solver within the multiplicative Schwarz smoother. This approach operates directly on both the velocity and pressure spaces, eliminating the need for a global Schur complement approximation. By leveraging the tensor product structure of Raviart-Thomas elements and an optimized, conflict-free shared memory access pattern, the matrix-free operator evaluation demonstrates excellent performance numbers, reaching over one billion degrees of freedom per second on a single NVIDIA A100 GPU. Numerical results indicate efficiency comparable to that of the three-dimensional Poisson problem.

CDOpt: A Python Package for a Class of Riemannian Optimization

2024-10-12T08:02:15Z

Optimization over the embedded submanifold defined by constraints $c(x) = 0$ has attracted much interest over the past few decades due to its wide applications in various areas. Plenty of related optimization packages have been developed based on Riemannian optimization approaches, which rely on some basic geometrical materials of Riemannian manifolds, including retractions, vector transports, etc. These geometrical materials can be challenging to determine in general. Existing packages only accommodate a few well-known manifolds whose geometrical materials are easily accessible. For other manifolds which are not contained in these packages, the users have to develop the geometric materials by themselves. In addition, it is not always tractable to adopt advanced features from various state-of-the-art unconstrained optimization solvers to Riemannian optimization approaches. We introduce CDOpt (available at https://cdopt.github.io/), a user-friendly Python package for a class Riemannian optimization. Based on constraint dissolving approaches, Riemannian optimization problems are transformed into their equivalent unconstrained counterparts in CDOpt. Therefore, solving Riemannian optimization problems through CDOpt directly benefits from various existing solvers and the rich expertise gained over decades for unconstrained optimization. Moreover, all the computations in CDOpt related to any manifold in question are conducted on its constraints expression, hence users can easily define new manifolds in CDOpt without any background on differential geometry. Furthermore, CDOpt extends the neural layers from PyTorch and Flax, thus allows users to train manifold constrained neural networks directly by the solvers for unconstrained optimization. Extensive numerical experiments demonstrate that CDOpt is highly efficient and robust in solving various classes of Riemannian optimization problems.

LibMOON: A Gradient-based MultiObjective OptimizatioN Library in PyTorch

2024-10-11T16:31:46Z

Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with thousands / millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order / meta-heuristic methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with thousands / millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.

GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisation

2024-10-11T05:34:30Z

In this paper we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equation (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific Computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://github$.$com/zjwegert/GridapTopOpt.jl.

Methods for Few-View CT Image Reconstruction

2024-10-10T02:49:06Z

Computed Tomography (CT) is an essential non-destructive three dimensional imaging modality used in medicine, security screening, and inspection of manufactured components. Typical CT data acquisition entails the collection of a thousand or more projections through the object under investigation through a range of angles covering one hundred eighty degrees or more. It may be desirable or required that the number of projections angles be reduced by one or two orders of magnitude for reasons such as acquisition time or dose. Unless specialized reconstruction algorithms are applied, reconstructing with fewer views will result in streak artifacts and failure to resolve object boundaries at certain orientations. These artifacts may substantially diminish the usefulness of the reconstructed CT volumes. Here we develop constrained and regularized numerical optimization methods to reconstruct CT volumes from 4-28 projections. These methods entail utilization of novel data fidelity and convex and non-convex regularization terms. In addition, the methods outlined here are usually carried out by a sequence of two or three numerical optimization methods in sequence. The efficacy of our methods is demonstrated on four measured and three simulated few-view CT data sets. We show that these methods outperform other state of the art few-view numerical optimization methods.

BLAS-like Interface for Binary Tensor Contractions

2024-10-09T11:03:14Z

In the world of linear algebra computation, a well-established standard exists called BLAS(Basic Linear Algebra Subprograms). This standard has been crucial for the development of software using linear algebra operations. Its benefits include portability with efficiency and mitigation of suboptimal re-implementations of linear algebra operations. Multilinear algebra is an extension of linear algebra in which the central objects are tensors, which are generalizations of vectors and matrices. Though tensor operations are becoming more common, they do not have a standard like BLAS. Such standardization would be beneficial and decrease the now-visible replication of work, as many libraries nowadays use their own implementations. This master thesis aims to work towards such a standard by discovering whether or not a BLAS-like interface is possible for the operation binary tensor contraction. To answer this, an interface has been developed in the programming language C together with an implementation and tested to see if it would be sufficient. The interface developed is: xGETT(RANKA, EXTA, INCA, A, RANKB, EXTB, INCB, B, CONTS, CONTA, CONTB, PERM, INCC, C) with the implementation and tests, it has been deemed sufficient as a BLAS-like interface for binary tensor contractions and possible to use in a BLAS-like standardization for tensor operations.

lintsampler: Easy random sampling via linear interpolation

2024-10-08T08:42:29Z

'lintsampler' provides a Python implementation of a technique we term 'linear interpolant sampling': an algorithm to efficiently draw pseudo-random samples from an arbitrary probability density function (PDF). First, the PDF is evaluated on a grid-like structure. Then, it is assumed that the PDF can be approximated between grid vertices by the (multidimensional) linear interpolant. With this assumption, random samples can be efficiently drawn via inverse transform sampling. lintsampler is primarily written with 'numpy', drawing some additional functionality from 'scipy'. Under the most basic usage of lintsampler, the user provides a Python function defining the target PDF and some parameters describing a grid-like structure to the 'LintSampler' class, and is then able to draw samples via the 'sample' method. Additionally, there is functionality for the user to set the random seed, employ quasi-Monte Carlo sampling, or sample within a premade grid ('DensityGrid') or tree ('DensityTree') structure.

Exact sensitivity analysis of Markov reward processes via algebraic geometry

2024-10-07T20:08:02Z

We introduce a new approach for deterministic sensitivity analysis of Markov reward processes, commonly used in cost-effectiveness analyses, via reformulation into a polynomial system. Our approach leverages cylindrical algebraic decomposition (CAD), a technique arising from algebraic geometry that provides an exact description of all solutions to a polynomial system. While it is typically intractable to build a CAD for systems with more than a few variables, we show that a special class of polynomial systems, which includes the polynomials arising from Markov reward processes, can be analyzed much more tractably. We establish several theoretical results about such systems and develop a specialized algorithm to construct their CAD, which allows us to perform exact, multi-way sensitivity analysis for common health economic analyses. We develop an open-source software package that implements our algorithm. Finally, we apply it to two case studies, one with synthetic data and one that re-analyzes a previous cost-effectiveness analysis from the literature, demonstrating advantages of our approach over standard techniques. Our software and code are available at: \url{https://github.com/mmaaz-git/markovag}.

A C++ implementation of the discrete adjoint sensitivity analysis method for explicit adaptive Runge-Kutta methods enabled by automatic adjoint differentiation and SIMD vectorization

2024-10-02T18:09:48Z

A C++ library for sensitivity analysis of optimisation problems involving ordinary differential equations (ODEs) enabled by automatic differentiation (AD) and SIMD (Single Instruction, Multiple data) vectorization is presented. The discrete adjoint sensitivity analysis method is implemented for adaptive explicit Runge-Kutta (ERK) methods. Automatic adjoint differentiation (AAD) is employed for efficient evaluations of products of vectors and the Jacobian matrix of the right hand side of the ODE system. This approach avoids the low-level drawbacks of the black box approach of employing AAD on the entire ODE solver and opens the possibility to leverage parallelization. SIMD vectorization is employed to compute the vector-Jacobian products concurrently. We study the performance of other methods and implementations of sensitivity analysis and we find that our algorithm presents a small advantage compared to equivalent existing software.

Efficient $1$-bit tensor approximations

2024-10-02T17:56:32Z

We present a spatially efficient decomposition of matrices and arbitrary-order tensors as linear combinations of tensor products of $\{-1, 1\}$-valued vectors. For any matrix $A \in \mathbb{R}^{m \times n}$, $$A - R_w = S_w C_w T_w^\top = \sum_{j=1}^w c_j \cdot \mathbf{s}_j \mathbf{t}_j^\top$$ is a {\it $w$-width signed cut decomposition of $A$}. Here $C_w = "diag"(\mathbf{c}_w)$ for some $\mathbf{c}_w \in \mathbb{R}^w,$ and $S_w, T_w$, and the vectors $\mathbf{s}_j, \mathbf{t}_j$ are $\{-1, 1\}$-valued. To store $(S_w, T_w, C_w)$, we may pack $w \cdot (m + n)$ bits, and require only $w$ floating point numbers. As a function of $w$, $\|R_w\|_F$ exhibits exponential decay when applied to #f32 matrices with i.i.d. $\mathcal N (0, 1)$ entries. Choosing $w$ so that $(S_w, T_w, C_w)$ has the same memory footprint as a \textit{f16} or \textit{bf16} matrix, the relative error is comparable. Our algorithm yields efficient signed cut decompositions in $20$ lines of pseudocode. It reflects a simple modification from a celebrated 1999 paper [1] of Frieze and Kannan. As a first application, we approximate the weight matrices in the open \textit{Mistral-7B-v0.1} Large Language Model to a $50\%$ spatial compression. Remarkably, all $226$ remainder matrices have a relative error $<6\%$ and the expanded model closely matches \textit{Mistral-7B-v0.1} on the {\it huggingface} leaderboard [2]. Benchmark performance degrades slowly as we reduce the spatial compression from $50\%$ to $25\%$. We optimize our open source \textit{rust} implementation [3] with \textit{simd} instructions on \textit{avx2} and \textit{avx512} architectures. We also extend our algorithm from matrices to tensors of arbitrary order and use it to compress a picture of the first author's cat Angus.

GALÆXI: Solving complex compressible flows with high-order discontinuous Galerkin methods on accelerator-based systems

2024-10-02T08:34:36Z

This work presents GALAEXI as a novel, energy-efficient flow solver for the simulation of compressible flows on unstructured meshes leveraging the parallel computing power of modern Graphics Processing Units (GPUs). GALAEXI implements the high-order Discontinuous Galerkin Spectral Element Method (DGSEM) using shock capturing with a finite-volume subcell approach to ensure the stability of the high-order scheme near shocks. This work provides details on the general code design, the parallelization strategy, and the implementation approach for the compute kernels with a focus on the element local mappings between volume and surface data due to the unstructured mesh. GALAEXI exhibits excellent strong scaling properties up to 1024 GPUs if each GPU is assigned a minimum of one million degrees of freedom degrees of freedom. To verify its implementation, a convergence study is performed that recovers the theoretical order of convergence of the implemented numerical schemes. Moreover, the solver is validated using both the incompressible and compressible formulation of the Taylor-Green-Vortex at a Mach number of 0.1 and 1.25, respectively. A mesh convergence study shows that the results converge to the high-fidelity reference solution and that the results match the original CPU implementation. Finally, GALAEXI is applied to a large-scale wall-resolved large eddy simulation of a linear cascade of the NASA Rotor 37. Here, the supersonic region and shocks at the leading edge are captured accurately and robustly by the implemented shock-capturing approach. It is demonstrated that GALAEXI requires less than half of the energy to carry out this simulation in comparison to the reference CPU implementation. This renders GALAEXI as a potent tool for accurate and efficient simulations of compressible flows in the realm of exascale computing and the associated new HPC architectures.

BDDC Preconditioning on GPUs for Cardiac Simulations

2024-10-01T08:21:10Z

In order to understand cardiac arrhythmia, computer models for electrophysiology are essential. In the EuroHPC MicroCARD project, we adapt the current models and leverage modern computing resources to model diseased hearts and their microstructure accurately. Towards this objective, we develop a portable, highly efficient, and performing BDDC preconditioner and solver implementation, demonstrating scalability with over 90% efficiency on up to 100 GPUs.

Efficient Implementation of Interior-Point Methods for Quantum Relative Entropy

2024-09-28T04:21:26Z

Quantum Relative Entropy (QRE) programming is a recently popular and challenging class of convex optimization problems with significant applications in quantum computing and quantum information theory. We are interested in modern interior point (IP) methods based on optimal self-concordant barriers for the QRE cone. A range of theoretical and numerical challenges associated with such barrier functions and the QRE cones have hindered the scalability of IP methods. To address these challenges, we propose a series of numerical and linear algebraic techniques and heuristics aimed at enhancing the efficiency of gradient and Hessian computations for the self-concordant barrier function, solving linear systems, and performing matrix-vector products. We also introduce and deliberate about some interesting concepts related to QRE such as symmetric quantum relative entropy (SQRE). We also introduce a two-phase method for performing facial reduction that can significantly improve the performance of QRE programming. Our new techniques have been implemented in the latest version (DDS 2.2) of the software package DDS. In addition to handling QRE constraints, DDS accepts any combination of several other conic and non-conic convex constraints. Our comprehensive numerical experiments encompass several parts including 1) a comparison of DDS 2.2 with Hypatia for the nearest correlation matrix problem, 2) using DDS for combining QRE constraints with various other constraint types, and 3) calculating the key rate for quantum key distribution (QKD) channels and presenting results for several QKD protocols.