https://arxiv.org/api/k08f+nq0mHanC22CQCz8iWim6uI 2026-06-22T18:28:39Z 2664 510 15 http://arxiv.org/abs/2406.10271v1 Enhancing non-Perl bioinformatic applications with Perl: Building novel, component based applications using Object Orientation, PDL, Alien, FFI, Inline and OpenMP 2024-06-11T18:32:50Z

Component-Based Software Engineering (CBSE) is a methodology that assembles pre-existing, re-usable software components into new applications, which is particularly relevant for fast moving, data-intensive fields such as bioinformatics. While Perl was used extensively in this field until a decade ago, more recent applications opt for a Bioconductor/R or Python. This trend represents a significantly missed opportunity for the rapid generation of novel bioinformatic applications out of pre-existing components since Perl offers a variety of abstractions that can facilitate composition. In this paper, we illustrate the utility of Perl for CBSE through a combination of Object Oriented frameworks, the Perl Data Language and facilities for interfacing with non-Perl code through Foreign Function Interfaces and inlining of foreign source code. To do so, we enhance Polyester, a RNA sequencing simulator written in R, and edlib a fast sequence similarity search library based on the edit distance. The first case study illustrates the near effortless authoring of new, highly performant Perl modules for the simulation of random numbers using the GNU Scientific Library and PDL, and proposes Perl and Perl/C alternatives to the Python tool cutadapt that is used to "trim" polyA tails from biological sequences. For the edlib case, we leverage the power of metaclass programming to endow edlib with coarse, process based parallelism, through the Many Core Engine (MCE) module and fine grained parallelism through OpenMP, a C/C++/Fortran Application Programming Interface for shared memory multithreaded processing. These use cases provide proof-of-concept for the Bio::SeqAlignment framework, which can organize heterogeneous components in complex memory and command-line based workflows for the construction of novel bionformatic tools to analyze data from long-read sequencing, e.g. Nanopore, sequencing platforms.

2024-06-11T18:32:50Z 36 pages, 8 figures Christos Argyropoulos http://arxiv.org/abs/2406.05577v1 Flexible Multi-Dimensional FFTs for Plane Wave Density Functional Theory Codes 2024-06-08T21:17:35Z

Multi-dimensional Fourier transforms are key mathematical building blocks that appear in a wide range of applications from materials science, physics, chemistry and even machine learning. Over the past years, a multitude of software packages targeting distributed multi-dimensional Fourier transforms have been developed. Most variants attempt to offer efficient implementations for single transforms applied on data mapped onto rectangular grids. However, not all scientific applications conform to this pattern, i.e. plane wave Density Functional Theory codes require multi-dimensional Fourier transforms applied on data represented as batches of spheres. Typically, the implementations for this use case are hand-coded and tailored for the requirements of each application. In this work, we present the Fastest Fourier Transform from Berkeley (FFTB) a distributed framework that offers flexible implementations for both regular/non-regular data grids and batched/non-batched transforms. We provide a flexible implementations with a user-friendly API that captures most of the use cases. Furthermore, we provide implementations for both CPU and GPU platforms, showing that our approach offers improved execution time and scalability on the HP Cray EX supercomputer. In addition, we outline the need for flexible implementations for different use cases of the software package.

2024-06-08T21:17:35Z 17 pages, 9 figures Doru Thom Popovici Mauro del Ben Osni Marques Andrew Canning http://arxiv.org/abs/2202.06297v3 Faster Gröbner bases for Lie derivatives of ODE systems via monomial orderings 2024-06-06T21:18:53Z

Symbolic computation for systems of differential equations is often computationally expensive. Many practical differential models have a form of polynomial or rational ODE system with specified outputs. A basic symbolic approach to analyze these models is to compute and then symbolically process the polynomial system obtained by sufficiently many Lie derivatives of the output functions with respect to the vector field given by the ODE system. In this paper, we present a method for speeding up Gröbner basis computation for such a class of polynomial systems by using specific monomial ordering, including weights for the variables, coming from the structure of the ODE model. We provide empirical results that show improvement across different symbolic computing frameworks and apply the method to speed up structural identifiability analysis of ODE models.

2022-02-13T12:40:11Z Mariya Bessonov Ilia Ilmer Tatiana Konstantinova Alexey Ovchinnikov Gleb Pogudin Pedro Soto 10.1145/3666000.3669695 http://arxiv.org/abs/2407.07096v1 Spectral Toolkit of Algorithms for Graphs: Technical Report (2) 2024-06-06T15:32:37Z

Spectral Toolkit of Algorithms for Graphs (STAG) is an open-source library for efficient graph algorithms. This technical report presents the newly implemented component on locality sensitive hashing, kernel density estimation, and fast spectral clustering. The report includes a user's guide to the newly implemented algorithms, experiments and demonstrations of the new functionality, and several technical considerations behind our development.

2024-06-06T15:32:37Z The first STAG report is available at arXiv:2304.03170 Peter Macgregor He Sun http://arxiv.org/abs/2406.00065v1 Parallel Redundancy Removal in lrslib with Application to Projections 2024-05-30T11:24:11Z

We describe a parallel implementation in lrslib for removing redundant halfspaces and finding a minimum representation for an H-representation of a convex polyhedron. By a standard transformation, the same code works for V-representations. We use this approach to speed up the redundancy removal step in Fourier-Motzkin elimination. Computational results are given including a comparison with Clarkson's algorithm, which is particularly fast on highly redundant inputs.

2024-05-30T11:24:11Z David Avis Charles Jordan http://arxiv.org/abs/2405.18966v1 svds-C: A Multi-Thread C Code for Computing Truncated Singular Value Decomposition 2024-05-29T10:24:56Z

This article presents svds-C, an open-source and high-performance C program for accurately and robustly computing truncated SVD, e.g. computing several largest singular values and corresponding singular vectors. We have re-implemented the algorithm of svds in Matlab in C based on MKL or OpenBLAS and multi-thread computing to obtain the parallel program named svds-C. svds-C running on shared-memory computer consumes less time and memory than svds thanks to careful implementation of multi-thread parallelization and memory management. Numerical experiments on different test cases which are synthetically generated or directly from real world datasets show that, svds-C runs remarkably faster than svds with averagely 4.7X and at most 12X speedup for 16-thread parallel computing on a computer with Intel CPU, while preserving same accuracy and consuming about half memory space. Experimental results also demonstrate that svds-C has similar advantages over svds on the computer with AMD CPU, and outperforms other state-of-the-art algorithms for truncated SVD on computing time and robustness.

2024-05-29T10:24:56Z 20 pages, accepted by SoftwareX Xu Feng Wenjian Yu Yuyang Xie http://arxiv.org/abs/2406.02579v1 An Open-Source Framework for Efficient Numerically-Tailored Computations 2024-05-29T10:10:53Z

We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic datapath generation, enabling highly customizable systolic MMM kernels; second, seamless integration of the generated kernels into user code, irrespective of the programming language employed, without necessitating modifications. The framework demonstrates a systematic enhancement in accuracy per energy cost across diverse High Performance Computing (HPC) workloads displaying a variety of numerical requirements, such as Artificial Intelligence (AI) inference and Sea Surface Height (SSH) computation. For AI inference, we consider a set of state-of-the-art neural network models, namely ResNet18, ResNet34, ResNet50, DenseNet121, DenseNet161, DenseNet169, and VGG11, in conjunction with two datasets, two computer formats, and 27 distinct intermediate arithmetic datapaths. Our approach consistently reduces energy consumption across all cases, with a notable example being the reduction by factors of $3.3\times$ for IEEE754-32 and $1.4\times$ for Bfloat16 during ImageNet inference with ResNet50. This is accomplished while maintaining accuracies of $82.3\%$ and $86\%$, comparable to those achieved with conventional Floating-Point Units (FPUs). In the context of SSH computation, our method achieves fully-reproducible results using double-precision words, surpassing the accuracy of conventional double- and quad-precision arithmetic in FPUs. Our approach enhances SSH computation accuracy by a minimum of $5\times$ and $27\times$ compared to IEEE754-64 and IEEE754-128, respectively, resulting in $5.6\times$ and $15.1\times$ improvements in accuracy per power cost.

2024-05-29T10:10:53Z 6 pages, open-source International Conference on Field Programmable Logic and Applications 2023 Louis Ledoux Marc Casas 10.1109/FPL60245.2023.00011 http://arxiv.org/abs/2312.02113v2 A Framework for Symmetric Self-Intersecting Surfaces 2024-05-27T14:24:00Z

3D printing of surfaces has become an established method for prototyping and visualisation. However, surfaces often contain certain degenerations, such as self-intersecting faces or non-manifold parts, which pose problems in obtaining a 3D printable file. Therefore, it is necessary to examine these degenerations beforehand. Surfaces in three-dimensional space can be represented as embedded simplicial complexes describing a triangulation of the surface. We use this combinatorial description, and the notion of embedded simplicial surfaces (which can be understood as well-behaved surfaces) to give a framework for obtaining 3D printable files. This provides a new perspective on self-intersecting triangulated surfaces in three-dimensional space. Our method first retriangulates a surface using a minimal number of triangles, then computes its outer hull, and finally treats non-manifold parts. To this end, we prove an initialisation criterion for the computation of the outer hull. We also show how symmetry properties can be used to simplify computations. Implementations of the proposed algorithms are given in the computer algebra system GAP4. To verify our methods, we use a dataset of self-intersecting symmetric icosahedra. Exploiting the symmetry of the underlying embedded complex leads to a notable speed-up and enhanced numerical robustness when computing a retriangulation, compared to methods that do not take advantage of symmetry.

2023-12-04T18:46:42Z Updated introduction and added more details and examples Christian Amend Tom Goertzen http://arxiv.org/abs/2402.04711v3 High-dimensional multidisciplinary design optimization for aircraft eco-design / Optimisation multi-disciplinaire en grande dimension pour l'éco-conception avion en avant-projet 2024-05-26T20:02:59Z

The objective of this Philosophiae Doctor (Ph.D) thesis is to propose an efficient approach for optimizing a multidisciplinary black-box model when the optimization problem is constrained and involves a large number of mixed integer design variables (typically 100 variables). The targeted optimization approach, called EGO, is based on a sequential enrichment of an adaptive surrogate model and, in this context, GP surrogate models are one of the most widely used in engineering problems to approximate time-consuming high fidelity models. EGO is a heuristic BO method that performs well in terms of solution quality. However, like any other global optimization method, EGO suffers from the curse of dimensionality, meaning that its performance is satisfactory on lower dimensional problems, but deteriorates as the dimensionality of the optimization search space increases. For realistic aircraft design problems, the typical size of the design variables can even exceed 100 and, thus, trying to solve directly the problems using EGO is ruled out. The latter is especially true when the problems involve both continuous and categorical variables increasing even more the size of the search space. In this Ph.D thesis, effective parameterization tools are investigated, including techniques like partial least squares regression, to significantly reduce the number of design variables. Additionally, Bayesian optimization is adapted to handle discrete variables and high-dimensional spaces in order to reduce the number of evaluations when optimizing innovative aircraft concepts such as the "DRAGON" hybrid airplane to reduce their climate impact.

2024-02-07T09:58:52Z PhD Thesis, Université de Toulouse, Toulouse, 2024 on Gaussian Process kernels for Bayesian optimization in high dimension with mixed and hierarchical variables at ISAE-SUPAERO. Keywords: Gaussian process, Black-box optimization, Bayesian inference, Multidisciplinary design optimization, Mixed hierarchical and categorical inputs, Eco-friendly aircraft design Paul Saves http://arxiv.org/abs/2405.16618v1 An efficient optimization model and tabu search-based global optimization approach for continuous p-dispersion problem 2024-05-26T16:25:55Z

Continuous p-dispersion problems with and without boundary constraints are NP-hard optimization problems with numerous real-world applications, notably in facility location and circle packing, which are widely studied in mathematics and operations research. In this work, we concentrate on general cases with a non-convex multiply-connected region that are rarely studied in the literature due to their intractability and the absence of an efficient optimization model. Using the penalty function approach, we design a unified and almost everywhere differentiable optimization model for these complex problems and propose a tabu search-based global optimization (TSGO) algorithm for solving them. Computational results over a variety of benchmark instances show that the proposed model works very well, allowing popular local optimization methods (e.g., the quasi-Newton methods and the conjugate gradient methods) to reach high-precision solutions due to the differentiability of the model. These results further demonstrate that the proposed TSGO algorithm is very efficient and significantly outperforms several popular global optimization algorithms in the literature, improving the best-known solutions for several existing instances in a short computational time. Experimental analyses are conducted to show the influence of several key ingredients of the algorithm on computational performance.

2024-05-26T16:25:55Z Xiangjing Lai Zhenheng Lin Jin-Kao Hao Qinghua Wu http://arxiv.org/abs/2405.14321v2 An 808 Line Phasor-Based Dehomogenisation Matlab Code For Multi-Scale Topology Optimisation 2024-05-24T06:46:31Z

This work presents an 808-line Matlab educational code for combined multi-scale topology optimisation and phasor-based dehomogenisation titled deHomTop808. The multi-scale formulation utilises homogenisation of optimal microstructures to facilitate efficient coarse-scale optimisation. Dehomogenisation allows for a high-resolution single-scale reconstruction of the optimised multi-scale structure, achieving minor losses in structural performance, at a fraction of the computational cost, compared to its large-scale topology optimisation counterpart. The presented code utilises stiffness optimal Rank-2 microstructures to minimise the compliance of a single-load case problem, subject to a volume fraction constraint. By exploiting the inherent efficiency benefits of the phasor-based dehomogenisation procedure, on-the-fly dehomogenisation to a single-scale structure is obtained. The presented code includes procedures for structural verification of the final dehomogenised structure by comparison to the multi-scale solution. The code is introduced in terms of the underlying theory and its major components, including examples and potential extensions, and can be downloaded from https://github.com/peterdorffler/deHomTop808.git.

2024-05-23T08:53:24Z Rebekka Varum Woldseth Ole Sigmund Peter Dørffler Ladegaard Jensen http://arxiv.org/abs/2405.14642v1 GPU Implementations for Midsize Integer Addition and Multiplication 2024-05-23T14:44:49Z

This paper explores practical aspects of using a high-level functional language for GPU-based arithmetic on ``midsize'' integers. By this we mean integers of up to about a quarter million bits, which is sufficient for most practical purposes. The goal is to understand whether it is possible to support efficient nested-parallel programs with a small, flexible code base. We report on GPU implementations for addition and multiplication of integers that fit in one CUDA block, thus leveraging temporal reuse from scratchpad memories. Our key contribution resides in the simplicity of the proposed solutions: We recognize that addition is a straightforward application of scan, which is known to allow efficient GPU implementation. For quadratic multiplication we employ a simple work-partitioning strategy that offers good temporal locality. For FFT multiplication, we efficiently map the computation in the domain of integral fields by finding ``good'' primes that enable almost-full utilization of machine words. In comparison, related work uses complex tiling strategies -- which feel too big a hammer for the job -- or uses the computational domain of reals, which may degrade the magnitude of the base in which the computation is carried. We evaluate the performance in comparison to the state-of-the-art CGBN library, authored by NvidiaLab, and report that our CUDA prototype outperforms CGBN for integer sizes higher than 32K bits, while offering comparable performance for smaller sizes. Moreover, we are, to our knowledge, the first to report that FFT multiplication outperforms the classical one on the larger sizes that still fit in a CUDA block. Finally, we examine Futhark's strengths and weaknesses for efficiently supporting such computations and find out that a compiler pass aimed at efficient sequentialization of excess parallelism would significantly improve performance.

2024-05-23T14:44:49Z Cosmin E. Oancea Stephen M. Watt http://arxiv.org/abs/2404.06389v2 Raster Forge: Interactive Raster Manipulation Library and GUI for Python 2024-05-19T16:52:01Z

Raster Forge is a Python library and graphical user interface for raster data manipulation and analysis. The tool is focused on remote sensing applications, particularly in wildfire management. It allows users to import, visualize, and process raster layers for tasks such as image compositing or topographical analysis. For wildfire management, it generates fuel maps using predefined models. Its impact extends from disaster management to hydrological modeling, agriculture, and environmental monitoring. Raster Forge can be a valuable asset for geoscientists and researchers who rely on raster data analysis, enhancing geospatial data processing and visualization across various disciplines.

2024-04-09T15:31:48Z Software Impacts, 20, 100657, 2024 Afonso Oliveira Nuno Fachada João P. Matos-Carvalho 10.1016/j.simpa.2024.100657 http://arxiv.org/abs/2405.11065v1 Enabling mixed-precision with the help of tools: A Nekbone case study 2024-05-17T19:42:10Z

Mixed-precision computing has the potential to significantly reduce the cost of exascale computations, but determining when and how to implement it in programs can be challenging. In this article, we consider Nekbone, a mini-application for the CFD solver Nek5000, as a case study, and propose a methodology for enabling mixed-precision with the help of computer arithmetic tools and roofline model. We evaluate the derived mixed-precision program by combining metrics in three dimensions: accuracy, time-to-solution, and energy-to-solution. Notably, the introduction of mixed-precision in Nekbone, reducing time-to-solution by 40.7% and energy-to-solution by 47% on 128 MPI ranks.

2024-05-17T19:42:10Z Yanxiang Chen Pablo de Oliveira Castro Paolo Bientinesi Roman Iakymchuk http://arxiv.org/abs/2308.16731v2 An Efficient Framework for Global Non-Convex Polynomial Optimization over the Hypercube 2024-05-16T17:31:19Z

We present a novel efficient theoretical and numerical framework for solving global non-convex polynomial optimization problems. We analytically demonstrate that such problems can be efficiently reformulated using a non-linear objective over a convex set; further, these reformulated problems possess no spurious local minima (i.e., every local minimum is a global minimum). We introduce an algorithm for solving these resulting problems using the augmented Lagrangian and the method of Burer and Monteiro. We show through numerical experiments that polynomial scaling in dimension and degree is achievable for computing the optimal value and location of previously intractable global polynomial optimization problems in high dimension.

2023-08-31T13:49:47Z Pierre-David Letourneau Dalton Jones Matthew Morse M. Harper Langston