https://arxiv.org/api/ZEa9QUoWQwPyzmgkEvzHbANDVM8 2026-06-30T09:40:50Z 9958 1845 15 http://arxiv.org/abs/2502.06988v1 A Compiler for Operations on Relations with Bag Semantics 2025-02-10T19:29:15Z

We describe an abstract loop-based intermediate representation that can express fused implementations of relational algebra expressions on sets and bags (multisets). The loops are abstracted away from physical data structures thus making it easier to generate, reason about, and perform optimization like fusion on. The IR supports the natural relational algebra as well as complex operators that are used in production database systems, including outer joins, non-equi joins, and differences. We then show how to compile this IR to efficient C++ code that co-iterates over the physical data structures present in the relational algebra expression. Our approach lets us express fusion across disparate operators, leading to a 3.87x speedup (0.77--12.23x) on selected LSQB benchmarks and worst-case optimal triangle queries. We also demonstrate that our compiler generates code of high quality: it has similar sequential performance to Hyper on TPC-H with a 1.00x speedup (0.38--4.34x) and competitive parallel performance with a 0.61x speedup (0.23--1.80x). Finally, our approach is portable across data structures.

2025-02-10T19:29:15Z James Dong Fredrik Kjolstad http://arxiv.org/abs/2402.09087v2 The Vienna Architecture Description Language 2025-02-10T17:04:54Z

The Vienna Architecture Description Language (VADL) is a powerful processor description language (PDL) that enables the concise formal specification of processor architectures. By utilizing a single VADL processor specification, the VADL system exhibits the capability to automatically generate a range of artifacts necessary for rapid design space exploration. These include assemblers, compilers, linkers, functional instruction set simulators, cycle-accurate instruction set simulators, synthesizable specifications in a hardware description language, as well as test cases and documentation. One distinctive feature of VADL lies in its separation of the instruction set architecture (ISA) specification and the microarchitecture (MiA) specification. This segregation allows users the flexibility to combine various ISAs with different MiAs, providing a versatile approach to processor design. In contrast to existing PDLs, VADL's MiA specification operates at a higher level of abstraction, enhancing the clarity and simplicity of the design process. Notably, with a single ISA specification, VADL streamlines compiler generation and maintenance by eliminating the need for intricate compiler-specific knowledge. The original VADL implementation has a restricted copyright. Therefore, the open source implementation OpenVADL was started. This article introduces VADL, compares the original VADL implementation with the ongoing OpenVADL implementation, describes the generator techniques in detail and demonstrates the power of the language and the performance of the generators in an empirical evaluation. The evaluation shows the expressiveness and conciseness of VADL and the efficiency of the generated artifacts.

2024-02-14T11:08:04Z Florian Freitag Linus Halder Simon Himmelbauer Christoph Hochrainer Benedikt Huber Benjamin Kasper Niklas Mischkulnig Michael Nestler Philipp Paulweber Kevin Per Matthias Raschhofer Alexander Ripar Tobias Schwarzinger Johannes Zottele Andreas Krall http://arxiv.org/abs/2503.05703v1 What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces 2025-02-10T14:42:13Z

Code generation and understanding are critical capabilities for large language models (LLMs). Thus, most LLMs are pretrained and fine-tuned on code data. However, these datasets typically treat code as static strings and rarely exploit the dynamic information about their execution. Building upon previous work on trace modeling, we study Execution Tuning (E.T.), a training procedure in which we explicitly model real-world program execution traces without requiring manual test annotations. We train and evaluate models on different execution trace granularities (line and instruction-level) and strategies on the task of output prediction, obtaining around 80% accuracy on CruxEval and MBPP, and showing the advantages of dynamic scratchpads (i.e., self-contained intermediate computations updated by the model rather than accumulated as a history of past computations) on long executions (up to 14k steps). Finally, we discuss E.T.'s practical applications.

2025-02-10T14:42:13Z Jordi Armengol-Estapé Quentin Carbonneaux Tianjun Zhang Aram H. Markosyan Volker Seeker Chris Cummins Melanie Kambadur Michael F. P. O'Boyle Sida Wang Gabriel Synnaeve Hugh James Leather http://arxiv.org/abs/2502.06293v1 RustMC: Extending the GenMC stateless model checker to Rust 2025-02-10T09:33:24Z

RustMC is a stateless model checker that enables verification of concurrent Rust programs. As both Rust and C/C++ compile to LLVM IR, RustMC builds on GenMC which provides a verification framework for LLVM IR. This enables the automatic verification of Rust code and any C/C++ dependencies. This tool paper presents the key challenges we addressed to extend GenMC. These challenges arise from Rust's unique compilation strategy and include intercepting threading operations, handling memory intrinsics and uninitialized accesses. Through case studies adapted from real-world programs, we demonstrate RustMC's effectiveness at finding concurrency bugs stemming from unsafe Rust code, FFI calls to C/C++, and incorrect use of atomic operations.

2025-02-10T09:33:24Z 10 pages with a 2 page abstract, 5 figures with 2 figures in the abstract Oliver Pearce Julien Lange Dan O'Keeffe http://arxiv.org/abs/2411.06086v2 On Decidable and Undecidable Extensions of Simply Typed Lambda Calculus 2025-02-10T06:23:31Z

The decidability of the reachability problem for finitary PCF has been used as a theoretical basis for fully automated verification tools for functional programs. The reachability problem, however, often becomes undecidable for a slight extension of finitary PCF with side effects, such as exceptions, algebraic effects, and references, which hindered the extension of the above verification tools for supporting functional programs with side effects. In this paper, we first give simple proofs of the undecidability of four extensions of finitary PCF, which would help us understand and analyze the source of undecidability. We then focus on an extension with references, and give a decidable fragment using a type system. To our knowledge, this is the first non-trivial decidable fragment that features higher-order recursive functions containing reference cells.

2024-11-09T06:37:05Z Proc. ACM Program. Lang. 9(POPL): 1136-1166 (2025) Naoki Kobayashi 10.1145/3704875 http://arxiv.org/abs/2502.03402v2 Tensor Evolution: A Framework for Fast Evaluation of Tensor Computations using Recurrences 2025-02-07T16:07:14Z

This paper introduces a new mathematical framework for analysis and optimization of tensor expressions within an enclosing loop. Tensors are multi-dimensional arrays of values. They are common in high performance computing (HPC) and machine learning domains. Our framework extends Scalar Evolution - an important optimization pass implemented in both LLVM and GCC - to tensors. Scalar Evolution (SCEV) relies on the theory of `Chain of Recurrences' for its mathematical underpinnings. We use the same theory for Tensor Evolution (TeV). While some concepts from SCEV map easily to TeV -- e.g. element-wise operations; tensors introduce new operations such as concatenation, slicing, broadcast, reduction, and reshape which have no equivalent in scalars and SCEV. Not all computations are amenable to TeV analysis but it can play a part in the optimization and analysis parts of ML and HPC compilers. Also, for many mathematical/compiler ideas, applications may go beyond what was initially envisioned, once others build on it and take it further. We hope for a similar trajectory for the tensor-evolution concept.

2025-02-05T17:43:17Z Javed Absar Samarth Narang Muthu Baskaran http://arxiv.org/abs/2311.02103v2 Relax: Composable Abstractions for End-to-End Dynamic Machine Learning 2025-02-07T00:48:38Z

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art systems across various GPUs and enables deployment of emerging models to a broader set of emerging environments, including mobile phones, embedded devices, and web browsers.

2023-11-01T23:03:59Z To appear at ASPLOS 2025 (16 pages, 20 figures) Ruihang Lai Junru Shao Siyuan Feng Steven S. Lyubomirsky Bohan Hou Wuwei Lin Zihao Ye Hongyi Jin Yuchen Jin Jiawei Liu Lesheng Jin Yaxing Cai Ziheng Jiang Yong Wu Sunghyun Park Prakalp Srivastava Jared G. Roesch Todd C. Mowry Tianqi Chen 10.1145/3676641.3716249 http://arxiv.org/abs/2310.05551v3 Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning 2025-02-06T13:40:44Z

Deep reinforcement learning (DRL) has revolutionized quantitative trading (Q-trading) by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trends, causing them to miss good trading opportunities or suffer from large drawdowns when encountering market crashes. To address this limitation, a natural approach is to incorporate human expert knowledge in identifying market trends. Whereas, such knowledge is abstract and hard to be quantified. In order to effectively leverage abstract human expert knowledge, in this paper, we propose a universal logic-guided deep reinforcement learning framework for Q-trading, called Logic-Q. In particular, Logic-Q adopts the program synthesis by sketching paradigm and introduces a logic-guided model design that leverages a lightweight, plug-and-play market trend-aware program sketch to determine the market trend and correspondingly adjusts the DRL policy in a post-hoc manner. Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.

2023-10-09T09:20:13Z accepted by AAAI 2025 Zhiming Li Junzhe Jiang Yushi Cao Aixin Cui Bozhi Wu Bo Li Yang Liu Danny Dongning Sun http://arxiv.org/abs/2502.04063v1 A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions 2025-02-06T13:20:29Z

High-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical micro-kernels are left to specialized code generators or handwritten assembly. Even though widely-adopted compilers (e.g., LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, and general-purpose best-effort design inhibit tailored code generation for innovative hardware. We think it is time to widen the classical hourglass backend and embrace progressive lowering across a diverse set of structured abstractions to bring domain-specific code generation to compiler backends. We demonstrate this concept by implementing a custom backend for a RISC-V-based accelerator with hardware loops and streaming registers, leveraging knowledge about the hardware at levels of abstraction that match its custom ISA. We use incremental register allocation over structured IRs, while dropping classical spilling heuristics, and show up to 90% FPU utilization across key DNN kernels. By breaking the backend hourglass model, we reopen the path from domain-specific abstractions to specialized hardware.

2025-02-06T13:20:29Z Alexandre Lopoukhine Federico Ficarelli Christos Vasiladiotis Anton Lydike Josse Van Delm Alban Dutilleul Luca Benini Marian Verhelst Tobias Grosser 10.1145/3696443.3708952 http://arxiv.org/abs/2404.03515v3 Model Checking Probabilistic Operator Precedence Automata 2025-02-05T17:40:42Z

We address the problem of model checking context-free specifications for probabilistic pushdown automata, which has relevant applications in the verification of recursive probabilistic programs. Operator Precedence Languages (OPLs) are an expressive subclass of context-free languages suitable for model checking recursive programs. The derived Precedence Oriented Temporal Logic (POTL) can express fundamental OPL specifications such as pre/post-conditions and exception safety. We introduce probabilistic Operator Precedence Automata (pOPA), a class of probabilistic pushdown automata whose traces are OPLs, and study their model checking problem against POTL specifications. We identify a fragment of POTL, called POTLf$χ$, for which we develop an EXPTIME algorithm for qualitative probabilistic model checking, and an EXPSPACE algorithm for the quantitative variant. The algorithms rely on the property of separation of automata generated from POTLf$χ$ formulas. The same property allows us to employ these algorithms for model checking pOPA against Linear Temporal Logic (LTL) specifications. POTLf$χ$ is then the first context-free logic for which an optimal probabilistic model checking algorithm has been developed, matching its EXPTIME lower bound in complexity. In comparison, the best known algorithm for probabilistic model checking of CaRet, a prominent temporal logic based on Visibly Pushdown Languages (VPL), is doubly exponential.

2024-04-04T15:21:40Z 37 pages, 9 figures Francesco Pontiggia Ezio Bartocci Michele Chiari http://arxiv.org/abs/2502.03271v1 TYPEPULSE: Detecting Type Confusion Bugs in Rust Programs 2025-02-05T15:27:08Z

Rust supports type conversions and safe Rust guarantees the security of these conversions through robust static type checking and strict ownership guidelines. However, there are instances where programmers need to use unsafe Rust for certain type conversions, especially those involving pointers. Consequently, these conversions may cause severe memory corruption problems. Despite extensive research on type confusion bugs in C/C++, studies on type confusion bugs in Rust are still lacking. Also, due to Rust's new features in the type system, existing solutions in C/C++ cannot be directly applied to Rust. In this paper, we develop a static analysis tool called TYPEPULSE to detect three main categories of type confusion bugs in Rust including misalignment, inconsistent layout, and mismatched scope. TYPEPULSE first performs a type conversion analysis to collect and determine trait bounds for type pairs. Moreover, it performs a pointer alias analysis to resolve the alias relationship of pointers. Following the integration of information into the property graph, it constructs type patterns and detects each type of bug in various conversion scenarios. We run TYPEPULSE on the top 3,000 Rust packages and uncover 71 new type confusion bugs, exceeding the total number of type confusion bugs reported in RUSTSEC over the past five years. We have received 32 confirmations from developers, along with one CVE ID and six RUSTSEC IDs.

2025-02-05T15:27:08Z To Appear in the 34th USENIX Security Symposium, August 13-15, 2025 Hung-Mao Chen Xu He Shu Wang Xiaokuan Zhang Kun Sun http://arxiv.org/abs/2502.00949v1 A domain-theoretic framework for conditional probability and Bayesian updating in programming 2025-02-02T22:46:26Z

We present a domain-theoretic framework for probabilistic programming that provides a constructive definition of conditional probability and addresses computability challenges previously identified in the literature. We introduce a novel approach based on an observable notion of events that enables computability. We examine two methods for computing conditional probabilities -- one using conditional density functions and another using trace sampling with rejection -- and prove they yield consistent results within our framework. We implement these ideas in a simple probabilistic functional language with primitives for sampling and evaluation, providing both operational and denotational semantics and proving their consistency. Our work provides a rigorous foundation for implementing conditional probability in probabilistic programming languages.

2025-02-02T22:46:26Z 17 pages Pietro Di Gianantonio Abbas Edalat http://arxiv.org/abs/2409.12013v2 Memory Consistency and Program Transformations 2025-02-01T22:08:32Z

A memory consistency model specifies the allowed behaviors of shared memory concurrent programs. At the language level, these models are known to have a non-trivial impact on the safety of program optimizations, limiting the ability to rearrange/refactor code without introducing new behaviors. Existing programming language memory models try to address this by permitting more (relaxed/weak) concurrent behaviors but are still unable to allow all the desired optimizations. A core problem is that weaker consistency models may also render optimizations unsafe, a conclusion that goes against the intuition of them allowing more behaviors. This exposes an open problem of the compositional interaction between memory consistency semantics and optimizations: which parts of the semantics correspond to allowing/disallowing which set of optimizations is unclear. In this work, we establish a formal foundation suitable enough to understand this compositional nature, decomposing optimizations into a finite set of elementary effects on program execution traces, over which aspects of safety can be assessed. We use this decomposition to identify a desirable compositional property (complete) that would guarantee the safety of optimizations from one memory model to another. We showcase its practicality by proving such a property between Sequential Consistency (SC) and $SC_{RR}$, the latter allowing independent read-read reordering over $SC$. Our work potentially paves way to a new design methodology of programming-language memory models, one that places emphasis on the optimizations desired to be performed.

2024-09-18T14:28:19Z Akshay Gopalakrishnan McGill University Clark Verbrugge McGill University Mark Batty University of Kent 10.1145/3721143 http://arxiv.org/abs/2502.00222v1 The Free Termination Property of Queries Over Time 2025-01-31T23:28:34Z

Building on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soundness questions studied in the CALM literature. We also develop a new model based on semiautomata that allows us to bridge from the relational transducer model of the CALM papers to algebraic models that are popular among software engineers (e.g. CRDTs) and of increasing interest to database theory for datalog extensions and incremental view maintenance.

2025-01-31T23:28:34Z Conor Power Paraschos Koutris Joseph M Hellerstein http://arxiv.org/abs/2501.19150v1 SkipFlow: Improving the Precision of Points-to Analysis using Primitive Values and Predicate Edges 2025-01-31T14:07:05Z

A typical points-to analysis such as Andersen's or Steensgaard's may lose precision because it ignores the branching structure of the analyzed program. Moreover, points-to analysis typically focuses on objects only, not considering instructions manipulating primitive values. We argue that such an approach leads to an unnecessary precision loss, for example, when primitive constants true and false flow out of method calls. We propose a novel lightweight points-to analysis called SkipFlow that interprocedurally tracks the flow of both primitives and objects, and explicitly captures the branching structure of the code using predicate edges. At the same time, however, SkipFlow is as lightweight and scalable as possible, unlike a traditional flow-sensitive analysis. We apply SkipFlow to GraalVM Native Image, a closed-world solution to building standalone binaries for Java applications. We evaluate the implementation using a set of microservice applications as well as well-known benchmark suites. We show that SkipFlow reduces the size of the application in terms of reachable methods by 9% on average without significantly increasing the analysis time.

2025-01-31T14:07:05Z 19 pages including appendix, published at CGO'25 David Kozak Codrut Stancu Tomáš Vojnar Christian Wimmer