https://arxiv.org/api/ZEa9QUoWQwPyzmgkEvzHbANDVM82026-06-30T09:40:50Z9958184515http://arxiv.org/abs/2502.06988v1A Compiler for Operations on Relations with Bag Semantics2025-02-10T19:29:15ZWe describe an abstract loop-based intermediate representation that can express fused implementations of relational algebra expressions on sets and bags (multisets). The loops are abstracted away from physical data structures thus making it easier to generate, reason about, and perform optimization like fusion on. The IR supports the natural relational algebra as well as complex operators that are used in production database systems, including outer joins, non-equi joins, and differences. We then show how to compile this IR to efficient C++ code that co-iterates over the physical data structures present in the relational algebra expression. Our approach lets us express fusion across disparate operators, leading to a 3.87x speedup (0.77--12.23x) on selected LSQB benchmarks and worst-case optimal triangle queries. We also demonstrate that our compiler generates code of high quality: it has similar sequential performance to Hyper on TPC-H with a 1.00x speedup (0.38--4.34x) and competitive parallel performance with a 0.61x speedup (0.23--1.80x). Finally, our approach is portable across data structures.2025-02-10T19:29:15ZJames DongFredrik Kjolstadhttp://arxiv.org/abs/2402.09087v2The Vienna Architecture Description Language2025-02-10T17:04:54ZThe Vienna Architecture Description Language (VADL) is a powerful processor description language (PDL) that enables the concise formal specification of processor architectures. By utilizing a single VADL processor specification, the VADL system exhibits the capability to automatically generate a range of artifacts necessary for rapid design space exploration. These include assemblers, compilers, linkers, functional instruction set simulators, cycle-accurate instruction set simulators, synthesizable specifications in a hardware description language, as well as test cases and documentation. One distinctive feature of VADL lies in its separation of the instruction set architecture (ISA) specification and the microarchitecture (MiA) specification. This segregation allows users the flexibility to combine various ISAs with different MiAs, providing a versatile approach to processor design. In contrast to existing PDLs, VADL's MiA specification operates at a higher level of abstraction, enhancing the clarity and simplicity of the design process. Notably, with a single ISA specification, VADL streamlines compiler generation and maintenance by eliminating the need for intricate compiler-specific knowledge. The original VADL implementation has a restricted copyright. Therefore, the open source implementation OpenVADL was started. This article introduces VADL, compares the original VADL implementation with the ongoing OpenVADL implementation, describes the generator techniques in detail and demonstrates the power of the language and the performance of the generators in an empirical evaluation. The evaluation shows the expressiveness and conciseness of VADL and the efficiency of the generated artifacts.2024-02-14T11:08:04ZFlorian FreitagLinus HalderSimon HimmelbauerChristoph HochrainerBenedikt HuberBenjamin KasperNiklas MischkulnigMichael NestlerPhilipp PaulweberKevin PerMatthias RaschhoferAlexander RiparTobias SchwarzingerJohannes ZotteleAndreas Krallhttp://arxiv.org/abs/2503.05703v1What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces2025-02-10T14:42:13ZCode generation and understanding are critical capabilities for large language models (LLMs). Thus, most LLMs are pretrained and fine-tuned on code data. However, these datasets typically treat code as static strings and rarely exploit the dynamic information about their execution. Building upon previous work on trace modeling, we study Execution Tuning (E.T.), a training procedure in which we explicitly model real-world program execution traces without requiring manual test annotations. We train and evaluate models on different execution trace granularities (line and instruction-level) and strategies on the task of output prediction, obtaining around 80% accuracy on CruxEval and MBPP, and showing the advantages of dynamic scratchpads (i.e., self-contained intermediate computations updated by the model rather than accumulated as a history of past computations) on long executions (up to 14k steps). Finally, we discuss E.T.'s practical applications.2025-02-10T14:42:13ZJordi Armengol-EstapéQuentin CarbonneauxTianjun ZhangAram H. MarkosyanVolker SeekerChris CumminsMelanie KambadurMichael F. P. O'BoyleSida WangGabriel SynnaeveHugh James Leatherhttp://arxiv.org/abs/2502.06293v1RustMC: Extending the GenMC stateless model checker to Rust2025-02-10T09:33:24ZRustMC is a stateless model checker that enables verification of concurrent Rust programs. As both Rust and C/C++ compile to LLVM IR, RustMC builds on GenMC which provides a verification framework for LLVM IR. This enables the automatic verification of Rust code and any C/C++ dependencies. This tool paper presents the key challenges we addressed to extend GenMC. These challenges arise from Rust's unique compilation strategy and include intercepting threading operations, handling memory intrinsics and uninitialized accesses. Through case studies adapted from real-world programs, we demonstrate RustMC's effectiveness at finding concurrency bugs stemming from unsafe Rust code, FFI calls to C/C++, and incorrect use of atomic operations.2025-02-10T09:33:24Z10 pages with a 2 page abstract, 5 figures with 2 figures in the abstractOliver PearceJulien LangeDan O'Keeffehttp://arxiv.org/abs/2411.06086v2On Decidable and Undecidable Extensions of Simply Typed Lambda Calculus2025-02-10T06:23:31ZThe decidability of the reachability problem for finitary PCF has been used as a theoretical basis for fully automated verification tools for functional programs. The reachability problem, however, often becomes undecidable for a slight extension of finitary PCF with side effects, such as exceptions, algebraic effects, and references, which hindered the extension of the above verification tools for supporting functional programs with side effects. In this paper, we first give simple proofs of the undecidability of four extensions of finitary PCF, which would help us understand and analyze the source of undecidability. We then focus on an extension with references, and give a decidable fragment using a type system. To our knowledge, this is the first non-trivial decidable fragment that features higher-order recursive functions containing reference cells.2024-11-09T06:37:05ZProc. ACM Program. Lang. 9(POPL): 1136-1166 (2025)Naoki Kobayashi10.1145/3704875http://arxiv.org/abs/2502.03402v2Tensor Evolution: A Framework for Fast Evaluation of Tensor Computations using Recurrences2025-02-07T16:07:14ZThis paper introduces a new mathematical framework for analysis and optimization of tensor expressions within an enclosing loop. Tensors are multi-dimensional arrays of values. They are common in high performance computing (HPC) and machine learning domains. Our framework extends Scalar Evolution - an important optimization pass implemented in both LLVM and GCC - to tensors. Scalar Evolution (SCEV) relies on the theory of `Chain of Recurrences' for its mathematical underpinnings. We use the same theory for Tensor Evolution (TeV). While some concepts from SCEV map easily to TeV -- e.g. element-wise operations; tensors introduce new operations such as concatenation, slicing, broadcast, reduction, and reshape which have no equivalent in scalars and SCEV. Not all computations are amenable to TeV analysis but it can play a part in the optimization and analysis parts of ML and HPC compilers. Also, for many mathematical/compiler ideas, applications may go beyond what was initially envisioned, once others build on it and take it further. We hope for a similar trajectory for the tensor-evolution concept.2025-02-05T17:43:17ZJaved AbsarSamarth NarangMuthu Baskaranhttp://arxiv.org/abs/2311.02103v2Relax: Composable Abstractions for End-to-End Dynamic Machine Learning2025-02-07T00:48:38ZDynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art systems across various GPUs and enables deployment of emerging models to a broader set of emerging environments, including mobile phones, embedded devices, and web browsers.2023-11-01T23:03:59ZTo appear at ASPLOS 2025 (16 pages, 20 figures)Ruihang LaiJunru ShaoSiyuan FengSteven S. LyubomirskyBohan HouWuwei LinZihao YeHongyi JinYuchen JinJiawei LiuLesheng JinYaxing CaiZiheng JiangYong WuSunghyun ParkPrakalp SrivastavaJared G. RoeschTodd C. MowryTianqi Chen10.1145/3676641.3716249http://arxiv.org/abs/2310.05551v3Logic-Q: Improving Deep Reinforcement Learning-based Quantitative Trading via Program Sketch-based Tuning2025-02-06T13:40:44ZDeep reinforcement learning (DRL) has revolutionized quantitative trading (Q-trading) by achieving decent performance without significant human expert knowledge. Despite its achievements, we observe that the current state-of-the-art DRL models are still ineffective in identifying the market trends, causing them to miss good trading opportunities or suffer from large drawdowns when encountering market crashes. To address this limitation, a natural approach is to incorporate human expert knowledge in identifying market trends. Whereas, such knowledge is abstract and hard to be quantified. In order to effectively leverage abstract human expert knowledge, in this paper, we propose a universal logic-guided deep reinforcement learning framework for Q-trading, called Logic-Q. In particular, Logic-Q adopts the program synthesis by sketching paradigm and introduces a logic-guided model design that leverages a lightweight, plug-and-play market trend-aware program sketch to determine the market trend and correspondingly adjusts the DRL policy in a post-hoc manner. Extensive evaluations of two popular quantitative trading tasks demonstrate that Logic-Q can significantly improve the performance of previous state-of-the-art DRL trading strategies.2023-10-09T09:20:13Zaccepted by AAAI 2025Zhiming LiJunzhe JiangYushi CaoAixin CuiBozhi WuBo LiYang LiuDanny Dongning Sunhttp://arxiv.org/abs/2502.04063v1A Multi-level Compiler Backend for Accelerated Micro-kernels Targeting RISC-V ISA Extensions2025-02-06T13:20:29ZHigh-performance micro-kernels must fully exploit today's diverse and specialized hardware to deliver peak performance to DNNs. While higher-level optimizations for DNNs are offered by numerous compilers (e.g., MLIR, TVM, OpenXLA), performance-critical micro-kernels are left to specialized code generators or handwritten assembly. Even though widely-adopted compilers (e.g., LLVM, GCC) offer tuned backends, their CPU-focused input abstraction, unstructured IR, and general-purpose best-effort design inhibit tailored code generation for innovative hardware. We think it is time to widen the classical hourglass backend and embrace progressive lowering across a diverse set of structured abstractions to bring domain-specific code generation to compiler backends. We demonstrate this concept by implementing a custom backend for a RISC-V-based accelerator with hardware loops and streaming registers, leveraging knowledge about the hardware at levels of abstraction that match its custom ISA. We use incremental register allocation over structured IRs, while dropping classical spilling heuristics, and show up to 90% FPU utilization across key DNN kernels. By breaking the backend hourglass model, we reopen the path from domain-specific abstractions to specialized hardware.2025-02-06T13:20:29ZAlexandre LopoukhineFederico FicarelliChristos VasiladiotisAnton LydikeJosse Van DelmAlban DutilleulLuca BeniniMarian VerhelstTobias Grosser10.1145/3696443.3708952http://arxiv.org/abs/2404.03515v3Model Checking Probabilistic Operator Precedence Automata2025-02-05T17:40:42ZWe address the problem of model checking context-free specifications for probabilistic pushdown automata, which has relevant applications in the verification of recursive probabilistic programs. Operator Precedence Languages (OPLs) are an expressive subclass of context-free languages suitable for model checking recursive programs. The derived Precedence Oriented Temporal Logic (POTL) can express fundamental OPL specifications such as pre/post-conditions and exception safety.
We introduce probabilistic Operator Precedence Automata (pOPA), a class of probabilistic pushdown automata whose traces are OPLs, and study their model checking problem against POTL specifications. We identify a fragment of POTL, called POTLf$χ$, for which we develop an EXPTIME algorithm for qualitative probabilistic model checking, and an EXPSPACE algorithm for the quantitative variant. The algorithms rely on the property of separation of automata generated from POTLf$χ$ formulas. The same property allows us to employ these algorithms for model checking pOPA against Linear Temporal Logic (LTL) specifications. POTLf$χ$ is then the first context-free logic for which an optimal probabilistic model checking algorithm has been developed, matching its EXPTIME lower bound in complexity. In comparison, the best known algorithm for probabilistic model checking of CaRet, a prominent temporal logic based on Visibly Pushdown Languages (VPL), is doubly exponential.2024-04-04T15:21:40Z37 pages, 9 figuresFrancesco PontiggiaEzio BartocciMichele Chiarihttp://arxiv.org/abs/2502.03271v1TYPEPULSE: Detecting Type Confusion Bugs in Rust Programs2025-02-05T15:27:08ZRust supports type conversions and safe Rust guarantees the security of these conversions through robust static type checking and strict ownership guidelines. However, there are instances where programmers need to use unsafe Rust for certain type conversions, especially those involving pointers. Consequently, these conversions may cause severe memory corruption problems. Despite extensive research on type confusion bugs in C/C++, studies on type confusion bugs in Rust are still lacking. Also, due to Rust's new features in the type system, existing solutions in C/C++ cannot be directly applied to Rust. In this paper, we develop a static analysis tool called TYPEPULSE to detect three main categories of type confusion bugs in Rust including misalignment, inconsistent layout, and mismatched scope. TYPEPULSE first performs a type conversion analysis to collect and determine trait bounds for type pairs. Moreover, it performs a pointer alias analysis to resolve the alias relationship of pointers. Following the integration of information into the property graph, it constructs type patterns and detects each type of bug in various conversion scenarios. We run TYPEPULSE on the top 3,000 Rust packages and uncover 71 new type confusion bugs, exceeding the total number of type confusion bugs reported in RUSTSEC over the past five years. We have received 32 confirmations from developers, along with one CVE ID and six RUSTSEC IDs.2025-02-05T15:27:08ZTo Appear in the 34th USENIX Security Symposium, August 13-15, 2025Hung-Mao ChenXu HeShu WangXiaokuan ZhangKun Sunhttp://arxiv.org/abs/2502.00949v1A domain-theoretic framework for conditional probability and Bayesian updating in programming2025-02-02T22:46:26ZWe present a domain-theoretic framework for probabilistic programming that provides a constructive definition of conditional probability and addresses computability challenges previously identified in the literature. We introduce a novel approach based on an observable notion of events that enables computability. We examine two methods for computing conditional probabilities -- one using conditional density functions and another using trace sampling with rejection -- and prove they yield consistent results within our framework. We implement these ideas in a simple probabilistic functional language with primitives for sampling and evaluation, providing both operational and denotational semantics and proving their consistency. Our work provides a rigorous foundation for implementing conditional probability in probabilistic programming languages.2025-02-02T22:46:26Z17 pagesPietro Di GianantonioAbbas Edalathttp://arxiv.org/abs/2409.12013v2Memory Consistency and Program Transformations2025-02-01T22:08:32ZA memory consistency model specifies the allowed behaviors of shared memory concurrent programs. At the language level, these models are known to have a non-trivial impact on the safety of program optimizations, limiting the ability to rearrange/refactor code without introducing new behaviors. Existing programming language memory models try to address this by permitting more (relaxed/weak) concurrent behaviors but are still unable to allow all the desired optimizations. A core problem is that weaker consistency models may also render optimizations unsafe, a conclusion that goes against the intuition of them allowing more behaviors. This exposes an open problem of the compositional interaction between memory consistency semantics and optimizations: which parts of the semantics correspond to allowing/disallowing which set of optimizations is unclear. In this work, we establish a formal foundation suitable enough to understand this compositional nature, decomposing optimizations into a finite set of elementary effects on program execution traces, over which aspects of safety can be assessed. We use this decomposition to identify a desirable compositional property (complete) that would guarantee the safety of optimizations from one memory model to another. We showcase its practicality by proving such a property between Sequential Consistency (SC) and $SC_{RR}$, the latter allowing independent read-read reordering over $SC$. Our work potentially paves way to a new design methodology of programming-language memory models, one that places emphasis on the optimizations desired to be performed.2024-09-18T14:28:19ZAkshay GopalakrishnanMcGill UniversityClark VerbruggeMcGill UniversityMark BattyUniversity of Kent10.1145/3721143http://arxiv.org/abs/2502.00222v1The Free Termination Property of Queries Over Time2025-01-31T23:28:34ZBuilding on prior work on distributed databases and the CALM Theorem, we define and study the question of free termination: in the absence of distributed coordination, what query properties allow nodes in a distributed (database) system to unilaterally terminate execution even though they may receive additional data or messages in the future? This completeness question is complementary to the soundness questions studied in the CALM literature. We also develop a new model based on semiautomata that allows us to bridge from the relational transducer model of the CALM papers to algebraic models that are popular among software engineers (e.g. CRDTs) and of increasing interest to database theory for datalog extensions and incremental view maintenance.2025-01-31T23:28:34ZConor PowerParaschos KoutrisJoseph M Hellersteinhttp://arxiv.org/abs/2501.19150v1SkipFlow: Improving the Precision of Points-to Analysis using Primitive Values and Predicate Edges2025-01-31T14:07:05ZA typical points-to analysis such as Andersen's or Steensgaard's may lose precision because it ignores the branching structure of the analyzed program. Moreover, points-to analysis typically focuses on objects only, not considering instructions manipulating primitive values. We argue that such an approach leads to an unnecessary precision loss, for example, when primitive constants true and false flow out of method calls. We propose a novel lightweight points-to analysis called SkipFlow that interprocedurally tracks the flow of both primitives and objects, and explicitly captures the branching structure of the code using predicate edges. At the same time, however, SkipFlow is as lightweight and scalable as possible, unlike a traditional flow-sensitive analysis. We apply SkipFlow to GraalVM Native Image, a closed-world solution to building standalone binaries for Java applications. We evaluate the implementation using a set of microservice applications as well as well-known benchmark suites. We show that SkipFlow reduces the size of the application in terms of reachable methods by 9% on average without significantly increasing the analysis time.2025-01-31T14:07:05Z19 pages including appendix, published at CGO'25David KozakCodrut StancuTomáš VojnarChristian Wimmer