https://arxiv.org/api/TZJUeh4i8y9Pq6bvO832F90cz5Y2026-06-27T00:10:50Z9951142515http://arxiv.org/abs/2502.17075v2Equality Saturation for Optimizing High-Level Julia IR2025-07-13T11:55:53ZCompilers are indispensable for transforming code written in high-level languages into performant machine code, but their general-purpose optimizations sometimes fall short. Domain experts might be aware of certain optimizations that the compiler is unable to apply or that are only valid in a particular domain. We have developed a system that allows domain experts to express rewrite rules to optimize code in the Julia programming language. Our system builds on e-graphs and equality saturation. It can apply optimizations in the presence of control flow and side effects. As Julia uses multiple dispatch, we allow users to constrain rewrite rules by argument types, and propagate type information through the e-graph representation. We propose an ILP formulation for optimal e-graph extraction taking into account dominance properties for code reuse and introduce CFG skeleton relaxation to rewrite calls to pure functions as well as those with side effects. Use cases demonstrate that our system can perform rewrites on high-level, domain-specific code, as well as on lower-level code such as Julia's broadcasting mechanism. Finally, we analyze the required compilation time.2025-02-24T11:40:49ZSubmitted to ACM Transactions on Architecture and Code Optimization (TACO)Jules MerckxTim BesardBjorn De Sutterhttp://arxiv.org/abs/2507.09539v1Bounded Model Checking of RISC-V Machine Code with Context-Free-Language Ordered Binary Decision Diagrams2025-07-13T08:43:37ZSymbolic execution is a powerful technique for analyzing the behavior of software yet scalability remains a challenge due to state explosion in control and data flow. Existing tools typically aim at managing control flow internally, often at the expense of completeness, while offloading reasoning over data flow to SMT solvers. Moreover, reasoning typically happens on source code or intermediate representation level to leverage structural information, making machine code generation part of the trust base. We are interested in changing the equation in two non-trivial ways: pushing reasoning down to machine code level, and then offloading reasoning entirely into SMT solvers and other, possibly more efficient solver technology. In more abstract terms, we are asking if bit-precise reasoning technology can be made scalable on software, and not just hardware. For this purpose, we developed two tools called rotor and bitme for model generation and bounded model checking, respectively. We chose RISC-V restricted to integer arithmetic as modeling target for rotor since RISC-V integer semantics is essentially equivalent to established SMT semantics over bitvectors and arrays of bitvectors. While state-of-the-art SMT solvers struggle in our experiments, we have evidence that there is potential for improvement. To show the potential, we have slightly generalized and then implemented in bitme two types of binary decision diagrams (BDDs): algebraic decision diagrams (ADDs) and context-free-language ordered binary decision diagrams (CFLOBDDs). Bitme uses BDDs to propagate program input through models, essentially generalizing constant propagation to domain propagation. SMT solvers only get involved when model input cannot be propagated, significanly speeding up SMT solving. We then study the impact on state explosion of CFLOBDDs, which are potentially more scalable than ADDs.2025-07-13T08:43:37ZAnna BolotinaChristoph M. KirschStefanie Muroya LeiMatthias Pleschingerhttp://arxiv.org/abs/2504.03529v5PHOENIX: Pauli-Based High-Level Optimization Engine for Instruction Execution on NISQ Devices2025-07-13T04:26:35ZVariational quantum algorithms (VQA) based on Hamiltonian simulation represent a specialized class of quantum programs well-suited for near-term quantum computing applications due to its modest resource requirements in terms of qubits and circuit depth. Unlike the conventional single-qubit (1Q) and two-qubit (2Q) gate sequence representation, Hamiltonian simulation programs are essentially composed of disciplined subroutines known as Pauli exponentiations (Pauli strings with coefficients) that are variably arranged. To capitalize on these distinct program features, this study introduces PHOENIX, a highly effective compilation framework that primarily operates at the high-level Pauli-based intermediate representation (IR) for generic Hamiltonian simulation programs. PHOENIX exploits global program optimization opportunities to the greatest extent, compared to existing SOTA methods despite some of them also utilizing similar IRs. Experimental results demonstrate that PHOENIX outperforms SOTA VQA compilers across diverse program categories, backend ISAs, and hardware topologies.2025-04-04T15:29:18Z6 pages, 8 figures; Open-sourced on GitHub; A conference paper at DAC 2025Proc. 62nd ACM/IEEE Design Automation Conference (DAC 2025), San Francisco, CA, USA, June 22-25, 2025, pp. 1-7Zhaohui YangDawei DingChenghong ZhuJianxin ChenYuan Xie10.1109/DAC63849.2025.11133028http://arxiv.org/abs/2507.08992v1Semantic Source Code Segmentation using Small and Large Language Models2025-07-11T19:49:59ZSource code segmentation, dividing code into functionally coherent segments, is crucial for knowledge retrieval and maintenance in software development. While enabling efficient navigation and comprehension of large codebases, manual and syntactic analysis approaches have become impractical as repositories grow, especially for low-resource languages like R and their research domains (e.g., social sciences, psychology).This paper introduces an automated, domain-specific approach for research R code segmentation using Large and Small Language Models (LLMs/SLMs). It presents two novel approaches and a human-annotated dataset, StatCodeSeg. We explore two distinct approaches: line-by-line analysis with context and range-based segment determination. We experiment with LLMs and fine-tuned SLMs. To support the generalizability of our approaches, we also include experiments on Python code from the computer science domain.Our results show that context-based line-by-line analysis is superior over range-based segmentation.Using smaller language models like CodeBERT and an encoder-only version of CodeT5+ are better than their LLM counterparts. Most notably, these two best-performing models did not see R code during pre-training versus the LLMs but were only fine-tuned on 4,130 lines of manually annotated code.2025-07-11T19:49:59Z18 pages, 4 figuresAbdelhalim DahouAnsgar ScherpSebastian KurtenBrigitte MathiakMadhu Chauhanhttp://arxiv.org/abs/2507.08796v1Filter Equivariant Functions: A symmetric account of length-general extrapolation on lists2025-07-11T17:57:16ZWhat should a function that extrapolates beyond known input/output examples look like? This is a tricky question to answer in general, as any function matching the outputs on those examples can in principle be a correct extrapolant. We argue that a "good" extrapolant should follow certain kinds of rules, and here we study a particularly appealing criterion for rule-following in list functions: that the function should behave predictably even when certain elements are removed. In functional programming, a standard way to express such removal operations is by using a filter function. Accordingly, our paper introduces a new semantic class of functions -- the filter equivariant functions. We show that this class contains interesting examples, prove some basic theorems about it, and relate it to the well-known class of map equivariant functions. We also present a geometric account of filter equivariants, showing how they correspond naturally to certain simplicial structures. Our highlight result is the amalgamation algorithm, which constructs any filter-equivariant function's output by first studying how it behaves on sublists of the input, in a way that extrapolates perfectly.2025-07-11T17:57:16Z18 pages, 2 figuresOwen LewisNeil GhaniAndrew DudzikChristos PerivolaropoulosRazvan PascanuPetar Veličkovićhttp://arxiv.org/abs/2507.04298v2CCR 2.0: High-level Reasoning for Conditional Refinements2025-07-11T16:06:29ZIn recent years, great progress has been made in the field of formal verification for low-level systems. Many of them are based on one of two popular approaches: refinement or unary separation logic. These two approaches are very different in nature and offer complementary benefits in compositionality.
Recently, to fuse these benefits into a single unified mechanism, a new approach called Conditional Contextual Refinement (CCR 1.0 for short) was proposed. In this paper, we advance CCR 1.0 and provide novel and intuitive reasoning principles, resulting in CCR 2.0. Achieving this goal was challenging due to non-trivial counterexamples which necessitated elegant changes to the model of CCR 1.0. On top of CCR 2.0, we show how to fuse the benefits of refinement, unary separation logic, and also relational separation logic.
Our results are formalized in Rocq.2025-07-06T08:51:48ZYoungju SongMinki Chohttp://arxiv.org/abs/2503.00404v2SecRef*: Securely Sharing Mutable References Between Verified and Unverified Code in F*2025-07-11T06:17:13ZWe introduce SecRef*, a secure compilation framework protecting stateful programs verified in F* against linked unverified code, with which the program dynamically shares ML-style mutable references. To ease program verification in this setting, we propose a way of tracking which references are shareable with the unverified code, and which ones are not shareable and whose contents are thus guaranteed to be unchanged after calling into unverified code. This universal property of non-shareable references is exposed in the interface on which the verified program can rely when calling into unverified code. The remaining refinement types and pre- and post-conditions that the verified code expects from the unverified code are converted into dynamic checks about the shared references by using higher-order contracts. We prove formally in F* that this strategy ensures sound and secure interoperability with unverified code. Since SecRef* is built on top of the Monotonic State effect of F*, these proofs rely on the first monadic representation for this effect, which is a contribution of our work that can be of independent interest. Finally, we use SecRef* to build a simple cooperative multi-threading scheduler that is verified and that securely interacts with unverified threads.2025-03-01T08:48:39ZICFP'25 preprintCezar-Constantin AndriciDanel AhmanCatalin HritcuRuxandra IcleanuGuido MartínezExequiel RivasThéo Winterhalterhttp://arxiv.org/abs/2411.12822v2Denotational Semantics of Gradual Typing using Synthetic Guarded Domain Theory (Extended Version)2025-07-10T22:01:07ZGradually typed programming languages, which allow for soundly mixing static and dynamically typed programming styles, present a strong challenge for metatheorists. Even the simplest sound gradually typed languages feature at least recursion and errors, with realistic languages featuring furthermore runtime allocation of memory locations and dynamic type tags. Further, the desired metatheoretic properties of gradually typed languages have become increasingly sophisticated: validity of type-based equational reasoning as well as the relational property known as graduality. Many recent works have tackled verifying these properties, but the resulting mathematical developments are highly repetitive and tedious, with few reusable theorems persisting across different developments.
In this work, we present a new denotational semantics for gradual typing developed using guarded domain theory. Guarded domain theory combines the generality of step-indexed logical relations for modeling advanced programming features with the modularity and reusability of denotational semantics. We demonstrate the feasibility of this approach with a model of a simple gradually typed lambda calculus and prove the validity of beta-eta equality and the graduality theorem for the denotational model. This model should provide the basis for a reusable mathematical theory of gradually typed program semantics. Finally, we have mechanized most of the core theorems of our development in Guarded Cubical Agda, a recent extension of Agda with support for the guarded recursive constructions we use.2024-11-19T19:18:31ZExtended version of paper accepted to POPL 2025Eric GiovanniniTingting DingMax S. New10.1145/3704863http://arxiv.org/abs/2508.00005v1Modelling Program Spaces in Program Synthesis with Constraints2025-07-10T14:00:53ZA core challenge in program synthesis is taming the large space of possible programs. Since program synthesis is essentially a combinatorial search, the community has sought to leverage powerful combinatorial constraint solvers. Here, constraints are used to express the program semantics, but not as a potentially potent tool to remove unwanted programs. Recent inductive logic programming approaches introduce constraints on the program's syntax to be synthesized. These syntactic constraints allow for checking and propagating a constraint without executing the program, and thus for arbitrary operators. In this work, we leverage syntactic constraints to model program spaces, defining not just solutions that are feasible, but also ones that are likely useful. To demonstrate this idea, we introduce BART, a solver that efficiently propagates and solves these constraints. We evaluate BART on program space enumeration tasks, finding that the constraints eliminate up to 99 percent of the program space, and that modeling program spaces significantly reduces enumeration time.2025-07-10T14:00:53ZTilman HinnerichsBart SwinkelsJaap de JongReuben Gardos ReidTudor MagirescuNeil Yorke-SmithSebastijan Dumancichttp://arxiv.org/abs/2507.07480v1On Propositional Program Equivalence (extended abstract)2025-07-10T07:06:47ZGeneral program equivalence is undecidable. However, if we abstract away the semantics of statements, then this problem becomes not just decidable, but practically feasible. For instance, a program of the form "if $b$ then $e$ else $f$" should be equivalent to "if not $b$ then $f$ else $e$" - no matter what $b$, $e$ and $f$ are. This kind of equivalence is known as propositional equivalence. In this extended abstract, we discuss recent developments in propositional program equivalence from the perspective of (Guarded) Kleene Algebra with Tests, or (G)KAT.2025-07-10T07:06:47ZTobias Kappéhttp://arxiv.org/abs/2507.06939v1Sound Interval-Based Synthesis for Probabilistic Programs2025-07-09T15:21:22ZProbabilistic programming has become a standard practice to model stochastic events and learn about the behavior of nature in different scientific contexts, ranging from Genetics and Ecology to Linguistics and Psychology. However, domain practitioners (such as biologists) also need to be experts in statistics in order to select which probabilistic model is suitable for a given particular problem, relying then on probabilistic inference engines such as Stan, Pyro or Edward to fine-tune the parameters of that particular model. Probabilistic Programming would be more useful if the model selection is made automatic, without requiring statistics expertise from the end user. Automatically selecting the model is challenging because of the large search space of probabilistic programs needed to be explored, because the fact that most of that search space contains invalid programs, and because invalid programs may only be detected in some executions, due to its probabilistic nature. We propose a type system to statically reject invalid probabilistic programs, a type-directed synthesis algorithm that guarantees that generated programs are type-safe by construction, and an heuristic search procedure to handle the vast search space. We collect a number of probabilistic programs from the literature, and use them to compare our method with both a type-agnostic random search, and a data-guided method from the literature (DaPPer). Our results show that our technique both outperforms random search and DaPPer, specially on more complex programs. This drastic performance difference in synthesis allows for fast sampling of programs and enables techniques that previously suffered from the complexity of synthesis, such as Genetic Programming, to be applied.2025-07-09T15:21:22ZGuilherme EspadaAlcides Fonsecahttp://arxiv.org/abs/2501.13603v2Verifying Graph Algorithms in Separation Logic: A Case for an Algebraic Approach (Extended Version)2025-07-09T13:04:15ZVerifying graph algorithms has long been considered challenging in separation logic, mainly due to structural sharing between graph subcomponents. We show that these challenges can be effectively addressed by representing graphs as a partial commutative monoid (PCM), and by leveraging structure-preserving functions (PCM morphisms), including higher-order combinators.
PCM morphisms are important because they generalize separation logic's principle of local reasoning. While traditional framing isolates relevant portions of the heap only at the top level of a specification, morphisms enable contextual localization: they distribute over monoid operations to isolate relevant subgraphs, even when nested deeply within a specification.
We demonstrate the morphisms' effectiveness with novel and concise verifications of two canonical graph benchmarks: the Schorr-Waite graph marking algorithm and the union-find data structure.2025-01-23T12:11:57ZProc. ACM Program. Lang. 9, ICFP, Article 241 (August 2025)Marcos GranduryAleksandar NanevskiAlexander Gryzlov10.1145/3747510http://arxiv.org/abs/2507.06584v1Finding Compiler Bugs through Cross-Language Code Generator and Differential Testing2025-07-09T06:33:06ZCompilers play a central role in translating high-level code into executable programs, making their correctness essential for ensuring code safety and reliability. While extensive research has focused on verifying the correctness of compilers for single-language compilation, the correctness of cross-language compilation - which involves the interaction between two languages and their respective compilers - remains largely unexplored. To fill this research gap, we propose CrossLangFuzzer, a novel framework that introduces a universal intermediate representation (IR) for JVM-based languages and automatically generates cross-language test programs with diverse type parameters and complex inheritance structures. After generating the initial IR, CrossLangFuzzer applies three mutation techniques - LangShuffler, FunctionRemoval, and TypeChanger - to enhance program diversity. By evaluating both the original and mutated programs across multiple compiler versions, CrossLangFuzzer successfully uncovered 10 confirmed bugs in the Kotlin compiler, 4 confirmed bugs in the Groovy compiler, 7 confirmed bugs in the Scala 3 compiler, 2 confirmed bugs in the Scala 2 compiler, and 1 confirmed bug in the Java compiler. Among all mutators, TypeChanger is the most effective, detecting 11 of the 24 compiler bugs. Furthermore, we analyze the symptoms and root causes of cross-compilation bugs, examining the respective responsibilities of language compilers when incorrect behavior occurs during cross-language compilation. To the best of our knowledge, this is the firstwork specifically focused on identifying and diagnosing compiler bugs in cross-language compilation scenarios. Our research helps to understand these challenges and contributes to improving compiler correctness in multi-language environments.2025-07-09T06:33:06ZThe 40th ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA)Qiong FengXiaotian MaZiyuan FengMarat AkhinWei SongPeng Lianghttp://arxiv.org/abs/2507.06456v1Fast Collection Operations from Indexed Stream Fusion2025-07-08T23:52:45ZWe present a system of efficient methods for traversing and combining associative collection data structures. A distinguishing feature of the system is that, like traditional sequential iterator libraries, it does not require specialized compiler infrastructure or staged compilation for efficiency and composability. By using a representation based on indexed streams, the library can express complex joins over input collections while using no intermediate allocations. We implement the library for the Lean, Morphic, and Rust programming languages and provide a mechanized proof of functional correctness in Lean.2025-07-08T23:52:45ZScott KovachPraneeth KolichalaKyle A. MillerDavid BromanFredrik Kjolstadhttp://arxiv.org/abs/2507.06396v1Representing Prompting Patterns with PDL: Compliance Agent Case Study2025-07-08T21:03:22ZPrompt engineering for LLMs remains complex, with existing frameworks either hiding complexity behind restrictive APIs or providing inflexible canned patterns that resist customization -- making sophisticated agentic programming challenging. We present the Prompt Declaration Language (PDL), a novel approach to prompt representation that tackles this fundamental complexity by bringing prompts to the forefront, enabling manual and automatic prompt tuning while capturing the composition of LLM calls together with rule-based code and external tools. By abstracting away the plumbing for such compositions, PDL aims at improving programmer productivity while providing a declarative representation that is amenable to optimization. This paper demonstrates PDL's utility through a real-world case study of a compliance agent. Tuning the prompting pattern of this agent yielded up to 4x performance improvement compared to using a canned agent and prompt pattern.2025-07-08T21:03:22ZICML 2025 Workshop on Programmatic Representations for Agent LearningMandana VaziriLouis MandelYuji WatanabeHirokuni KitaharaMartin HirzelAnca Sailer