https://arxiv.org/api/YUB0o7voY51xp5tlOdAvaz1g4+0 2026-06-28T16:50:04Z 9951 1620 15 http://arxiv.org/abs/2504.18529v2 Practical Type-Based Taint Checking and Inference (Extended Version) 2025-04-30T21:16:41Z

Many important security properties can be formulated in terms of flows of tainted data, and improved taint analysis tools to prevent such flows are of critical need. Most existing taint analyses use whole-program static analysis, leading to scalability challenges. Type-based checking is a promising alternative, as it enables modular and incremental checking for fast performance. However, type-based approaches have not been widely adopted in practice, due to challenges with false positives and annotating existing codebases. In this paper, we present a new approach to type-based checking of taint properties that addresses these challenges, based on two key techniques. First, we present a new type-based tainting checker with significantly reduced false positives, via more practical handling of third-party libraries and other language constructs. Second, we present a novel technique to automatically infer tainting type qualifiers for existing code. Our technique supports inference of generic type argument annotations, crucial for tainting properties. We implemented our techniques in a tool TaintTyper and evaluated it on real-world benchmarks. TaintTyper exceeds the recall of a state-of-the-art whole-program taint analyzer, with comparable precision, and 2.93X-22.9X faster checking time. Further, TaintTyper infers annotations comparable to those written by hand, suitable for insertion into source code. TaintTyper is a promising new approach to efficient and practical taint checking.

2025-04-25T17:53:52Z Extended version of ECOOP 2025 paper Nima Karimipour Kanak Das Manu Sridharan Behnaz Hassanshahi http://arxiv.org/abs/2206.14175v3 InvAASTCluster: On Applying Invariant-Based Program Clustering to Introductory Programming Assignments 2025-04-30T15:39:53Z

Due to the vast number of students enrolled in programming courses, there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Typically, such techniques use program clustering to take advantage of previous correct student implementations to repair a new incorrect submission. These repair techniques use clustering methods since analyzing all available correct submissions to repair a program is not feasible. However, conventional clustering methods rely on program representations based on features such as abstract syntax trees (ASTs), syntax, control flow, and data flow. This paper proposes InvAASTCluster, a novel approach for program clustering that uses dynamically generated program invariants to cluster semantically equivalent IPAs. InvAASTCluster's program representation uses a combination of the program's semantics, through its invariants, and its structure through its anonymized abstract syntax tree (AASTs). Invariants denote conditions that must remain true during program execution, while AASTs are ASTs devoid of variable and function names, retaining only their types. Our experiments show that the proposed program representation outperforms syntax-based representations when clustering a set of correct IPAs. Furthermore, we integrate InvAASTCluster into a state-of-the-art clustering-based program repair tool. Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based repair tools by repairing around 13% more students' programs, in a shorter amount of time.

2022-06-28T17:42:28Z 31 pages, 21 Figures, 5 Tables. Accepted for publication at the Journal of Systems and Software. GitHub repo: https://github.com/pmorvalho/InvAASTCluster Pedro Orvalho Mikoláš Janota Vasco Manquinho http://arxiv.org/abs/2504.19852v2 A Formal Framework for Naturally Specifying and Verifying Sequential Algorithms 2025-04-30T08:39:51Z

Current approaches for formal verification of algorithms face important limitations. For specification, they cannot express algorithms naturally and concisely, especially for algorithms with states and flexible control flow. For verification, formal proof based on Hoare logic cannot reflect the logical structure of natural proof. To address these challenges, we introduce a formal framework for naturally specifying and verifying sequential algorithms in Coq. We use the state relation monad to integrate Coq's expressive type system with the flexible control flow of imperative languages. It supports nondeterministic operations and customizable program states, enabling specifying algorithms at an appropriate level of abstraction. For verification, we build a Hoare logic for the monad and propose a novel two-stage proof approach that separates natural logical reasoning from mechanical composition. It reflects the logical structure of natural proof, enhancing modularity and readability. We evaluate the framework by formalizing the Depth-First Search (DFS) algorithm and verifying the Knuth-Morris-Pratt (KMP) algorithm.

2025-04-28T14:47:20Z To appear at TASE 2025 (The 19th International Symposium on Theoretical Aspects of Software Engineering) Chengxi Yang Shushu Wu Qinxiang Cao http://arxiv.org/abs/2504.03995v3 Intrinsic Verification of Parsers and Formal Grammar Theory in Dependent Lambek Calculus (Extended Version) 2025-04-29T21:41:05Z

We present Dependent Lambek Calculus, a domain-specific dependent type theory for verified parsing and formal grammar theory. In $\textrm{Lambek}^D$, linear types are used as a syntax for formal grammars,and parsers can be written as linear terms. The linear typing restriction provides a form of intrinsic verification that a parser yields only valid parse trees for the input string. We demonstrate the expressivity of this system by showing that the combination of inductive linear types and dependency on non-linear data can be used to encode commonly used grammar formalisms such as regular and context-free grammars as well as traces of various types of automata. Using these encodings, we define parsers for regular expressions using deterministic automata, as well as examples of verified parsers of context-free grammars. We present a denotational semantics of our type theory that interprets the linear types as functions from strings to sets of abstract parse trees and terms as parse transformers. Based on this denotational semantics, we have made a prototype implementation of $\textrm{Lambek}^D$ using a shallow embedding in the Agda proof assistant. All of our examples parsers have been implemented in this prototype implementation.

2025-04-04T23:39:38Z 37 pages, 24 figures; Replaced to keep consistency with non-extended version of the paper to appear at PLDI 25 Steven Schaefer Nathan Varner Pedro H. Azevedo de Amorim Max S. New http://arxiv.org/abs/2504.20385v1 Weighted GKAT: Completeness and Complexity 2025-04-29T03:11:55Z

We propose Weighted Guarded Kleene Algebra with Tests (wGKAT), an uninterpreted weighted programming language equipped with branching, conditionals, and loops. We provide an operational semantics for wGKAT using a variant of weighted automata and introduce a sound and complete axiomatization. We also provide a polynomial time decision procedure for bisimulation equivalence.

2025-04-29T03:11:55Z ICALP 2025. 51 pages, 2 figures Spencer Van Koevering Wojciech Różowski Alexandra Silva http://arxiv.org/abs/2504.20166v1 Type-safe and portable support for packed data 2025-04-28T18:05:40Z

When components of a system exchange data, they need to serialise the data so that it can be sent over the network. Then, the recipient has to deserialise the data in order to be able to process it. These steps take time and have an impact on the overall system's performance. A solution to this is to use packed data, which has a unified representation between the memory and the network, removing the need for any marshalling steps. Additionally, using this data representation can improve the program's performance thanks to the data locality enabled by the compact representation of the data in memory. Unfortunately, no mainstream programming languages support packed data, whether it's out-of-the-box or through a compiler extension. We present packed-data, a Haskell library that allows for type safe building and reading of packed data in a functional style. The library does not rely on compiler modifications, making it portable, and leverages meta-programming to allow programmers to pack their own data types effortlessly. We evaluate the usability and performance of the library, and conclude that it allows traversing packed data up to 60% faster than unpacked data. We also reflect on how to enhance the performance of library-based support for packed data. Our implementation approach is general and can easily be used with any programming languages that support higher-kinded types.

2025-04-28T18:05:40Z To be published in European Conference on Object-Oriented Programming (ECOOP) 2025 Arthur Jamet Michael Vollmer http://arxiv.org/abs/2311.18042v3 Dependency-Aware Compilation for Surface Code Quantum Architectures 2025-04-28T17:15:31Z

Practical applications of quantum computing depend on fault-tolerant devices with error correction. Today, the most promising approach is a class of error-correcting codes called surface codes. We study the problem of compiling quantum circuits for quantum computers implementing surface codes. Optimal or near-optimal compilation is critical for both efficiency and correctness. The compilation problem requires (1) mapping circuit qubits to the device qubits and (2) routing execution paths between interacting qubits. We solve this problem efficiently and near-optimally with a novel algorithm that exploits the dependency structure of circuit operations to formulate discrete optimization problems that can be approximated via simulated annealing, a classic and simple algorithm. Our extensive evaluation shows that our approach is powerful and flexible for compiling realistic workloads.

2023-11-29T19:36:19Z Full version of OOPSLA 2025 paper Abtin Molavi Amanda Xu Swamit Tannu Aws Albarghouthi http://arxiv.org/abs/2504.19625v1 Rulebook: bringing co-routines to reinforcement learning environments 2025-04-28T09:34:34Z

Reinforcement learning (RL) algorithms, due to their reliance on external systems to learn from, require digital environments (e.g., simulators) with very simple interfaces, which in turn constrain significantly the implementation of such environments. In particular, these environments are implemented either as separate processes or as state machines, leading to synchronization and communication overheads in the first case, and to unstructured programming in the second. We propose a new domain-specific, co-routine-based, compiled language, called Rulebook, designed to automatically generate the state machine required to interact with machine learning (ML) algorithms and similar applications, with no performance overhead. Rulebook allows users to express programs without needing to be aware of the specific interface required by the ML components. By decoupling the execution model of the program from the syntactical encoding of the program, and thus without the need for manual state management, Rulebook allows to create larger and more sophisticated environments at a lower development cost.

2025-04-28T09:34:34Z Massimo Fioravanti Samuele Pasini Giovanni Agosta http://arxiv.org/abs/2502.18832v2 Safe and usable kernel extensions with Rex 2025-04-28T08:31:36Z

Safe kernel extensions have gained significant traction, evolving from simple packet filters to large, complex programs that customize storage, networking, and scheduling. Existing kernel extension mechanisms like eBPF rely on in-kernel verifiers to ensure safety of kernel extensions by static verification using symbolic execution. We identify significant usability issues -- safe extensions being rejected by the verifier -- due to the language-verifier gap, a mismatch between developers' expectation of program safety provided by a contract with the programming language, and the verifier's expectation. We present Rex, a new kernel extension framework that closes the language-verifier gap and improves the usability of kernel extensions in terms of programming experience and maintainability. Rex builds upon language-based safety to provide safety properties desired by kernel extensions, along with a lightweight extralingual runtime for properties that are unsuitable for static analysis, including safe exception handling, stack safety, and termination. With Rex, kernel extensions are written in safe Rust and interact with the kernel via a safe interface provided by Rex's kernel crate. No separate static verification is needed. Rex addresses usability issues of eBPF kernel extensions without compromising performance.

2025-02-26T05:16:06Z Jinghao Jia Ruowen Qin Milo Craun Egor Lukiyanov Ayush Bansal Michael V. Le Hubertus Franke Hani Jamjoom Tianyin Xu Dan Williams http://arxiv.org/abs/2504.19129v1 Automatic Goal Clone Detection in Rocq 2025-04-27T07:05:05Z

Proof engineering in Rocq is a labor-intensive process, and as proof developments grow in size, redundancy and maintainability become challenges. One such redundancy is goal cloning, i.e., proving α-equivalent goals multiple times, leading to wasted effort and bloated proof scripts. In this paper, we introduce clone-finder, a novel technique for detecting goal clones in Rocq proofs. By leveraging the formal notion of α-equivalence for Gallina terms, clone-finder systematically identifies duplicated proof goals across large Rocq codebases. We evaluate clone-finder on 40 real-world Rocq projects from the CoqGym dataset. Our results reveal that each project contains an average of 27.73 instances of goal clone. We observed that the clones can be categorized as either exact goal duplication, generalization, or α-equivalent goals with different proofs, each signifying varying levels duplicate effort. Our findings highlight significant untapped potential for proof reuse in Rocq-based formal verification projects, paving the way for future improvements in automated proof engineering.

2025-04-27T07:05:05Z 39th European Conference on Object-Oriented Programming (ECOOP 2025) Ali Ghanbari 10.4230/LIPIcs.ECOOP.2025.23 http://arxiv.org/abs/2504.18943v1 GPU accelerated program synthesis: Enumerate semantics, not syntax! 2025-04-26T15:06:37Z

Program synthesis is an umbrella term for generating programs and logical formulae from specifications. With the remarkable performance improvements that GPUs enable for deep learning, a natural question arose: can we also implement a search-based program synthesiser on GPUs to achieve similar performance improvements? In this article we discuss our insights on this question, based on recent works~. The goal is to build a synthesiser running on GPUs which takes as input positive and negative example traces and returns a logical formula accepting the positive and rejecting the negative traces. With GPU-friendly programming techniques -- using the semantics of formulae to minimise data movement and reduce data-dependent branching -- our synthesiser scales to significantly larger synthesis problems, and operates much faster than the previous CPU-based state-of-the-art. We believe the insights that make our approach GPU-friendly have wide potential for enhancing the performance of other formal methods (FM) workloads.

2025-04-26T15:06:37Z 10 pages Martin Berger Nathanaël Fijalkow Mojtaba Valizadeh http://arxiv.org/abs/2504.18704v1 An Interactive Debugger for Rust Trait Errors 2025-04-25T21:33:43Z

Compiler diagnostics for type inference failures are notoriously bad, and type classes only make the problem worse. By introducing a complex search process during inference, type classes can lead to wholly inscrutable or useless errors. We describe a system, Argus, for interactively visualizing type class inferences to help programmers debug inference failures, applied specifically to Rust's trait system. The core insight of Argus is to avoid the traditional model of compiler diagnostics as one-size-fits-all, instead providing the programmer with different views on the search tree corresponding to different debugging goals. Argus carefully uses defaults to improve debugging productivity, including interface design (e.g., not showing full paths of types by default) and heuristics (e.g., sorting obligations based on the expected complexity of fixing them). We evaluated Argus in a user study where $N = 25$ participants debugged type inference failures in realistic Rust programs, finding that participants using Argus correctly localized $2.2\times$ as many faults and localized $3.3\times$ faster compared to not using Argus.

2025-04-25T21:33:43Z Gavin Gray Brown University Will Crichton Brown University Shriram Krishnamurthi Brown University 10.1145/3729302 http://arxiv.org/abs/2505.14694v1 Prime Path Coverage in the GNU Compiler Collection 2025-04-25T18:18:53Z

We describe the implementation of the prime path coverage support introduced the GNU Compiler Collection 15, a structural coverage metric that focuses on paths of execution through the program. Prime path coverage strikes a good balance between the number of tests and coverage, and requires that loops are taken, taken more than once, and skipped. We show that prime path coverage subsumes modified condition/decision coverage (MC/DC). We improve on the current state-of-the-art algorithms for enumerating prime paths by using a suffix tree for efficient pruning of duplicated and redundant subpaths, reducing it to $O(n^2m)$ from $O(n^2m^2)$, where $n$ is the length of the longest path and $m$ is the number of candidate paths. We can efficiently track candidate paths using a few bitwise operations based on a compact representation of the indices of the ordered prime paths. By analyzing the control flow graph, GCC can observe and instrument paths in a language-agnostic manner, and accurately report what code must be run in what order to achieve coverage.

2025-04-25T18:18:53Z 11 pages, 12 figures Jørgen Kvalsvik http://arxiv.org/abs/2504.15845v2 Contrasting Deadlock-Free Session Processes (Extended Version) 2025-04-25T06:16:31Z

Deadlock freedom is a crucial property for message-passing programs. Over the years, several different type systems for concurrent processes that ensure deadlock freedom have been proposed; this diversity raises the question of how they compare. We address this question, considering two type systems not covered in prior work: Kokke etal's HCP, a type system based on a linear logic with hypersequents, and Padovani's priority-based type system for asynchronous processes, dubbed P. Their distinctive features make formal comparisons relevant and challenging. Our findings are two-fold: (1) the hypersequent setting does not drastically change the class of deadlock-free processes induced by linear logic, and (2) we relate the classes of deadlock-free processes induced by HCP and P. We prove that our results hold under both synchronous and asynchronous communication. Our results provide new insights into the essential mechanisms involved in statically avoiding deadlocks in concurrency.

2025-04-22T12:42:37Z Full version of an ECOOP 25 paper Juan C. Jaramillo Jorge A. Pérez http://arxiv.org/abs/2504.17646v1 Portability of Optimizations from SC to TSO 2025-04-24T15:16:17Z

It is well recognized that the safety of compiler optimizations is at risk in a concurrent context. Existing approaches primarily rely on context-free thread-local guarantees, and prohibit optimizations that introduce a data-race. However, compilers utilize global context-specific information, exposing safe optimizations that may violate such guarantees as well as introduce a race. Such optimizations need to individually be proven safe for each language model. An alternate approach to this would be proving them safe for an intuitive model (like interleaving semantics), and then determine their portability across other concurrent models. In this paper, we address this problem of porting across models of concurrency. We first identify a global guarantee on optimizations portable from Sequential Consistency (SC) to Total Store Order (TSO). Our guarantee is in the form of constraints specifying the syntactic changes an optimization must not incur. We then show these constraints correlate to prohibiting the introduction of triangular races, a subset of data-race relevant to TSO. We conclude by showing how such race inducing optimizations relate to porting across Strong Release Acquire (SRA), a known causally consistent memory model.

2025-04-24T15:16:17Z Submitted Manuscript. This pre-print has not undergone any post-review modifications/improvements Akshay Gopalakrishnan Clark Verbrugge 10.1007/978-3-031-98208-8_8