https://arxiv.org/api/mo61YDo2LEW64GGAXpQyVhc/hyM 2026-06-26T23:07:22Z 9951 1410 15 http://arxiv.org/abs/2507.11873v2 Syntax Repair as Language Intersection 2025-07-17T02:26:31Z

We introduce a new technique for repairing syntax errors in arbitrary context-free languages. This technique models syntax repair as a language intersection problem by defining a finite language that provably generates every syntactically valid repair within a given edit distance. Leveraging a theoretical connection between the Bar-Hillel construction from formal language theory and CFL reachability from program analysis, we show that repairability in a finite number of typographic edits is polylogarithmic parallel time decidable and provide an enumeration algorithm based on the Brzozowski derivative. Finally, we evaluate this algorithm and its implementation, demonstrating state-of-the-art results on a Python syntax repair benchmark.

2025-07-16T03:35:32Z Breandan Considine http://arxiv.org/abs/2506.23407v2 Compiling a Q# Subset to QASM 3.0 in TypeScript via a JSON Based IR 2025-07-16T22:46:09Z

We implement a compile toolchain from Q# to QASM 3.0 including a full-featured lexer and parser implementation, as well as a compiler that supports a subset of Q# features. The lexer, parser and compiler are shown to work with various input Q# programs and the implementation is compared against existing Q# compile tools. Unlike the Microsoft implementation of the official Q# compile toolchain, our implementation is written in TypeScript in order to port functionality to web environments.

2025-06-29T22:02:57Z Marcus Edwards http://arxiv.org/abs/2504.20432v2 An Algebraic Approach to Asymmetric Delegation and Polymorphic Label Inference (Technical Report) 2025-07-16T21:38:43Z

Language-based information flow control (IFC) enables reasoning about and enforcing security policies in decentralized applications. While information flow properties are relatively extensional and compositional, designing expressive systems that enforce such properties remains challenging. In particular, it can be difficult to use IFC labels to model certain security assumptions, such as semi-honest agents. Motivated by these modeling limitations, we study the algebraic semantics of lattice-based IFC label models, and propose a semantic framework that allows formalizing asymmetric delegation, which is partial delegation of confidentiality or integrity. Our framework supports downgrading of information and ensures their safety through nonmalleable information flow (NMIF). To demonstrate the practicality of our framework, we design and implement a novel algorithm that statically checks NMIF and a label inference procedure that efficiently supports bounded label polymorphism, allowing users to write code generic with respect to labels.

2025-04-29T05:00:17Z Silei Ren Coşku Acay Andrew C. Myers http://arxiv.org/abs/2507.12640v1 Dual-Numbers Reverse AD for Functional Array Languages 2025-07-16T21:21:52Z

The standard dual-numbers construction works well for forward-mode automatic differentiation (AD) and is attractive due to its simplicity; recently, it also has been adapted to reverse-mode AD, but practical performance, especially on array programs, leaves a lot to be desired. In this paper we introduce first-class support for multidimensional arrays in dual-numbers reverse-mode AD with little to no performance overhead. The algorithm consists of three loosely-coupled components: a semantics-preserving vectorisation code transformation (the bulk-operation transform or BOT), a fairly straightforward lifting of the basic dual-numbers reverse AD algorithm to a mostly first-order array language, and symbolic interpretation to achieve an end-to-end compilation pipeline. Unfortunately, we lose some of the nice generalisable aspects of dual-numbers AD in the process, most importantly support for higher-order code. We do support some higher-order array combinators, but only a carefully-chosen set: 'build' (elementwise array construction), 'gather' and 'scatter'. In return, the BOT can eliminate the essential (for AD) higher-orderness of the input program, meaning that AD gets essentially presented with a first-order program. This allows the naive trick of lifting dual numbers to "dual arrays" to work without much modification.

2025-07-16T21:21:52Z Tom Smeding Mikołaj Konarski Simon Peyton Jones Andrew Fitzgibbon http://arxiv.org/abs/2507.12443v1 LLM-Based Config Synthesis requires Disambiguation 2025-07-16T17:29:15Z

Beyond hallucinations, another problem in program synthesis using LLMs is ambiguity in user intent. We illustrate the ambiguity problem in a networking context for LLM-based incremental configuration synthesis of route-maps and ACLs. These structures frequently overlap in header space, making the relative priority of actions impossible for the LLM to infer without user interaction. Measurements in a large cloud identify complex ACLs with 100's of overlaps, showing ambiguity is a real problem. We propose a prototype system, Clarify, which uses an LLM augmented with a new module called a Disambiguator that helps elicit user intent. On a small synthetic workload, Clarify incrementally synthesizes routing policies after disambiguation and then verifies them. Our treatment of ambiguities is useful more generally when the intent of updates can be correctly synthesized by LLMs, but their integration is ambiguous and can lead to different global behaviors.

2025-07-16T17:29:15Z Rajdeep Mondal Nikolaj Bjorner Todd Millstein Alan Tang George Varghese http://arxiv.org/abs/2503.19609v3 Nanopass Back-Translation of Call-Return Trees for Mechanized Secure Compilation Proofs 2025-07-16T12:54:33Z

Researchers aim to build secure compilation chains enforcing that if there is no attack a source context can mount against a source program then there is also no attack an adversarial target context can mount against the compiled program. Proving that these compilation chains are secure is, however, challenging, and involves a non-trivial back-translation step: for any attack a target context mounts against the compiled program one has to exhibit a source context mounting the same attack against the source program. We describe a novel back-translation technique, which results in simpler proofs that can be more easily mechanized in a proof assistant. Given a finite set of finite trace prefixes, capturing the interaction recorded during an attack between a target context and the compiled program, we build a call-return tree that we back-translate into a source context producing the same trace prefixes. We use state in the generated source context to record the current location in the call-return tree. The back-translation is done in several small steps, each adding to the tree new information describing how the location should change depending on how the context regains control. To prove this back-translation correct we give semantics to every intermediate call-return tree language, using ghost state to store information and explicitly enforce execution invariants. We prove several small forward simulations, basically seeing the back-translation as a verified nanopass compiler. Thanks to this modular structure, we are able to mechanize this complex back-translation and its correctness proof in the Rocq prover without too much effort.

2025-03-25T12:50:35Z ITP'25 final version Jérémy Thibault Joseph Lenormand Catalin Hritcu http://arxiv.org/abs/2507.22070v1 Automated Test Data Generation for Enterprise Protobuf Systems: A Metaclass-Enhanced Statistical Approach 2025-07-16T09:49:04Z

Large-scale enterprise systems utilizing Protocol Buffers (protobuf) present significant challenges for performance testing, particularly when targeting intermediate business interfaces with complex nested data structures. Traditional test data generation approaches are inadequate for handling the intricate hierarchical and graph-like structures inherent in enterprise protobuf schemas. This paper presents a novel test data generation framework that leverages Python's metaclass system for dynamic type enhancement and statistical analysis of production logs for realistic value domain extraction. Our approach combines automatic schema introspection, statistical value distribution analysis, and recursive descent algorithms for handling deeply nested structures. Experimental evaluation on three real-world enterprise systems demonstrates up to 95\% reduction in test data preparation time and 80\% improvement in test coverage compared to existing approaches. The framework successfully handles protobuf structures with up to 15 levels of nesting and generates comprehensive test suites containing over 100,000 test cases within seconds.

2025-07-16T09:49:04Z 7 pages Y. Du http://arxiv.org/abs/2507.11897v1 Towards Relational Contextual Equality Saturation 2025-07-16T04:24:42Z

Equality saturation is a powerful technique for program optimization. Contextual equality saturation extends this to support rewrite rules that are conditioned on where a term appears in an expression. Existing work has brought contextual reasoning to egg; in this paper, we share our ongoing work to extend this to relational equality saturation in egglog. We summarize the existing approaches to contextual equality saturation, outline its main applications, and identify key challenges in combining this approach with relational models.

2025-07-16T04:24:42Z Appeared at EGRAPHS 2024 Tyler Hou Shadaj Laddad Joseph M. Hellerstein http://arxiv.org/abs/2505.18409v3 On the Complexity of Checking Mixed Isolation Levels for SQL Transactions 2025-07-15T17:50:19Z

Concurrent accesses to databases are typically grouped in transactions which define units of work that should be isolated from other concurrent computations and resilient to failures. Modern databases provide different levels of isolation for transactions that correspond to different trade-offs between consistency and throughput. Quite often, an application can use transactions with different isolation levels at the same time. In this work, we investigate the problem of testing isolation level implementations in databases, i.e., checking whether a given execution composed of multiple transactions adheres to the prescribed isolation level semantics. We particularly focus on transactions formed of SQL queries and the use of multiple isolation levels at the same time. We show that many restrictions of this problem are NP-complete and provide an algorithm which is exponential-time in the worst-case, polynomial-time in relevant cases, and practically efficient.

2025-05-23T22:33:03Z CAV 2025 Full Version Ahmed Bouajjani Constantin Enea Enrique Román-Calvo http://arxiv.org/abs/2507.10799v1 Stream programs are monoid homomorphisms with state 2025-07-14T20:51:53Z

We define a broad class of deterministic stream functions and show they can be implemented as homomorphisms into a "state" monoid. The homomorphism laws are simpler than the conditions of previous semantic frameworks for stream program optimization, yet retain support for rich equational reasoning over expressive dataflow programs, including sequential composition, parallel composition, and feedback. We demonstrate this using examples of partitioned database joins, stratified negation, and a simplified model of TCP.

2025-07-14T20:51:53Z Tyler Hou Michael Arntzenius Max Willsey http://arxiv.org/abs/2507.10482v1 Orthologic Type Systems 2025-07-14T17:01:18Z

We propose to use orthologic as the basis for designing type systems supporting intersection, union, and negation types in the presence of subtyping assumptions. We show how to extend orthologic to support monotonic and antimonotonic functions, supporting the use of type constructors in such type systems. We present a proof system for orthologic with function symbols, showing that it admits partial cut elimination. Using these insights, we present an $\mathcal O(n^2(1+m))$ algorithm for deciding the subtyping relation under $m$ assumptions. We also show $O(n^2)$ polynomial-time normalization algorithm, allowing simplification of types to their minimal canonical form.

2025-07-14T17:01:18Z Simon Guilloud Viktor Kunčak http://arxiv.org/abs/2507.10324v1 Toolsuite for Implementing Multiagent Systems Based on Communication Protocols 2025-07-14T14:32:09Z

Interaction-Oriented Programming (IOP) is an approach to building a multiagent system by modeling the interactions between its roles via a flexible interaction protocol and implementing agents to realize the interactions of the roles they play in the protocol. In recent years, we have developed an extensive suite of software that enables multiagent system developers to apply IOP. These include tools for efficiently verifying protocols for properties such as liveness and safety and middleware that simplifies the implementation of agents. This paper presents some of that software suite.

2025-07-14T14:32:09Z Amit K. Chopra Samuel H. Christie Munindar P. Singh http://arxiv.org/abs/2507.10635v1 Formal Verification of Variational Quantum Circuits 2025-07-14T12:28:32Z

Variational quantum circuits (VQCs) are a central component of many quantum machine learning algorithms, offering a hybrid quantum-classical framework that, under certain aspects, can be considered similar to classical deep neural networks. A shared aspect is, for instance, their vulnerability to adversarial inputs, small perturbations that can lead to incorrect predictions. While formal verification techniques have been extensively developed for classical models, no comparable framework exists for certifying the robustness of VQCs. Here, we present the first in-depth theoretical and practical study of the formal verification problem for VQCs. Inspired by abstract interpretation methods used in deep learning, we analyze the applicability and limitations of interval-based reachability techniques in the quantum setting. We show that quantum-specific aspects, such as state normalization, introduce inter-variable dependencies that challenge existing approaches. We investigate these issues by introducing a novel semantic framework based on abstract interpretation, where the verification problem for VQCs can be formally defined, and its complexity analyzed. Finally, we demonstrate our approach on standard verification benchmarks.

2025-07-14T12:28:32Z Assolini and Marzari contributed equally to the paper Nicola Assolini Luca Marzari Isabella Mastroeni Alessandra di Pierro http://arxiv.org/abs/2507.10099v1 ReDemon UI: Reactive Synthesis by Demonstration for Web UI 2025-07-14T09:34:33Z

ReDemon UI synthesizes React applications from user demonstrations, enabling designers and non-expert programmers to create UIs that integrate with standard UI prototyping workflows. Users provide a static mockup sketch with event handler holes and demonstrate desired runtime behaviors by interacting with the rendered mockup and editing the sketch. ReDemon UI identifies reactive data and synthesizes a React program with correct state update logic. We utilize enumerative synthesis for simple UIs and LLMs for more complex UIs.

2025-07-14T09:34:33Z Submitted to UIST 2025 Posters Jay Lee Gyuhyeok Oh Joongwon Ahn Xiaokang Qiu 10.1145/3746058.3758454 http://arxiv.org/abs/2507.09883v1 BeePL: Correct-by-compilation kernel extensions 2025-07-14T03:44:25Z

eBPF is a technology that allows developers to safely extend kernel functionality without modifying kernel source code or developing loadable kernel modules. Since the kernel governs critical system operations and enforces isolation boundaries between user space and privileged data, any mechanism that modifies its behavior must meet the highest standards of safety and correctness. To this end, the eBPF toolchain includes a verifier, which statically checks safety properties such as memory access validity, bounded loops, and type correctness before loading the program into the kernel. However, the existing verifier is both overly conservative in some cases-rejecting valid programs-and unsound in others, permitting unsafe behavior that violates the intended semantics of the kernel interface. To address these challenges, we introduce BeePL, a domain-specific language for eBPF with a formally verified type system. The BeePL type system, along with the language design, statically enforces key safety properties such as type-correct memory access, safe pointer usage, absence of unbounded loops, and structured control flow. These guarantees are backed by formal type soundness proofs, ensuring that well-typed programs satisfy the safety invariants required by the eBPF execution environment. BeePL also proves that well-typed source programs meet critical eBPF-specific properties related to memory safety, termination, and control flow, enabling high-level reasoning prior to compilation. For properties not fully enforceable statically-such as dynamic bounds and undefined behavior-BeePL inserts semantics-preserving runtime checks during compilation. We develop a verified compilation strategy that extends CompCert to generate BPF bytecode from BeePL programs, establishing a principled foundation for an end-to-end verifiable toolchain for safe kernel extensions.

2025-07-14T03:44:25Z 45 pages, 18 figures Swarn Priya Frédéric Besson Connor Sughrue Tim Steenvoorden Jamie Fulford Freek Verbeek Binoy Ravindran