https://arxiv.org/api/6M9tFeYwrIblD3UL1ZHCU/R2Shs 2026-06-26T17:02:06Z 9951 1320 15 http://arxiv.org/abs/2508.10111v1 Constrained Decoding of Diffusion LLMs with Context-Free Grammars 2025-08-13T18:09:09Z

Large language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a formal language. Yet, due to their probabilistic nature, LLM output is not guaranteed to adhere to such formal languages. Prior work has proposed constrained decoding as a means to restrict LLM generation to particular formal languages. However, existing works are not applicable to the emerging paradigm of diffusion LLMs, when used in practical scenarios such as the generation of formally correct C++ or JSON output. In this paper we address this challenge and present the first constrained decoding method for diffusion models, one that can handle formal languages captured by context-free grammars. We begin by reducing constrained decoding to the more general additive infilling problem, which asks whether a partial output can be completed to a valid word in the target language. This problem also naturally subsumes the previously unaddressed multi-region infilling constrained decoding. We then reduce this problem to the task of deciding whether the intersection of the target language and a regular language is empty and present an efficient algorithm to solve it for context-free languages. Empirical results on various applications, such as C++ code infilling and structured data extraction in JSON, demonstrate that our method achieves near-perfect syntactic correctness while consistently preserving or improving functional correctness. Importantly, our efficiency optimizations ensure that the computational overhead remains practical.

2025-08-13T18:09:09Z Niels Mündler Jasper Dekoninck Martin Vechev http://arxiv.org/abs/2504.05398v2 CRDT Emulation, Simulation, and Representation Independence 2025-08-12T21:34:53Z

Conflict-free replicated data types (CRDTs) are distributed data structures designed for fault tolerance and high availability. CRDTs have historically been taxonomized into state-based CRDTs, in which replicas apply updates locally and periodically broadcast their state to other replicas over the network, and operation-based (or op-based) CRDTs, in which every state-updating operation is individually broadcast. In the literature, state-based and op-based CRDTs are considered equivalent due to the existence of algorithms that let them emulate each other, and verification techniques and results that apply to one kind of CRDT are said to apply to the other thanks to this equivalence. However, what it means for state-based and op-based CRDTs to emulate each other has never been made fully precise. Emulation is nontrivial since state-based and op-based CRDTs place different requirements on the underlying network with regard to both the causal ordering of message delivery, and the granularity of the messages themselves. We specify and formalize CRDT emulation in terms of simulation by modeling CRDTs and their interactions with the network as transition systems. We show that emulation can be understood as weak simulations between the transition systems of the original and emulating CRDT systems, thus closing a gap in the CRDT literature. We precisely characterize which properties of CRDT systems are preserved by our weak simulations, and therefore which properties can be said to be preserved by emulation algorithms. Finally, we leverage our emulation results to obtain a general representation independence result for CRDTs: intuitively, clients of a CRDT cannot tell whether they are interacting with a state-based or op-based CRDT in particular.

2025-04-07T18:11:53Z This is an extended version of the ICFP '25 paper Proc. ACM Program. Lang. 9, ICFP, Article 259 (August 2025) Nathan Liittschwager Jonathan Castello Stelios Tsampas Lindsey Kuper 10.1145/3747528 http://arxiv.org/abs/2508.06813v2 Technical Report: Full-Stack Fine-Tuning for the Q Programming Language 2025-08-12T15:49:05Z

Even though large language models are becoming increasingly capable, it is still unreasonable to expect them to excel at tasks that are under-represented on the Internet. Leveraging LLMs for specialized applications, particularly in niche programming languages and private domains, remains challenging and largely unsolved. In this work, we address this gap by presenting a comprehensive, open-source approach for adapting LLMs to the Q programming language, a popular tool in quantitative finance that is much less present on the Internet compared to Python, C, Java, and other ``mainstream" languages and is therefore not a strong suit of general-purpose AI models. We introduce a new Leetcode style evaluation dataset for Q, benchmark major frontier models on the dataset, then do pretraining, supervised fine tuning, and reinforcement learning to train a suite of reasoning and non-reasoning models based on the Qwen-2.5 series, spanning five parameter sizes (1.5B, 3B, 7B, 14B, 32B). Our best model achieves a pass@1 accuracy of 59 percent on our Q benchmark, surpassing the best-performing frontier model, Claude Opus-4 by 29.5 percent. Additionally, all models, even our 1.5B model, outperform GPT-4.1 on this task. In addition to releasing models, code, and data, we provide a detailed blueprint for dataset construction, model pretraining, supervised fine-tuning, and reinforcement learning. Our methodology is broadly applicable, and we discuss how these techniques can be extended to other tasks, including those where evaluation may rely on soft or subjective signals.

2025-08-09T04:22:07Z 40 pages Brendan R. Hogan Will Brown Adel Boyarsky Anderson Schneider Yuriy Nevmyvaka http://arxiv.org/abs/2405.17681v2 Synthesizing JSON Schema Transformers 2025-08-12T05:33:08Z

JSON (JavaScript Object Notation) is a data encoding that allows structured data to be used in a standardized and straightforward manner across systems. Schemas for JSON-formatted data can be constructed using the JSON Schema standard, which describes the data types, structure, and meaning of JSON-formatted data. JSON is commonly used for storing and transmitting information such as program configurations, web API requests and responses, or remote procedure calls; or data records, such as healthcare information or other structured documents. Since JSON is a plaintext format with potentially highly complex definitions, it can be an arduous process to change code which handles structured JSON data when its storage or transmission schemas are modified. Our work describes a program synthesis method to generate a program that accepts data conforming to a given input JSON Schema and automatically converts it to conform to a resulting, target JSON Schema. We use a top-down, type-directed approach to search for programs using a set of rewrite rules which constrain the ways in which a schema can be modified without unintended data loss or corruption. Once a satisfying sequence of rewrites has been found, we pass an intermediate representation of the rewrite sequence to a code generation backend, which synthesizes a program which executes the data transformation. This system allows users to quickly and efficiently modify or augment their existing systems in safe ways at their interfaces.

2024-05-27T22:17:50Z Jack Stanek Daniel Killough http://arxiv.org/abs/2508.08467v1 Empowering Children to Create AI-Enabled Augmented Reality Experiences 2025-08-11T20:57:39Z

Despite their potential to enhance children's learning experiences, AI-enabled AR technologies are predominantly used in ways that position children as consumers rather than creators. We introduce Capybara, an AR-based and AI-powered visual programming environment that empowers children to create, customize, and program 3D characters overlaid onto the physical world. Capybara enables children to create virtual characters and accessories using text-to-3D generative AI models, and to animate these characters through auto-rigging and body tracking. In addition, our system employs vision-based AI models to recognize physical objects, allowing children to program interactive behaviors between virtual characters and their physical surroundings. We demonstrate the expressiveness of Capybara through a set of novel AR experiences. We conducted user studies with 20 children in the United States and Argentina. Our findings suggest that Capybara can empower children to harness AI in authoring personalized and engaging AR experiences that seamlessly bridge the virtual and physical worlds.

2025-08-11T20:57:39Z Accepted to ACM UIST 2025 Lei Zhang Shuyao Zhou Amna Liaqat Tinney Mak Brian Berengard Emily Qian Andrés Monroy-Hernández http://arxiv.org/abs/2206.02585v5 On the Origins of Objects by Means of Careful Selection 2025-08-11T15:59:50Z

We introduce a taxonomy of objects for EO programming language. This taxonomy is designed with a few principles in mind: non-redundancy, simplicity, and so on. The taxonomy is supposed to be used as a navigation map by EO programmers. It may also be helpful as a guideline for designers of other object-oriented languages or libraries for them.

2022-06-06T12:49:00Z Yegor Bugayenko Maxim Trunnikov http://arxiv.org/abs/2508.08074v1 Towards General-Purpose Data Discovery: A Programming Languages Approach 2025-08-11T15:18:20Z

Efficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal language, and corresponding implementation, for characterizing and solving generic discovery queries. To this end, we present TQL, a domain-specific language for data discovery well-designed to leverage and exploit the results of programming languages research in both its syntax and semantics. In this paper, we fully and formally characterize the core language through an algebraic model, Imperative Relational Algebra with Types (ImpRAT), and implement a modular proof-of-concept system prototype.

2025-08-11T15:18:20Z Andrew Kang Yashnil Saha Sainyam Galhotra http://arxiv.org/abs/2508.08054v1 TQL: Towards Type-Driven Data Discovery 2025-08-11T14:55:49Z

Existing query languages for data discovery exhibit system-driven designs that emphasize database features and functionality over user needs. We propose a re-prioritization of the client through an introduction of a language-driven approach to data discovery systems that can leverage powerful results from programming languages research. In this paper, we describe TQL, a flexible and practical query language which incorporates a type-like system to encompass downstream transformation-context in its discovery queries. The syntax and semantics of TQL (including the underlying evaluation model), are formally defined, and a sketch of its implementation is also provided. Additionally, we provide comparisons to existing languages for data retrieval and data discovery to examine the advantages of TQL's expanded expressive power in real-life settings.

2025-08-11T14:55:49Z 2024 IEEE BigData paper Andrew Kang Sainyam Galhotra 10.1109/BigData62323.2024.10825227 http://arxiv.org/abs/2508.07855v1 Checking Consistency of Event-driven Traces 2025-08-11T11:11:44Z

Event-driven programming is a popular paradigm where the flow of execution is controlled by two features: (1) shared memory and (2) sending and receiving of messages between multiple handler threads (just called handler). Each handler has a mailbox (modelled as a queue) for receiving messages, with the constraint that the handler processes its messages sequentially. Executions of messages by different handlers may be interleaved. A central problem in this setting is checking whether a candidate execution is consistent with the semantics of event-driven programs. In this paper, we propose an axiomatic semantics for eventdriven programs based on the standard notion of traces (also known as execution graphs). We prove the equivalence of axiomatic and operational semantics. This allows us to rephrase the consistency problem axiomatically, resulting in the event-driven consistency problem: checking whether a given trace is consistent. We analyze the computational complexity of this problem and show that it is NP-complete, even when the number of handler threads is bounded. We then identify a tractable fragment: in the absence of nested posting, where handlers do not post new messages while processing a message, consistency checking can be performed in polynomial time. Finally, we implement our approach in a prototype tool and report on experimental results on a wide range of benchmarks.

2025-08-11T11:11:44Z Parosh Aziz Abdulla Mohamed Faouzi Atig R. Govind Samuel Grahn Ramanathan S. Thinniyam http://arxiv.org/abs/2503.23145v2 CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis 2025-08-08T07:13:11Z

Inductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural language, their ability to perform inductive program synthesis is underexplored. Existing evaluation protocols rely on static sets of examples and held-out tests, offering no feedback when synthesized functions are incorrect and failing to reflect real-world scenarios such as reverse engineering. We propose CodeARC, the Code Abstraction and Reasoning Challenge, a new evaluation framework where agents interact with a hidden target function by querying it with new inputs, synthesizing candidate functions, and iteratively refining their solutions using a differential testing oracle. This interactive setting encourages agents to perform function calls and self-correction based on feedback. We construct the first large-scale benchmark for general-purpose inductive program synthesis, featuring 1114 functions. Among 18 models evaluated, o3-mini performs best with a success rate of 52.7%, highlighting the difficulty of this task. Fine-tuning LLaMA-3.1-8B-Instruct on curated synthesis traces yields up to a 31% relative performance gain. CodeARC provides a more realistic and challenging testbed for evaluating LLM-based program synthesis and inductive reasoning. Our code, data, and models are publicly available at https://github.com/Anjiang-Wei/CodeARC

2025-03-29T16:50:39Z Anjiang Wei Tarun Suresh Jiannan Cao Naveen Kannan Yuheng Wu Kai Yan Thiago S. F. X. Teixeira Ke Wang Alex Aiken http://arxiv.org/abs/2508.05997v1 Hybrid Game Control Envelope Synthesis 2025-08-08T04:02:15Z

Control problems for embedded systems like cars and trains can be modeled by two-player hybrid games. Control envelopes, which are families of safe control solutions, correspond to nondeterministic winning policies of hybrid games, where each deterministic specialization of the policy is a control solution. This paper synthesizes nondeterministic winning policies for hybrid games that are as permissive as possible. It introduces subvalue maps, a compositional representation of such policies that enables verification and synthesis along the structure of the game. An inductive logical characterization in differential game logic (dGL) checks whether a subvalue map induces a sound control envelope which always induces a winning play. A policy is said to win if it always achieves the desirable outcome when the player follows it, no matter what actions the opponent plays. The maximal subvalue map, which allows the most action options while still winning, is shown to exist and satisfy a logical characterization. A family of algorithms for nondeterministic policy synthesis can be obtained from the inductive subvalue map soundness characterization. An implementation of these findings is evaluated on examples that use the expressivity of dGL to model a range of diverse control challenges.

2025-08-08T04:02:15Z Aditi Kabra Jonathan Laurent Stefan Mitsch André Platzer http://arxiv.org/abs/2204.12384v5 Qunity: A Unified Language for Quantum and Classical Computing (Extended Version) 2025-08-07T14:53:50Z

We introduce Qunity, a new quantum programming language designed to treat quantum computing as a natural generalization of classical computing. Qunity presents a unified syntax where familiar programming constructs can have both quantum and classical effects. For example, one can use sum types to implement the direct sum of linear operators, exception-handling syntax to implement projective measurements, and aliasing to induce entanglement. Further, Qunity takes advantage of the overlooked BQP subroutine theorem, allowing one to construct reversible subroutines from irreversible quantum algorithms through the uncomputation of "garbage" outputs. Unlike existing languages that enable quantum aspects with separate add-ons (like a classical language with quantum gates bolted on), Qunity provides a unified syntax and a novel denotational semantics that guarantees that programs are quantum mechanically valid. We present Qunity's syntax, type system, and denotational semantics, showing how it can cleanly express several quantum algorithms. We also detail how Qunity can be compiled into a low-level qubit circuit language like OpenQASM, proving the realizability of our design.

2022-04-26T15:34:22Z 79 pages, 37 figures. Presented at POPL 2023. Corrected to fix mistakes found by Mikhail Mints Finn Voichick Liyi Li Robert Rand Michael Hicks 10.1145/3571225 http://arxiv.org/abs/2503.04512v3 Modular Reasoning about Error Bounds for Concurrent Probabilistic Programs (Extended Version) 2025-08-07T07:03:47Z

We present Coneris, the first higher-order concurrent separation logic for reasoning about error probability bounds of higher-order concurrent probabilistic programs with higher-order state. To support modular reasoning about concurrent (non-probabilistic) program modules, state-of-the-art program logics internalize the classic notion of linearizability within the logic through the concept of logical atomicity. Coneris extends this idea to probabilistic concurrent program modules. Thus Coneris supports modular reasoning about probabilistic concurrent modules by capturing a novel notion of randomized logical atomicity within the logic. To do so, Coneris utilizes presampling tapes and a novel probabilistic update modality to describe how state is changed probabilistically at linearization points. We demonstrate this approach by means of smaller synthetic examples and larger case studies. All of the presented results, including the meta-theory, have been mechanized in the Rocq proof assistant and the Iris separation logic framework This is the extended version of the same paper accepted at ICFP 2025, where more details of proofs and case studies are included in the Appendix.

2025-03-06T14:59:30Z Kwing Hei Li Alejandro Aguirre Simon Oddershede Gregersen Philipp G. Haselwarter Joseph Tassarotti Lars Birkedal http://arxiv.org/abs/2408.06478v2 Theorem-Carrying Transactions: Runtime Verification to Ensure Interface Specifications for Smart Contract Safety 2025-08-06T22:31:48Z

Security bugs and trapdoors in smart contracts have been impacting the Ethereum community since its inception. Conceptually, the 1.45-million Ethereum's contracts form a single "gigantic program" whose behaviors are determined by the complex compositions of contracts. Can programmers be assured that this gigantic program conforms to high-level safety specifications, despite unforeseeable code-level intricacies? Static code verification cannot be faithful to this gigantic program due to its scale and high polymorphism. In this paper, we present a viable approach to achieve this goal. Our technology, called Theorem-Carrying Transactions (TCT), combines the benefits of concrete execution and symbolic proofs. Under the TCT protocol, every transaction carries a theorem that proves its adherence to the specified properties in the invoked contracts, and the runtime system checks the theorem before executing the transaction. Once a theorem is proven, it will be reused for future transactions, so TCT's runtime overhead is minimal. As case studies, we demonstrate that TCT secures token contracts without foreseeing code-level intricacies, such as integer overflow and reentrancy. TCT is also successfully applied to a Uniswap codebase, showcasing a complex decentralized finance (DeFi) scenario. Our evaluation shows a negligible runtime overhead, two orders of magnitude lower than a state-of-the-art approach for runtime checking of contract code safety.

2024-08-12T20:27:41Z Thomas Ball Microsoft Research Nikolaj S. Bjørner Microsoft Research Ashley J. Chen New York University Shanghai Shuo Chen Microsoft Research Yang Chen Microsoft Research Zhongxin Guo Microsoft Research Tzu-Han Hsu Michigan State University Peng Liu Pennsylvania State University Nanqing Luo Pennsylvania State University http://arxiv.org/abs/2508.06563v1 Assessing Engineering Student Perceptions of Introductory CS Courses in an Indian Context 2025-08-06T19:04:19Z

Understanding student perceptions of assessment is vital for designing inclusive and effective learning environments, especially in technical education. This study explores engineering students' perceptions of assessment practices in an introductory computer science/ programming course, and its associated laboratory within an Indian engineering institute context. A total of 318 first-year Bachelor of Technology students participated in a weekly 25-statement Likert-scale survey conducted over nine weeks. Using descriptive statistics and non-parametric tests (Mann-Whitney U and Kruskal-Wallis), the analysis reveals that students largely perceive lab assignments as effective learning activities and view exams and projects as authentic and skill-enhancing. Students appreciated the role of instructors in shaping course content and found teaching assistants to be approachable and helpful, despite some inconsistencies. The study also finds significant variations in students' academic performance and assessment perceptions based on prior programming experience, technology familiarity, gender, and academic branch. Notably, the performance data did not follow a Gaussian distribution, challenging common assumptions in grade modeling. A comparative analysis with European cohorts highlights both universal patterns and contextual differences, offering valuable insights for designing inclusive and equitable assessment strategies in programming education.

2025-08-06T19:04:19Z Utsav Kumar Nareti Divyansh Gupta Chandranath Adak Soumi Chattopadhyay Emma Riese Tanujit Chakraborty Mayank Agarwal Satendra Kumar