https://arxiv.org/api/6M9tFeYwrIblD3UL1ZHCU/R2Shs2026-06-26T17:02:06Z9951132015http://arxiv.org/abs/2508.10111v1Constrained Decoding of Diffusion LLMs with Context-Free Grammars2025-08-13T18:09:09ZLarge language models (LLMs) have shown promising performance across diverse domains. Many practical applications of LLMs, such as code completion and structured data extraction, require adherence to syntactic constraints specified by a formal language. Yet, due to their probabilistic nature, LLM output is not guaranteed to adhere to such formal languages. Prior work has proposed constrained decoding as a means to restrict LLM generation to particular formal languages. However, existing works are not applicable to the emerging paradigm of diffusion LLMs, when used in practical scenarios such as the generation of formally correct C++ or JSON output. In this paper we address this challenge and present the first constrained decoding method for diffusion models, one that can handle formal languages captured by context-free grammars. We begin by reducing constrained decoding to the more general additive infilling problem, which asks whether a partial output can be completed to a valid word in the target language. This problem also naturally subsumes the previously unaddressed multi-region infilling constrained decoding. We then reduce this problem to the task of deciding whether the intersection of the target language and a regular language is empty and present an efficient algorithm to solve it for context-free languages. Empirical results on various applications, such as C++ code infilling and structured data extraction in JSON, demonstrate that our method achieves near-perfect syntactic correctness while consistently preserving or improving functional correctness. Importantly, our efficiency optimizations ensure that the computational overhead remains practical.2025-08-13T18:09:09ZNiels MündlerJasper DekoninckMartin Vechevhttp://arxiv.org/abs/2504.05398v2CRDT Emulation, Simulation, and Representation Independence2025-08-12T21:34:53ZConflict-free replicated data types (CRDTs) are distributed data structures designed for fault tolerance and high availability. CRDTs have historically been taxonomized into state-based CRDTs, in which replicas apply updates locally and periodically broadcast their state to other replicas over the network, and operation-based (or op-based) CRDTs, in which every state-updating operation is individually broadcast. In the literature, state-based and op-based CRDTs are considered equivalent due to the existence of algorithms that let them emulate each other, and verification techniques and results that apply to one kind of CRDT are said to apply to the other thanks to this equivalence. However, what it means for state-based and op-based CRDTs to emulate each other has never been made fully precise. Emulation is nontrivial since state-based and op-based CRDTs place different requirements on the underlying network with regard to both the causal ordering of message delivery, and the granularity of the messages themselves.
We specify and formalize CRDT emulation in terms of simulation by modeling CRDTs and their interactions with the network as transition systems. We show that emulation can be understood as weak simulations between the transition systems of the original and emulating CRDT systems, thus closing a gap in the CRDT literature. We precisely characterize which properties of CRDT systems are preserved by our weak simulations, and therefore which properties can be said to be preserved by emulation algorithms. Finally, we leverage our emulation results to obtain a general representation independence result for CRDTs: intuitively, clients of a CRDT cannot tell whether they are interacting with a state-based or op-based CRDT in particular.2025-04-07T18:11:53ZThis is an extended version of the ICFP '25 paperProc. ACM Program. Lang. 9, ICFP, Article 259 (August 2025)Nathan LiittschwagerJonathan CastelloStelios TsampasLindsey Kuper10.1145/3747528http://arxiv.org/abs/2508.06813v2Technical Report: Full-Stack Fine-Tuning for the Q Programming Language2025-08-12T15:49:05ZEven though large language models are becoming increasingly capable, it is still unreasonable to expect them to excel at tasks that are under-represented on the Internet. Leveraging LLMs for specialized applications, particularly in niche programming languages and private domains, remains challenging and largely unsolved. In this work, we address this gap by presenting a comprehensive, open-source approach for adapting LLMs to the Q programming language, a popular tool in quantitative finance that is much less present on the Internet compared to Python, C, Java, and other ``mainstream" languages and is therefore not a strong suit of general-purpose AI models. We introduce a new Leetcode style evaluation dataset for Q, benchmark major frontier models on the dataset, then do pretraining, supervised fine tuning, and reinforcement learning to train a suite of reasoning and non-reasoning models based on the Qwen-2.5 series, spanning five parameter sizes (1.5B, 3B, 7B, 14B, 32B). Our best model achieves a pass@1 accuracy of 59 percent on our Q benchmark, surpassing the best-performing frontier model, Claude Opus-4 by 29.5 percent. Additionally, all models, even our 1.5B model, outperform GPT-4.1 on this task. In addition to releasing models, code, and data, we provide a detailed blueprint for dataset construction, model pretraining, supervised fine-tuning, and reinforcement learning. Our methodology is broadly applicable, and we discuss how these techniques can be extended to other tasks, including those where evaluation may rely on soft or subjective signals.2025-08-09T04:22:07Z40 pagesBrendan R. HoganWill BrownAdel BoyarskyAnderson SchneiderYuriy Nevmyvakahttp://arxiv.org/abs/2405.17681v2Synthesizing JSON Schema Transformers2025-08-12T05:33:08ZJSON (JavaScript Object Notation) is a data encoding that allows structured data to be used in a standardized and straightforward manner across systems. Schemas for JSON-formatted data can be constructed using the JSON Schema standard, which describes the data types, structure, and meaning of JSON-formatted data. JSON is commonly used for storing and transmitting information such as program configurations, web API requests and responses, or remote procedure calls; or data records, such as healthcare information or other structured documents. Since JSON is a plaintext format with potentially highly complex definitions, it can be an arduous process to change code which handles structured JSON data when its storage or transmission schemas are modified. Our work describes a program synthesis method to generate a program that accepts data conforming to a given input JSON Schema and automatically converts it to conform to a resulting, target JSON Schema. We use a top-down, type-directed approach to search for programs using a set of rewrite rules which constrain the ways in which a schema can be modified without unintended data loss or corruption. Once a satisfying sequence of rewrites has been found, we pass an intermediate representation of the rewrite sequence to a code generation backend, which synthesizes a program which executes the data transformation. This system allows users to quickly and efficiently modify or augment their existing systems in safe ways at their interfaces.2024-05-27T22:17:50ZJack StanekDaniel Killoughhttp://arxiv.org/abs/2508.08467v1Empowering Children to Create AI-Enabled Augmented Reality Experiences2025-08-11T20:57:39ZDespite their potential to enhance children's learning experiences, AI-enabled AR technologies are predominantly used in ways that position children as consumers rather than creators. We introduce Capybara, an AR-based and AI-powered visual programming environment that empowers children to create, customize, and program 3D characters overlaid onto the physical world. Capybara enables children to create virtual characters and accessories using text-to-3D generative AI models, and to animate these characters through auto-rigging and body tracking. In addition, our system employs vision-based AI models to recognize physical objects, allowing children to program interactive behaviors between virtual characters and their physical surroundings. We demonstrate the expressiveness of Capybara through a set of novel AR experiences. We conducted user studies with 20 children in the United States and Argentina. Our findings suggest that Capybara can empower children to harness AI in authoring personalized and engaging AR experiences that seamlessly bridge the virtual and physical worlds.2025-08-11T20:57:39ZAccepted to ACM UIST 2025Lei ZhangShuyao ZhouAmna LiaqatTinney MakBrian BerengardEmily QianAndrés Monroy-Hernándezhttp://arxiv.org/abs/2206.02585v5On the Origins of Objects by Means of Careful Selection2025-08-11T15:59:50ZWe introduce a taxonomy of objects for EO programming language. This taxonomy is designed with a few principles in mind: non-redundancy, simplicity, and so on. The taxonomy is supposed to be used as a navigation map by EO programmers. It may also be helpful as a guideline for designers of other object-oriented languages or libraries for them.2022-06-06T12:49:00ZYegor BugayenkoMaxim Trunnikovhttp://arxiv.org/abs/2508.08074v1Towards General-Purpose Data Discovery: A Programming Languages Approach2025-08-11T15:18:20ZEfficient and effective data discovery is critical for many modern applications in machine learning and data science. One major bottleneck to the development of a general-purpose data discovery tool is the absence of an expressive formal language, and corresponding implementation, for characterizing and solving generic discovery queries. To this end, we present TQL, a domain-specific language for data discovery well-designed to leverage and exploit the results of programming languages research in both its syntax and semantics. In this paper, we fully and formally characterize the core language through an algebraic model, Imperative Relational Algebra with Types (ImpRAT), and implement a modular proof-of-concept system prototype.2025-08-11T15:18:20ZAndrew KangYashnil SahaSainyam Galhotrahttp://arxiv.org/abs/2508.08054v1TQL: Towards Type-Driven Data Discovery2025-08-11T14:55:49ZExisting query languages for data discovery exhibit system-driven designs that emphasize database features and functionality over user needs. We propose a re-prioritization of the client through an introduction of a language-driven approach to data discovery systems that can leverage powerful results from programming languages research. In this paper, we describe TQL, a flexible and practical query language which incorporates a type-like system to encompass downstream transformation-context in its discovery queries. The syntax and semantics of TQL (including the underlying evaluation model), are formally defined, and a sketch of its implementation is also provided. Additionally, we provide comparisons to existing languages for data retrieval and data discovery to examine the advantages of TQL's expanded expressive power in real-life settings.2025-08-11T14:55:49Z2024 IEEE BigData paperAndrew KangSainyam Galhotra10.1109/BigData62323.2024.10825227http://arxiv.org/abs/2508.07855v1Checking Consistency of Event-driven Traces2025-08-11T11:11:44ZEvent-driven programming is a popular paradigm where the flow of execution is controlled by two features: (1) shared memory and (2) sending and receiving of messages between multiple handler threads (just called handler). Each handler has a mailbox (modelled as a queue) for receiving messages, with the constraint that the handler processes its messages sequentially. Executions of messages by different handlers may be interleaved. A central problem in this setting is checking whether a candidate execution is consistent with the semantics of event-driven programs. In this paper, we propose an axiomatic semantics for eventdriven programs based on the standard notion of traces (also known as execution graphs). We prove the equivalence of axiomatic and operational semantics. This allows us to rephrase the consistency problem axiomatically, resulting in the event-driven consistency problem: checking whether a given trace is consistent. We analyze the computational complexity of this problem and show that it is NP-complete, even when the number of handler threads is bounded. We then identify a tractable fragment: in the absence of nested posting, where handlers do not post new messages while processing a message, consistency checking can be performed in polynomial time. Finally, we implement our approach in a prototype tool and report on experimental results on a wide range of benchmarks.2025-08-11T11:11:44ZParosh Aziz AbdullaMohamed Faouzi AtigR. GovindSamuel GrahnRamanathan S. Thinniyamhttp://arxiv.org/abs/2503.23145v2CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis2025-08-08T07:13:11ZInductive program synthesis, or programming by example, requires synthesizing functions from input-output examples that generalize to unseen inputs. While large language model agents have shown promise in programming tasks guided by natural language, their ability to perform inductive program synthesis is underexplored. Existing evaluation protocols rely on static sets of examples and held-out tests, offering no feedback when synthesized functions are incorrect and failing to reflect real-world scenarios such as reverse engineering. We propose CodeARC, the Code Abstraction and Reasoning Challenge, a new evaluation framework where agents interact with a hidden target function by querying it with new inputs, synthesizing candidate functions, and iteratively refining their solutions using a differential testing oracle. This interactive setting encourages agents to perform function calls and self-correction based on feedback. We construct the first large-scale benchmark for general-purpose inductive program synthesis, featuring 1114 functions. Among 18 models evaluated, o3-mini performs best with a success rate of 52.7%, highlighting the difficulty of this task. Fine-tuning LLaMA-3.1-8B-Instruct on curated synthesis traces yields up to a 31% relative performance gain. CodeARC provides a more realistic and challenging testbed for evaluating LLM-based program synthesis and inductive reasoning. Our code, data, and models are publicly available at https://github.com/Anjiang-Wei/CodeARC2025-03-29T16:50:39ZAnjiang WeiTarun SureshJiannan CaoNaveen KannanYuheng WuKai YanThiago S. F. X. TeixeiraKe WangAlex Aikenhttp://arxiv.org/abs/2508.05997v1Hybrid Game Control Envelope Synthesis2025-08-08T04:02:15ZControl problems for embedded systems like cars and trains can be modeled by two-player hybrid games. Control envelopes, which are families of safe control solutions, correspond to nondeterministic winning policies of hybrid games, where each deterministic specialization of the policy is a control solution. This paper synthesizes nondeterministic winning policies for hybrid games that are as permissive as possible. It introduces subvalue maps, a compositional representation of such policies that enables verification and synthesis along the structure of the game. An inductive logical characterization in differential game logic (dGL) checks whether a subvalue map induces a sound control envelope which always induces a winning play. A policy is said to win if it always achieves the desirable outcome when the player follows it, no matter what actions the opponent plays. The maximal subvalue map, which allows the most action options while still winning, is shown to exist and satisfy a logical characterization. A family of algorithms for nondeterministic policy synthesis can be obtained from the inductive subvalue map soundness characterization. An implementation of these findings is evaluated on examples that use the expressivity of dGL to model a range of diverse control challenges.2025-08-08T04:02:15ZAditi KabraJonathan LaurentStefan MitschAndré Platzerhttp://arxiv.org/abs/2204.12384v5Qunity: A Unified Language for Quantum and Classical Computing (Extended Version)2025-08-07T14:53:50ZWe introduce Qunity, a new quantum programming language designed to treat quantum computing as a natural generalization of classical computing. Qunity presents a unified syntax where familiar programming constructs can have both quantum and classical effects. For example, one can use sum types to implement the direct sum of linear operators, exception-handling syntax to implement projective measurements, and aliasing to induce entanglement. Further, Qunity takes advantage of the overlooked BQP subroutine theorem, allowing one to construct reversible subroutines from irreversible quantum algorithms through the uncomputation of "garbage" outputs. Unlike existing languages that enable quantum aspects with separate add-ons (like a classical language with quantum gates bolted on), Qunity provides a unified syntax and a novel denotational semantics that guarantees that programs are quantum mechanically valid. We present Qunity's syntax, type system, and denotational semantics, showing how it can cleanly express several quantum algorithms. We also detail how Qunity can be compiled into a low-level qubit circuit language like OpenQASM, proving the realizability of our design.2022-04-26T15:34:22Z79 pages, 37 figures. Presented at POPL 2023. Corrected to fix mistakes found by Mikhail MintsFinn VoichickLiyi LiRobert RandMichael Hicks10.1145/3571225http://arxiv.org/abs/2503.04512v3Modular Reasoning about Error Bounds for Concurrent Probabilistic Programs (Extended Version)2025-08-07T07:03:47ZWe present Coneris, the first higher-order concurrent separation logic for reasoning about error probability bounds of higher-order concurrent probabilistic programs with higher-order state. To support modular reasoning about concurrent (non-probabilistic) program modules, state-of-the-art program logics internalize the classic notion of linearizability within the logic through the concept of logical atomicity.
Coneris extends this idea to probabilistic concurrent program modules. Thus Coneris supports modular reasoning about probabilistic concurrent modules by capturing a novel notion of randomized logical atomicity within the logic. To do so, Coneris utilizes presampling tapes and a novel probabilistic update modality to describe how state is changed probabilistically at linearization points. We demonstrate this approach by means of smaller synthetic examples and larger case studies.
All of the presented results, including the meta-theory, have been mechanized in the Rocq proof assistant and the Iris separation logic framework
This is the extended version of the same paper accepted at ICFP 2025, where more details of proofs and case studies are included in the Appendix.2025-03-06T14:59:30ZKwing Hei LiAlejandro AguirreSimon Oddershede GregersenPhilipp G. HaselwarterJoseph TassarottiLars Birkedalhttp://arxiv.org/abs/2408.06478v2Theorem-Carrying Transactions: Runtime Verification to Ensure Interface Specifications for Smart Contract Safety2025-08-06T22:31:48ZSecurity bugs and trapdoors in smart contracts have been impacting the Ethereum community since its inception. Conceptually, the 1.45-million Ethereum's contracts form a single "gigantic program" whose behaviors are determined by the complex compositions of contracts. Can programmers be assured that this gigantic program conforms to high-level safety specifications, despite unforeseeable code-level intricacies? Static code verification cannot be faithful to this gigantic program due to its scale and high polymorphism. In this paper, we present a viable approach to achieve this goal. Our technology, called Theorem-Carrying Transactions (TCT), combines the benefits of concrete execution and symbolic proofs. Under the TCT protocol, every transaction carries a theorem that proves its adherence to the specified properties in the invoked contracts, and the runtime system checks the theorem before executing the transaction. Once a theorem is proven, it will be reused for future transactions, so TCT's runtime overhead is minimal. As case studies, we demonstrate that TCT secures token contracts without foreseeing code-level intricacies, such as integer overflow and reentrancy. TCT is also successfully applied to a Uniswap codebase, showcasing a complex decentralized finance (DeFi) scenario. Our evaluation shows a negligible runtime overhead, two orders of magnitude lower than a state-of-the-art approach for runtime checking of contract code safety.2024-08-12T20:27:41ZThomas BallMicrosoft ResearchNikolaj S. BjørnerMicrosoft ResearchAshley J. ChenNew York University ShanghaiShuo ChenMicrosoft ResearchYang ChenMicrosoft ResearchZhongxin GuoMicrosoft ResearchTzu-Han HsuMichigan State UniversityPeng LiuPennsylvania State UniversityNanqing LuoPennsylvania State Universityhttp://arxiv.org/abs/2508.06563v1Assessing Engineering Student Perceptions of Introductory CS Courses in an Indian Context2025-08-06T19:04:19ZUnderstanding student perceptions of assessment is vital for designing inclusive and effective learning environments, especially in technical education. This study explores engineering students' perceptions of assessment practices in an introductory computer science/ programming course, and its associated laboratory within an Indian engineering institute context. A total of 318 first-year Bachelor of Technology students participated in a weekly 25-statement Likert-scale survey conducted over nine weeks. Using descriptive statistics and non-parametric tests (Mann-Whitney U and Kruskal-Wallis), the analysis reveals that students largely perceive lab assignments as effective learning activities and view exams and projects as authentic and skill-enhancing. Students appreciated the role of instructors in shaping course content and found teaching assistants to be approachable and helpful, despite some inconsistencies. The study also finds significant variations in students' academic performance and assessment perceptions based on prior programming experience, technology familiarity, gender, and academic branch. Notably, the performance data did not follow a Gaussian distribution, challenging common assumptions in grade modeling. A comparative analysis with European cohorts highlights both universal patterns and contextual differences, offering valuable insights for designing inclusive and equitable assessment strategies in programming education.2025-08-06T19:04:19ZUtsav Kumar NaretiDivyansh GuptaChandranath AdakSoumi ChattopadhyayEmma RieseTanujit ChakrabortyMayank AgarwalSatendra Kumar