https://arxiv.org/api/ZAitLelodi3XPJiMkj1jYm3E6BA2026-06-13T19:09:25Z988512015http://arxiv.org/abs/2605.23109v1Inductive Deductive Synthesis: Enabling AI to Generate Formally Verified Systems2026-05-22T00:05:36ZAI agents increasingly excel at generating, testing, and refining code. However, they fall short on tasks requiring formal guarantees of full coverage that testing alone cannot provide. Distributed systems are a prime example: properties such as consistency between reads and writes must hold under every possible interleaving of events. Mechanized formal verification can guarantee such correctness, but typically demands months to years of expert effort. As evidence, even SOTA coding agents (Codex with GPT-5.4 and Claude Code with Opus 4.6) succeed on only 2/7 distributed key-value-store specifications. In this paper, we present the first effective approach to addressing this gap, Inductive Deductive Synthesis (IDS), which jointly and incrementally synthesizes implementation and proof, and learns from failed attempts to systematically try promising strategies. Built as an agentic LLM system, IDS achieves 7/7 in about 6.8 hours and $106 per spec on average, roughly 200x faster than expert effort and 17% cheaper than SOTA agents. IDS further incorporates performance feedback into the same loop, yielding implementations up to 3x faster than published verified systems.2026-05-22T00:05:36ZShubham AgarwalAlexander KrentselShu LiuMert CemriAudrey ChengRui MengTomas PfisterChun-Liang LiSylvia RatnasamyAditya ParameswaranMatei ZahariaIon StoicaMohsen Lesanihttp://arxiv.org/abs/2605.23088v1YASPS: A Symbolic Framework for Extensible, High-Performance IPC Simulation2026-05-21T22:39:21ZIncremental Potential Contact (IPC) enables robust, contact-rich simulation by casting elasticity and contact as a single energy minimization problem, but high-performance IPC pipelines are typically built from specialized kernels and assembly logic tied to fixed energies, primitive types, and parameterizations, making extensions costly and combinatorial. We present YASPS, a GPU-oriented framework that removes this extensibility bottleneck by making structure explicit in a differentiable intermediate representation. YASPS introduces two first-class relational operators: JOIN, which composes dependent quantities across user-declared relations (e.g., element-to-vertex connectivity), and UNION, which represents alternative parameterizations within a relation (e.g., mixing free vertices with affine-body or other parameterizations without fragmenting the program). Because JOIN and UNION are part of the symbolic program, YASPS differentiates through them using dedicated rules and an efficient second-order procedure that reuses intermediate Jacobians and reduces Hessian-projection cost. From the same relational description, YASPS derives the global gradient/Hessian sparsity and block layout, enabling structure-aware block-sparse storage and compression, and JIT-compiles CUDA kernels for evaluation, derivatives, assembly, and solving. Across IPC-style examples, including layered cloth-on-bunny, mixed rigid/deformable bunnies, and a caged deformation model, YASPS supports rapid front-end extensions with minimal back-end changes while achieving competitive end-to-end performance; its Hessian compression yields near 10x faster CG iterations in our benchmarks.2026-05-21T22:39:21ZAccepted to Siggraph 2026Xuan TangKemeng HuangGilbert BernsteinMinchen LITzumao Li10.1145/3811327http://arxiv.org/abs/2605.20919v2Sutra: Tensor-Op RNNs as a Compilation Target for Vector Symbolic Architectures2026-05-21T21:10:07ZSutra is a typed, purely functional programming language whose compiled forward pass is a PyTorch neural network. The compiler beta-reduces the whole program -- primitives, control flow, string I/O -- to one fused tensor-op graph over a frozen embedding substrate. Rotation binding, unbind, bundle, polynomial Kleene three-valued logic, and tail-recursive loops all lower to tensor operations; the Kleene connectives are Lagrange-interpolated polynomials exact on the {-1, 0, +1} truth grid. Validation is one fact tested two ways. (1) The same program runs on four frozen embeddings spanning two modalities -- three text encoders (nomic-embed-text, all-minilm, mxbai-embed-large) and one protein language model (ESM-2) -- and decodes bundles at 100% accuracy through width k=8 on every substrate, where the textbook Hadamard product has already collapsed (2.5% on mxbai-embed-large, 7.5% on all-minilm). (2) PyTorch autograd flows through the actually compiled graph: a fuzzy-rule classifier written in .su trains from random init (18.7 +/- 9.5%; chance = 20%, five classes) to 100.0 +/- 0.0% (three seeds) by backpropagating through the emitted graph, the symbolic source unmodified. A weighted variant additionally trains a scalar cosine gain and writes it back into the .su source as a numeric literal; recompiling reproduces the trained behaviour to ~2e-7 per logit, so the trained model is itself legible, recompilable code. The same artifact is therefore both a logic program and a trainable neural network.2026-05-20T09:04:36ZModified NeurIPS submission, see AI declaration and replication materials at end of paperEmma Leonharthttp://arxiv.org/abs/2605.23031v1Step in Tine: Forking Processes in Functional Choreographies2026-05-21T20:56:27ZTraditional concurrent-programming techniques require programmers to painstakingly write programs for each participant in a concurrent system. Choreographic programming, in contrast, allows a programmer to write one centralized program and compile it to individual programs. This approach simplifies critical properties like deadlock freedom, but it complicates forking new processes, a core primitive in concurrent programming. This work addresses that gap with the choreographic fork calculus $λ{\pitchfork}$, the first functional choreographic language with process forking. $λ{\pitchfork}$ provides a deadlock-freedom guarantee while allowing programs to dynamically determine when to spawn new processes, what they will do, and who will communicate with them. In doing so, it supports practical applications like load balancers and parallel divide-and-conquer.2026-05-21T20:56:27Z81 pages, 10 figuresAshley SamuelsonAndrew K. HirschEthan Cecchettihttp://arxiv.org/abs/2605.23022v1Complete first-order reasoning for functional programs2026-05-21T20:47:07ZSeveral practical tools for automatically verifying functional programs (e.g., Liquid Haskell and Leon for Scala programs) rely on a heuristic based on unrolling recursive function definitions followed by quantifier-free reasoning using SMT solvers. We uncover foundational theoretical properties of this heuristic, revealing that it can be generalized and formalized as a technique that is in fact complete for reasoning with combined First-Order theories of algebraic datatypes and background theories, where background theories support decidable quantifier-free reasoning. The theory developed in this paper explains the efficacy of these heuristics when they succeed, explains why they fail when they fail, and the precise role that user help plays in making proofs succeed.2026-05-21T20:47:07ZAdithya MuraliLucas PeñaRanjit JhalaP. Madhusudanhttp://arxiv.org/abs/2201.10456v9A Complete Theory of Sequential Digital Circuits: Denotational, Operational and Algebraic Semantics2026-05-21T20:25:25ZDigital circuits, despite having been studied for nearly a century and used at scale for about half that time, have until recently evaded a fully compositional theoretical in which arbitrary circuits may be freely composed together without consulting their internals. Recent work remedied this theoretical shortcoming by showing how digital circuits can be presented compositionally as morphisms in a freely generated symmetric traced category. However, this was done informally; in this paper we refine and expand the previous work in several ways, culminating in the presentation of three sound and complete semantics for digital circuits: denotational, operational and algebraic. For the denotational semantics, we establish a correspondence between stream functions with certain properties and circuits constructed syntactically. For the operational semantics, we present the reductions required to model how a circuit processes a value, including the addition of a new reduction for eliminating non-delay-guarded feedback; this leads to an adequate notion of observational equivalence for digital circuits. Finally, we define a new family of equations for translating circuits into bisimilar circuits of a 'normal form', leading to a complete algebraic semantics for sequential circuits.2022-01-25T16:58:10ZExtended version, 67 pagesDan R. GhicaGeorge KayeDavid Sprungerhttp://arxiv.org/abs/2605.22433v1QuCtrl-BELL: A Compiler-Driven Sub-Microsecond Feedback Control Stack for Scalable Trapped-Ion Quantum Experiments2026-05-21T12:59:24ZAs trapped-ion quantum computing scales to larger qubit registers and more complex control protocols, classical control systems face a fundamental tradeoff: sub-microsecond board-level feedback requires tight hardware coupling, whereas maintainability and extensibility require clean, modular software abstractions. This paper presents QuCtrl-BELL (Bell), a compiler-driven software stack for trapped-ion quantum control. The design resolves this tradeoff by decoupling control flow -- including loops, branches, and synchronization -- from hardware state data. A Python-embedded domain-specific language (DSL) is lowered through a six-stage transpilation pipeline covering control flow graph (CFG) construction, static single-assignment (SSA) conversion, liveness analysis, and graph-coloring register allocation. The compiler generates deterministic distributed board-level programs and compact step-table data. A cross-board synchronization protocol supports feedback loops with latency below 700~ns without host intervention. Bell is deployed and evaluated on the QuCtrl-BELL platform (RISC-V + PXIe), demonstrating that a compiler-based infrastructure can provide programmability, deterministic timing, and modularity for scalable trapped-ion quantum control.2026-05-21T12:59:24Z7 pages, 6 figuresJunpeng SheRuoyu YanZhizhen QinZhanyu LiZhongtao ShenZichao ZhouBinxiang QiLuming Duanhttp://arxiv.org/abs/2510.18479v3Formally Verified Linear-Time Invertible Lexing2026-05-21T12:14:38ZWe present ZipLex, a verified framework for invertible linear-time lexical analysis following the longest match (maximal munch) semantics. Unlike past verified lexers that focus only on satisfying the semantics of regular expressions and the longest match property, ZipLex also guarantees that lexing and printing are mutual inverses. Thanks to verified memoization, it also ensures that the lexical analysis of a string is linear in the size of the string. Our design and implementation rely on two sets of ideas: (1) a new abstraction of token sequences that captures the separability of tokens in a sequence while supporting their efficient manipulation, and (2) a combination of verified data structures and optimizations, including Huet's zippers and memoization with a standalone verified imperative hash table. Our hash table offers competitive performance as shown by our evaluation. We implemented and verified ZipLex using the Stainless deductive verifier for Scala. Our evaluation demonstrates that ZipLex supports realistic applications such as JSON processing and lexers of programming languages, and behaves linearly even in cases that make flex-style approaches quadratic. ZipLex is two orders of magnitude faster than Verbatim++, showing that verified invertibility and linear-time algorithms can be developed without prohibitive cost. Compared to Coqlex, ZipLex also offers linear (instead of quadratic) time lexing, and is the first lexer that comes with invertibility proofs for printing token sequences.2025-10-21T09:58:08ZSamuel ChassotViktor Kunčakhttp://arxiv.org/abs/2512.21132v2AutoBaxBuilder: Bootstrapping Code Security Benchmarking2026-05-21T11:45:45ZAs large language models (LLMs) see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work showed that LLMs are prone to generating code with security vulnerabilities, highlighting that security is often overlooked. These insights were enabled by specialized benchmarks crafted by security experts through significant manual effort. However, benchmarks (i) inevitably end up contaminating training data, (ii) must extend to new tasks to provide a more complete picture, and (iii) must increase in difficulty to challenge more capable LLMs. In this work, we address these challenges and present AutoBaxBuilder, an automated pipeline that generates code security benchmarking tasks from scratch. It leverages the code-understanding capabilities of LLMs combined with robust reliability checks to construct functional tests and end-to-end security-probing exploits. The quality of the pipeline is quantitatively confirmed by aligning its predictions with an expert-written baseline and qualitatively validated through manual soundness verification. We use AutoBaxBuilder to construct a new benchmark and release it to the public as AutoBaxBench, together with a thorough evaluation on contemporary LLMs. AutoBaxBuilder generates new tasks in under 2 hours, for less than USD 4. Including a manual verification, this reduces the required human effort for benchmark construction by a factor of 12.2025-12-24T12:02:00ZICML 2026Tobias von ArxNiels MündlerMark VeroMaximilian BaaderMartin Vechevhttp://arxiv.org/abs/2605.24036v1Intent-Driven Computing: A Computational Model for Governed Autonomous Systems2026-05-21T09:54:52ZProgramming languages assume programs directly execute effects. When autonomous systems generate behavior dynamically, this assumption becomes problematic: there is no structural mediation point between deciding to act and acting. We define intent-driven computing: a programming model where programs produce intents (finite data values describing proposed actions) rather than directly executing effects. A governed runtime examines each intent against a decidable policy language, records every decision in a tamper-evident ledger, and only then realizes the effect. The language provides no alternative path to effects. The model does not decide arbitrary behavioral properties of programs (which Rice's theorem shows is impossible). Instead, it constrains the language so that all effectful interaction is reified as finite intent values, shifting governance from the undecidable domain of program semantics to the decidable domain of intent data. This yields emergent properties: event sourcing by construction, governance simulation via intent replay, structural audit completeness, and improved human comprehensibility. We specify the model formally, implement it in a concrete language compiling to the BEAM virtual machine, and verify key properties in Rocq (454 theorems, 36 modules, zero admitted lemmas). Property-based testing (70,000+ random inputs, zero disagreements) validates that the implementation matches the specification.2026-05-21T09:54:52Z18 pages, 6 theorems, 2 tables, 3 figures. Companion proofs: https://github.com/mashin-live/governance-proofs. Project: https://mashin.liveAlan L. McCannhttp://arxiv.org/abs/2602.01935v2LiteCoOp: Lightweight Multi-LLM Shared-Tree Reasoning for Model-Serving Compiler Optimizations2026-05-21T06:25:00ZLLM-guided compiler optimization has recently shown promise, but existing approaches rely on a single large LLM throughout search, making them expensive and excluding smaller models. We pose the research question: whether heterogeneous LLMs can collaborate during compiler optimization while reducing compilation cost below optimization guided by a single large LLM. Crucially, this must be achieved without introducing overhead from agentic frameworks, which would run counter to the goal of lower compilation cost. To achieve these competing objectives, we introduce LiteCoOp, a lightweight framework that turns the optimization search tree itself into the mechanism for multi-LLM collaboration, enabling heterogeneous models to share progress without external agentic coordination. At each optimization step, LiteCoOp queries one LLM to propose both a compiler transformation and select the LLM to query at the next step. These LLM proposals are recorded in a shared MCTS tree, so all models are invoked serially and yet are informed by each other's decisions. The shared MCTS backpropagates the rewards, allowing progress made by one model to influence later decisions by others. This makes the MCTS tree the collaborative reasoning mechanism itself, avoiding inter-model communication, heavy reasoning traces, or agentic infrastructure. We instantiate this idea with an LLM-aware UCT that biases model selection toward smaller LLMs to reduce cost while still preserving the compiler performance objective. Across diverse GPU and (CPU) benchmarks, LiteCoOp consistently outperforms single-model baselines, with the best results obtained when scaling collaboration to eight heterogeneous LLMs. This eight-model config reduces total compilation time by 1.95x (1.74x), reduces API cost by 4.47x (4.32x), and invokes the largest model for only 23.1% (23.9%) of total calls while demonstrating collaboration scalability.2026-02-02T10:37:05ZAnnabelle Sujun TangChristopher PriebeLianhui QinHadi Esmaeilzadehhttp://arxiv.org/abs/2307.05938v4Decalf: A Directed, Effectful Cost-Aware Logical Framework2026-05-20T23:34:03ZWe present Decalf, a directed, effectful cost-aware logical framework for studying quantitative aspects of functional programs with effects. Like Calf, the language is based on an internal phase distinction between the behavior of a program and its cost measured by an effect. Decalf extends Calf by accommodating other effects, such as probabilistic choice, which requires a reformulation of Calf's approach to cost analysis: rather than rely on a separable notion of cost, here a cost bound is simply another program. Formally, every type is equipped with an intrinsic preorder, allowing effectful programs to be compared for cost inequality. This approach serves as a streamlined alternative to the standard method of isolating a cost recurrence and readily extends to higher-order, effectful programs.
The development proceeds by first introducing the Decalf type system, which is based on an intrinsic cost ordering among terms that restricts in the behavioral phase to extensional equality. This formulation is then applied to illustrative examples, including pure and effectful sorting algorithms. Finally, Decalf is semantically justified via a model in the topos of augmented simplicial sets.2023-07-12T06:05:53ZHarrison GrodinCarnegie Mellon UniversityYue NiuNational Institute of InformaticsJonathan SterlingUniversity of CambridgeRobert HarperCarnegie Mellon University10.1145/3632852http://arxiv.org/abs/2604.26727v2Comparing Smart Contract Paradigms: A Preliminary Study of Security and Developer Experience2026-05-20T20:50:39ZSmart contract vulnerabilities have caused billions in financial losses, raising questions about whether programming language paradigms can reduce security overhead. While imperative languages like Solidity require developers to manually implement security checks, resource-oriented languages like Move encode safety guarantees in type systems. We present a preliminary mixed-methods study analyzing 12 functionally-equivalent contract pairs implemented in both Solidity and Move by the same development team, complemented by a survey of 11 developers experienced in both languages. Quantitative analysis reveals that Move reduces explicit security overhead by 60\% (security check density: 6.7% vs. 16.8%, p=0.002, Cohen's d=-1.75) at the cost of 47% larger code size (p=0.002, d=1.90), while maintaining identical cyclomatic complexity. Developer surveys show moderate learning difficulty but higher safety confidence in Move (Median=6/7, 10 of 11 above neutral), with 55% preferring Move for security-critical applications despite ecosystem maturity gaps. These preliminary findings suggest resource-oriented paradigms shift security from runtime validation to compile-time guarantees, though adoption requires investment in learning and tooling. The controlled comparison provides initial evidence for paradigm effects on smart contract development, informing language selection decisions and identifying opportunities for improved developer resources.2026-04-29T14:28:47ZThe 30th International Conference on Evaluation and Assessment in Software Engineering (EASE 2026), 9-12 June, 2026, Glasgow, Scotland, United KingdomMatteo VaccargiuAndrea PinnaMaria Ilaria LunesuGiuseppe Destefanis10.1145/3816483.3816514http://arxiv.org/abs/2605.21405v1Stdlib or Third-Party? Empirical Performance and Correctness of LLM-Assisted Zero-Dependency Python Libraries2026-05-20T17:02:54ZThird-party Python libraries introduce dependency management overhead, supply chain risk, and deployment friction in constrained environments. A natural question is how much of this ecosystem can be replicated using only Python's standard library -- and at what correctness and performance cost. We address this empirically through zerodep, a growing collection of single-file Python modules, each a stdlib-only reimplementation of a popular third-party library, developed with LLM assistance under strict constraints: no external imports, single file, drop-in API compatibility, and mandatory correctness validation against the reference library. Spanning over 40 modules across 12 categories -- including serialization, networking, cryptography, agent protocols, and text processing -- zerodep provides a controlled testbed for two interrelated questions: (1) Where does the stdlib suffice? and (2) Can LLMs effectively generate correct, performant code under tight symbolic constraints? Systematic benchmarking shows that stdlib-only implementations achieve performance parity (within 2x of the reference) in the majority of cases. The primary performance cliff is C-extension-backed computation (image processing, binary serialization, low-level crypto), not the inherent overhead of pure-Python third-party libraries. Conversely, many widely-used libraries carry architectural overhead that LLM-generated stdlib reimplementations avoid, yielding 5--115x speedups in several categories. We characterize the stdlib capability boundary across complexity tiers and library categories, discuss where LLM-assisted development succeeds and where it requires iterative human correction, and examine implications for dependency-free software engineering at scale. zerodep is open-source at https://github.com/Oaklight/zerodep.2026-05-20T17:02:54Z12 pagesPeng DingRick Stevenshttp://arxiv.org/abs/2605.21337v1Multicategorical Semantics for Untyped Effects2026-05-20T16:03:31ZCompleteness proofs in categorical semantics usually proceed by building a syntactic category whose composition is given by substitution. For untyped effectful Call-by-Value languages, this runs into a basic obstacle: there is no canonical notion of simultaneous substitution of computations, since evaluation order is semantically meaningful. We address this by taking single computation substitutions, that is, binding steps, as primitive, and representing computation substitution by finite sequential lists composed by concatenation. We formalize this idea in a one-object Freyd-multicategorical setting. We introduce Freyd operads, separating a cartesian operad of values from a symmetric Ren-cartesian preoperad of computations, connected by a Freyd functor, and from any Freyd operad we construct a corresponding Freyd PROP of substitutions. We prove that this construction is representable and, in the strict one-object setting, left adjoint to restriction to codomain 1. Using the induced term model, we interpret untyped computational lambda-calculus with procedures and higher-order functions in weakly closed Freyd operads, and prove soundness, initiality, and completeness. This yields a categorical semantics tailored to untyped effectful computation and broad enough to encompass realizability-oriented models such as monadic combinatory algebras.2026-05-20T16:03:31ZTo be published in MFPS 2026Ariel GrunfeldLiron Cohen