https://arxiv.org/api/VkPlxyBqtEoVtxSBEzc8S2nrhjk 2026-06-28T08:38:22Z 9951 1515 15 http://arxiv.org/abs/2505.13453v2 Pel, A Programming Language for Orchestrating AI Agents 2025-06-09T03:05:20Z

The proliferation of Large Language Models (LLMs) has opened new frontiers in computing, yet controlling and orchestrating their capabilities beyond simple text generation remains a challenge. Current methods, such as function/tool calling and direct code generation, suffer from limitations in expressiveness, scalability, cost, security, and the ability to enforce fine-grained control. This paper introduces Pel, a novel programming language specifically designed to bridge this gap. Inspired by the strengths of Lisp, Elixir, Gleam, and Haskell, Pel provides a syntactically simple, homoiconic, and semantically rich platform for LLMs to express complex actions, control flow, and inter-agent communication safely and efficiently. Pel's design emphasizes a minimal, easily modifiable grammar suitable for constrained LLM generation, eliminating the need for complex sandboxing by enabling capability control at the syntax level. Key features include a powerful piping mechanism for linear composition, first-class closures enabling easy partial application and functional patterns, built-in support for natural language conditions evaluated by LLMs, and an advanced Read-Eval-Print-Loop (REPeL) with Common Lisp-style restarts and LLM-powered helper agents for automated error correction. Furthermore, Pel incorporates automatic parallelization of independent operations via static dependency analysis, crucial for performant agentic systems. We argue that Pel offers a more robust, secure, and expressive paradigm for LLM orchestration, paving the way for more sophisticated and reliable AI agentic frameworks.

2025-04-03T18:46:53Z 1. Updated author email address (I graduated so I added my alumni email). 2. Changed mono-font color to blue for better readability Behnam Mohammadi http://arxiv.org/abs/2506.10021v1 From Tool Calling to Symbolic Thinking: LLMs in a Persistent Lisp Metaprogramming Loop 2025-06-08T20:12:06Z

We propose a novel architecture for integrating large language models (LLMs) with a persistent, interactive Lisp environment. This setup enables LLMs to define, invoke, and evolve their own tools through programmatic interaction with a live REPL. By embedding Lisp expressions within generation and intercepting them via a middleware layer, the system allows for stateful external memory, reflective programming, and dynamic tool creation. We present a design framework and architectural principles to guide future implementations of interactive AI systems that integrate symbolic programming with neural language generation.

2025-06-08T20:12:06Z Jordi de la Torre http://arxiv.org/abs/2501.16207v4 From Informal to Formal -- Incorporating and Evaluating LLMs on Natural Language Requirements to Verifiable Formal Proofs 2025-06-08T10:02:25Z

The research in AI-based formal mathematical reasoning has shown an unstoppable growth trend. These studies have excelled in mathematical competitions like IMO and have made significant progress. This paper focuses on formal verification, an immediate application scenario of formal reasoning, and breaks it down into sub-tasks. We constructed 18k high-quality instruction-response pairs across five formal specification languages (Coq, Lean4, Dafny, ACSL, and TLA+) by distilling gpt-4o and evaluated against ten open-sourced LLMs, including recent popular DeepSeek-R1. We also fine-tuned several 7~8B small models to achieve comparable performance with Deepseek-R1-671B. Interestingly, we observed that fine-tuning with formal data also enhances mathematics, reasoning, and coding capabilities. Fine-tuned models are released at https: //huggingface.co/fm-universe.

2025-01-27T17:00:56Z 20 pages Jialun Cao Yaojie Lu Meiziniu Li Haoyang Ma Haokun Li Mengda He Cheng Wen Le Sun Hongyu Zhang Shengchao Qin Shing-Chi Cheung Cong Tian http://arxiv.org/abs/2501.15104v3 Two-sorted algebraic decompositions of Brookes's shared-state denotational semantics 2025-06-08T06:38:08Z

We use a two sorted equational theory of algebraic effects to model concurrent shared state with preemptive interleaving, recovering Brookes's seminal 1996 trace-based model precisely. The decomposition allows us to analyse Brookes's model algebraically in terms of separate but interacting components. The multiple sorts partition terms into layers. We use two sorts: a "hold" sort for layers that disallow interleaving of environment memory accesses, analogous to holding a global lock on the memory; and a "cede" sort for the opposite. The algebraic signature comprises of independent interlocking components: two new operators that switch between these sorts, delimiting the atomic layers, thought of as acquiring and releasing the global lock; non-deterministic choice; and state-accessing operators. The axioms similarly divide cleanly: the delimiters behave as a closure pair; all operators are strict, and distribute over non-empty non-deterministic choice; and non-deterministic global state obeys Plotkin and Power's presentation of global state. Our representation theorem expresses the free algebras over a two-sorted family of variables as sets of traces with suitable closure conditions. When the held sort has no variables, we recover Brookes's trace semantics.

2025-01-25T07:04:45Z FoSSaCS 2025. Lecture Notes in Computer Science, vol 15691. Springer, Cham Yotam Dvir Ohad Kammar Ori Lahav Gordon Plotkin 10.1007/978-3-031-90897-2_18 http://arxiv.org/abs/2503.15812v6 Object-Spatial Programming 2025-06-07T19:08:16Z

The evolution of programming languages from low-level assembly to high-level abstractions demonstrates a fundamental principle: by constraining how programmers express computation and enriching semantic information at the language level, we can make previously undecidable program properties tractable for optimization. Building on the insight of this undecidability-lessening effect, we introduce Object-Spatial Programming (OSP), a novel programming model that extends Object-Oriented Programming by introducing topologically-aware class constructs called archetypes. OSP fundamentally inverts the traditional relationship between data and computation, enabling computation to move to data through four specialized archetypes: object classes, node classes (discrete data locations), edge classes (first-class relationships), and walker classes (mobile computational entities). By making topological relationships and traversal patterns explicit at the language level, OSP transforms previously opaque program behaviors into observable, optimizable patterns. This semantic enhancement enables runtime systems to make informed decisions about data locality, parallel execution, and distribution strategies based on explicit topology, while providing programmers with intuitive abstractions for modeling complex systems where connection topology is central to the computational model. The paradigm addresses fundamental limitations in traditional programming models when representing agent-based systems, social networks, neural networks, distributed systems, finite state machines, and other spatially-oriented computational problems, demonstrating how thoughtful abstraction design can simultaneously enhance programmer expressiveness and enable sophisticated system-level optimizations across the computing stack.

2025-03-20T02:55:40Z 31 pages, 45 pages with appendix Jason Mars http://arxiv.org/abs/2503.02768v2 Denotational Semantics for Probabilistic and Concurrent Programs 2025-06-07T11:43:51Z

We develop a denotational model for probabilistic and concurrent imperative programs, a class of programs with standard control flow via conditionals and while-loops, as well as probabilistic actions and parallel composition. Whereas semantics for concurrent or randomized programs in isolation is well studied, their combination has not been thoroughly explored and presents unique challenges. The crux of the problem is that interactions between control flow, probabilistic actions, and concurrent execution cannot be captured by straightforward generalizations of prior work on pomsets and convex languages, prominent models for those effects, individually. Our model has good domain theoretic properties, important for semantics of unbounded loops. We also prove two adequacy theorems, showing that the model subsumes typical powerdomain semantics for concurrency and convex powerdomain semantics for probabilistic nondeterminism.

2025-03-04T16:33:59Z 36th International Conference on Concurrency Theory (CONCUR 2025) Noam Zilberstein Daniele Gorla Alexandra Silva 10.4230/LIPIcs.CONCUR.2025.4 http://arxiv.org/abs/2501.14380v3 Verifying Fault-Tolerance of Quantum Error Correction Codes 2025-06-07T07:31:08Z

Quantum computers have advanced rapidly in qubit count and gate fidelity. However, large-scale fault-tolerant quantum computing still relies on quantum error correction code (QECC) to suppress noise. Manually or experimentally verifying the fault-tolerance property of complex QECC implementation is impractical due to the vast error combinations. This paper formalizes the fault-tolerance of QECC implementations within the language of quantum programs. By incorporating the techniques of quantum symbolic execution, we provide an automatic verification tool for quantum fault-tolerance. We evaluate and demonstrate the effectiveness of our tool on a universal set of logical operations across different QECCs.

2025-01-24T10:28:24Z 54 pages, 8 figures, extended version of the paper accepted by CAV 2025 International Conference on Computer Aided Verification (CAV 2025), pp. 3-27 Kean Chen Yuhao Liu Wang Fang Jennifer Paykin Xin-Chuan Wu Albert Schmitz Steve Zdancewic Gushu Li 10.1007/978-3-031-98685-7_1 http://arxiv.org/abs/2506.02290v2 HEC: Equivalence Verification Checking for Code Transformation via Equality Saturation 2025-06-06T14:53:51Z

In modern computing systems, compilation employs numerous optimization techniques to enhance code performance. Source-to-source code transformations, which include control flow and datapath transformations, have been widely used in High-Level Synthesis (HLS) and compiler optimization. While researchers actively investigate methods to improve performance with source-to-source code transformations, they often overlook the significance of verifying their correctness. Current tools cannot provide a holistic verification of these transformations. This paper introduces HEC, a framework for equivalence checking that leverages the e-graph data structure to comprehensively verify functional equivalence between programs. HEC utilizes the MLIR as its frontend and integrates MLIR into the e-graph framework. Through the combination of dynamic and static e-graph rewriting, HEC facilitates the validation of comprehensive code transformations. We demonstrate effectiveness of HEC on PolyBenchC benchmarks, successfully verifying loop unrolling, tiling, and fusion transformations. HEC processes over 100,000 lines of MLIR code in 40 minutes with predictable runtime scaling. Importantly, HEC identified two critical compilation errors in mlir-opt: loop boundary check errors causing unintended executions during unrolling, and memory read-after-write violations in loop fusion that alter program semantics. These findings demonstrate HEC practical value in detecting real-world compiler bugs and highlight the importance of formal verification in optimization pipelines.

2025-06-02T21:59:17Z Accepted by USENIX ATC 2025 Jiaqi Yin Zhan Song Nicolas Bohm Agostini Antonino Tumeo Cunxi Yu http://arxiv.org/abs/2506.06078v1 A Sound and Complete Characterization of Fair Asynchronous Session Subtyping 2025-06-06T13:34:23Z

Session types are abstractions of communication protocols enabling the static analysis of message-passing processes. Refinement notions for session types are key to support safe forms of process substitution while preserving their compatibility with the rest of the system. Recently, a fair refinement relation for asynchronous session types has been defined allowing the anticipation of message outputs with respect to an unbounded number of message inputs. This refinement is useful to capture common patterns in communication protocols that take advantage of asynchrony. However, while the semantic (à la testing) definition of such refinement is straightforward, its characterization has proved to be quite challenging. In fact, only a sound but not complete characterization is known so far. In this paper we close this open problem by presenting a sound and complete characterization of asynchronous fair refinement for session types. We relate this characterization to those given in the literature for synchronous session types by leveraging a novel labelled transition system of session types that embeds their asynchronous semantics.

2025-06-06T13:34:23Z Mario Bravetti Luca Padovani Gianluigi Zavattaro http://arxiv.org/abs/2506.05839v1 An Execution Model for RICE 2025-06-06T07:58:50Z

In this paper, we build on the previous work of the RICE compiler by giving its execution model. We show the restrictions to the FlatCurry language that were made to produce executable code, and present the execution model using operational semantics similar to Launchbury. Finally, we show that the execution model conforms with the standard operational semantics for Curry.

2025-06-06T07:58:50Z In Proceedings LSFA 2024, arXiv:2506.05219 EPTCS 421, 2025, pp. 112-129 Steven Libby 10.4204/EPTCS.421.7 http://arxiv.org/abs/2405.05751v3 Mirage: A Multi-Level Superoptimizer for Tensor Programs 2025-06-06T03:35:15Z

We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $μ$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $μ$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $μ$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by up to 3.3$\times$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.

2024-05-09T13:15:40Z OSDI'25 Mengdi Wu Xinhao Cheng Shengyu Liu Chunan Shi Jianan Ji Kit Ao Praveen Velliengiri Xupeng Miao Oded Padon Zhihao Jia http://arxiv.org/abs/2412.11014v2 CoopetitiveV: Leveraging LLM-powered Coopetitive Multi-Agent Prompting for High-quality Verilog Generation 2025-06-06T01:25:37Z

Recent advances in agentic LLMs have demonstrated great capabilities in Verilog code generation. However, existing approaches either use LLM-assisted single-agent prompting or cooperation-only multi-agent learning, which will lead to: (i) Degeneration issue for single-agent learning: characterized by diminished error detection and correction capabilities; (ii) Error propagation in cooperation-only multi-agent learning: erroneous information from the former agent will be propagated to the latter through prompts, which can make the latter agents generate buggy code. In this paper, we propose an LLM-based coopetitive multi-agent prompting framework, in which the agents cannot collaborate with each other to form the generation pipeline, but also create a healthy competitive mechanism to improve the generating quality. Our experimental results show that the coopetitive multi-agent framework can effectively mitigate the degeneration risk and reduce the error propagation while improving code error correction capabilities, resulting in higher quality Verilog code generation. The effectiveness of our approach is validated through extensive experiments. On VerilogEval Machine and Human dataset, CoopetitiveV+GPT-4 achieves 99.2% and 99.1% pass@10 scores, respectively. While on RTLLM, CoopetitiveV+GPT-4 obtains 100% syntax and 99.9% functionality pass@5 scores.

2024-12-15T01:58:10Z Zhendong Mi Renming Zheng Haowen Zhong Yue Sun Seth Kneeland Sayan Moitra Ken Kutzer Zhaozhuo Xu Shaoyi Huang http://arxiv.org/abs/2506.04019v1 CETBench: A Novel Dataset constructed via Transformations over Programs for Benchmarking LLMs for Code-Equivalence Checking 2025-06-04T14:47:14Z

LLMs have been extensively used for the task of automated code generation. In this work, we examine the applicability of LLMs for the related but relatively unexplored task of code-equivalence checking, i.e., given two programs, whether they are functionally equivalent or not. This is an important problem since benchmarking code equivalence can play a critical role in evaluating LLM capabilities for tasks such as code re-writing and code translation. Towards this end, we present CETBench - Code Equivalence with Transformations Benchmark, constructed via a repository of programs, where two programs in the repository may be solving the same or different tasks. Each instance in our dataset is obtained by taking a pair of programs in the repository and applying a random series of pre-defined code transformations, resulting in (non-)equivalent pairs. Our analysis on this dataset reveals a surprising finding that very simple code transformations in the underlying pair of programs can result in a significant drop in performance of SOTA LLMs for the task of code-equivalence checking. To remedy this, we present a simple fine-tuning-based approach to boost LLM performance on the transformed pairs of programs. Our approach for dataset generation is generic, and can be used with repositories with varying program difficulty levels and allows for applying varying numbers as well as kinds of transformations. In our experiments, we perform ablations over the difficulty level of original programs, as well as the kind of transformations used in generating pairs for equivalence checking. Our analysis presents deep insights into the working of LLMs for the task of code-equivalence, and points to the fact that they may still be far from what could be termed as a semantic understanding of the underlying code.

2025-06-04T14:47:14Z Neeva Oza Ishaan Govil Parul Gupta Dinesh Khandelwal Dinesh Garg Parag Singla http://arxiv.org/abs/2501.04250v2 Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures 2025-06-04T11:59:33Z

Safe memory reclamation techniques that utilize per read reservations, such as hazard pointers, often cause significant overhead in traversals of linked concurrent data structures. This is primarily due to the need to announce a reservation, and fence to enforce appropriate ordering, before each read. In read-intensive workloads, this overhead is amplified because, even if relatively little memory reclamation actually occurs, the full overhead of reserving records is still incurred while traversing data structures. In this paper, we propose a novel memory reclamation technique by combining POSIX signals and delayed reclamation, introducing a publish-on-ping approach. This method eliminates the need to make reservations globally visible before use. Instead, threads privately track which records they are accessing, and share this information on demand with threads that intend to reclaim memory. The approach can serve as a drop-in replacement for hazard pointers and hazard eras. Furthermore, the capability to retain reservations during traversals in data structure operations and publish them on demand facilitates the construction of a variant of hazard pointers (EpochPOP). This variant uses epochs to approach the performance of epoch-based reclamation in the common case where threads are not frequently delayed (while retaining the robustness of hazard pointers). Our publish-on-ping implementations based on hazard pointers (HP) and hazard eras, when applied to various data structures, exhibit significant performance improvements. The improvements across various workloads and data structures range from 1.2X to 4X over the original HP, up to 20% compared to a heavily optimized HP implementation similar to the one in the Folly open-source library, and up to 3X faster than hazard eras. EpochPOP delivers performance similar to epoch-based reclamation while providing stronger guarantees.

2025-01-08T03:18:41Z Extended version of full paper accepted at PPoPP '25: The 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming Proceedings Ajay Singh Trevor Brown http://arxiv.org/abs/2506.03382v1 Towards a Characterization of Two-way Bijections in a Reversible Computational Model 2025-06-03T20:40:45Z

We introduce an imperative, stack-based, and reversible computational model that characterizes Two-way Bijections both implicitly, concerning their computational complexity, and with zero-garbage.

2025-06-03T20:40:45Z 8 pages, 3 figures, 5 listings. Author's copy of the version which will appear in the Proceedings of the 17th International Conference, RC 2025, Odense, Denmark, July 3-4, 2025 Matteo Palazzo Luca Roversi