https://arxiv.org/api/FhXTmOxRd7yfyEIDfHkI2MJZEmw 2026-06-30T07:34:53Z 9958 1815 15 http://arxiv.org/abs/2502.19388v1 Foundations for Deductive Verification of Continuous Probabilistic Programs: From Lebesgue to Riemann and Back 2025-02-26T18:33:32Z

We lay out novel foundations for the computer-aided verification of guaranteed bounds on expected outcomes of imperative probabilistic programs featuring (i) general loops, (ii) continuous distributions, and (iii) conditioning. To handle loops we rely on user-provided quantitative invariants, as is well established. However, in the realm of continuous distributions, invariant verification becomes extremely challenging due to the presence of integrals in expectation-based program semantics. Our key idea is to soundly under- or over-approximate these integrals via Riemann sums. We show that this approach enables the SMT-based invariant verification for programs with a fairly general control flow structure. On the theoretical side, we prove convergence of our Riemann approximations, and establish coRE-completeness of the central verification problems. On the practical side, we show that our approach enables to use existing automated verifiers targeting discrete probabilistic programs for the verification of programs involving continuous sampling. Towards this end, we implement our approach in the recent quantitative verification infrastructure Caesar by encoding Riemann sums in its intermediate verification language. We present several promising case studies.

2025-02-26T18:33:32Z Full version of OOPSLA 25 paper, 55 pages Kevin Batz Joost-Pieter Katoen Francesca Randone Tobias Winkler http://arxiv.org/abs/2502.19368v1 Qmod: Expressive High-Level Quantum Modeling 2025-02-26T18:04:01Z

Quantum computing hardware is advancing at a rapid pace, yet the lack of high-level programming abstractions remains a serious bottleneck in the development of new applications. Widely used frameworks still rely on gate-level circuit descriptions, causing the algorithm's functional intent to become lost in low-level implementation details, and hindering flexibility and reuse. While various high-level quantum programming languages have emerged in recent years - offering a significant step toward higher abstraction - many still lack support for classical-like expression syntax, and native constructs for useful quantum algorithmic idioms. This paper presents Qmod, a high-level quantum programming language designed to capture algorithmic intent in natural terms while delegating implementation decisions to automation. Qmod introduces quantum numeric variables and expressions, including digital fixed-point arithmetic tuned for compact representations and optimal resource usage. Beyond digital encoding, Qmod also supports non-digital expression modes - phase and amplitude encoding - frequently exploited by quantum algorithms to achieve computational advantages. We describe the language's constructs, demonstrate practical usage examples, and outline future work on evaluating Qmod across a broader set of use cases.

2025-02-26T18:04:01Z 9 pages Matan Vax Peleg Emanuel Eyal Cornfeld Israel Reichental Ori Opher Ori Roth Tal Michaeli Lior Preminger Lior Gazit Amir Naveh Yehuda Naveh http://arxiv.org/abs/2502.16519v2 Guarding the Privacy of Label-Only Access to Neural Network Classifiers via iDP Verification 2025-02-26T16:31:42Z

Neural networks are susceptible to privacy attacks that can extract private information of the training set. To cope, several training algorithms guarantee differential privacy (DP) by adding noise to their computation. However, DP requires to add noise considering every possible training set. This leads to a significant decrease in the network's accuracy. Individual DP (iDP) restricts DP to a given training set. We observe that some inputs deterministically satisfy iDP without any noise. By identifying them, we can provide iDP label-only access to the network with a minor decrease to its accuracy. However, identifying the inputs that satisfy iDP without any noise is highly challenging. Our key idea is to compute the iDP deterministic bound (iDP-DB), which overapproximates the set of inputs that do not satisfy iDP, and add noise only to their predicted labels. To compute the tightest iDP-DB, which enables to guard the label-only access with minimal accuracy decrease, we propose LUCID, which leverages several formal verification techniques. First, it encodes the problem as a mixed-integer linear program, defined over a network and over every network trained identically but without a unique data point. Second, it abstracts a set of networks using a hyper-network. Third, it eliminates the overapproximation error via a novel branch-and-bound technique. Fourth, it bounds the differences of matching neurons in the network and the hyper-network and employs linear relaxation if they are small. We show that LUCID can provide classifiers with a perfect individuals' privacy guarantee (0-iDP) -- which is infeasible for DP training algorithms -- with an accuracy decrease of 1.4%. For more relaxed $\varepsilon$-iDP guarantees, LUCID has an accuracy decrease of 1.2%. In contrast, existing DP training algorithms reduce the accuracy by 12.7%.

2025-02-23T09:50:26Z Anan Kabaha Dana Drachsler-Cohen http://arxiv.org/abs/2502.19143v1 Language-Parametric Reference Synthesis (Extended) 2025-02-26T13:58:22Z

Modern Integrated Development Environments (IDEs) offer automated refactorings to aid programmers in developing and maintaining software. However, implementing sound automated refactorings is challenging, as refactorings may inadvertently introduce name-binding errors or cause references to resolve to incorrect declarations. To address these issues, previous work by Schäfer et al. proposed replacing concrete references with locked references to separate binding preservation from transformation. Locked references vacuously resolve to a specific declaration, and after transformation must be replaced with concrete references that also resolve to that declaration. Synthesizing these references requires a faithful inverse of the name lookup functions of the underlying language. Manually implementing such inverse lookup functions is challenging due to the complex name-binding features in modern programming languages. Instead, we propose to automatically derive this function from type system specifications written in the Statix meta-DSL. To guide the synthesis of qualified references we use scope graphs, which represent the binding structure of a program, to infer their names and discover their syntactic structure. We evaluate our approach by synthesizing concrete references for locked references in 2528 Java, 196 ChocoPy, and 49 Featherweight Generic Java test programs. Our approach yields a principled language-parametric method for synthesizing references.

2025-02-26T13:58:22Z 31 pages, 13 figures, This is an extended version of the paper published at PACMPL OOPSLA'25: https://doi.org/10.1145/3720481 Daniel A. A. Pelsmaeker Aron Zwaan Casper Bach Arjan J. Mooij 10.1145/3720481 http://arxiv.org/abs/2502.18879v1 Adaptive Shielding via Parametric Safety Proofs 2025-02-26T06:50:48Z

A major challenge to deploying cyber-physical systems with learning-enabled controllers is to ensure their safety, especially in the face of changing environments that necessitate runtime knowledge acquisition. Model-checking and automated reasoning have been successfully used for shielding, i.e., to monitor untrusted controllers and override potentially unsafe decisions, but only at the cost of hard tradeoffs in terms of expressivity, safety, adaptivity, precision and runtime efficiency. We propose a programming-language framework that allows experts to statically specify adaptive shields for learning-enabled agents, which enforce a safe control envelope that gets more permissive as knowledge is gathered at runtime. A shield specification provides a safety model that is parametric in the current agent's knowledge. In addition, a nondeterministic inference strategy can be specified using a dedicated domain-specific language, enforcing that such knowledge parameters are inferred at runtime in a statistically-sound way. By leveraging language design and theorem proving, our proposed framework empowers experts to design adaptive shields with an unprecedented level of modeling flexibility, while providing rigorous, end-to-end probabilistic safety guarantees.

2025-02-26T06:50:48Z Proceedings of the ACM on Programming Languages, Volume 9, Issue OOPSLA1, 2025 Yao Feng Jun Zhu André Platzer Jonathan Laurent 10.1145/3720450 http://arxiv.org/abs/2502.18297v1 DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis 2025-02-25T15:34:00Z

This paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team.

2025-02-25T15:34:00Z 8 pages, 3 figures Zeju Li Changran Xu Zhengyuan Shi Zedong Peng Yi Liu Yunhao Zhou Lingfeng Zhou Chengyu Ma Jianyuan Zhong Xi Wang Jieru Zhao Zhufei Chu Xiaoyan Yang Qiang Xu http://arxiv.org/abs/2502.18266v1 Parsing TTree Formula in Python 2025-02-25T14:53:40Z

Uproot can read ROOT files directly in pure Python but cannot (yet) compute expressions in ROOT's TTreeFormula expression language. Despite its popularity, this language has only one implementation and no formal specification. In a package called "formulate," we defined the language's syntax in standard BNF and parse it with Lark, a fast and modern parsing toolkit in Python. With formulate, users can now convert ROOT TTreeFormula expressions into NumExpr and Awkward Array manipulations. In this contribution, we describe BNF notation and the Look Ahead Left to Right (LALR) parsing algorithm, which scales linearly with expression length. We also present the challenges with interpreting TTreeFormula expressions as a functional language; some function-like forms can't be expressed as true functions. We also describe the design of the abstract syntax tree that facilitates conversion between the three languages. The formulate package has zero package dependencies, so we are adding it as one of Uproot's dependencies so that Uproot will be able to use TTreeFormula expressions, whether they are hand-written or embedded in a ROOT file as TTree aliases.

2025-02-25T14:53:40Z 6 pages, 3 figures, CHEP 2024 proceedings Aryan Roy Jim Pivarski http://arxiv.org/abs/2501.11613v7 Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems 2025-02-24T17:40:38Z

This study introduces Conversation Routines (CR), a structured prompt engineering framework for developing task-oriented dialog systems using Large Language Models (LLMs). While LLMs demonstrate remarkable natural language understanding capabilities, engineering them to reliably execute complex business workflows remains challenging. The proposed CR framework enables the development of Conversation Agentic Systems (CAS) through natural language specifications, embedding task-oriented logic within LLM prompts. This approach provides a systematic methodology for designing and implementing complex conversational workflows while maintaining behavioral consistency. We demonstrate the framework's effectiveness through two proof-of-concept implementations: a Train Ticket Booking System and an Interactive Troubleshooting Copilot. These case studies validate CR's capability to encode sophisticated behavioral patterns and decision logic while preserving natural conversational flexibility. Results show that CR enables domain experts to design conversational workflows in natural language while leveraging custom functions (tools) developed by software engineers, creating an efficient division of responsibilities where developers focus on core API implementation and domain experts handle conversation design. While the framework shows promise in accessibility and adaptability, we identify key challenges including computational overhead, non-deterministic behavior, and domain-specific logic optimization. Future research directions include CR evaluation methods based on prompt engineering frameworks driven by goal-oriented grading criteria, improving scalability for complex multi-agent interactions, and enhancing system robustness to address the identified limitations across diverse business applications.

2025-01-20T17:19:02Z Minor typos revision Giorgio Robino http://arxiv.org/abs/2502.15536v1 NPB-Rust: NAS Parallel Benchmarks in Rust 2025-02-21T15:39:29Z

Parallel programming often requires developers to handle complex computational tasks that can yield many errors in its development cycle. Rust is a performant low-level language that promises memory safety guarantees with its compiler, making it an attractive option for HPC application developers. We identified that the Rust ecosystem could benefit from more comprehensive scientific benchmark suites for standardizing comparisons and research. The NAS Parallel Benchmarks (NPB) is a standardized suite for evaluating various hardware aspects and is often used to compare different frameworks for parallelism. Therefore, our contributions are a Rust version of NPB, an analysis of the expressiveness and performance of the language features, and parallelization strategies. We compare our implementation with consolidated sequential and parallel versions of NPB. Experimental results show that Rust's sequential version is 1.23\% slower than Fortran and 5.59\% faster than C++, while Rust with Rayon was slower than both Fortran and C++ with OpenMP.

2025-02-21T15:39:29Z Eduardo M. Martins Leonardo G. Faé Renato B. Hoffmann Lucas S. Bianchessi Dalvan Griebler http://arxiv.org/abs/2502.15441v1 On the Effectiveness of Large Language Models in Writing Alloy Formulas 2025-02-21T13:09:58Z

Declarative specifications have a vital role to play in developing safe and dependable software systems. Writing specifications correctly, however, remains particularly challenging. This paper presents a controlled experiment on using large language models (LLMs) to write declarative formulas in the well-known language Alloy. Our use of LLMs is three-fold. One, we employ LLMs to write complete Alloy formulas from given natural language descriptions (in English). Two, we employ LLMs to create alternative but equivalent formulas in Alloy with respect to given Alloy formulas. Three, we employ LLMs to complete sketches of Alloy formulas and populate the holes in the sketches by synthesizing Alloy expressions and operators so that the completed formulas accurately represent the desired properties (that are given in natural language). We conduct the experimental evaluation using 11 well-studied subject specifications and employ two popular LLMs, namely ChatGPT and DeepSeek. The experimental results show that the LLMs generally perform well in synthesizing complete Alloy formulas from input properties given in natural language or in Alloy, and are able to enumerate multiple unique solutions. Moreover, the LLMs are also successful at completing given sketches of Alloy formulas with respect to natural language descriptions of desired properties (without requiring test cases). We believe LLMs offer a very exciting advance in our ability to write specifications, and can help make specifications take a pivotal role in software development and enhance our ability to build robust software.

2025-02-21T13:09:58Z Yang Hong Shan Jiang Yulei Fu Sarfraz Khurshid http://arxiv.org/abs/2502.15217v1 FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs 2025-02-21T05:20:04Z

FormalSpecCpp is a dataset designed to fill the gap in standardized benchmarks for verifying formal specifications in C++ programs. To the best of our knowledge, this is the first comprehensive collection of C++ programs with well-defined preconditions and postconditions. It provides a structured benchmark for evaluating specification inference tools and testing theaccuracy of generated specifications. Researchers and developers can use this dataset to benchmark specification inference tools,fine-tune Large Language Models (LLMs) for automated specification generation, and analyze the role of formal specifications in improving program verification and automated testing. By making this dataset publicly available, we aim to advance research in program verification, specification inference, and AI-assisted software development. The dataset and the code are available at https://github.com/MadhuNimmo/FormalSpecCpp.

2025-02-21T05:20:04Z Accepted at the 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR) Madhurima Chakraborty Peter Pirkelbauer Qing Yi http://arxiv.org/abs/2502.14626v1 Partial Incorrectness Logic 2025-02-20T15:05:50Z

Reasoning about program correctness has been a central topic in static analysis for many years, with Hoare logic (HL) playing an important role. The key notions in HL are partial and total correctness. Both require that program executions starting in a specified set of initial states (the precondition) reach a designated set of final states (the postcondition). Partial correctness is more lenient in that it does not require termination, effectively deeming divergence acceptable. We explore partial incorrectness logic, which stands in relation to O'Hearn's "total" incorrectness logic as partial correctness does to total correctness: Partial correctness allows divergence, partial incorrectness allows unreachability. While the duality between divergence and unreachability may not be immediately apparent, we explore this relationship further. Our chosen formalism is predicate transformers à la Dijkstra. We focus here on deterministic and reversible programs, though the discussion extends to nondeterministic and irreversible computations, both of which introduce additional nondeterminism that must be addressed.

2025-02-20T15:05:50Z This abstract was accepted for presentation at TPSA (Theory and Practice of Static Analysis) workshop of POPL (Principles of Programming Languages) 2025 Lena Verscht Ānrán Wáng Benjamin Lucien Kaminski http://arxiv.org/abs/2502.14478v1 Inductive Synthesis of Inductive Heap Predicates -- Extended Version 2025-02-20T11:52:39Z

We present an approach to automatically synthesise recursive predicates in Separation Logic (SL) from concrete data structure instances using Inductive Logic Programming (ILP) techniques. The main challenges to make such synthesis effective are (1) making it work without negative examples that are required in ILP but are difficult to construct for heap-based structures in an automated fashion, and (2) to be capable of summarising not just the shape of a heap (e.g., it is a linked list), but also the properties of the data it stores (e.g., it is a sorted linked list). We tackle these challenges with a new predicate learning algorithm. The key contributions of our work are (a) the formulation of ILP-based learning only using positive examples and (b) an algorithm that synthesises property-rich SL predicates from concrete memory graphs based on the positive-only learning. We show that our framework can efficiently and correctly synthesise SL predicates for structures that were beyond the reach of the state-of-the-art tools, including those featuring non-trivial payload constraints (e.g., binary search trees) and nested recursion (e.g., n-ary trees). We further extend the usability of our approach by a memory graph generator that produces positive heap examples from programs. Finally, we show how our approach facilitates deductive verification and synthesis of correct-by-construction code.

2025-02-20T11:52:39Z Ziyi Yang Ilya Sergey http://arxiv.org/abs/2411.05722v2 Characterizing Implementability of Global Protocols with Infinite States and Data 2025-02-19T18:57:36Z

We study the implementability problem for an expressive class of symbolic communication protocols involving multiple participants. Our symbolic protocols describe infinite states and data values using dependent refinement predicates. Implementability asks whether a global protocol specification admits a distributed, asynchronous implementation, namely one for each participant, that is deadlock-free and exhibits the same behavior as the specification. We provide a unified explanation of seemingly disparate sources of non-implementability through a precise semantic characterization of implementability for infinite protocols. Our characterization reduces the problem of implementability to (co)reachability in the global protocol restricted to each participant. This compositional reduction yields the first sound and relatively complete algorithm for checking implementability of symbolic protocols. We use our characterization to show that for finite protocols, implementability is co-NP-complete for explicit representations and PSPACE-complete for symbolic representations. The finite, explicit fragment subsumes a previously studied fragment of multiparty session types for which our characterization yields a co-NP decision procedure, tightening a prior PSPACE upper bound.

2024-11-08T17:24:11Z Elaine Li Felix Stutz Thomas Wies Damien Zufferey http://arxiv.org/abs/2407.06375v3 Macaw: A Machine Code Toolbox for the Busy Binary Analyst 2025-02-19T01:35:57Z

When attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern ISAs. We present Macaw, a modular framework that makes it possible to rapidly build reliable binary analysis tools across a range of use cases. Statically typed functional programming techniques are used pervasively throughout Macaw -- these range from using functional optimization passes to encoding tricky architectural invariants at the type level to statically check correctness properties. The level of assurance that functional programming ideas afford us allow us to iterate rapidly on Macaw while still having confidence that the underlying semantics are correct. Over a decade of development, we have used Macaw to support an industrial research team in building tools for machine code-related tasks. As such, the name 'Macaw' refers not just to the framework, but also a suite of tools that are built on top of it. We describe Macaw in depth and describe the different static and dynamic analyses that it performs, many powered by an SMT-based symbolic execution engine. We put a particular focus on interoperability between machine code and higher-level languages, including binary lifting from x86 to LLVM, as well verifying the correctness of mixed C and assembly code.

2024-07-08T20:36:34Z Ryan G. Scott Brett Boston Benjamin Davis Iavor Diatchki Mike Dodds Joe Hendrix Daniel Matichuk Kevin Quick Tristan Ravitch Valentin Robert Benjamin Selfridge Andrei Stefănescu Daniel Wagner Simon Winwood