https://arxiv.org/api/FhXTmOxRd7yfyEIDfHkI2MJZEmw2026-06-30T07:34:53Z9958181515http://arxiv.org/abs/2502.19388v1Foundations for Deductive Verification of Continuous Probabilistic Programs: From Lebesgue to Riemann and Back2025-02-26T18:33:32ZWe lay out novel foundations for the computer-aided verification of guaranteed bounds on expected outcomes of imperative probabilistic programs featuring (i) general loops, (ii) continuous distributions, and (iii) conditioning. To handle loops we rely on user-provided quantitative invariants, as is well established. However, in the realm of continuous distributions, invariant verification becomes extremely challenging due to the presence of integrals in expectation-based program semantics. Our key idea is to soundly under- or over-approximate these integrals via Riemann sums. We show that this approach enables the SMT-based invariant verification for programs with a fairly general control flow structure. On the theoretical side, we prove convergence of our Riemann approximations, and establish coRE-completeness of the central verification problems. On the practical side, we show that our approach enables to use existing automated verifiers targeting discrete probabilistic programs for the verification of programs involving continuous sampling. Towards this end, we implement our approach in the recent quantitative verification infrastructure Caesar by encoding Riemann sums in its intermediate verification language. We present several promising case studies.2025-02-26T18:33:32ZFull version of OOPSLA 25 paper, 55 pagesKevin BatzJoost-Pieter KatoenFrancesca RandoneTobias Winklerhttp://arxiv.org/abs/2502.19368v1Qmod: Expressive High-Level Quantum Modeling2025-02-26T18:04:01ZQuantum computing hardware is advancing at a rapid pace, yet the lack of high-level programming abstractions remains a serious bottleneck in the development of new applications. Widely used frameworks still rely on gate-level circuit descriptions, causing the algorithm's functional intent to become lost in low-level implementation details, and hindering flexibility and reuse. While various high-level quantum programming languages have emerged in recent years - offering a significant step toward higher abstraction - many still lack support for classical-like expression syntax, and native constructs for useful quantum algorithmic idioms. This paper presents Qmod, a high-level quantum programming language designed to capture algorithmic intent in natural terms while delegating implementation decisions to automation. Qmod introduces quantum numeric variables and expressions, including digital fixed-point arithmetic tuned for compact representations and optimal resource usage. Beyond digital encoding, Qmod also supports non-digital expression modes - phase and amplitude encoding - frequently exploited by quantum algorithms to achieve computational advantages. We describe the language's constructs, demonstrate practical usage examples, and outline future work on evaluating Qmod across a broader set of use cases.2025-02-26T18:04:01Z9 pagesMatan VaxPeleg EmanuelEyal CornfeldIsrael ReichentalOri OpherOri RothTal MichaeliLior PremingerLior GazitAmir NavehYehuda Navehhttp://arxiv.org/abs/2502.16519v2Guarding the Privacy of Label-Only Access to Neural Network Classifiers via iDP Verification2025-02-26T16:31:42ZNeural networks are susceptible to privacy attacks that can extract private information of the training set. To cope, several training algorithms guarantee differential privacy (DP) by adding noise to their computation. However, DP requires to add noise considering every possible training set. This leads to a significant decrease in the network's accuracy. Individual DP (iDP) restricts DP to a given training set. We observe that some inputs deterministically satisfy iDP without any noise. By identifying them, we can provide iDP label-only access to the network with a minor decrease to its accuracy. However, identifying the inputs that satisfy iDP without any noise is highly challenging. Our key idea is to compute the iDP deterministic bound (iDP-DB), which overapproximates the set of inputs that do not satisfy iDP, and add noise only to their predicted labels. To compute the tightest iDP-DB, which enables to guard the label-only access with minimal accuracy decrease, we propose LUCID, which leverages several formal verification techniques. First, it encodes the problem as a mixed-integer linear program, defined over a network and over every network trained identically but without a unique data point. Second, it abstracts a set of networks using a hyper-network. Third, it eliminates the overapproximation error via a novel branch-and-bound technique. Fourth, it bounds the differences of matching neurons in the network and the hyper-network and employs linear relaxation if they are small. We show that LUCID can provide classifiers with a perfect individuals' privacy guarantee (0-iDP) -- which is infeasible for DP training algorithms -- with an accuracy decrease of 1.4%. For more relaxed $\varepsilon$-iDP guarantees, LUCID has an accuracy decrease of 1.2%. In contrast, existing DP training algorithms reduce the accuracy by 12.7%.2025-02-23T09:50:26ZAnan KabahaDana Drachsler-Cohenhttp://arxiv.org/abs/2502.19143v1Language-Parametric Reference Synthesis (Extended)2025-02-26T13:58:22ZModern Integrated Development Environments (IDEs) offer automated refactorings to aid programmers in developing and maintaining software. However, implementing sound automated refactorings is challenging, as refactorings may inadvertently introduce name-binding errors or cause references to resolve to incorrect declarations. To address these issues, previous work by Schäfer et al. proposed replacing concrete references with locked references to separate binding preservation from transformation. Locked references vacuously resolve to a specific declaration, and after transformation must be replaced with concrete references that also resolve to that declaration. Synthesizing these references requires a faithful inverse of the name lookup functions of the underlying language.
Manually implementing such inverse lookup functions is challenging due to the complex name-binding features in modern programming languages. Instead, we propose to automatically derive this function from type system specifications written in the Statix meta-DSL. To guide the synthesis of qualified references we use scope graphs, which represent the binding structure of a program, to infer their names and discover their syntactic structure.
We evaluate our approach by synthesizing concrete references for locked references in 2528 Java, 196 ChocoPy, and 49 Featherweight Generic Java test programs. Our approach yields a principled language-parametric method for synthesizing references.2025-02-26T13:58:22Z31 pages, 13 figures, This is an extended version of the paper published at PACMPL OOPSLA'25: https://doi.org/10.1145/3720481Daniel A. A. PelsmaekerAron ZwaanCasper BachArjan J. Mooij10.1145/3720481http://arxiv.org/abs/2502.18879v1Adaptive Shielding via Parametric Safety Proofs2025-02-26T06:50:48ZA major challenge to deploying cyber-physical systems with learning-enabled controllers is to ensure their safety, especially in the face of changing environments that necessitate runtime knowledge acquisition. Model-checking and automated reasoning have been successfully used for shielding, i.e., to monitor untrusted controllers and override potentially unsafe decisions, but only at the cost of hard tradeoffs in terms of expressivity, safety, adaptivity, precision and runtime efficiency. We propose a programming-language framework that allows experts to statically specify adaptive shields for learning-enabled agents, which enforce a safe control envelope that gets more permissive as knowledge is gathered at runtime. A shield specification provides a safety model that is parametric in the current agent's knowledge. In addition, a nondeterministic inference strategy can be specified using a dedicated domain-specific language, enforcing that such knowledge parameters are inferred at runtime in a statistically-sound way. By leveraging language design and theorem proving, our proposed framework empowers experts to design adaptive shields with an unprecedented level of modeling flexibility, while providing rigorous, end-to-end probabilistic safety guarantees.2025-02-26T06:50:48ZProceedings of the ACM on Programming Languages, Volume 9, Issue OOPSLA1, 2025Yao FengJun ZhuAndré PlatzerJonathan Laurent10.1145/3720450http://arxiv.org/abs/2502.18297v1DeepCircuitX: A Comprehensive Repository-Level Dataset for RTL Code Understanding, Generation, and PPA Analysis2025-02-25T15:34:00ZThis paper introduces DeepCircuitX, a comprehensive repository-level dataset designed to advance RTL (Register Transfer Level) code understanding, generation, and power-performance-area (PPA) analysis. Unlike existing datasets that are limited to either file-level RTL code or physical layout data, DeepCircuitX provides a holistic, multilevel resource that spans repository, file, module, and block-level RTL code. This structure enables more nuanced training and evaluation of large language models (LLMs) for RTL-specific tasks. DeepCircuitX is enriched with Chain of Thought (CoT) annotations, offering detailed descriptions of functionality and structure at multiple levels. These annotations enhance its utility for a wide range of tasks, including RTL code understanding, generation, and completion. Additionally, the dataset includes synthesized netlists and PPA metrics, facilitating early-stage design exploration and enabling accurate PPA prediction directly from RTL code. We demonstrate the dataset's effectiveness on various LLMs finetuned with our dataset and confirm the quality with human evaluations. Our results highlight DeepCircuitX as a critical resource for advancing RTL-focused machine learning applications in hardware design automation.Our data is available at https://zeju.gitbook.io/lcm-team.2025-02-25T15:34:00Z8 pages, 3 figuresZeju LiChangran XuZhengyuan ShiZedong PengYi LiuYunhao ZhouLingfeng ZhouChengyu MaJianyuan ZhongXi WangJieru ZhaoZhufei ChuXiaoyan YangQiang Xuhttp://arxiv.org/abs/2502.18266v1Parsing TTree Formula in Python2025-02-25T14:53:40ZUproot can read ROOT files directly in pure Python but cannot (yet) compute expressions in ROOT's TTreeFormula expression language. Despite its popularity, this language has only one implementation and no formal specification. In a package called "formulate," we defined the language's syntax in standard BNF and parse it with Lark, a fast and modern parsing toolkit in Python. With formulate, users can now convert ROOT TTreeFormula expressions into NumExpr and Awkward Array manipulations.
In this contribution, we describe BNF notation and the Look Ahead Left to Right (LALR) parsing algorithm, which scales linearly with expression length. We also present the challenges with interpreting TTreeFormula expressions as a functional language; some function-like forms can't be expressed as true functions. We also describe the design of the abstract syntax tree that facilitates conversion between the three languages. The formulate package has zero package dependencies, so we are adding it as one of Uproot's dependencies so that Uproot will be able to use TTreeFormula expressions, whether they are hand-written or embedded in a ROOT file as TTree aliases.2025-02-25T14:53:40Z6 pages, 3 figures, CHEP 2024 proceedingsAryan RoyJim Pivarskihttp://arxiv.org/abs/2501.11613v7Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems2025-02-24T17:40:38ZThis study introduces Conversation Routines (CR), a structured prompt engineering framework for developing task-oriented dialog systems using Large Language Models (LLMs). While LLMs demonstrate remarkable natural language understanding capabilities, engineering them to reliably execute complex business workflows remains challenging. The proposed CR framework enables the development of Conversation Agentic Systems (CAS) through natural language specifications, embedding task-oriented logic within LLM prompts. This approach provides a systematic methodology for designing and implementing complex conversational workflows while maintaining behavioral consistency. We demonstrate the framework's effectiveness through two proof-of-concept implementations: a Train Ticket Booking System and an Interactive Troubleshooting Copilot. These case studies validate CR's capability to encode sophisticated behavioral patterns and decision logic while preserving natural conversational flexibility. Results show that CR enables domain experts to design conversational workflows in natural language while leveraging custom functions (tools) developed by software engineers, creating an efficient division of responsibilities where developers focus on core API implementation and domain experts handle conversation design. While the framework shows promise in accessibility and adaptability, we identify key challenges including computational overhead, non-deterministic behavior, and domain-specific logic optimization. Future research directions include CR evaluation methods based on prompt engineering frameworks driven by goal-oriented grading criteria, improving scalability for complex multi-agent interactions, and enhancing system robustness to address the identified limitations across diverse business applications.2025-01-20T17:19:02ZMinor typos revisionGiorgio Robinohttp://arxiv.org/abs/2502.15536v1NPB-Rust: NAS Parallel Benchmarks in Rust2025-02-21T15:39:29ZParallel programming often requires developers to handle complex computational tasks that can yield many errors in its development cycle. Rust is a performant low-level language that promises memory safety guarantees with its compiler, making it an attractive option for HPC application developers. We identified that the Rust ecosystem could benefit from more comprehensive scientific benchmark suites for standardizing comparisons and research. The NAS Parallel Benchmarks (NPB) is a standardized suite for evaluating various hardware aspects and is often used to compare different frameworks for parallelism. Therefore, our contributions are a Rust version of NPB, an analysis of the expressiveness and performance of the language features, and parallelization strategies. We compare our implementation with consolidated sequential and parallel versions of NPB. Experimental results show that Rust's sequential version is 1.23\% slower than Fortran and 5.59\% faster than C++, while Rust with Rayon was slower than both Fortran and C++ with OpenMP.2025-02-21T15:39:29ZEduardo M. MartinsLeonardo G. FaéRenato B. HoffmannLucas S. BianchessiDalvan Grieblerhttp://arxiv.org/abs/2502.15441v1On the Effectiveness of Large Language Models in Writing Alloy Formulas2025-02-21T13:09:58ZDeclarative specifications have a vital role to play in developing safe and dependable software systems. Writing specifications correctly, however, remains particularly challenging. This paper presents a controlled experiment on using large language models (LLMs) to write declarative formulas in the well-known language Alloy. Our use of LLMs is three-fold. One, we employ LLMs to write complete Alloy formulas from given natural language descriptions (in English). Two, we employ LLMs to create alternative but equivalent formulas in Alloy with respect to given Alloy formulas. Three, we employ LLMs to complete sketches of Alloy formulas and populate the holes in the sketches by synthesizing Alloy expressions and operators so that the completed formulas accurately represent the desired properties (that are given in natural language). We conduct the experimental evaluation using 11 well-studied subject specifications and employ two popular LLMs, namely ChatGPT and DeepSeek. The experimental results show that the LLMs generally perform well in synthesizing complete Alloy formulas from input properties given in natural language or in Alloy, and are able to enumerate multiple unique solutions. Moreover, the LLMs are also successful at completing given sketches of Alloy formulas with respect to natural language descriptions of desired properties (without requiring test cases). We believe LLMs offer a very exciting advance in our ability to write specifications, and can help make specifications take a pivotal role in software development and enhance our ability to build robust software.2025-02-21T13:09:58ZYang HongShan JiangYulei FuSarfraz Khurshidhttp://arxiv.org/abs/2502.15217v1FormalSpecCpp: A Dataset of C++ Formal Specifications created using LLMs2025-02-21T05:20:04ZFormalSpecCpp is a dataset designed to fill the gap in standardized benchmarks for verifying formal specifications in C++ programs. To the best of our knowledge, this is the first comprehensive collection of C++ programs with well-defined preconditions and postconditions. It provides a structured benchmark for evaluating specification inference tools and testing theaccuracy of generated specifications. Researchers and developers can use this dataset to benchmark specification inference tools,fine-tune Large Language Models (LLMs) for automated specification generation, and analyze the role of formal specifications in improving program verification and automated testing. By making this dataset publicly available, we aim to advance research in program verification, specification inference, and AI-assisted software development. The dataset and the code are available at https://github.com/MadhuNimmo/FormalSpecCpp.2025-02-21T05:20:04ZAccepted at the 2025 IEEE/ACM 22nd International Conference on Mining Software Repositories (MSR)Madhurima ChakrabortyPeter PirkelbauerQing Yihttp://arxiv.org/abs/2502.14626v1Partial Incorrectness Logic2025-02-20T15:05:50ZReasoning about program correctness has been a central topic in static analysis for many years, with Hoare logic (HL) playing an important role. The key notions in HL are partial and total correctness. Both require that program executions starting in a specified set of initial states (the precondition) reach a designated set of final states (the postcondition). Partial correctness is more lenient in that it does not require termination, effectively deeming divergence acceptable. We explore partial incorrectness logic, which stands in relation to O'Hearn's "total" incorrectness logic as partial correctness does to total correctness: Partial correctness allows divergence, partial incorrectness allows unreachability. While the duality between divergence and unreachability may not be immediately apparent, we explore this relationship further. Our chosen formalism is predicate transformers à la Dijkstra. We focus here on deterministic and reversible programs, though the discussion extends to nondeterministic and irreversible computations, both of which introduce additional nondeterminism that must be addressed.2025-02-20T15:05:50ZThis abstract was accepted for presentation at TPSA (Theory and Practice of Static Analysis) workshop of POPL (Principles of Programming Languages) 2025Lena VerschtĀnrán WángBenjamin Lucien Kaminskihttp://arxiv.org/abs/2502.14478v1Inductive Synthesis of Inductive Heap Predicates -- Extended Version2025-02-20T11:52:39ZWe present an approach to automatically synthesise recursive predicates in Separation Logic (SL) from concrete data structure instances using Inductive Logic Programming (ILP) techniques. The main challenges to make such synthesis effective are (1) making it work without negative examples that are required in ILP but are difficult to construct for heap-based structures in an automated fashion, and (2) to be capable of summarising not just the shape of a heap (e.g., it is a linked list), but also the properties of the data it stores (e.g., it is a sorted linked list). We tackle these challenges with a new predicate learning algorithm. The key contributions of our work are (a) the formulation of ILP-based learning only using positive examples and (b) an algorithm that synthesises property-rich SL predicates from concrete memory graphs based on the positive-only learning.
We show that our framework can efficiently and correctly synthesise SL predicates for structures that were beyond the reach of the state-of-the-art tools, including those featuring non-trivial payload constraints (e.g., binary search trees) and nested recursion (e.g., n-ary trees). We further extend the usability of our approach by a memory graph generator that produces positive heap examples from programs. Finally, we show how our approach facilitates deductive verification and synthesis of correct-by-construction code.2025-02-20T11:52:39ZZiyi YangIlya Sergeyhttp://arxiv.org/abs/2411.05722v2Characterizing Implementability of Global Protocols with Infinite States and Data2025-02-19T18:57:36ZWe study the implementability problem for an expressive class of symbolic communication protocols involving multiple participants. Our symbolic protocols describe infinite states and data values using dependent refinement predicates. Implementability asks whether a global protocol specification admits a distributed, asynchronous implementation, namely one for each participant, that is deadlock-free and exhibits the same behavior as the specification. We provide a unified explanation of seemingly disparate sources of non-implementability through a precise semantic characterization of implementability for infinite protocols. Our characterization reduces the problem of implementability to (co)reachability in the global protocol restricted to each participant. This compositional reduction yields the first sound and relatively complete algorithm for checking implementability of symbolic protocols. We use our characterization to show that for finite protocols, implementability is co-NP-complete for explicit representations and PSPACE-complete for symbolic representations. The finite, explicit fragment subsumes a previously studied fragment of multiparty session types for which our characterization yields a co-NP decision procedure, tightening a prior PSPACE upper bound.2024-11-08T17:24:11ZElaine LiFelix StutzThomas WiesDamien Zuffereyhttp://arxiv.org/abs/2407.06375v3Macaw: A Machine Code Toolbox for the Busy Binary Analyst2025-02-19T01:35:57ZWhen attempting to understand the behavior of an executable, a binary analyst can make use of many different techniques. These include program slicing, dynamic instrumentation, binary-level rewriting, symbolic execution, and formal verification, all of which can uncover insights into how a piece of machine code behaves. As a result, there is no one-size-fits-all binary analysis tool, so a binary analysis researcher will often combine several different tools. Sometimes, a researcher will even need to design new tools to study problems that existing frameworks are not well equipped to handle. Designing such tools from complete scratch is rarely time- or cost-effective, however, given the scale and complexity of modern ISAs.
We present Macaw, a modular framework that makes it possible to rapidly build reliable binary analysis tools across a range of use cases. Statically typed functional programming techniques are used pervasively throughout Macaw -- these range from using functional optimization passes to encoding tricky architectural invariants at the type level to statically check correctness properties. The level of assurance that functional programming ideas afford us allow us to iterate rapidly on Macaw while still having confidence that the underlying semantics are correct. Over a decade of development, we have used Macaw to support an industrial research team in building tools for machine code-related tasks. As such, the name 'Macaw' refers not just to the framework, but also a suite of tools that are built on top of it. We describe Macaw in depth and describe the different static and dynamic analyses that it performs, many powered by an SMT-based symbolic execution engine. We put a particular focus on interoperability between machine code and higher-level languages, including binary lifting from x86 to LLVM, as well verifying the correctness of mixed C and assembly code.2024-07-08T20:36:34ZRyan G. ScottBrett BostonBenjamin DavisIavor DiatchkiMike DoddsJoe HendrixDaniel MatichukKevin QuickTristan RavitchValentin RobertBenjamin SelfridgeAndrei StefănescuDaniel WagnerSimon Winwood