https://arxiv.org/api/03ddLOZ7efENanNOJTLZDVwLMdo2026-06-28T21:05:38Z9951168015http://arxiv.org/abs/2504.06688v1Efficient Timestamping for Sampling-based Race Detection2025-04-09T08:49:24ZDynamic race detection based on the happens before (HB) partial order has now become the de facto approach to quickly identify data races in multi-threaded software. Most practical implementations for detecting these races use timestamps to infer causality between events and detect races based on these timestamps. Such an algorithm updates timestamps (stored in vector clocks) at every event in the execution, and is known to induce excessive overhead. Random sampling has emerged as a promising algorithmic paradigm to offset this overhead. It offers the promise of making sound race detection scalable. In this work we consider the task of designing an efficient sampling based race detector with low overhead for timestamping when the number of sampled events is much smaller than the total events in an execution. To solve this problem, we propose (1) a new notion of freshness timestamp, (2) a new data structure to store timestamps, and (3) an algorithm that uses a combination of them to reduce the cost of timestamping in sampling based race detection. Further, we prove that our algorithm is close to optimal -- the number of vector clock traversals is bounded by the number of sampled events and number of threads, and further, on any given dynamic execution, the cost of timestamping due to our algorithm is close to the amount of work any timestamping-based algorithm must perform on that execution, that is it is instance optimal. Our evaluation on real world benchmarks demonstrates the effectiveness of our proposed algorithm over prior timestamping algorithms that are agnostic to sampling.2025-04-09T08:49:24ZTo appear at PLDI 2025Minjian ZhangDaniel Wee Soong LimMosaad Al ThokairUmang MathurMahesh Viswanathanhttp://arxiv.org/abs/2504.06542v1Polygon: Symbolic Reasoning for SQL using Conflict-Driven Under-Approximation Search2025-04-09T02:46:52ZWe present a novel symbolic reasoning engine for SQL which can efficiently generate an input $I$ for $n$ queries $P_1, \cdots, P_n$, such that their outputs on $I$ satisfy a given property (expressed in SMT). This is useful in different contexts, such as disproving equivalence of two SQL queries and disambiguating a set of queries. Our first idea is to reason about an under-approximation of each $P_i$ -- that is, a subset of $P_i$'s input-output behaviors. While it makes our approach both semantics-aware and lightweight, this idea alone is incomplete (as a fixed under-approximation might miss some behaviors of interest). Therefore, our second idea is to perform search over an expressive family of under-approximations (which collectively cover all program behaviors of interest), thereby making our approach complete. We have implemented these ideas in a tool, Polygon, and evaluated it on over 30,000 benchmarks across two tasks (namely, SQL equivalence refutation and query disambiguation). Our evaluation results show that Polygon significantly outperforms all prior techniques.2025-04-09T02:46:52ZPLDI 2025Pinhan ZhaoYuepeng WangXinyu Wanghttp://arxiv.org/abs/2405.04612v2Numerical Fuzz: A Type System for Rounding Error Analysis2025-04-08T18:31:17ZAlgorithms operating on real numbers are implemented as floating-point computations in practice, but floating-point operations introduce roundoff errors that can degrade the accuracy of the result. We propose $Λ_{num}$, a functional programming language with a type system that can express quantitative bounds on roundoff error. Our type system combines a sensitivity analysis, enforced through a linear typing discipline, with a novel graded monad to track the accumulation of roundoff errors. We prove that our type system is sound by relating the denotational semantics of our language to the exact and floating-point operational semantics. To demonstrate our system, we instantiate $Λ_{num}$ with error metrics proposed in the numerical analysis literature and we show how to incorporate rounding operations that faithfully model aspects of the IEEE 754 floating-point standard. To show that $Λ_{num}$ can be a useful tool for automated error analysis, we develop a prototype implementation for $Λ_{num}$ that infers error bounds that are competitive with existing tools, while often running significantly faster. Finally, we consider semantic extensions of our graded monad to bound error under more complex rounding behaviors, such as non-deterministic and randomized rounding.2024-05-07T18:50:57ZAriel E. KellisonJustin Hsu10.1145/3656456http://arxiv.org/abs/2504.06026v1Taking out the Toxic Trash: Recovering Precision in Mixed Flow-Sensitive Static Analyses2025-04-08T13:30:15ZStatic analysis of real-world programs combines flow- and context-sensitive analyses of local program states with computation of flow- and context-insensitive invariants at globals, that, e.g., abstract data shared by multiple threads. The values of locals and globals may mutually depend on each other, with the analysis of local program states both making contributions to globals and querying their values. Usually, all contributions to globals are accumulated during fixpoint iteration, with widening applied to enforce termination. Such flow-insensitive information often becomes unnecessarily imprecise and can include superfluous contributions -- trash -- which, in turn, may be toxic to the precision of the overall analysis. To recover precision of globals, we propose techniques complementing each other: Narrowing on globals differentiates contributions by origin; reluctant widening limits the amount of widening applied at globals; and finally, abstract garbage collection undoes contributions to globals and propagates their withdrawal. The experimental evaluation shows that these techniques increase the precision of mixed flow-sensitive analyses at a reasonable cost.2025-04-08T13:30:15ZPreprint of our PLDI '25 paper, including supplementary materialFabian StemmlerMichael SchwarzJulian ErhardSarah TilscherHelmut Seidlhttp://arxiv.org/abs/2412.06947v3PyraNet: A Multi-Layered Hierarchical Dataset for Verilog2025-04-07T21:58:26ZRecently, there has been a growing interest in leveraging Large Language Models for Verilog code generation. However, the current quality of the generated Verilog code remains suboptimal. This is largely due to the absence of well-defined, well-organized datasets with high-quality samples, as well as a lack of innovative fine-tuning methods and models specifically trained on Verilog. In this paper, we introduce a novel open-source dataset and a corresponding fine-tuning technique, which utilizes a multi-layered structure that we refer to as PyraNet. Our experiments demonstrate that employing the proposed dataset and fine-tuning approach leads to a more accurate fine-tuned model, producing syntactically and functionally correct Verilog code. The evaluation results show improvements by up-to $32.6\%$ in comparison to the CodeLlama-7B baseline model and up-to $16.7\%$ in comparison to the state-of-the-art models using VerilogEval evaluation platform.2024-12-09T19:45:54ZBardia NadimiGhali Omar BoutaibHao Zheng10.1109/DAC63849.2025.11133406http://arxiv.org/abs/2505.13454v1pyeb: A Python Implementation of Event-B Refinement Calculus2025-04-07T13:09:21ZThis paper presents the PyEB tool, a Python implementation of the Event-B refinement calculus. The PyEB tool takes a Python program and generates several proof obligations that are then passed into the Z3 solver for verification purposes. The Python program represents an Event-B model. Examples of these proof obligations are machine invariant preservation, feasibility of non-deterministic event actions, event guard strengthening, event simulation, and correctness of machine variants. The Python program follows a particular object-oriented syntax; for example, actions, events, contexts, and machines are encoded as Python classes. We implemented PyEB as a PyPI (Python Package Index) library, which is freely available online. We carried out a case study on the use of PyEB. We modelled and verified several sequential algorithms in Python, e.g., the binary search algorithm and the square-root algorithm, among others. Our experimental results show that PyEB verified the refinement calculus models written in Python.2025-04-07T13:09:21ZNéstor Catañohttp://arxiv.org/abs/2504.04874v1Futureproof Static Memory Planning2025-04-07T09:28:54ZThe NP-complete combinatorial optimization task of assigning offsets to a set of buffers with known sizes and lifetimes so as to minimize total memory usage is called dynamic storage allocation (DSA). Existing DSA implementations bypass the theoretical state-of-the-art algorithms in favor of either fast but wasteful heuristics, or memory-efficient approaches that do not scale beyond one thousand buffers. The "AI memory wall", combined with deep neural networks' static architecture, has reignited interest in DSA. We present idealloc, a low-fragmentation, high-performance DSA implementation designed for million-buffer instances. Evaluated on a novel suite of particularly hard benchmarks from several domains, idealloc ranks first against four production implementations in terms of a joint effectiveness/robustness criterion.2025-04-07T09:28:54ZSubmitted to ACM TOPLASChristos LamprakosPanagiotis XanthopoulosManolis KatsaragakisSotirios XydisDimitrios SoudrisFrancky Catthoorhttp://arxiv.org/abs/2405.17238v3IRIS: LLM-Assisted Static Analysis for Detecting Security Vulnerabilities2025-04-06T23:46:59ZSoftware is prone to security vulnerabilities. Program analysis tools to detect them have limited effectiveness in practice due to their reliance on human labeled specifications. Large language models (or LLMs) have shown impressive code generation capabilities but they cannot do complex reasoning over code to detect such vulnerabilities especially since this task requires whole-repository analysis. We propose IRIS, a neuro-symbolic approach that systematically combines LLMs with static analysis to perform whole-repository reasoning for security vulnerability detection. Specifically, IRIS leverages LLMs to infer taint specifications and perform contextual analysis, alleviating needs for human specifications and inspection. For evaluation, we curate a new dataset, CWE-Bench-Java, comprising 120 manually validated security vulnerabilities in real-world Java projects. A state-of-the-art static analysis tool CodeQL detects only 27 of these vulnerabilities whereas IRIS with GPT-4 detects 55 (+28) and improves upon CodeQL's average false discovery rate by 5% points. Furthermore, IRIS identifies 4 previously unknown vulnerabilities which cannot be found by existing tools. IRIS is available publicly at https://github.com/iris-sast/iris.2024-05-27T14:53:35ZZiyang LiSaikat DuttaMayur Naikhttp://arxiv.org/abs/2502.19348v2The Simulation Semantics of Synthesisable Verilog2025-04-06T23:11:45ZDespite numerous previous formalisation projects targeting Verilog, the semantics of Verilog defined by the Verilog standard -- Verilog's simulation semantics -- has thus far eluded definitive mathematical formalisation. Previous projects on formalising the semantics have made good progress but no previous project provides a formalisation that can be used to execute or formally reason about real-world hardware designs. In this paper, we show that the reason for this is that the Verilog standard is inconsistent both with Verilog practice and itself. We pinpoint a series of problems in the Verilog standard that we have identified in how the standard defines the semantics of the subset of Verilog used to describe hardware designs, that is, the synthesisable subset of Verilog. We show how the most complete Verilog formalisation to date inherits these problems and how, after we repair these problems in an executable implementation of the formalisation, the repaired implementation can be used to execute real-world hardware designs. The existing formalisation together with the repairs hence constitute the first formalisation of Verilog's simulation semantics compatible with real-world hardware designs. Additionally, to make the results of this paper accessible to a wider (nonmathematical) audience, we provide a visual formalisation of Verilog's simulation semantics.2025-02-26T17:39:04ZOOPSLA 2025Andreas Lööw10.1145/3720484http://arxiv.org/abs/2408.15475v2Verifying Solutions to Semantics-Guided Synthesis Problems2025-04-06T21:48:45ZSemantics-Guided Synthesis (SemGuS) provides a framework to specify synthesis problems in a solver-agnostic and domain-agnostic way, by allowing a user to provide both the syntax and semantics of the language in which the desired program should be synthesized. Because synthesis and verification are closely intertwined, the SemGuS framework raises the problem of how to verify programs in a solver and domain-agnostic way.
We prove that the problem of verifying whether a program is a valid solution to a SemGuS problem can be reduced to proving validity of a query in the `CLP calculus, a fixed-point logic that generalizes Constrained Horn Clauses and co-Constrained Horn Clauses. Our encoding into `CLP allows us to further classify the SemGuS verification problems into ones that are reducible to validity of (i) first-order-logic formulas, (ii) Constrained Horn Clauses, (iii) co-Constrained Horn Clauses, and (iv) `CLP queries. Furthermore, our encoding shines light on some limitations of the SemGuS framework, such as its inability to model nondeterminism and reactive synthesis. We thus propose a modification to SemGuS that makes it more expressive, and for which verifying solutions is exactly equivalent to proving validity of a query in the `CLP calculus. Our implementation of SemGuS verifiers based on the above encoding can verify instances that were not even encodable in previous work. Furthermore, we use our SemGuS verifiers within an enumeration-based SemGuS solver to correctly synthesize solutions to SemGuS problems that no previous SemGuS synthesizer could solve.2024-08-28T01:46:15ZCharlie MurphyKeith JohnsonThomas RepsLoris D'Antonihttp://arxiv.org/abs/2504.04542v1Automated Verification of Soundness of DNN Certifiers2025-04-06T16:42:56ZThe uninterpretability of Deep Neural Networks (DNNs) hinders their use in safety-critical applications. Abstract Interpretation-based DNN certifiers provide promising avenues for building trust in DNNs. Unsoundness in the mathematical logic of these certifiers can lead to incorrect results. However, current approaches to ensure their soundness rely on manual, expert-driven proofs that are tedious to develop, limiting the speed of developing new certifiers. Automating the verification process is challenging due to the complexity of verifying certifiers for arbitrary DNN architectures and handling diverse abstract analyses.
We introduce ProveSound, a novel verification procedure that automates the soundness verification of DNN certifiers for arbitrary DNN architectures. Our core contribution is the novel concept of a symbolic DNN, using which, ProveSound reduces the soundness property, a universal quantification over arbitrary DNNs, to a tractable symbolic representation, enabling verification with standard SMT solvers. By formalizing the syntax and operational semantics of ConstraintFlow, a DSL for specifying certifiers, ProveSound efficiently verifies both existing and new certifiers, handling arbitrary DNN architectures.
Our code is available at https://github.com/uiuc-focal-lab/constraintflow.git2025-04-06T16:42:56ZAvaljot SinghYasmin Chandini SaritaCharith MendisGagandeep Singh10.1145/3720509http://arxiv.org/abs/2504.06295v1Bottom-Up Generation of Verilog Designs for Testing EDA Tools2025-04-06T15:40:36ZTesting Electronic Design Automation (EDA) tools rely on benchmarks -- designs written in Hardware Description Languages (HDLs) such as Verilog, SystemVerilog, or VHDL. Although collections of benchmarks for these languages exist, they are typically limited in size. This scarcity has recently drawn more attention due to the increasing need for training large language models in this domain. To deal with such limitation, this paper presents a methodology and a corresponding tool for generating realistic Verilog designs. The tool, ChiGen, was originally developed to test the Jasper\textregistered\ Formal Verification Platform, a product by Cadence Design Systems. Now, released as open-source software, ChiGen has been able to identify zero-day bugs in a range of tools, including Verible, Verilator, and Yosys. This paper outlines the principles behind ChiGen's design, focusing on three aspects of it: (i) generation guided by probabilistic grammars, (ii) type inference via the Hindley-Milner algorithm, and (iii) code injection enabled by data-flow analysis. Once deployed on standard hardware, ChiGen outperforms existing Verilog Fuzzers such as Verismith, TransFuzz, and VlogHammer regarding structural diversity, code coverage, and bug-finding ability.2025-04-06T15:40:36ZThis is an 11-page paper, yet not submitted to conferences or journalsJoão Victor Amorim VieiraLuiza de Melo GomesRafael SumitaniRaissa MacielAugusto MafraMirlaine CrepaldeFernando Magno Quintão Pereirahttp://arxiv.org/abs/2504.04302v1AbsInf: A Lightweight Object to Represent float('inf') in Dijkstra's Algorithm2025-04-05T23:37:55ZWe introduce AbsInf, a lightweight abstract object designed as a high-performance alternative to Python's native float('inf') within pathfinding algorithms. Implemented as a C-based Python extension, AbsInf bypasses IEEE-754 float coercion and dynamic type dispatch, offering constant-time dominance comparisons and arithmetic neutrality. When integrated into Dijkstra's algorithm without altering its logic, AbsInf reduces runtime by up to 17.2%, averaging 9.74% across diverse synthetic and real-world graph datasets. This optimization highlights the performance trade-offs in high-frequency algorithmic constructs, where a symbolic use of infinity permits efficient abstraction. Our findings contribute to the broader discourse on lightweight architectural enhancements for interpreted languages, particularly in performance-critical control flows.2025-04-05T23:37:55Z13 pages, 3 figures. One bar chart was created using OPENAI's ChatGPT and included as Figure 2. One image was downloaded from Wikipedia and cited in the References section (used via local file instead of URL). Benchmarks performed using CPython 3.12.0 and Python 3.13 across Azure and local Windows machines. Code available at https://github.com/AnjanB3012/abstract-infinityAnjan BellamkondaLaksh BharaniHarivatsan Selvamhttp://arxiv.org/abs/2504.01866v2From Code Generation to Software Testing: AI Copilot with Context-Based RAG2025-04-05T09:03:33ZThe rapid pace of large-scale software development places increasing demands on traditional testing methodologies, often leading to bottlenecks in efficiency, accuracy, and coverage. We propose a novel perspective on software testing by positing bug detection and coding with fewer bugs as two interconnected problems that share a common goal, which is reducing bugs with limited resources. We extend our previous work on AI-assisted programming, which supports code auto-completion and chatbot-powered Q&A, to the realm of software testing. We introduce Copilot for Testing, an automated testing system that synchronizes bug detection with codebase updates, leveraging context-based Retrieval Augmented Generation (RAG) to enhance the capabilities of large language models (LLMs). Our evaluation demonstrates a 31.2% improvement in bug detection accuracy, a 12.6% increase in critical test coverage, and a 10.5% higher user acceptance rate, highlighting the transformative potential of AI-driven technologies in modern software development practices.2025-04-02T16:20:05ZThis work has been accepted for publication in IEEE Software (DOI: 10.1109/MS.2025.3549628)Yuchen WangShangxin GuoChee Wei Tan10.1109/MS.2025.3549628http://arxiv.org/abs/2504.03890v1Handling the Selection Monad (Full Version)2025-04-04T19:30:14ZThe selection monad on a set consists of selection functions. These select an element from the set, based on a loss (dually, reward) function giving the loss resulting from a choice of an element. Abadi and Plotkin used the monad to model a language with operations making choices of computations taking account of the loss that would arise from each choice. However, their choices were optimal, and they asked if they could instead be programmer provided.
In this work, we present a novel design enabling programmers to do so. We present a version of algebraic effect handlers enriched by computational ideas inspired by the selection monad. Specifically, as well as the usual delimited continuations, our new kind of handlers additionally have access to choice continuations, that give the possible future losses. In this way programmers can write operations implementing optimisation algorithms that are aware of the losses arising from their possible choices.
We give an operational semantics for a higher-order model language $λC$, and establish desirable properties including progress, type soundness, and termination for a subset with a mild hierarchical constraint on allowable operation types. We give this subset a selection monad denotational semantics, and prove soundness and adequacy results. We also present a Haskell implementation and give a variety of programming examples.2025-04-04T19:30:14ZHandling the Selection Monad (PLDI'25) with the appendixGordon PlotkinNingning Xie