https://arxiv.org/api/u6ebGe/ktBVrQgGZvlfEDGSkEuA 2026-06-28T14:30:12Z 9951 1590 15 http://arxiv.org/abs/2505.08633v1 D-Hammer: Efficient Equational Reasoning for Labelled Dirac Notation 2025-05-13T14:51:04Z

Labelled Dirac notation is a formalism commonly used by physicists to represent many-body quantum systems and by computer scientists to assert properties of quantum programs. It is supported by a rich equational theory for proving equality between expressions in the language. These proofs are typically carried on pen-and-paper, and can be exceedingly long and error-prone. We introduce D-Hammer, the first tool to support automated equational proof for labelled Dirac notation. The salient features of D-Hammer include: an expressive, higher-order, dependently-typed language for labelled Dirac notation; an efficient normalization algorithm; and an optimized C++ implementation. We evaluate the implementation on representative examples from both plain and labelled Dirac notation. In the case of plain Dirac notation, we show that our implementation significantly outperforms DiracDec.

2025-05-13T14:51:04Z This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. 47 pages (with appendix) Yingte Xu Li Zhou Gilles Barthe http://arxiv.org/abs/2505.08114v1 Valida ISA Spec, version 1.0: A zk-Optimized Instruction Set Architecture 2025-05-12T23:03:02Z

The Valida instruction set architecture is designed for implementation in zkVMs to optimize for fast, efficient execution proving. This specification intends to guide implementors of zkVMs and compiler toolchains for Valida. It provides an unambiguous definition of the semantics of Valida programs and may be used as a starting point for formalization efforts.

2025-05-12T23:03:02Z Morgan Thomas Mamy Ratsimbazafy Marcin Bugaj Lewis Revill Carlo Modica Sebastian Schmidt Ventali Tan Daniel Lubarov Max Gillett Wei Dai http://arxiv.org/abs/2301.05301v2 A Minimal Formulation of Session Types: The Sessions of Trios in Concert 2025-05-12T20:08:33Z

Session types are a type-based approach to the verification of message-passing programs. They specify communication structures essential to enforcing program correctness; by relying on sequencing constructs, a session type can precisely describe the intended order of communication actions through a channel. In this paper we study a fragment of session types that makes a very limited use of sequencing; we call it minimal session types. In the context of a core process calculus with sessions and higher-order concurrency, we establish two technical results. First, we prove that every process P typable with standard session types can be compiled down into a process D(P) typable with minimal session types. Second, we prove that P and D(P) are behaviorally equivalent. These results show that having sequencing in both processes and session types is convenient, but that only sequencing in processes is truly indispensable, as it can correctly codify sequencing in types. Our developments draw inspiration from work by Parrow on behavior-preserving decompositions of untyped processes using trios, i.e., processes with exactly three nested prefixes. By casting Parrow's approach in the realm of typed processes, our developments reveal a conceptually simple formulation of session types, supported by static and dynamic correctness results.

2023-01-12T21:25:28Z arXiv admin note: text overlap with arXiv:1906.03836 Alen Arslanagić Jorge A. Pérez Dan Frumin http://arxiv.org/abs/2411.15100v3 XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models 2025-05-12T08:20:08Z

The applications of LLM Agents are becoming increasingly complex and diverse, leading to a high demand for structured outputs that can be parsed into code, structured function calls, and embodied agent commands. These developments bring significant demands for structured generation in LLM inference. Context-free grammar is a flexible approach to enable structured generation via constrained decoding. However, executing context-free grammar requires going through several stack states over all tokens in vocabulary during runtime, bringing non-negligible overhead for structured generation. In this paper, we propose XGrammar, a flexible and efficient structure generation engine for large language models. XGrammar accelerates context-free grammar execution by dividing the vocabulary into context-independent tokens that can be prechecked and context-dependent tokens that need to be interpreted during runtime. We further build transformations to expand the grammar context and reduce the number of context-independent tokens. Additionally, we build an efficient persistent stack to accelerate the context-dependent token checks. Finally, we co-design the grammar engine with LLM inference engine to overlap grammar computation with GPU executions. Evaluation results show that XGrammar can achieve up to 100x speedup over existing solutions. Combined with an LLM inference engine, it can generate near-zero overhead structure generation in end-to-end low-LLM serving.

2024-11-22T18:01:37Z MLSys '25 Yixin Dong Charlie F. Ruan Yaxing Cai Ruihang Lai Ziyi Xu Yilong Zhao Tianqi Chen http://arxiv.org/abs/2505.06958v1 A Formally Verified Robustness Certifier for Neural Networks (Extended Version) 2025-05-11T12:05:14Z

Neural networks are often susceptible to minor perturbations in input that cause them to misclassify. A recent solution to this problem is the use of globally-robust neural networks, which employ a function to certify that the classification of an input cannot be altered by such a perturbation. Outputs that pass this test are called certified robust. However, to the authors' knowledge, these certification functions have not yet been verified at the implementation level. We demonstrate how previous unverified implementations are exploitably unsound in certain circumstances. Moreover, they often rely on approximation-based algorithms, such as power iteration, that (perhaps surprisingly) do not guarantee soundness. To provide assurance that a given output is robust, we implemented and formally verified a certification function for globally-robust neural networks in Dafny. We describe the program, its specifications, and the important design decisions taken for its implementation and verification, as well as our experience applying it in practice.

2025-05-11T12:05:14Z James Tobler Hira Taqdees Syeda Toby Murray http://arxiv.org/abs/2407.10424v5 CodeV: Empowering LLMs with HDL Generation through Multi-Level Summarization 2025-05-11T07:35:24Z

The design flow of processors, particularly in hardware description languages (HDL) like Verilog and Chisel, is complex and costly. While recent advances in large language models (LLMs) have significantly improved coding tasks in software languages such as Python, their application in HDL generation remains limited due to the scarcity of high-quality HDL data. Traditional methods of adapting LLMs for hardware design rely on synthetic HDL datasets, which often suffer from low quality because even advanced LLMs like GPT perform poorly in the HDL domain. Moreover, these methods focus solely on chat tasks and the Verilog language, limiting their application scenarios. In this paper, we observe that: (1) HDL code collected from the real world is of higher quality than code generated by LLMs. (2) LLMs like GPT-3.5 excel in summarizing HDL code rather than generating it. (3) An explicit language tag can help LLMs better adapt to the target language when there is insufficient data. Based on these observations, we propose an efficient LLM fine-tuning pipeline for HDL generation that integrates a multi-level summarization data synthesis process with a novel Chat-FIM-Tag supervised fine-tuning method. The pipeline enhances the generation of HDL code from natural language descriptions and enables the handling of various tasks such as chat and infilling incomplete code. Utilizing this pipeline, we introduce CodeV, a series of HDL generation LLMs. Among them, CodeV-All not only possesses a more diverse range of language abilities, i.e. Verilog and Chisel, and a broader scope of tasks, i.e. Chat and fill-in-middle (FIM), but it also achieves performance on VerilogEval that is comparable to or even surpasses that of CodeV-Verilog fine-tuned on Verilog only, making them the first series of open-source LLMs designed for multi-scenario HDL generation.

2024-07-15T03:57:20Z 13 pages, 10 figures, journal Yang Zhao Di Huang Chongxiao Li Pengwei Jin Muxin Song Yinan Xu Ziyuan Nan Mingju Gao Tianyun Ma Lei Qi Yansong Pan Zhenxing Zhang Rui Zhang Xishan Zhang Zidong Du Qi Guo Xing Hu http://arxiv.org/abs/2505.13479v1 RTL++: Graph-enhanced LLM for RTL Code Generation 2025-05-11T00:17:26Z

As hardware design complexity escalates, there is an urgent need for advanced automation in electronic design automation (EDA). Traditional register transfer level (RTL) design methods are manual, time-consuming, and prone to errors. While commercial (instruction-tuned) large language models (LLMs) shows promising performance for automation, they pose security and privacy concerns. Open-source models offer alternatives; however, they frequently fall short in quality/correctness, largely due to limited, high-quality RTL code data essential for effective training and generalization. This paper proposes RTL++, a first-of-its-kind LLM-assisted method for RTL code generation that utilizes graph representations of code structures to enhance the quality of generated code. By encoding RTL code into a textualized control flowgraphs (CFG) and data flow graphs (DFG), RTL++ captures the inherent hierarchy, dependencies, and relationships within the code. This structured graph-based approach enhances the context available to LLMs, enabling them to better understand and generate instructions. By focusing on data generation through graph representations, RTL++ addresses the limitations of previous approaches that rely solely on code and suffer from lack of diversity. Experimental results demonstrate that RTL++ outperforms state-of-the-art models fine-tuned for RTL generation, as evaluated using the VerilogEval benchmark's Pass@1/5/10 metric, as well as the RTLLM1.1 model, which highlight the effectiveness of graph-enhanced context in advancing the capabilities of LLM-assisted RTL code generation.

2025-05-11T00:17:26Z Accepted to the IEEE International Conference on LLM-Aided Design (LAD '25) Mohammad Akyash Kimia Azar Hadi Kamali http://arxiv.org/abs/2505.13478v1 An Extensive Study on Text Serialization Formats and Methods 2025-05-10T18:40:29Z

Text serialization is a fundamental concept in modern computing, enabling the conversion of complex data structures into a format that can be easily stored, transmitted, and reconstructed. This paper provides an extensive overview of text serialization, exploring its importance, prevalent formats, underlying methods, and comparative performance characteristics. We dive into the advantages and disadvantages of various text-based serialization formats, including JSON, XML, YAML, and CSV, examining their structure, readability, verbosity, and suitability for different applications. The paper also discusses the common methods involved in the serialization and deserialization processes, such as parsing techniques and the role of schemas. To illustrate the practical implications of choosing a serialization format, we present hypothetical performance results in the form of tables, comparing formats based on metrics like serialization deserialization speed and resulting data size. The discussion analyzes these results, highlighting the trade offs involved in selecting a text serialization format for specific use cases. This work aims to provide a comprehensive resource for understanding and applying text serialization in various computational domains.

2025-05-10T18:40:29Z Wang Wei Li Na Zhang Lei Liu Fang Chen Hao Yang Xiuying Huang Lei Zhao Min Wu Gang Zhou Jie Xu Jing Sun Tao Ma Li Zhu Qiang Hu Jun Guo Wei He Yong Gao Yuan Lin Dan Zheng Yi Shi Li http://arxiv.org/abs/2505.06456v1 Rod Bustall: In Memoriam 2025-05-09T22:26:49Z

This is an obituary of Rod Burstall, written in his honour. Rod was a prominent computer scientist whose contributions span over forty years. Most of his career was spent at Edinburgh University. He lead the team programming Freddy, the first hand-eye assembly robot, with much of his effort being devoted to the development of the POP-2 programming language. He became interested in a mathematical approach to software development: he recognised the central role of structural induction; his work on reasoning about mutable data structures was an influential precursor of separation logic; he was the first to point out the connection between program proof and modal logic; and he was responsible for the idea that stores are mappings from locations to their contents. As part of his quest for correctness of programs, Rod, with John Darlington, undertook the first major work on program transformation. His interest in novel programming languages continued with the experimental language HOPE, developed with Don Sannella and David MacQueen. Robin Milner's Standard ML and its relatives integrated ideas from Hope, and Rod was an active member of the Standard ML design team. Rod pioneered the use of algebraic and categorical techniques in programming. He and Joseph Goguen proposed the first algebraic specification language. Rod was an early user of categorical ideas, in particular developing computational category theory with David Rydeheard. The interplay between programming and correctness proofs is again evident in Rod's contributions to automated proof support systems. For example, he led Randy Pollack, Zhaohui Luo, and others in the Lego proof assistant, which implemented type systems supporting interactive proof development. Finally, Rod, with James McKinna, investigated notions of programs packaged with proofs of their correctness, anticipating the topic of proof-carrying code.

2025-05-09T22:26:49Z This is an obituary of a prominent computer scientist J Strother Moore Gordon Plotkin David Rydeheard Don Sannella 10.1145/3731974. http://arxiv.org/abs/2403.16218v4 CoverUp: Effective High Coverage Test Generation for Python 2025-05-09T14:33:58Z

Testing is an essential part of software development. Test generation tools attempt to automate the otherwise labor-intensive task of test creation, but generating high-coverage tests remains challenging. This paper proposes CoverUp, a novel approach to driving the generation of high-coverage Python regression tests. CoverUp combines coverage analysis, code context, and feedback in prompts that iteratively guide the LLM to generate tests that improve line and branch coverage. We evaluate our prototype CoverUp implementation across a benchmark of challenging code derived from open-source Python projects and show that CoverUp substantially improves on the state of the art. Compared to CodaMosa, a hybrid search/LLM-based test generator, CoverUp achieves a per-module median line+branch coverage of 80% (vs. 47%). Compared to MuTAP, a mutation- and LLM-based test generator, CoverUp achieves an overall line+branch coverage of 89% (vs. 77%). We also demonstrate that CoverUp's performance stems not only from the LLM used but from the combined effectiveness of its components.

2024-03-24T16:18:27Z 21 pages; to appear at FSE'25 Juan Altmayer Pizzorno Emery D. Berger 10.1145/3729398 http://arxiv.org/abs/2505.05986v1 GNU Aris: a web application for students 2025-05-09T12:17:45Z

We report on recent improvements to the free logic education software tool GNU Aris, including the latest features added during the Google Summer of Code 2023 project. We focused on making GNU Aris a web application to enable almost all users to use it as a standalone offline web application written in a combination of HTML, JavaScript, and WebAssembly. We used the Qt Quick framework with Emscripten to compile the application to WebAssembly. In the report we summarize the user feedback of university students given during a course on logic.

2025-05-09T12:17:45Z In Proceedings ThEdu24, arXiv:2505.04677 EPTCS 419, 2025, pp. 42-54 Saksham Attri Birla Institute of Technology and Science Pilani, Hyderabad Campus, India Zoltán Kovács Private University of Education, Diocese Linz, Austria Aaron Windischbauer Private University of Education, Diocese Linz, Austria 10.4204/EPTCS.419.3 http://arxiv.org/abs/2505.13473v1 A Graphical Interface for Category Theory Proofs in Coq 2025-05-09T12:17:29Z

The importance of category theory in recent developments in both mathematics and in computer science cannot be overstated. However, its abstract nature makes it difficult to understand at first. Graphical languages have been developed to help manage this abstraction, but they have not been used in proof assistants, most of which are text-based. We believe that a graphical interface for categorical proofs integrated in a generic proof assistant would allow students to familiarize themselves with diagrammatic reasoning on concrete proofs that they are already familiar with. We present an implementation of a Coq plugin that enables both visualization and interactions with Coq proofs in a graphical manner.

2025-05-09T12:17:29Z In Proceedings ThEdu24, arXiv:2505.04677 EPTCS 419, 2025, pp. 28-41 Luc Chabassier ENS Paris-Saclay 10.4204/EPTCS.419.2 http://arxiv.org/abs/2505.05715v1 JustinANN: Realistic Test Generation for Java Programs Driven by Annotations 2025-05-09T01:31:46Z

Automated test case generation is important. However, the automatically generated test input does not always make sense, and the automated assertion is difficult to validate against the program under test. In this paper, we propose JustinANN, a flexible and scalable tool to generate test cases for Java programs, providing realistic test inputs and assertions. We have observed that, in practice, Java programs contain a large number of annotations from programs, which can be considered as part of the user specification. We design a systematic annotation set with 7 kinds of annotations and 4 combination rules based on them to modify complex Java objects. Annotations that modify the fields or return variables of methods can be used to generate assertions that represent the true intent of the program, and the ones that modify the input parameters can be used to generate test inputs that match the real business requirement. We have conducted experiments to evaluate the approach on open source Java programs. The results show that the annotations and their combinations designed in this paper are compatible with existing annotations; our approach is easier to generate test data in, on and outside the boundaries of the requirement domain; and it also helps to find program defects.

2025-05-09T01:31:46Z Baoquan Cui Rong Qu Jian Zhang http://arxiv.org/abs/2504.09246v2 Type-Constrained Code Generation with Language Models 2025-05-08T09:33:40Z

Large language models (LLMs) have achieved notable success in code generation. However, they still frequently produce uncompilable output because their next-token inference procedure does not model formal aspects of code. Although constrained decoding is a promising approach to alleviate this issue, it has only been applied to handle either domain-specific languages or syntactic features of general-purpose programming languages. However, LLMs frequently generate code with typing errors, which are beyond the domain of syntax and generally hard to adequately constrain. To address this challenge, we introduce a type-constrained decoding approach that leverages type systems to guide code generation. For this purpose, we develop novel prefix automata and a search over inhabitable types, forming a sound approach to enforce well-typedness on LLM-generated code. We formalize our approach on a foundational simply-typed language and extend it to TypeScript to demonstrate practicality. Our evaluation on the HumanEval and MBPP datasets shows that our approach reduces compilation errors by more than half and significantly increases functional correctness in code synthesis, translation, and repair tasks across LLMs of various sizes and model families, including state-of-the-art open-weight models with more than 30B parameters. The results demonstrate the generality and effectiveness of our approach in constraining LLM code generation with formal rules of type systems.

2025-04-12T15:03:00Z Niels Mündler Jingxuan He Hao Wang Koushik Sen Dawn Song Martin Vechev 10.1145/3729274 http://arxiv.org/abs/2501.12093v3 Checkification: A Practical Approach for Testing Static Analysis Truths 2025-05-07T14:05:14Z

Static analysis is an essential component of many modern software development tools. Unfortunately, the ever-increasing complexity of static analyzers makes their coding error-prone. Even analysis tools based on rigorous mathematical techniques, such as abstract interpretation, are not immune to bugs. Ensuring the correctness and reliability of software analyzers is critical if they are to be inserted in production compilers and development environments. While compiler validation has seen notable success, formal validation of static analysis tools remains relatively unexplored. In this paper, we propose a method for testing abstract interpretation-based static analyzers. Broadly, it consists in checking, over a suite of benchmarks, that the properties inferred statically are satisfied dynamically. The main advantage of our approach lies in its simplicity, which stems directly from framing it within the Ciao assertion-based validation framework, and its blended static/dynamic assertion checking approach. We demonstrate that in this setting, the analysis can be tested with little effort by combining the following components already present in the framework: 1) the static analyzer, which outputs its results as the original program source with assertions interspersed; 2) the assertion run-time checking mechanism, which instruments a program to ensure that no assertion is violated at run time; 3) the random test case generator, which generates random test cases satisfying the properties present in assertion preconditions; and 4) the unit-test framework, which executes those test cases. We have applied our approach to the CiaoPP static analyzer, resulting in the identification of many bugs with reasonable overhead. Most of these bugs have been either fixed or confirmed, helping us detect a range of errors not only related to analysis soundness but also within other aspects of the framework.

2025-01-21T12:38:04Z Accepted for publication in Theory and Practice of Logic Programming (TPLP). Extended, revised version of our work published in LOPSTR 2020 Daniela Ferreiro Ignacio Casso Jose F. Morales Pedro López-García Manuel V. Hermenegildo