https://arxiv.org/api/7AwtTWOmeNlU6igBugeeoIbtoJI2026-06-30T11:46:40Z9958187515http://arxiv.org/abs/2501.15002v1A Proof-Producing Compiler for Blockchain Applications2025-01-25T00:31:47ZCairoZero is a programming language for running decentralized applications (dApps) at scale. Programs written in the CairoZero language are compiled to machine code for the Cairo CPU architecture and cryptographic protocols are used to verify the results of execution efficiently on blockchain. We explain how we have extended the CairoZero compiler with tooling that enables users to prove, in the Lean 3 proof assistant, that compiled code satisfies high-level functional specifications. We demonstrate the success of our approach by verifying primitives for computation with the secp256k1 and secp256r1 curves over a large finite field as well as the validation of cryptographic signatures using the former. We also verify a mechanism for simulating a read-write dictionary data structure in a read-only setting. Finally, we reflect on our methodology and discuss some of the benefits of our approach.2025-01-25T00:31:47ZJeremy AvigadLior GoldbergDavid LevitYoav SeginerAlon Titelmanhttp://arxiv.org/abs/2501.14702v1Multiparty Session Types with a Bang!2025-01-24T18:24:20ZReplication is an alternative construct to recursion for describing infinite behaviours in the pi-calculus. In this paper we explore the implications of including type-level replication in Multiparty Session Types (MPST), a behavioural type theory for message-passing programs. We introduce MPST!, a session-typed multiparty process calculus with replication and first-class roles. We show that replication is not an equivalent alternative to recursion in MPST, and that using both replication and recursion in one type system in fact allows us to express both context-free protocols and protocols that support mutual exclusion and races. We demonstrate the expressiveness of MPST! on examples including binary tree serialisation, dining philosophers, and a model of an auction, and explore the implications of replication on the decidability of typechecking.2025-01-24T18:24:20ZMatthew Alan Le BrunSimon FowlerOrnela Dardhahttp://arxiv.org/abs/2403.10144v3NLP Verification: Towards a General Methodology for Certifying Robustness2025-01-24T15:43:41ZMachine Learning (ML) has exhibited substantial success in the field of Natural Language Processing (NLP). For example large language models have empirically proven to be capable of producing text of high complexity and cohesion. However, they are prone to inaccuracies and hallucinations. As these systems are increasingly integrated into real-world applications, ensuring their safety and reliability becomes a primary concern. There are safety critical contexts where such models must be robust to variability or attack, and give guarantees over their output. Computer Vision had pioneered the use of formal verification of neural networks for such scenarios and developed common verification standards and pipelines, leveraging precise formal reasoning about geometric properties of data manifolds. In contrast, NLP verification methods have only recently appeared in the literature. While presenting sophisticated algorithms, these papers have not yet crystallised into a common methodology. They are often light on the pragmatical issues of NLP verification and the area remains fragmented. In this paper, we attempt to distil and evaluate general components of an NLP verification pipeline, that emerges from the progress in the field to date. Our contributions are two-fold. Firstly, we propose a general methodology to analyse the effect of the embedding gap, a problem that refers to the discrepancy between verification of geometric subspaces and the semantic meaning of sentences, which the geometric subspaces are supposed to represent. We propose a number of practical NLP methods that can help to quantify the effects of the embedding gap. Secondly, we give a general method for training and verification of neural networks that leverages a more precise geometric estimation of semantic similarity of sentences in the embedding space and helps to overcome the effects of the embedding gap in practice.2024-03-15T09:43:52ZMarco CasadioTanvi DinkarEkaterina KomendantskayaLuca ArnaboldiMatthew L. DaggittOmri IsacGuy KatzVerena RieserOliver Lemon10.1017/S0956792525000099http://arxiv.org/abs/2501.14555v1Exploring Answer Set Programming for Provenance Graph-Based Cyber Threat Detection: A Novel Approach2025-01-24T14:57:27ZProvenance graphs are useful and powerful tools for representing system-level activities in cybersecurity; however, existing approaches often struggle with complex queries and flexible reasoning. This paper presents a novel approach using Answer Set Programming (ASP) to model and analyze provenance graphs. We introduce an ASP-based representation that captures intricate relationships between system entities, including temporal and causal dependencies. Our model enables sophisticated analysis capabilities such as attack path tracing, data exfiltration detection, and anomaly identification. The declarative nature of ASP allows for concise expression of complex security patterns and policies, facilitating both real-time threat detection and forensic analysis. We demonstrate our approach's effectiveness through case studies showcasing its threat detection capabilities. Experimental results illustrate the model's ability to handle large-scale provenance graphs while providing expressive querying. The model's extensibility allows for incorporation of new system behaviors and security rules, adapting to evolving cyber threats. This work contributes a powerful, flexible, and explainable framework for reasoning about system behaviors and security incidents, advancing the development of effective threat detection and forensic investigation tools.2025-01-24T14:57:27ZThe 27th International Symposium on Practical Aspects of Declarative Languages (PADL 2025)Fang LiFei ZuoGopal Guptahttp://arxiv.org/abs/2501.14438v1Data-efficient Performance Modeling via Pre-training2025-01-24T12:14:53ZPerformance models are essential for automatic code optimization, enabling compilers to predict the effects of code transformations on performance and guide search for optimal transformations. Building state-of-the-art performance models with deep learning, however, requires vast labeled datasets of random programs -- an expensive and time-consuming process, stretching over months. This paper introduces a self-supervised pre-training scheme with autoencoders to reduce the need for labeled data. By pre-training on a large dataset of random programs, the autoencoder learns representations of code and transformations, which are then used to embed programs for the performance model. Implemented in the Tiramisu autoscheduler, our approach improves model accuracy with less data. For example, to achieve a MAPE of 20.72%, the original model requires 18 million data points, whereas our method achieves a similar MAPE of 22.44% with only 3.6 million data points, reducing data requirements by 5x.2025-01-24T12:14:53ZChunting LiuRiyadh Baghdadihttp://arxiv.org/abs/2410.18212v2CUTECat: Concolic Execution for Computational Law2025-01-23T10:10:41ZMany legal computations, including the amount of tax owed by a citizen, whether they are eligible to social benefits, or the wages due to civil state servants, are specified by computational laws. Their application, however, is performed by expert computer programs intended to faithfully transcribe the law into computer code. Bugs in these programs can lead to dramatic societal impact, e.g., paying employees incorrect amounts, or not awarding benefits to families in need.
To address this issue, we consider concolic unit testing, a combination of concrete execution with SMT-based symbolic execution, and propose CUTECat, a concolic execution tool targeting implementations of computational laws. Such laws typically follow a pattern where a base case is later refined by many exceptions in following law articles, a pattern that can be formally modeled using default logic. We show how to handle default logic inside a concolic execution tool, and implement our approach in the context of Catala, a recent domain-specific language tailored to implement computational laws. We evaluate CUTECat on several programs, including the Catala implementation of the French housing benefits and Section 132 of the US tax code. We show that CUTECat can successfully generate hundreds of thousands of testcases covering all branches of these bodies of law. Through several heuristics, we improve CUTECat's scalability and usability, making the testcases understandable by lawyers and programmers alike. We believe CUTECat thus paves the way for the use of formal methods during legislative processes.2024-10-23T18:30:23ZPierre GoutagnyAymeric FromherzRaphaël Monathttp://arxiv.org/abs/2105.12077v2A beginner guide to Iris, Coq and separation logic2025-01-23T04:59:44ZCreating safe concurrent algorithms is challenging and error-prone. For this reason, a formal verification framework is necessary especially when those concurrent algorithms are used in safety-critical systems. The goal of this guide is to provide resources for beginners to get started in their journey of formal verification using the powerful tool Iris. The difference between this guide and many others is that it provides (i) an in-depth explanation of examples and tactics, (ii) an explicit discussion of separation logic, and (iii) a thorough coverage of Iris and Coq. References to other guides and to papers are included throughout to provide readers with resources through which to continue their learning.2021-05-25T16:57:25ZElizabeth Dietrichhttp://arxiv.org/abs/2501.13262v1ASDF: A Compiler for Qwerty, a Basis-Oriented Quantum Programming Language2025-01-22T22:54:13ZQwerty is a high-level quantum programming language built on bases and functions rather than circuits. This new paradigm introduces new challenges in compilation, namely synthesizing circuits from basis translations and automatically specializing adjoint or predicated forms of functions. This paper presents ASDF, an open-source compiler for Qwerty that answers these challenges in compiling basis-oriented languages. Enabled with a novel high-level quantum IR implemented in the MLIR framework, our compiler produces OpenQASM 3 or QIR for either simulation or execution on hardware. Our compiler is evaluated by comparing the fault-tolerant resource requirements of generated circuits with other compilers, finding that ASDF produces circuits with comparable cost to prior circuit-oriented compilers.2025-01-22T22:54:13Z20 pages, 14 figures. To appear in CGO '25Austin J. AdamsSharjeel KhanArjun S. BhamraRyan R. AbusaadaAnthony M. CabreraCameron C. HoechstTravis S. HumbleJeffrey S. YoungThomas M. Contehttp://arxiv.org/abs/2501.13194v1Corecursive Coding of High Computational Derivatives and Power Series2025-01-22T19:55:43ZWe discuss the functional lazy techniques in generation and handling of arbitrarily long sequences of derivatives of numerical expressions in one ``variable''; the domain to which the paper belongs is usually nicknamed ``Automatic differentiation''. Two models thereof are considered, the chains of ``pure'' derivatives, and the infinite power series, similar, but algorithmically a bit different. We deal with their arithmetic/algebra, and with more convoluted procedures, such as composition and reversion. Some more specific applications of these structures are also presented.2025-01-22T19:55:43Z22 pages, 2 figures. Category: cs.PLJerzy Karczmarczukhttp://arxiv.org/abs/2501.13002v1Constructive characterisations of the must-preorder for asynchrony2025-01-22T16:42:03ZDe Nicola and Hennessy's must-preorder is a contextual refinement which states that a server q refines a server p if all clients satisfied by p are also satisfied by q. Owing to the universal quantification over clients, this definition does not yield a practical proof method for the must-preorder, and alternative characterisations are necessary to reason over it. Finding these characterisations for asynchronous semantics, i.e. where outputs are non-blocking, has thus far proven to be a challenge, usually tackled via ad-hoc definitions. We show that the standard characterisations of the must-preorder carry over as they stand to asynchronous communication, if servers are enhanced to act as forwarders, i.e. they can input any message as long as they store it back into the shared buffer. Our development is constructive, is completely mechanised in Coq, and is independent of any calculus: our results pertain to Selinger output-buffered agents with feedback. This is a class of Labelled Transition Systems that captures programs that communicate via a shared unordered buffer, as in asynchronous CCS or the asynchronous pi-calculus. We show that the standard coinductive characterisation lets us prove in Coq that concrete programs are related by the must-preorder. Finally, our proofs show that Brouwer's bar induction principle is a useful technique to reason on liveness preserving program transformations.2025-01-22T16:42:03ZGiovanni BernardiIlaria CastellaniPaul LaforgueLéo Stefanescohttp://arxiv.org/abs/2502.00028v1VRank: Enhancing Verilog Code Generation from Large Language Models via Self-Consistency2025-01-22T08:36:29ZLarge Language Models (LLMs) have demonstrated promising capabilities in generating Verilog code from module specifications. To improve the quality of such generated Verilog codes, previous methods require either time-consuming manual inspection or generation of multiple Verilog codes, from which the one with the highest quality is selected with manually designed testbenches. To enhance the generation efficiency while maintaining the quality of the generated codes, we propose VRank, an automatic framework that generates Verilog codes with LLMs. In our framework, multiple code candidates are generated with LLMs by leveraging their probabilistic nature. Afterwards, we group Verilog code candidates into clusters based on identical outputs when tested against the same testbench, which is also generated by LLMs. Clusters are ranked based on the consistency they show on testbench. To determine the best candidate, Chain-of-Thought is further applied to select the best candidate from the top-ranked clusters. By systematically analyzing diverse outputs of generated codes, VRank reduces errors and enhances the overall quality of the generated Verilog code. Experimental results on the VerilogEval-Human benchmark demonstrate a significant 10.5% average increase in functional correctness (passl1) across multiple LLMs, demonstrating VRank's effectiveness in improving the accuracy of automated hardware description language generation for complex design tasks.2025-01-22T08:36:29Zaccepted by ISQED2025Zhuorui ZhaoRuidi QiuIng-Chao LinGrace Li ZhangBing LiUlf Schlichtmannhttp://arxiv.org/abs/2501.12702v1Paradigm-Based Automatic HDL Code Generation Using LLMs2025-01-22T08:18:37ZWhile large language models (LLMs) have demonstrated the ability to generate hardware description language (HDL) code for digital circuits, they still face the hallucination problem, which can result in the generation of incorrect HDL code or misinterpretation of specifications. In this work, we introduce a human-expert-inspired method to mitigate the hallucination of LLMs and enhance their performance in HDL code generation. We begin by constructing specialized paradigm blocks that consist of several steps designed to divide and conquer generation tasks, mirroring the design methodology of human experts. These steps include information extraction, human-like design flows, and the integration of external tools. LLMs are then instructed to classify the type of circuit in order to match it with the appropriate paradigm block and execute the block to generate the HDL codes. Additionally, we propose a two-phase workflow for multi-round generation, aimed at effectively improving the testbench pass rate of the generated HDL codes within a limited number of generation and verification rounds. Experimental results demonstrate that our method significantly enhances the functional correctness of the generated Verilog code2025-01-22T08:18:37Zaccepted by ISQED2025. arXiv admin note: text overlap with arXiv:2407.18326Wenhao SunBing LiGrace Li ZhangXunzhao YinCheng ZhuoUlf Schlichtmannhttp://arxiv.org/abs/2501.12313v1Correctness Witnesses with Function Contracts2025-01-21T17:27:59ZSoftware verification witnesses are a common exchange format for software verification tools. They were developed to provide arguments supporting the verification result, allowing other tools to reproduce the verification results. Correctness witnesses in the current format (version 2.0) allow only for the encoding of loop and location invariants using C expressions. This limits the correctness arguments that verifiers can express in the witness format. One particular limitation is the inability to express function contracts, which consist of a pre-condition and a post-condition for a function. We propose an extension to the existing witness format 2.0 to allow for the specification of function contracts. Our extension includes support for several features inspired by ACSL (\result, \old, \at). This allows for the export of more information from tools and for the exchange of information with tools that require function contracts.2025-01-21T17:27:59Z9 pages, 3 figures, 1 tableMatthias HeizmannDominik KlumppMarian Lingsch-RosenfeldFrank Schüsselehttp://arxiv.org/abs/2501.11501v1FLAT: Formal Languages as Types2025-01-20T14:16:24ZProgrammers regularly use strings to encode many types of data, such as Unix file paths, URLs, and email addresses, that are conceptually different. However, existing mainstream programming languages use a unified string type to represent them all. As a result, their type systems will keep quiet when a function requiring an email address is instead fed an HTML text, which may cause unexceptional failures or vulnerabilities.
To let the type system distinguish such conceptually different string types, in this paper, we propose to regard \emph{formal languages as types} (FLAT), thereby restricting the set of valid strings by context-free grammars and semantic constraints if needed. To this end, email addresses and HTML text are treated as different types. We realize this idea in Python as a testing framework FLAT-PY. It contains user annotations, all directly attached to the user's code, to (1) define such \emph{language types}, (2) specify pre-/post-conditions serving as \emph{semantic oracles} or contracts for functions, and (3) fuzz functions via random string inputs generated from a \emph{language-based fuzzer}. From these annotations, FLAY-PY \emph{automatically} checks type correctness at runtime via \emph{code instrumentation}, and reports any detected type error as soon as possible, preventing bugs from flowing deeply into other parts of the code. Case studies on real Python code fragments show that FLAT-PY is enable to catch logical bugs from random inputs, requiring a reasonable amount of user annotations.2025-01-20T14:16:24ZFengmin ZhuAndreas Zellerhttp://arxiv.org/abs/2404.04132v2Accurate and Extensible Symbolic Execution of Binary Code based on Formal ISA Semantics2025-01-20T11:56:45ZSymbolic execution is an SMT-based software verification and testing technique. Symbolic execution requires tracking performed computations during software simulation to reason about branches in the software under test. The prevailing approach on symbolic execution of binary code tracks computations by transforming the code to be tested to an architecture-independent IR and then symbolically executes this IR. However, the resulting IR must be semantically equivalent to the binary code, making this process complex and error-prone. The semantics of the binary code are specified by the targeted ISA, commonly given in natural language and requiring a manual implementation of the transformation to an IR. In recent years, the use of formal languages to describe ISA semantics in a machine-readable way has gained increased popularity. We investigate the utilization of such formal semantics for symbolic execution of binary code, achieving an accurate representation of instruction semantics. We present a prototype for the RISC-V ISA and conduct a case study to demonstrate that it can be easily extended to additional instructions. Furthermore, we perform an experimental comparison with prior work which resulted in the discovery of five previously unknown bugs in the ISA implementation of the popular IR-based symbolic executor angr.2024-04-05T14:29:39ZTo be published in the proceedings of the 2025 Design, Automation and Test in Europe Conference (DATE'25)Sören TempelTobias BrandtChristoph LüthChristian DietrichRolf Drechsler10.23919/DATE64628.2025.10993257