https://arxiv.org/api/LL4xibL+HUhiOS8Mn5vaqlTBUI0 2026-06-30T13:44:13Z 9958 1905 15 http://arxiv.org/abs/2501.07920v1 Coinductive Proofs for Temporal Hyperliveness 2025-01-14T08:10:44Z

Temporal logics for hyperproperties have recently emerged as an expressive specification technique for relational properties of reactive systems. While the model checking problem for such logics has been widely studied, there is a scarcity of deductive proof systems for temporal hyperproperties. In particular, hyperproperties with an alternation of universal and existential quantification over system executions are rarely supported. In this paper, we focus on the difficult class of hyperproperties of the form $\forall^*\exists^*ψ$, where $ψ$ is a safety relation. We show that hyperproperties of this class -- which includes many hyperliveness properties of interest -- can always be approximated by coinductive relations. This enables intuitive proofs by coinduction. Based on this observation, we define HyCo (HYperproperties, COinductively), a mechanized framework to reason about temporal hyperproperties within the Coq proof assistant. We detail the construction of HyCo, provide a proof of its soundness, and exemplify its use by applying it to the verification of reactive systems modeled as imperative programs with nondeterminism and I/O.

2025-01-14T08:10:44Z Arthur Correnson Bernd Finkbeiner http://arxiv.org/abs/2501.07918v1 Finding $\forall\exists$ Hyperbugs using Symbolic Execution 2025-01-14T08:01:00Z

Many important hyperproperties, such as refinement and generalized non-interference, fall into the class of $\forall\exists$ hyperproperties and require, for each execution trace of a system, the existence of another trace relating to the first one in a certain way. The alternation of quantifiers renders $\forall\exists$ hyperproperties extremely difficult to verify, or even just to test. Indeed, contrary to trace properties, where it suffices to find a single counterexample trace, refuting a $\forall\exists$ hyperproperty requires not only to find a trace, but also a proof that no second trace satisfies the specified relation with the first trace. As a consequence, automated testing of $\forall\exists$ hyperproperties falls out of the scope of existing automated testing tools. In this paper, we present a fully automated approach to detect violations of $\forall\exists$ hyperproperties in software systems. Our approach extends bug-finding techniques based on symbolic execution with support for trace quantification. We provide a prototype implementation of our approach, and demonstrate its effectiveness on a set of challenging examples.

2025-01-14T08:01:00Z Arthur Correnson Tobias Niessen Bernd Finkbeiner Georg Weissenbacher http://arxiv.org/abs/2312.09982v4 ACPO: AI-Enabled Compiler Framework 2025-01-14T01:42:46Z

The key to performance optimization of a program is to decide correctly when a certain transformation should be applied by a compiler. This is an ideal opportunity to apply machine-learning models to speed up the tuning process; while this realization has been around since the late 90s, only recent advancements in ML enabled a practical application of ML to compilers as an end-to-end framework. This paper presents ACPO: An AI-Enabled Compiler Framework, a novel framework that provides LLVM with simple and comprehensive tools to benefit from employing ML models for different optimization passes. We first showcase the high-level view, class hierarchy, and functionalities of ACPO and subsequently, demonstrate \taco{a couple of use cases of ACPO by ML-enabling the Loop Unroll and Function Inlining passes used in LLVM's O3. and finally, describe how ACPO can be leveraged to optimize other passes. Experimental results reveal that the ACPO model for Loop Unroll can gain on average 4%, 3%, 5.4%, and 0.2% compared to LLVM's vanilla O3 optimization when deployed on Polybench, Coral-2, CoreMark, and Graph-500, respectively. Furthermore, by including both Function Inlining and Loop Unroll models, ACPO can provide a combined speedup of 4.5% on Polybench and 2.4% on Cbench when compared with LLVM's O3, respectively.

2023-12-15T17:49:24Z ACPO (12 pages) Amir H. Ashouri Muhammad Asif Manzoor Duc Minh Vu Raymond Zhang Colin Toft Ziwen Wang Angel Zhang Bryan Chan Tomasz S. Czajkowski Yaoqing Gao 10.1109/CASES60062.2024.00011 http://arxiv.org/abs/2501.07535v1 Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU 2025-01-13T18:15:44Z

Fully homomorphic encryption (FHE) and zero-knowledge proofs (ZKPs) are emerging as solutions for data security in distributed environments. However, the widespread adoption of these encryption techniques is hindered by their significant computational overhead, primarily resulting from core cryptographic operations that involve large integer arithmetic. This paper presents a formalization of multi-word modular arithmetic (MoMA), which breaks down large bit-width integer arithmetic into operations on machine words. We further develop a rewrite system that implements MoMA through recursive rewriting of data types, designed for compatibility with compiler infrastructures and code generators. We evaluate MoMA by generating cryptographic kernels, including basic linear algebra subprogram (BLAS) operations and the number theoretic transform (NTT), targeting various GPUs. Our MoMA-based BLAS operations outperform state-of-the-art multi-precision libraries by orders of magnitude, and MoMA-based NTTs achieve near-ASIC performance on commodity GPUs.

2025-01-13T18:15:44Z Accepted at the International Symposium on Code Generation and Optimization (CGO), 2025 Naifeng Zhang Franz Franchetti http://arxiv.org/abs/2501.06780v1 COMPASS: A Compiler Framework for Resource-Constrained Crossbar-Array Based In-Memory Deep Learning Accelerators 2025-01-12T11:31:25Z

Recently, crossbar array based in-memory accelerators have been gaining interest due to their high throughput and energy efficiency. While software and compiler support for the in-memory accelerators has also been introduced, they are currently limited to the case where all weights are assumed to be on-chip. This limitation becomes apparent with the significantly increasing network sizes compared to the in-memory footprint. Weight replacement schemes are essential to address this issue. We propose COMPASS, a compiler framework for resource-constrained crossbar-based processing-in-memory (PIM) deep neural network (DNN) accelerators. COMPASS is specially targeted for networks that exceed the capacity of PIM crossbar arrays, necessitating access to external memories. We propose an algorithm to determine the optimal partitioning that divides the layers so that each partition can be accelerated on chip. Our scheme takes into account the data dependence between layers, core utilization, and the number of write instructions to minimize latency, memory accesses, and improve energy efficiency. Simulation results demonstrate that COMPASS can accommodate much more networks using a minimal memory footprint, while improving throughput by 1.78X and providing 1.28X savings in energy-delay product (EDP) over baseline partitioning methods.

2025-01-12T11:31:25Z Accepted IEEE DATE 2025 Jihoon Park Jeongin Choe Dohyun Kim Jae-Joon Kim http://arxiv.org/abs/2501.06579v1 Refuting Equivalence in Probabilistic Programs with Conditioning 2025-01-11T16:25:32Z

We consider the problem of refuting equivalence of probabilistic programs, i.e., the problem of proving that two probabilistic programs induce different output distributions. We study this problem in the context of programs with conditioning (i.e., with observe and score statements), where the output distribution is conditioned by the event that all the observe statements along a run evaluate to true, and where the probability densities of different runs may be updated via the score statements. Building on a recent work on programs without conditioning, we present a new equivalence refutation method for programs with conditioning. Our method is based on weighted restarting, a novel transformation of probabilistic programs with conditioning to the output equivalent probabilistic programs without conditioning that we introduce in this work. Our method is the first to be both a) fully automated, and b) providing provably correct answers. We demonstrate the applicability of our method on a set of programs from the probabilistic inference literature.

2025-01-11T16:25:32Z Accepted at TACAS 2025 Krishnendu Chatterjee Ehsan Kafshdar Goharshady Petr Novotný Đorđe Žikelić http://arxiv.org/abs/2501.06283v1 Dafny as Verification-Aware Intermediate Language for Code Generation 2025-01-10T17:23:14Z

Using large language models (LLMs) to generate source code from natural language prompts is a popular and promising idea with a wide range of applications. One of its limitations is that the generated code can be faulty at times, often in a subtle way, despite being presented to the user as correct. In this paper, we explore ways in which formal methods can assist with increasing the quality of code generated by an LLM. Instead of emitting code in a target language directly, we propose that the user guides the LLM to first generate an opaque intermediate representation, in the verification-aware language Dafny, that can be automatically validated for correctness against agreed on specifications. The correct Dafny program is then compiled to the target language and returned to the user. All user-system interactions throughout the procedure occur via natural language; Dafny code is never exposed. We describe our current prototype and report on its performance on the HumanEval Python code generation benchmarks.

2025-01-10T17:23:14Z Yue Chen Li Stefan Zetzsche Siva Somayyajula http://arxiv.org/abs/2402.11650v2 Programmatic Reinforcement Learning: Navigating Gridworlds 2025-01-10T09:44:48Z

The field of reinforcement learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL. Considering a class of gridworld environments, we define a class of programmatic policies. Our main contributions are to place upper bounds on the size of optimal programmatic policies, and to construct an algorithm for synthesizing them. These theoretical findings are complemented by a prototype implementation of the algorithm.

2024-02-18T17:02:39Z Published in the proceedings of GenPlan, AAAI 2025 Workshop on Generlization in Planning Guruprerana Shabadi Nathanaël Fijalkow Théo Matricon http://arxiv.org/abs/2312.02851v4 Checkpoint-based rollback recovery in session programming 2025-01-09T11:12:35Z

To react to unforeseen circumstances or amend abnormal situations in communication-centric systems, programmers are in charge of "undoing" the interactions which led to an undesired state. To assist this task, session-based languages can be endowed with reversibility mechanisms. In this paper we propose a language enriched with programming facilities to commit session interactions, to roll back the computation to a previous commit point, and to abort the session. Rollbacks in our language always bring the system to previous visited states and a rollback cannot bring the system back to a point prior to the last commit. Programmers are relieved from the burden of ensuring that a rollback never restores a checkpoint imposed by a session participant different from the rollback requester. Such undesired situations are prevented at design-time (statically) by relying on a decidable compliance check at the type level, implemented in MAUDE. We show that the language satisfies error-freedom and progress of a session.

2023-12-03T09:02:52Z Logical Methods in Computer Science, Volume 21, Issue 1 (January 10, 2025) lmcs:12654 Claudio Antares Mezzina Francesco Tiezzi Nobuko Yoshida 10.46298/lmcs-21(1:2)2025 http://arxiv.org/abs/2501.05111v1 Baking for Dafny: A CakeML Backend for Dafny 2025-01-09T09:59:22Z

Dafny is a verification-aware programming language that allows developers to formally specify their programs and prove them correct. Currently, a Dafny program is compiled in two steps: First, a backend translates the input program to a high-level target language like C# or Rust. Second, the translated program is compiled using the target language's toolchain. Recently, an intermediate representation (IR) has been added to Dafny that serves as input to new backends. At the time of writing, none of these steps are verified, resulting in both the backend and the target language's toolchain being part of Dafny's trusted computing base (TCB). To reduce Dafny's TCB, we started developing a new backend that translates Dafny to CakeML, a verified, bootstrapped subset of Standard ML, in the interactive theorem prover HOL4. We also started to define functional big-step semantics for the Dafny IR to prove correctness of the backend.

2025-01-09T09:59:22Z 4 pages, 7 figures. Accepted to the Dafny workshop at the 52nd ACM SIGPLAN Symposium on Principles of Programming Languages (POPL 2025). For code, see https://github.com/CakeML/cakeml/tree/1fc6f802c664dfdaa2bbfb1e5e4a2051bcf68756/compiler/dafny Daniel Nezamabadi Magnus Myreen http://arxiv.org/abs/2501.04908v1 HaVen: Hallucination-Mitigated LLM for Verilog Code Generation Aligned with HDL Engineers 2025-01-09T01:47:41Z

Recently, the use of large language models (LLMs) for Verilog code generation has attracted great research interest to enable hardware design automation. However, previous works have shown a gap between the ability of LLMs and the practical demands of hardware description language (HDL) engineering. This gap includes differences in how engineers phrase questions and hallucinations in the code generated. To address these challenges, we introduce HaVen, a novel LLM framework designed to mitigate hallucinations and align Verilog code generation with the practices of HDL engineers. HaVen tackles hallucination issues by proposing a comprehensive taxonomy and employing a chain-of-thought (CoT) mechanism to translate symbolic modalities (e.g. truth tables, state diagrams, etc.) into accurate natural language descriptions. Furthermore, HaVen bridges this gap by using a data augmentation strategy. It synthesizes high-quality instruction-code pairs that match real HDL engineering practices. Our experiments demonstrate that HaVen significantly improves the correctness of Verilog code generation, outperforming state-of-the-art LLM-based Verilog generation methods on VerilogEval and RTLLM benchmark. HaVen is publicly available at https://github.com/Intelligent-Computing-Research-Group/HaVen.

2025-01-09T01:47:41Z Yiyao Yang Fu Teng Pengju Liu Mengnan Qi Chenyang Lv Ji Li Xuhong Zhang Zhezhi He http://arxiv.org/abs/2501.04503v1 Developing a Modular Compiler for a Subset of a C-like Language 2025-01-08T13:42:54Z

The paper introduces the development of a modular compiler for a subset of a C-like language, which addresses the challenges in constructing a compiler for high-level languages. This modular approach will allow developers to modify a language by adding or removing subsets as required, resulting in a minimal and memory-efficient compiler. The development process is divided into small, incremental steps, where each step yields a fully functioning compiler for an expanding subset of the language. The paper outlines the iterative developmental phase of the compiler, emphasizing progressive enhancements in capabilities and functionality. Adherence to industry best practices of modular design, code reusability, and documentation has enabled the resulting compiler's functional efficiency, maintainability, and extensibility. The compiler proved to be effective not only in managing the language structure but also in developing optimized code, which demonstrates its practical usability. This was also further assessed using the compiler on a tiny memory-deficient single-board computer, again showing the compiler's efficiency and suitability for resource-constrained devices.

2025-01-08T13:42:54Z Debasish Dutta Neeharika Sonowal Irani Hazarika http://arxiv.org/abs/1909.12582v4 Towards a Coq-verified Chain of Esterel Semantics 2025-01-06T19:13:58Z

This paper focuses on formally specifying and verifying the chain of formal semantics of the Esterel synchronous programming language using the Coq proof assistant. In particular, in addition to the standard logical (LBS) semantics, constructive semantics (CBS) and constructive state semantics (CSS), we introduce a novel microstep semantics that gets rid of the Must/Can potential function pair of the constructive semantics and can be viewed as an abstract version of Esterel's circuit semantics used by compilers to generate software code and hardware designs. Excluding the loop construct from Esterel, the paper also provides formal proofs in Coq of the equivalence between the CBS and CSS semantics and of the refinement of the CSS by the microstep semantics.

2019-09-27T09:43:14Z Gérard Berry Lionel Rieg http://arxiv.org/abs/2501.02901v1 DeCon: Detecting Incorrect Assertions via Postconditions Generated by a Large Language Model 2025-01-06T10:25:28Z

Recently, given the docstring for the target problem and the target function signature, large language models (LLMs) have been used not only to generate source code, but also to generate test cases, consisting of test inputs and assertions (e.g., in the form of checking an actual output against the expected output). However, as shown by our empirical study on assertions generated by four LLMs for the HumanEval benchmark, over 62% of the generated assertions are incorrect (i.e., failed on the ground-truth problem solution). To detect incorrect assertions (given the docstring and the target function signature along with a sample of example inputs and outputs), in this paper, we propose a new approach named DeCon to effectively detect incorrect assertions via LLM-generated postconditions for the target problem (a postcondition is a predicate that must always be true just after the execution of the ground-truth problem solution). Our approach requires a small set of I/O examples (i.e., a sample of example inputs and outputs) for the target problem (e.g., the I/O examples included in the docstring for a target problem in HumanEval). We use the given I/O examples to filter out those LLM-generated postconditions that are violated by at least one given I/O example. We then use the remaining postconditions to detect incorrect assertions as those assertions that violate at least one remaining postcondition. Experimental results show that DeCon can detect averagely more than 64% (63% and 65.5% detected by GPT-3.5 and GPT-4, respectively) incorrect assertions generated by four state-of-the-art LLMs, and DeCon can also improve the effectiveness of these LLMs in code generation by 4% in terms of Pass@1. In addition, although DeCon might filter out correct assertions, the fault-finding ability of the remaining correct assertions decreases only slightly.

2025-01-06T10:25:28Z Hao Yu Tianyu Chen Jiaming Huang Zongyang Li Dezhi Ran Xinyu Wang Ying Li Assaf Marron David Harel Yuan Xie Tao Xie http://arxiv.org/abs/2501.02413v1 Semantic foundations of equality saturation 2025-01-05T01:46:42Z

Equality saturation is an emerging technique for program and query optimization developed in the programming language community. It performs term rewriting over an E-graph, a data structure that compactly represents a program space. Despite its popularity, the theory of equality saturation lags behind the practice. In this paper, we define a fixpoint semantics of equality saturation based on tree automata and uncover deep connections between equality saturation and the chase. We characterize the class of chase sequences that correspond to equality saturation. We study the complexities of terminations of equality saturation in three cases: single-instance, all-term-instance, and all-E-graph-instance. Finally, we define a syntactic criterion based on acyclicity that implies equality saturation termination.

2025-01-05T01:46:42Z To appear at ICDT 2025 Dan Suciu Yisu Remy Wang Yihong Zhang