https://arxiv.org/api/vhGbANHzJ6cgGhn7sa4HYqiZGPk 2026-06-23T13:50:28Z 9934 1005 15 http://arxiv.org/abs/2506.07834v2 Execution-Aware Program Reduction for WebAssembly via Record and Replay 2025-11-01T07:37:26Z

WebAssembly (Wasm) programs may trigger bugs in their engine implementations. To aid debugging, program reduction techniques try to produce a smaller variant of the input program that still triggers the bug. However, existing execution-unaware program reduction techniques struggle with large and complex Wasm programs, because they rely on static information and apply syntactic transformations, while ignoring the valuable information offered by the input program's execution behavior. We present RR-Reduce and Hybrid-Reduce, novel execution-aware program reduction techniques that leverage execution behaviors via record and replay. RR-Reduce identifies a bug-triggering function as the target function, isolates that function from the rest of the program, and generates a reduced program that replays only the interactions between the target function and the rest of the program. Hybrid-Reduce combines a complementary execution-unaware reduction technique with RR-Reduce to further reduce program size. We evaluate RR-Reduce and Hybrid-Reduce on 28 Wasm programs that trigger a diverse set of bugs in three engines. On average, RR-Reduce reduces the programs to 1.20 percent of their original size in 14.5 minutes, which outperforms the state of the art by 33.15 times in terms of reduction time. Hybrid-Reduce reduces the programs to 0.13 percent of their original size in 3.5 hours, which outperforms the state of the art by 3.42 times in terms of reduced program size and 2.26 times in terms of reduction time. We envision RR-Reduce as the go-to tool for rapid, on-demand debugging in minutes, and Hybrid-Reduce for scenarios where developers require the smallest possible programs.

2025-06-09T14:57:10Z Accepted at ASE 2025 Doehyun Baek Daniel Lehmann Ben L. Titzer Sukyoung Ryu Michael Pradel http://arxiv.org/abs/2511.00403v1 Equality Saturation Guided by Large Language Models 2025-11-01T05:02:46Z

One critical issue with large language models (LLMs) is their inability to guarantee correctness. Although this problem can be addressed by applying LLMs to formal rewrite systems, current LLMs are still far from adequate to generate sound rewrite chains. To bridge this gap, this paper proposes LLM-guided equality saturation, dubbed LGuess, by incorporating e-graphs as an intermediate layer between LLMs and rewrite systems. LGuess queries LLMs only for high-level rewrite checkpoints and uses e-graphs to supply low-level rewrite chains between these checkpoints. The key technical challenge in this procedure lies in effectively extracting a suitable checkpoint from a saturated e-graph, which LGuess addresses by learning a probabilistic model from the LLM. The model predicts probable checkpoints while remaining simple enough for effective extraction. We implement a prototype of LGuess and evaluate it on the problem of factorizing multivariable polynomials. The results demonstrate a significant advantage of LGuess compared to both straightforward equality saturation and the approach that queries the LLM directly for the rewrite chain.

2025-11-01T05:02:46Z presented at EGRAPHS 2025 Wentao Peng Ruyi Ji Yingfei Xiong http://arxiv.org/abs/2510.26101v2 QCoder Benchmark: Bridging Language Generation and Quantum Hardware through Simulator-Based Feedback 2025-11-01T03:02:22Z

Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it remains underexplored in domains that require interaction with hardware devices, such as quantum programming, where human coders write Python code that is executed on a quantum computer. To address this gap, we introduce QCoder Benchmark, an evaluation framework that assesses LLMs on quantum programming with feedback from simulated hardware devices. Our benchmark offers two key features. First, it supports evaluation using a quantum simulator environment beyond conventional Python execution, allowing feedback of domain-specific metrics such as circuit depth, execution time, and error classification, which can be used to guide better generation. Second, it incorporates human-written code submissions collected from real programming contests, enabling both quantitative comparisons and qualitative analyses of LLM outputs against human-written codes. Our experiments reveal that even advanced models like GPT-4o achieve only around 18.97% accuracy, highlighting the difficulty of the benchmark. In contrast, reasoning-based models such as o3 reach up to 78% accuracy, outperforming averaged success rates of human-written codes (39.98%). We release the QCoder Benchmark dataset and public evaluation API to support further research. (Codes and datasets are available at https://qcoder-bench.github.io/ )

2025-10-30T03:27:35Z Accepted to INLG2025 Taku Mikuriya Tatsuya Ishigaki Masayuki Kawarada Shunya Minami Tadashi Kadowaki Yohichi Suzuki Soshun Naito Shunya Takata Takumi Kato Tamotsu Basseda Reo Yamada Hiroya Takamura http://arxiv.org/abs/2511.00125v1 Inferring multiple helper Dafny assertions with LLMs 2025-10-31T09:45:39Z

The Dafny verifier provides strong correctness guarantees but often requires numerous manual helper assertions, creating a significant barrier to adoption. We investigate the use of Large Language Models (LLMs) to automatically infer missing helper assertions in Dafny programs, with a primary focus on cases involving multiple missing assertions. To support this study, we extend the DafnyBench benchmark with curated datasets where one, two, or all assertions are removed, and we introduce a taxonomy of assertion types to analyze inference difficulty. Our approach refines fault localization through a hybrid method that combines LLM predictions with error-message heuristics. We implement this approach in a new tool called DAISY (Dafny Assertion Inference SYstem). While our focus is on multiple missing assertions, we also evaluate DAISY on single-assertion cases. DAISY verifies 63.4% of programs with one missing assertion and 31.7% with multiple missing assertions. Notably, many programs can be verified with fewer assertions than originally present, highlighting that proofs often admit multiple valid repair strategies and that recovering every original assertion is unnecessary. These results demonstrate that automated assertion inference can substantially reduce proof engineering effort and represent a step toward more scalable and accessible formal verification.

2025-10-31T09:45:39Z Álvaro Silva Alexandra Mendes Ruben Martins http://arxiv.org/abs/2510.27067v1 Dependence-Driven, Scalable Quantum Circuit Mapping with Affine Abstractions 2025-10-31T00:33:54Z

Qubit Mapping is a critical task in Quantum Compilation, as modern Quantum Processing Units (QPUs) are constrained to nearest-neighbor interactions defined by a qubit coupling graph. This compiler pass repairs the connectivity of two-qubit gates whose operands are not adjacent by inserting SWAP gates that move the state of qubits between directly connected qubits. Deciding when to introduce SWAPs while minimizing their count is critical because the error in quantum programs increases exponentially with the circuit latency, measured in number of gates along the critical path of the circuit. Prior work for this problem relied on heuristics and exact methods that partition the circuit into two or more layers, but failed to exploit valuable dependence information in any form. This paper introduces a novel qubit mapping algorithm based on the weight of transitive dependences. The introduced mapper models quantum circuits with affine abstractions thereby yielding the ability to compute transitive dependences. In turn, the newfound information is used to partition circuits by dependence distances and compute, efficiently, distinct weights for each layer. We evaluate the efficiency of our mapper on IBM and Rigetti QPUs, using the large datasets from the QUEKO and QASMBench benchmark suites, and against four baseline tools (QMAP, Sabre, Cirq and TKET), demonstrating notable improvements in circuit depth and swap count while delivering competitive scalability.

2025-10-31T00:33:54Z To appear in the Proceedings of the 2026 International Symposium on Code Generation and Optimization (CGO 2026) Marouane Benbetka Merwan Bekkar Riyadh Baghdadi Martin Kong http://arxiv.org/abs/2510.26431v1 CHCVerif: A Portfolio-Based Solver for Constrained Horn Clauses 2025-10-30T12:25:06Z

Constrained Horn Clauses (CHCs) are widely adopted as intermediate representations for a variety of verification tasks, including safety checking, invariant synthesis, and interprocedural analysis. This paper introduces CHCVERIF, a portfolio-based CHC solver that adopts a software verification approach for solving CHCs. This approach enables us to reuse mature software verification tools to tackle CHC benchmarks, particularly those involving bitvectors and low-level semantics. Our evaluation shows that while the method enjoys only moderate success with linear integer arithmetic, it achieves modest success on bitvector benchmarks. Moreover, our results demonstrate the viability and potential of using software verification tools as backends for CHC solving, particularly when supported by a carefully constructed portfolio.

2025-10-30T12:25:06Z In Proceedings HCVS 2025, arXiv:2510.25468 EPTCS 434, 2025, pp. 40-51 Mihály Dobos-Kovács Department of Artificial Intelligence and Systems Engineering, Budapest University of Technology and Economics, Hungary Levente Bajczi Department of Artificial Intelligence and Systems Engineering, Budapest University of Technology and Economics, Hungary András Vörös Department of Artificial Intelligence and Systems Engineering, Budapest University of Technology and Economics, Hungary 10.4204/EPTCS.434.6 http://arxiv.org/abs/2510.26429v1 Semantic Properties of Computations Defined by Elementary Inference Systems 2025-10-30T12:24:37Z

We consider sets/relations/computations defined by *Elementary Inference Systems* I, which are obtained from Smullyan's *elementary formal systems* using Gentzen's notation for inference rules, and proof trees for atoms P(t_1,...,t_n), where predicate P represents the considered set/relation/computation. A first-order theory Th(I), actually a set of definite Horn clauses, is given to I. Properties of objects defined by I are expressed as first-order sentences F, which are proved true or false by *satisfaction* M |= F of F in a *canonical* model M of Th(I). For this reason, we call F a *semantic property* of I. Since canonical models are, in general, incomputable, we show how to (dis)prove semantic properties by satisfiability in an *arbitrary* model A of Th(I). We apply these ideas to the analysis of properties of programming languages and systems whose computations can be described by means of an elementary inference system. In particular, rewriting-based systems.

2025-10-30T12:24:37Z In Proceedings HCVS 2025, arXiv:2510.25468 EPTCS 434, 2025, pp. 10-26 Salvador Lucas Universitat Politecnica de Valencia 10.4204/EPTCS.434.4 http://arxiv.org/abs/2510.26428v1 Finding Regular Herbrand Models for CHCs using Answer Set Programming 2025-10-30T12:24:23Z

We are interested in proving satisfiability of Constrained Horn Clauses (CHCs) over Algebraic Data Types (ADTs). We propose to prove satisfiability by building a tree automaton recognizing the Herbrand model of the CHCs. If such an automaton exists then the model is said to be regular, i.e., the Herbrand model is a regular set of atoms. Kostyukov et al. have shown how to derive an automaton when CVC4 finds a finite model of the CHCs. We propose an alternative way to build the automaton using an encoding into a SAT problem using Clingo, an Answer Set Programming (ASP) tool. We implemented a translation of CHCs with ADTs into an ASP problem. Combined with Clingo, we obtain a semi-complete satisfiability checker: it finds a tree automaton if a regular Herbrand model exists or finds a counter-example if the problem is unsatisfiable.

2025-10-30T12:24:23Z In Proceedings HCVS 2025, arXiv:2510.25468 EPTCS 434, 2025, pp. 4-9 Gregoire Maire ENS Rennes, Rennes, France Thomas Genet Univ Rennes, IRISA, Inria, Rennes, France 10.4204/EPTCS.434.3 http://arxiv.org/abs/2307.02180v5 Runtime Repeated Recursion Unfolding in CHR: A Just-In-Time Online Program Optimization Strategy That Can Achieve Super-Linear Speedup 2025-10-30T09:46:20Z

We introduce a just-in-time runtime program transformation strategy based on repeated recursion unfolding. Our online program optimization generates several versions of a recursion differentiated by the minimal number of recursive steps covered. The base case of the recursion is ignored in our technique. Our method is introduced here on the basis of single linear direct recursive rules. When a recursive call is encountered at runtime, first an unfolder creates specializations of the associated recursive rule on-the-fly and then an interpreter applies these rules to the call. Our approach reduces the number of recursive rule applications to its logarithm at the expense of introducing a logarithmic number of generic unfolded rules. We prove correctness of our online optimization technique and determine its time complexity. For recursions which have enough simplifyable unfoldings, a super-linear is possible, i.e. speedup by more than a constant factor. The necessary simplification is problem-specific and has to be provided at compile-time. In our speedup analysis, we prove a sufficient condition as well as a sufficient and necessary condition for super-linear speedup relating the complexity of the recursive steps of the original rule and the unfolded rules. We have implemented an unfolder and meta-interpreter for runtime repeated recursion unfolding with just five rules in Constraint Handling Rules (CHR) embedded in Prolog. We illustrate the feasibility of our approach with simplifications, time complexity results and benchmarks for some basic tractable algorithms. The simplifications require some insight and were derived manually. The runtime improvement quickly reaches several orders of magnitude, consistent with the super-linear speedup predicted by our theorems.

2023-07-05T10:18:51Z Final version as accepted for Journal Fundamenta Informaticae Fundamenta Informaticae, Volume 194, Issue 3 (October 31, 2025) fi:11547 Thom Fruehwirth 10.46298/fi.11547 http://arxiv.org/abs/2510.26839v1 Internalizing Extensions in Lattices of Type Theories 2025-10-29T23:51:54Z

Many proof assistants allow the use of features and axioms that increase their expressive power. However, these extensions must be used with care, as some combinations are known to lead to logical inconsistencies. Therefore, proof assistants include mechanisms that track which extensions are used in a proof development or module, ensuring that incompatible extensions are not used simultaneously. Unfortunately, existing extension tracking mechanisms are external to the type system. This means that we cannot specify precisely which extensions a definition depends on. Having the ability to write more precise specifications means we are not picking an overapproximation of the extensions needed, which prevents reusing definitions in the presence of incompatible extensions. Furthermore, we cannot refer to definitions that use incompatible extensions even if they are never used in inconsistent ways. The reasoning principles of one extension therefore cannot be used as a metatheory to reason about the properties of an incompatible extension. In this report, I explore the use of the Dependent Calculus of Indistinguishability (DCOI) by Liu et al. for extension tracking. DCOI is a dependent type system with dependency tracking, where terms and variables are assigned dependency levels alongside their types. These dependency levels form a lattice that describes which levels are permitted to access what. To instead track extensions, each set of extensions would correspond to a dependency level, and the lattice would describe how extensions are permitted to interact.

2025-10-29T23:51:54Z This report was written as part of the Research Qualifier for the doctoral degree requirements in the department of Computer and Information Science at the University of Pennsylvania Jonathan Chan http://arxiv.org/abs/2510.26016v1 Fair intersection of seekable iterators 2025-10-29T23:16:15Z

miniKanren's key semantic advance over Prolog is to implement a complete yet efficient search strategy, fairly interleaving execution between disjuncts. This fairness is accomplished by bounding how much work is done exploring one disjunct before switching to the next. We show that the same idea -- fairness via bounded work -- underlies an elegant compositional approach to implementing worst-case optimal joins using a seekable iterator interface, suitable for shallow embedding in functional languages.

2025-10-29T23:16:15Z 8 pages, 2 figures, published in miniKanren 2025 Michael Arntzenius http://arxiv.org/abs/2207.09190v3 Central Submonads and Notions of Computation: Soundness, Completeness and Internal Languages 2025-10-29T16:28:26Z

Monads in category theory are algebraic structures that can be used to model computational effects in programming languages. We show how the notion of "centre", and more generally "centrality", i.e. the property for an effect to commute with all other effects, may be formulated for strong monads acting on symmetric monoidal categories. We identify three equivalent conditions which characterise the existence of the centre of a strong monad (some of which relate it to the premonoidal centre of Power and Robinson) and we show that every strong monad on many well-known naturally occurring categories does admit a centre, thereby showing that this new notion is ubiquitous. More generally, we study central submonads, which are necessarily commutative, just like the centre of a strong monad. We provide a computational interpretation by formulating equational theories of lambda calculi equipped with central submonads, we describe categorical models for these theories and prove soundness, completeness and internal language results for our semantics.

2022-07-19T10:53:40Z Journal version of the conference paper accepted to LICS'23 TItouan Carette Louis Lemonnier Vladimir Zamdzhiev http://arxiv.org/abs/2510.25468v1 Proceedings of the 12th Workshop on Horn Clauses for Verification and Synthesis 2025-10-29T12:41:44Z

This volume contains the post-proceedings of the 12th Workshop on Horn Clauses for Verification and Synthesis (HCVS 2025), which took place in Zagreb, Croatia, on July 22, 2025, as affiliated workshop of the 37th International Conference on Computer Aided Verification (CAV 2025).

2025-10-29T12:41:44Z EPTCS 434, 2025 Emanuele De Angelis CNR-IASI, Italy Florian Frohn RWTH Aachen, Germany 10.4204/EPTCS.434 http://arxiv.org/abs/2504.07732v3 Efficient Formal Verification of Quantum Error Correcting Programs 2025-10-29T08:27:40Z

Quantum error correction (QEC) is fundamental for suppressing noise in quantum hardware and enabling fault-tolerant quantum computation. In this paper, we propose an efficient verification framework for QEC programs. We define an assertion logic and a program logic specifically crafted for QEC programs and establish a sound proof system. We then develop an efficient method for handling verification conditions (VCs) of QEC programs: for Pauli errors, the VCs are reduced to classical assertions that can be solved by SMT solvers, and for non-Pauli errors, we provide a heuristic algorithm. We formalize the proposed program logic in Coq proof assistant, making it a verified QEC verifier. Additionally, we implement an automated QEC verifier, Veri-QEC, for verifying various fault-tolerant scenarios. We demonstrate the efficiency and broad functionality of the framework by performing different verification tasks across various scenarios. Finally, we present a benchmark of 14 verified stabilizer codes.

2025-04-10T13:28:49Z 41 pages, 10 figures, 4 tables; v2: Extended version of the paper in PLDI 2025; Evaluated artifact at https://doi.org/10.5281/zenodo.15267327 v3: revise typos and inconsistencies Proceedings of the ACM on Programming Languages, Volume 9, Issue PLDI (2025) 1068-1093 Qifan Huang Li Zhou Wang Fang Mengyu Zhao Mingsheng Ying 10.1145/3729293 http://arxiv.org/abs/2510.25112v1 The Singularity Theory of Concurrent Programs: A Topological Characterization and Detection of Deadlocks and Livelocks 2025-10-29T02:24:07Z

This paper introduces a novel paradigm for the analysis and verification of concurrent programs -- the Singularity Theory. We model the execution space of a concurrent program as a branched topological space, where program states are points and state transitions are paths. Within this framework, we characterize deadlocks as attractors and livelocks as non-contractible loops in the execution space. By employing tools from algebraic topology, particularly homotopy and homology groups, we define a series of concurrent topological invariants to systematically detect and classify these concurrent "singularities" without exhaustively traversing all states. This work aims to establish a geometric and topological foundation for concurrent program verification, transcending the limitations of traditional model checking.

2025-10-29T02:24:07Z 10 pages Di Zhang