https://arxiv.org/api/L2QKUm5PKXrxcbKS80t2rISDtnI 2026-06-15T05:30:43Z 9888 600 15 http://arxiv.org/abs/2602.18307v1 VeriSoftBench: Repository-Scale Formal Verification Benchmarks for Lean 2026-02-20T16:05:06Z

Large language models have achieved striking results in interactive theorem proving, particularly in Lean. However, most benchmarks for LLM-based proof automation are drawn from mathematics in the Mathlib ecosystem, whereas proofs in software verification are developed inside definition-rich codebases with substantial project-specific libraries. We introduce VeriSoftBench, a benchmark of 500 Lean 4 proof obligations drawn from open-source formal-methods developments and packaged to preserve realistic repository context and cross-file dependencies. Our evaluation of frontier LLMs and specialized provers yields three observations. First, provers tuned for Mathlib-style mathematics transfer poorly to this repository-centric setting. Second, success is strongly correlated with transitive repository dependence: tasks whose proofs draw on large, multi-hop dependency closures are less likely to be solved. Third, providing curated context restricted to a proof's dependency closure improves performance relative to exposing the full repository, but nevertheless leaves substantial room for improvement. Our benchmark and evaluation suite are released at https://github.com/utopia-group/VeriSoftBench.

2026-02-20T16:05:06Z Yutong Xin Qiaochu Chen Greg Durrett Işil Dillig http://arxiv.org/abs/2602.18295v1 Towards a Higher-Order Bialgebraic Denotational Semantics 2026-02-20T15:46:53Z

The bialgebraic abstract GSOS framework by Turi and Plotkin provides an elegant categorical approach to modelling the operational and denotational semantics of programming and process languages. In abstract GSOS, bisimilarity is always a congruence, and it coincides with denotational equivalence. This saves the language designer from intricate, ad-hoc reasoning to establish these properties. The bialgebraic perspective on operational semantics in the style of abstract GSOS has recently been extended to higher-order languages, preserving compositionality of bisimilarity. However, a categorical understanding of bialgebraic denotational semantics according to Turi and Plotkin's original vision has so far been missing in the higher-order setting. In the present paper, we develop a theory of adequate denotational semantics in higher-order abstract GSOS. The denotational models are parametric in an appropriately chosen semantic domain in the form of a locally final coalgebra for a behaviour bifunctor, whose construction is fully decoupled from the syntax of the language. Our approach captures existing accounts of denotational semantics such as semantic domains built via general step-indexing, previously introduced on a per-language basis, and is shown to be applicable to a wide range of different higher-order languages, e.g. simply typed and untyped languages, or languages with computational effects such as probabilistic or non-deterministic branching.

2026-02-20T15:46:53Z Sergey Goncharov Marco Peressotti Stelios Tsampas Henning Urbat Stefano Volpe http://arxiv.org/abs/2602.18166v1 Grammar Repair with Examples and Tree Automata: Extended Version 2026-02-20T12:04:42Z

Context-free grammars (CFGs) are the de-facto formalism for declaratively describing concrete syntax for programming languages and generating parsers. One of the major challenges in defining a desired syntax is ruling out all possible ambiguities in the CFG productions that determine scoping rules as well as operator precedence and associativity. Practical tools for parser generation typically apply ad-hoc approaches for resolving such ambiguities, which might result in a parser's behavior that contradicts the intents of the language designer. In this work, we present a user-friendly approach to soundly repair grammars with ambiguities, which is inspired by the programming by example line of research in automated program synthesis. At the heart of our approach is the interpretation of both the initial CFG and additional examples that define the desired restrictions in precedence and associativity, as tree automata (TAs). The technical novelties of our approach are (1) a new TA learning algorithm that constructs an automaton based on the original grammar and examples that encode the user's preferred ways of resolving ambiguities all in a single TA, and (2) an efficient algorithm for TA intersection that utilises reachability analysis and optimizations that significantly reduce the size of the resulting automaton, which results in idiomatic CFGs amenable to parser generators. We have proven the soundness of the algorithms, and implemented our approach in a tool called Greta, demonstrating its effectiveness on a series of case studies.

2026-02-20T12:04:42Z Yunjeong Lee Gokul Rajiv Ilya Sergey http://arxiv.org/abs/2602.18534v1 Validated Code Translation for Projects with External Libraries 2026-02-20T10:55:14Z

Large Language Models (LLMs) have shown promise for program translation, particularly for migrating systems code to memory-safe languages such as Rust. However, existing approaches struggle when source programs depend on external libraries: LLMs frequently hallucinate non-existent target APIs and fail to generate call-enabling imports; moreover, validating semantic equivalence is challenging when the code manipulates opaque, library-defined types. We present a translation and validation framework for translating Go projects with external dependencies to Rust. Our approach combines (i) a retrieval mechanism that maps Go library APIs to Rust APIs, and (ii) a cross-language validation pipeline that establishes language interoperability in the presence of opaque library types by synthesising adapters exclusively from public library APIs, prior to validating I/O equivalence. We evaluate our system on six real-world Go repositories with non-trivial external dependencies. Our approach significantly increases both the compilation and equivalence success rate (up to 100% in the most dependency-heavy case; approx. 2x on average) by enabling validated translation that manipulate opaque, library-defined types.

2026-02-20T10:55:14Z Hanliang Zhang Arindam Sharma Cristina David Meng Wang Brandon Paulsen Daniel Kroening Wenjia Ye Taro Sekiyama http://arxiv.org/abs/2601.01944v2 The Invisible Hand of AI Libraries Shaping Open Source Projects and Communities 2026-02-20T10:14:27Z

In the early 1980s, Open Source Software emerged as a revolutionary concept amidst the dominance of proprietary software. What began as a revolutionary idea has now become the cornerstone of computer science. Amidst OSS projects, AI is increasing its presence and relevance. However, despite the growing popularity of AI, its adoption and impacts on OSS projects remain underexplored. We aim to assess the adoption of AI libraries in Python and Java OSS projects and examine how they shape development, including the technical ecosystem and community engagement. To this end, we will perform a large-scale analysis on 157.7k potential OSS repositories, employing repository metrics and software metrics to compare projects adopting AI libraries against those that do not. We expect to identify measurable differences in development activity, community engagement, and code complexity between OSS projects that adopt AI libraries and those that do not, offering evidence-based insights into how AI integration reshapes software development practices.

2026-01-05T09:50:37Z ACCEPTED REGISTERED REPORT AT SANER (CORE A*) 2026 Matteo Esposito Andrea Janes Valentina Lenarduzzi Davide Taibi http://arxiv.org/abs/2602.18090v1 Programming Backpropagation with Reverse Handlers for Arrows 2026-02-20T09:28:20Z

We introduce a new programming language and its categorical semantics in order to design and implement neural networks within the framework of algebraic effects and handlers for arrows. Our language enables us to construct neural networks symbolically, in the same manner as algebraic effects, and to assign implementations -- such as backpropagation computations -- to them via handlers. The advantage of this language design is that network descriptions become abstract and high-level, while implementations can be flexibly assigned to networks. We establish a rigorous foundation for our language by developing a type system, an operational semantics, a categorical semantics, and soundness and adequacy theorems. The technical core is the introduction of \emph{reverse handlers}, a novel handler mechanism for arrows for implementing backpropagation, together with new algebras of strong promonads on reverse differential restriction categories (RDRCs), whose string diagrams provide a formal graphical syntax and semantics for neural networks.

2026-02-20T09:28:20Z Takahiro Sanada Keisuke Hoshino Kenshin Hirai Shin-ya Katsumata http://arxiv.org/abs/2603.13242v1 Automating the Analysis and Improvement of Dynamic Programming Algorithms with Applications to Natural Language Processing 2026-02-20T02:01:27Z

This thesis develops a system for automatically analyzing and improving dynamic programs, such as those that have driven progress in natural language processing and computer science, more generally, for decades. Finding a correct program with the optimal asymptotic runtime can be unintuitive, time-consuming, and error-prone. This thesis aims to automate this laborious process. To this end, we develop an approach based on 1. a high-level, domain-specific language called Dyna for concisely specifying dynamic programs 2. a general-purpose solver to efficiently execute these programs 3. a static analysis system that provides type analysis and worst-case time/space complexity analyses 4. a rich collection of meaning-preserving transformations to programs, which systematizes the repeated insights of numerous authors when speeding up algorithms in the literature 5. a search algorithm for identifying a good sequence of transformations that reduce the runtime complexity, given an initial, correct program We show that, in practice, automated search -- like the mental search performed by human programmers -- can find substantial improvements to the initial program. Empirically, we show that many speed-ups described in the NLP literature could have been discovered automatically by our system. We provide a freely available prototype system at https://github.com/timvieira/dyna-pi.

2026-02-20T02:01:27Z 2023 PhD dissertation (Johns Hopkins University) Tim Vieira http://arxiv.org/abs/2602.17937v1 Analyzing LLM Instruction Optimization for Tabular Fact Verification 2026-02-20T01:56:27Z

Instruction optimization provides a lightweight, model-agnostic approach to enhancing the reasoning performance of large language models (LLMs). This paper presents the first systematic comparison of instruction optimization, based on the DSPy optimization framework, for tabular fact verification. We evaluate four out-of-the-box prompting techniques that cover both text-only prompting and code use: direct prediction, Chain-of-Thought (CoT), ReAct with SQL tools, and CodeAct with Python execution. We study three optimizers from the DSPy framework -- COPRO, MiPROv2, and SIMBA -- across four benchmarks and three model families. We find that instruction optimization consistently improves verification accuracy, with MiPROv2 yielding the most stable gains for CoT, and SIMBA providing the largest benefits for ReAct agents, particularly at larger model scales. Behavioral analyses reveal that SIMBA encourages more direct reasoning paths by applying heuristics, thereby improving numerical comparison abilities in CoT reasoning and helping avoid unnecessary tool calls in ReAct agents. Across different prompting techniques, CoT remains effective for tabular fact checking, especially with smaller models. Although ReAct agents built with larger models can achieve competitive performance, they require careful instruction optimization.

2026-02-20T01:56:27Z Xiaotang Du Giwon Hong Wai-Chung Kwan Rohit Saxena Ivan Titov Pasquale Minervini Emily Allaway http://arxiv.org/abs/2602.18511v1 Beyond Pass-by-Pass Optimization: Intent-Driven IR Optimization with Large Language Models 2026-02-19T13:48:51Z

Modern compilers optimize programs through a sequence of modular passes over intermediate representations (IR). While this pass-by-pass paradigm offers engineering benefits, it suffers from a pass coordination problem: locally beneficial transformations may block more profitable optimizations in later stages. This limitation stems from the lack of an explicit notion of optimization intent, defined as a holistic strategy for coordinating multiple transformations toward a global performance objective. Recent LLM-based approaches formulate IR optimization as an end-to-end generation task, thereby avoiding the traditional pass-by-pass structure. However, optimization intent remains implicit in these methods, forcing models to jointly infer optimization strategy and generate low-level transformations, which limits both correctness and performance. We propose IntOpt, the first intent-driven IR optimizer that explicitly separates high-level optimization intent from low-level analysis and transformation. IntOpt organizes IR optimization into three stages: intent formulation, intent refinement, and intent realization, enabling globally coordinated transformations. Experiments show that IntOpt achieves 90.5% verified correctness and 2.660x average speedup on 200-program test set, outperforming state-of-the-art LLM-based optimizers in both correctness and performance, and surpassing modern compiler with the -O3 option on 37 benchmarks with speedups of up to 272.60x.

2026-02-19T13:48:51Z Lei Qiu Zi Yang Fang Lyu Ming Zhong Huimin Cui Xiaobing Feng http://arxiv.org/abs/2602.16981v1 Mason: Type- and Name-Guided Program Synthesis 2026-02-19T00:52:49Z

Object-oriented programs tend to be written using many common coding idioms, such as those captured by design patterns. While design patterns are useful, implementing them is often tedious and repetitive, requiring boilerplate code that distracts the programmer from more essential details. In this paper, we introduce Mason, a tool that synthesizes object-oriented programs from partial program pieces, and we apply it to automatically insert design patterns into programs. At the core of Mason is a novel technique we call type- and name-guided synthesis, in which an enumerative solver traverses a partial program to generate typing constraints; discharges constraints via program transformations guided by the names of constrained types and members; and backtracks when a constraint is violated or a candidate program fails unit tests. We also introduce two extensions to Mason: a non-local backtracking heuristic that uses execution traces, and a language of patterns that impose syntactic restrictions on missing names. We evaluate Mason on a suite of benchmarks to which Mason must add various well-known design patterns implemented as a library of program pieces. We find that Mason performs well when very few candidate programs satisfy its typing constraints and that our extensions can improve Mason's performance significantly when this is not the case. We believe that Mason takes an important step forward in synthesizing multi-class object-oriented programs using design patterns.

2026-02-19T00:52:49Z Jasper Geer Fox Huston Jeffrey S. Foster http://arxiv.org/abs/2602.16809v1 Haskell meets Evariste 2026-02-18T19:13:29Z

Since its birth as a new scientific body of knowledge in the late 1950s, computer programming has become a fundamental skill needed in many other disciplines. However, programming is not easy, it is prone to errors and code re-use is key for productivity. This calls for high-quality documentation in software libraries, which is quite often not the case. Taking a few Haskell functions available from the Hackage repository as case-studies, and comparing their descriptions with similar functions in other languages, this paper shows how clarity and good conceptual design can be achieved by following a so-called easy-hard-split formal strategy that is quite general and productive, even if used informally. This strategy is easy to use in functional programming and can be applied to both program analysis and synthesis.

2026-02-18T19:13:29Z Paulo R. Pereira Jose N. Oliveira http://arxiv.org/abs/2602.16707v1 E-Graphs as a Persistent Compiler Abstraction 2026-02-18T18:56:52Z

Recent algorithmic advances have made equality saturation an appealing approach to program optimization because it avoids the phase-ordering problem. Existing work uses external equality saturation libraries, or custom implementations that are deeply tied to the specific application. However, these works only apply equality saturation at a single level of abstraction, or discard the discovered equalities when code is transformed by other compiler passes. We propose an alternative approach that represents an e-graph natively in the compiler's intermediate representation, facilitating the application of constructive compiler passes that maintain the e-graph state throughout the compilation flow. We build on a Python-based MLIR framework, xDSL, and introduce a new MLIR dialect, eqsat, that represents e-graphs in MLIR code. We show that this representation expands the scope of equality saturation in the compiler, allowing us to interleave pattern rewriting with other compiler transformations. The eqsat dialect provides a unified abstraction for compilers to utilize equality saturation across various levels of intermediate representations concurrently within the same MLIR flow.

2026-02-18T18:56:52Z Jules Merckx Alexandre Lopoukhine Samuel Coward Jianyi Cheng Bjorn De Sutter Tobias Grosser http://arxiv.org/abs/2512.10861v2 Towards Cumulative Abstract Semantics via Handlers 2026-02-17T22:23:32Z

We consider the problem of modularizing control flow in a generic abstract interpretation framework. A generic abstract interpretation framework is not truly flexible if it does not allow interpreting with different path- and flow-sensitivities, by going forwards or backwards, and over- or under-approximately. Most interpreters inherently intertwine syntax and semantics, making the implementation antagonistic to modularity. Current approaches to modular designs require the use of complex data structures (e.g., monad transformers), providing modularity but often proving unwieldy (e.g., lifts). We observe that leveraging scoped effects within an interpreter facilitates the accumulation of semantic fragments against a fixed syntax. In this paper, we define cumulative abstract semantics, illustrating the potential for creating multiple dynamic evaluators and static analyses from one interpreter. This modularity is achieved by grouping effects into two categories: syntax elimination and domain-semantic introduction handlers. Our contribution shows the benefits of using effects as an instrument for designing a clean, elegant, and modular abstract interpretation framework.

2025-12-11T17:55:35Z Cade Lueker Andrew Fox Bor-Yuh Evan Chang http://arxiv.org/abs/2602.15409v1 Hennessy-Milner Logic in CSLib, the Lean Computer Science Library 2026-02-17T07:47:17Z

We present a library-level formalisation of Hennessy-Milner Logic (HML) - a foundational logic for labelled transition systems (LTSs) - for the Lean Computer Science Library (CSLib). Our development includes the syntax, satisfaction relation, and denotational semantics of HML, as well as a complete metatheory including the Hennessy-Milner theorem - bisimilarity coincides with theory equivalence for image-finite LTSs. Our development emphasises generality and reusability: it is parametric over arbitrary LTSs, definitions integrate with CSLib's infrastructure (such as the formalisation of bisimilarity), and proofs leverage Lean's automation (notably the grind tactic). All code is publicly available in CSLib and can be readily applied to systems that use its LTS API.

2026-02-17T07:47:17Z Fabrizio Montesi Marco Peressotti Alexandre Rademaker http://arxiv.org/abs/2602.15164v1 Synthesizing Trajectory Queries from Examples 2026-02-16T20:07:58Z

Data scientists often need to write programs to process predictions of machine learning models, such as object detections and trajectories in video data. However, writing such queries can be challenging due to the fuzzy nature of real-world data; in particular, they often include real-valued parameters that must be tuned by hand. We propose a novel framework called Quivr that synthesizes trajectory queries matching a given set of examples. To efficiently synthesize parameters, we introduce a novel technique for pruning the parameter space and a novel quantitative semantics that makes this more efficient. We evaluate Quivr on a benchmark of 17 tasks, including several from prior work, and show both that it can synthesize accurate queries for each task and that our optimizations substantially reduce synthesis time.

2026-02-16T20:07:58Z International Conference on Computer Aided Verification, pp. 459-484 (July 2023) Stephen Mell Favyen Bastani Steve Zdancewic Osbert Bastani 10.1007/978-3-031-37706-8_23