https://arxiv.org/api/kSfRfPiKnypdVJkN2UvSKg5ztyY2026-06-21T14:42:53Z991873515http://arxiv.org/abs/2508.10904v3A2H-MAS: An Algorithm-to-HLS Multi-Agent System for Automated and Reliable FPGA Implementation2026-01-21T09:43:23ZBridging the gap between algorithm development and hardware realization remains a persistent challenge, particularly in latency- and resource-constrained domains such as wireless communication. While MATLAB provides a mature environment for algorithm prototyping, translating these models into efficient FPGA implementations via High-Level Synthesis (HLS) often requires expert tuning and lengthy iterations. Recent advances in large language models (LLMs) offer new opportunities for automating this process. However, existing approaches suffer from hallucinations, forgetting, limited domain expertise, and often overlook key performance metrics. To address these limitations, we present A2H-MAS, a modular and hierarchical multi-agent system. At the system level, A2H-MAS assigns clearly defined responsibilities to specialized agents and uses standardized interfaces and execution-based validation to ensure correctness and reproducibility. At the algorithmic level, it employs dataflow-oriented modular decomposition and algorithm-hardware co-design, recognizing that the choice of algorithm often has a larger impact on hardware efficiency than pragma-level optimization. Experiments on representative wireless communication algorithms show that A2H-MAS consistently produces functionally correct, resource-efficient, and latency-optimized HLS designs, demonstrating its effectiveness and robustness for complex hardware development workflows.2025-07-29T01:51:12Z9 pages, 6 figuresJie LeiRuofan JiaJ. Andrew ZhangHao Zhanghttp://arxiv.org/abs/2601.14114v1Partial Reductions for Kleene Algebra with Linear Hypotheses2026-01-20T16:08:26ZKleene algebra (KA) is an important tool for reasoning about general program equivalences, with a decidable and complete equational theory. However, KA cannot always prove equivalences between specific programs. For this purpose, one adds hypotheses to KA that encode program-specific knowledge. Traditionally, a map on regular expressions called a reduction then lets us lift decidability and completeness to these more expressive systems. Explicitly constructing such a reduction requires significant labour. Moreover, due to regularity constraints, a reduction may not exist for all combinations of expression and hypothesis.
We describe an automaton-based construction to mechanically derive reductions for a wide class of hypotheses. These reductions can be partial, in which case they yield partial completeness: completeness for expressions in their domain. This allows us to automatically establish the provability of more equivalences than what is covered in existing work.2026-01-20T16:08:26ZLiam ChungTobias Kappéhttp://arxiv.org/abs/2502.11091v2A Program Logic for Under-approximating Worst-case Resource Usage2026-01-20T16:08:07ZUnderstanding and predicting the worst-case resource usage is crucial for software quality; however, existing methods either over-approximate with potentially loose bounds or under-approximate without asymptotic guarantees. This paper presents a program logic to under-approximate worst-case resource usage, adapting incorrectness logic (IL) to reason quantitatively about resource consumption. We propose quantitative forward and backward under-approximate (QFUA and QBUA) triples, which generalize IL to identify execution paths leading to high resource usage. We also introduce a variant of QBUA that supports reasoning about high-water marks. Our logic is proven sound and complete with respect to a simple IMP-like language, and all meta-theoretical results are mechanized and verified in Rocq. We implement a prototype checker for all three variants of our logic and demonstrate its utility through a few examples and four case studies.2025-02-16T12:11:05ZZiyue JinDi Wanghttp://arxiv.org/abs/2601.14059v1Verifying Floating-Point Programs in Stainless2026-01-20T15:16:35ZWe extend the Stainless deductive verifier with floating-point support, providing the first automated verification support for floating-point numbers for a subset of Scala that includes polymorphism, recursion and higher-order functions. We follow the recent approach in the KeY verifier to axiomatise reasoning about mathematical functions, but go further by supporting all functions from Scala's math API, and by verifying the correctness of the axioms against the actual implementation in Stainless itself. We validate Stainless' floating-point support on a new set of benchmarks sampled from real-world code from GitHub, showing that it can verify specifications about, e.g., ranges of output or absence of special values for most supported functions, or produce counter-examples when the specifications do not hold.2026-01-20T15:16:35ZAndrea GilotAxel BergströmEva Darulovahttp://arxiv.org/abs/2601.13991v1Generating Functions Meet Occupation Measures: Invariant Synthesis for Probabilistic Loops (Extended Version)2026-01-20T14:05:06ZA fundamental computational task in probabilistic programming is to infer a program's output (posterior) distribution from a given initial (prior) distribution. This problem is challenging, especially for expressive languages that feature loops or unbounded recursion. While most of the existing literature focuses on statistical approximation, in this paper we address the problem of mathematically exact inference.
To achieve this for programs with loops, we rely on a relatively underexplored type of probabilistic loop invariant, which is linked to a loop's so-called occupation measure. The occupation measure associates program states with their expected number of visits, given the initial distribution. Based on this, we derive the notion of an occupation invariant. Such invariants are essentially dual to probabilistic martingales, the predominant technique for formal probabilistic loop analysis in the literature. A key feature of occupation invariants is that they can take the initial distribution into account and often yield a proof of positive almost sure termination as a by-product.
Finally, we present an automatic, template-based invariant synthesis approach for occupation invariants by encoding them as generating functions. The approach is implemented and evaluated on a set of benchmarks.2026-01-20T14:05:06ZFull version of ESOP2026 paper 'Generating Functions Meet Occupation Measures: Invariant Synthesis for Probabilistic Loops'Darion HaaseKevin BatzAdrian GallusBenjamin Lucien KaminskiJoost-Pieter KatoenLutz KlinkenbergTobias Winklerhttp://arxiv.org/abs/2601.15335v1ToolCaching: Towards Efficient Caching for LLM Tool-calling2026-01-20T09:25:59ZRecent advances in Large Language Models (LLMs) have revolutionized web applications, enabling intelligent search, recommendation, and assistant services with natural language interfaces. Tool-calling extends LLMs with the ability to interact with external APIs, greatly enhancing their practical utility. While prior research has improved tool-calling performance by adopting traditional computer systems techniques, such as parallel and asynchronous execution, the challenge of redundant or repeated tool-calling requests remains largely unaddressed. Caching is a classic solution to this problem, but applying it to LLM tool-calling introduces new difficulties due to heterogeneous request semantics, dynamic workloads, and varying freshness requirements, which render conventional cache policies ineffective. To address these issues, we propose ToolCaching, an efficient feature-driven and adaptive caching framework for LLM tool-calling systems. ToolCaching systematically integrates semantic and system-level features to evaluate request cacheability and estimate caching value. At its core, the VAAC algorithm integrates bandit-based admission with value-driven, multi-factor eviction, jointly accounting for request frequency, recency, and caching value. Extensive experiments on synthetic and public tool-calling workloads demonstrate that ToolCaching with VAAC achieves up to 11% higher cache hit ratios and 34% lower latency compared to standard policies, effectively accelerating LLM tool-calling in practical applications.2026-01-20T09:25:59ZYi ZhaiDian ShenJunzhou LuoBin Yanghttp://arxiv.org/abs/2601.13727v1Foundational VeriFast: Pragmatic Certification of Verification Tool Results through Hinted Mirroring2026-01-20T08:34:38ZVeriFast is a leading tool for the modular formal verification of correctness properties of single-threaded and multi-threaded C and Rust programs. It verifies a program by symbolically executing each function in isolation, exploiting user-annotated preconditions, postconditions, and loop invariants written in a form of separation logic, and using a separation logic-based symbolic representation of memory. However, the tool itself, written in roughly 30K lines of OCaml code, has not been formally verified. Therefore, bugs in the tool could cause it to falsely report the correctness of the input program. We here report on an early result extending VeriFast to emit, upon successful verification of a Rust program, a Rocq proof script that proves correctness of the program with respect to a Rocq-encoded axiomatic semantics of Rust. This significantly enhances VeriFast's applicability in safety-critical domains. We apply hinted mirroring: we record key information from VeriFast's symbolic execution run, and use it to direct a replay of the run in Rocq.2026-01-20T08:34:38Z8 pages, 2 figuresBart Jacobshttp://arxiv.org/abs/2601.13682v1CodeContests-O: Powering LLMs via Feedback-Driven Iterative Test Case Generation2026-01-20T07:32:44ZThe rise of reasoning models necessitates large-scale verifiable data, for which programming tasks serve as an ideal source. However, while competitive programming platforms provide abundant problems and solutions, high-quality test cases for verification remain scarce. Existing approaches attempt to synthesize test cases using Large Language Models (LLMs), but rely solely on the model's intrinsic generation capabilities without external feedback, frequently resulting in insufficiently diverse cases. To address this limitation, we propose a $\textbf{Feedback-Driven Iterative Framework}$ for comprehensive test case construction. Specifically, our method leverages the LLM to generate initial test cases, executes them against known correct and incorrect solutions, and utilizes the failed results as feedback to guide the LLM in refining the test cases toward high fidelity and discriminability. We then apply this method to the CodeContests dataset to construct an optimized high-quality derivative, $\textbf{CodeContests-O}$. Evaluating against the entire pool of solutions ($1.1 \times 10^7$ in total), our dataset achieves an average True Positive Rate (TPR) of $89.37\%$ and True Negative Rate (TNR) of $90.89\%$, significantly outperforming the CodeContests and CodeContests+ by margins of $4.32\%$ and $9.37\%$, respectively. Furthermore, fine-tuning the Qwen2.5-7B model on CodeContests-O results in a $9.52\%$ improvement on LiveCodeBench (Pass@1). Experiments demonstrate the effectiveness of our framework and the quality of CodeContests-O. To support reproducibility and facilitate future research, we release the $\href{https://github.com/cai-jianfeng/CodeContests-O}{code}$ and $\href{https://huggingface.co/datasets/caijanfeng/CodeContests-O}{dataset}$.2026-01-20T07:32:44ZJianfeng CaiJinhua ZhuRuopei SunKangwen ZhaoDongyun XueMingxiao FengWengang ZhouHouqiang Lihttp://arxiv.org/abs/2601.13341v1Reduction for Structured Concurrent Programs2026-01-19T19:23:52ZCommutativity reasoning based on Lipton's movers is a powerful technique for verification of concurrent programs. The idea is to define a program transformation that preserves a subset of the initial set of interleavings, which is sound modulo reorderings of commutative actions. Scaling commutativity reasoning to routinely-used features in software systems, such as procedures and parallel composition, remains a significant challenge.
In this work, we introduce a novel reduction technique for structured concurrent programs that unifies two key advances. First, we present a reduction strategy that soundly replaces parallel composition with sequential composition. Second, we generalize Lipton's reduction to support atomic sections containing (potentially recursive) procedure calls. Crucially, these two foundational strategies can be composed arbitrarily, greatly expanding the scope and flexibility of reduction-based reasoning. We implemented this technique in Civl and demonstrated its effectiveness on a number of challenging case studies, including a snapshot object, a fault-tolerant and linearizable register, the FLASH cache coherence protocol, and a non-trivial variant of Two-Phase Commit.2026-01-19T19:23:52Z35th European Symposium on Programming, ESOP 2026Namratha GangamreddypalliConstantin EneaShaz Qadeerhttp://arxiv.org/abs/2601.13325v1Verifying First-Order Temporal Properties of Infinite-State Systems via Timers and Rankings2026-01-19T19:05:02ZWe present a unified deductive verification framework for first-order temporal properties based on well-founded rankings, where verification conditions are discharged using SMT solvers. To that end, we introduce a novel reduction from verification of arbitrary temporal properties to verification of termination. Our reduction augments the system with prophecy timer variables that predict the number of steps along a trace until the next time certain temporal formulas, including the negated property, hold. In contrast to standard tableaux-based reductions, which reduce the problem to fair termination, our reduction does not introduce fairness assumptions. To verify termination of the augmented system, we follow the traditional approach of assigning each state a rank from a well-founded set and showing that the rank decreases in every transition. We leverage the recently proposed formalism of implicit rankings to express and automatically verify the decrease of rank using SMT solvers, even when the rank is not expressible in first-order logic. We extend implicit rankings from finite to infinite domains, enabling verification of more general systems and making them applicable to the augmented systems generated by our reduction, which allows us to exploit the decrease of timers in termination proofs. We evaluate our technique on a range of temporal verification tasks from previous works, giving simple, intuitive proofs for them within our framework.2026-01-19T19:05:02ZRaz LotanNeta EladOded PadonSharon Shohamhttp://arxiv.org/abs/2601.02653v2Backwards Data-Flow Analysis using Prophecy Variables in the BuildIt System2026-01-19T18:42:47ZMany program transformations and optimizations require information about the future behavior of the program. A standard way to obtain this information is to build an intermediate program representation, then use a backwards program analysis to propagate relevant information against the flow of control back to the transformation/optimization site. We instead propose to use prophecy variables, which predict information about the future execution of the program, to enable such transformations and optimizations. We implement prophecy variables in BuildIt, a lightweight domain specific language implementation system. BuildIt uses staged compilation to implement high performance domain specific languages embedded within a standard general purpose programming language (C++). The BuildIt first phase uses standard C++ program execution to generate optimized C, C++, and CUDA second phase code. This approach enables BuildIt to eliminate programming language implementation components such as parsers and intermediate representations, delivering a dramatic decrease in the engineering effort required to implement domain specific languages. The combination of prophecy variables and repeated forward program execution enables BuildIt to extend this approach to include transformations and optimizations that require information about the future execution of the program without backwards analyses and without the engineering overhead associated with implementing these analyses. We formalize the use of prophecy variables for this purpose, discuss the implementation of prophecy variables and repeated execution in BuildIt, and present experimental results for BuildIt computations that benefit from optimizations enabled by the information that prophecy variables provide.2026-01-06T02:06:47ZAjay BrahmakshatriyaSaman AmarasingheMartin Rinardhttp://arxiv.org/abs/2601.13224v1Functional Logic Program Transformations2026-01-19T16:59:12ZMany tools used to process programs, like compilers, analyzers, or verifiers, perform transformations on their intermediate program representation, like abstract syntax trees. Implementing such program transformations is a non-trivial task, since it is necessary to iterate over the complete syntax tree and apply various transformations at nodes in a tree. In this paper we show how the features of functional logic programming are useful to implement program transformations in a compact and comprehensible manner. For this purpose, we propose to write program transformations as partially defined and non-deterministic operations. Since the implementation of non-determinism usually causes some overhead compared to deterministically defined operations, we compare our approach to a deterministic transformation method. We evaluate these alternatives for the functional logic language Curry and its intermediate representation FlatCurry which is used in various analysis and verification tools and compilers.2026-01-19T16:59:12ZPresented at Conference on Declarative Programming (DECLARE 2025)Michael HanusSteven Libbyhttp://arxiv.org/abs/2601.12813v1A Formally Verified Procedure for Width Inference in FIRRTL2026-01-19T08:18:48ZFIRRTL is an intermediate representation language for Register Transfer Level (RTL) hardware designs. In FIRRTL programs, the bit widths of many components are not specified explicitly and must be inferred during compilation. In mainstream FIRRTL compilers, such as the official compiler firtool, width inference is conducted by a compilation pass referred to as InferWidths, which may fail even for simple FIRRTL programs. In this paper, we thoroughly investigate the width inference problem for FIRRTL programs. We show that, if the constraints obtained from a FIRRTL program are satisfiable, there exists a unique least solution. Based on this result, we propose a complete procedure for solving the width inference problem. We implement it in the interactive theorem prover Rocq and prove its functional correctness. From the Rocq implementation, we extract an OCaml implementation, which is the first formally verified implementation of the InferWidths pass. Extensive experiments demonstrate that our approach can solve more instances than the official InferWidths pass in firtool, normally with high efficiency.2026-01-19T08:18:48ZArxiv version for the European Symposium on Programming (ESOP 2026)(to appear) This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences, Grant No.~XDA0320101, and partially supported by NSFC-RGC Collaborative Research Grant No.~62561160151. D.N. Jansen is supported by Beijing Natural Science Foundation Project No.~IS25071Keyin WangXiaomu ShiJiaxiang LiuZhilin WuTaolve ChenFu SongDavid N. Jansenhttp://arxiv.org/abs/2601.12741v1An Introduction to Razborov's Flag Algebra as a Proof System for Extremal Graph Theory2026-01-19T05:47:01ZRazborov's flag algebra forms a powerful framework for deriving asymptotic inequalities between induced subgraph densities, underpinning many advances in extremal graph theory. This survey introduces flag algebra to computer scientists working in logic, programming languages, automated verification, and formal methods. We take a logical perspective on flag algebra and present it in terms of syntax, semantics, and proof strategies, in a style closer to formal logic. One popular proof strategy derives valid inequalities by first proving inequalities in a labelled variant of flag algebra and then transferring them to the original unlabelled setting using the so-called downward operator. We explain this strategy in detail and highlight that its transfer mechanism relies on the notion of what we call an adjoint pair, reminiscent of Galois connections and categorical adjunctions, which appear frequently in work on automated verification and programming languages. Along the way, we work through representative examples, including Mantel's theorem and Goodman's bound on Ramsey multiplicity, to illustrate how mathematical arguments can be carried out symbolically in the flag algebra framework.2026-01-19T05:47:01ZGyeongwon JeongSeonghun ParkHongseok Yanghttp://arxiv.org/abs/2601.02251v4Deciding Serializability in Network Systems2026-01-18T18:50:40ZWe present the SER modeling language for automatically verifying serializability of concurrent programs, i.e., whether every concurrent execution of the program is equivalent to some serial execution. SER programs are suitably restricted to make this problem decidable, while still allowing for an unbounded number of concurrent threads of execution, each potentially running for an unbounded number of steps. Building on prior theoretical results, we give the first automated end-to-end decision procedure that either proves serializability by producing a checkable certificate, or refutes it by producing a counterexample trace. We also present a network-system abstraction to which SER programs compile. Our decision procedure then reduces serializability in this setting to a Petri net reachability query. Furthermore, in order to scale, we curtail the search space via multiple optimizations, including Petri net slicing, semilinear-set compression, and Presburger-formula manipulation. We extensively evaluate our framework and show that, despite the theoretical hardness of the problem, it can successfully handle various models of real-world programs, including stateful firewalls, BGP routers, and more.2026-01-05T16:32:03ZTo appear in TACAS 2026Guy AmirMark BarboneNicolas AmatJules Jacobs