https://arxiv.org/api/NYmF78wEtlxUunAbVy1+zIOKpUo 2026-06-15T07:25:07Z 9888 630 15 http://arxiv.org/abs/2602.07742v1 Gillian Debugging: Swinging Through the (Compositional Symbolic Execution) Trees, Extended Version 2026-02-08T00:32:59Z

In recent years, compositional symbolic execution (CSE) tools have been growing in prominence and are becoming more and more applicable to real-world codebases. Still to this day, however, debugging the output of these tools remains difficult, even for specialist users. To address this, we introduce a debugging interface for symbolic execution tools, integrated with Visual Studio Code and the Gillian multi-language CSE platform, with strong focus on visualisation, interactivity, and intuitive representation of symbolic execution trees. We take care in making this interface tool-agnostic, easing its transfer to other symbolic analysis tools in future. We empirically evaluate our work with a user study, the results of which show the debugger's usefulness in helping early researchers understand the principles of CSE and verify fundamental data structure algorithms in Gillian.

2026-02-08T00:32:59Z 24 pages, 11 figures. To be published at TACAS 2026 Nat Karmios Sacha-Élie Ayoun Philippa Gardner http://arxiv.org/abs/2602.07627v1 Series-Parallel-Loop Decompositions of Control-flow Graphs 2026-02-07T17:16:48Z

Control-flow graphs (CFGs) of structured programs are well known to exhibit strong sparsity properties. Traditionally, this sparsity has been modeled using graph parameters such as treewidth and pathwidth, enabling the development of faster parameterized algorithms for tasks in compiler optimization, model checking, and program analysis. However, these parameters only approximate the structural constraints of CFGs: although every structured CFG has treewidth at most~7, many graphs with treewidth at most~7 cannot arise as CFGs. As a result, existing parameterized techniques are optimized for a substantially broader class of graphs than those encountered in practice. In this work, we introduce a new grammar-based decomposition framework that characterizes \emph{exactly} the class of control-flow graphs generated by structured programs. Our decomposition is intuitive, mirrors the syntactic structure of programs, and remains fully compatible with the dynamic-programming paradigm of treewidth-based methods. Using this framework, we design improved algorithms for two classical compiler optimization problems: \emph{Register Allocation} and \emph{Lifetime-Optimal Speculative Partial Redundancy Elimination (LOSPRE)}. Extensive experimental evaluation demonstrates significant performance improvements over previous state-of-the-art approaches, highlighting the benefits of using decompositions tailored specifically to CFGs.

2026-02-07T17:16:48Z Xuran Cai Amir Goharshady S Hitarth Chun Kit Lam 10.1016/j.sysarc.2026.103732 http://arxiv.org/abs/2602.07581v1 $\partial$CBDs: Differentiable Causal Block Diagrams 2026-02-07T15:17:35Z

Modern cyber-physical systems (CPS) integrate physics, computation, and learning, demanding modeling frameworks that are simultaneously composable, learnable, and verifiable. Yet existing approaches treat these goals in isolation: causal block diagrams (CBDs) support modular system interconnections but lack differentiability for learning; differentiable programming (DP) enables end-to-end gradient-based optimization but provides limited correctness guarantees; while contract-based verification frameworks remain largely disconnected from data-driven model refinement. To address these limitations, we introduce differentiable causal block diagrams ($\partial$CBDs), a unifying formalism that integrates these three perspectives. Our approach (i) retains the compositional structure and execution semantics of CBDs, (ii) incorporates assume--guarantee (A--G) contracts for modular correctness reasoning, and (iii) introduces residual-based contracts as differentiable, trajectory-level certificates compatible with automatic differentiation (AD), enabling gradient-based optimization and learning. Together, these elements enable a scalable, verifiable, and trainable modeling pipeline that preserves causality and modularity while supporting data-, physics-, and constraint-informed optimization for CPS.

2026-02-07T15:17:35Z Thomas Beckers Ján Drgoňa Truong X. Nghiem http://arxiv.org/abs/2602.07455v1 RustCompCert: A Verified and Verifying Compiler for a Sequential Subset of Rust 2026-02-07T09:15:35Z

We present our ongoing work on developing an end-to-end verified Rust compiler based on CompCert. It provides two guarantees: one is semantics preservation from Rust to assembly, i.e., the behaviors of source code includes the behaviors of target code, with which the properties verified at the source can be preserved down to the target; the other is memory safety ensured by the verifying compilation -- the borrow checking pass, which can simplify the verification of Rust programs, e.g., by allowing the verification tools focus on the functional correctness.

2026-02-07T09:15:35Z Submitted to Rust Verify 2026 Jinhua Wu Yuting Wang Liukun Yu Linglong Meng http://arxiv.org/abs/2602.07324v1 Static Analysis Under Non-Deterministic Program Assumptions 2026-02-07T02:45:03Z

Static analyses overwhelmingly trade precision for soundness and automation. For this reason, their use-cases are restricted to situations where imprecision isn't prohibitive. In this paper, we propose and specify a static analysis that accepts user-supplied program assumptions that are local to program locations. Such assumptions can be used to counteract imprecision in static analyses, enabling their use in a much wider variety of applications. These assumptions are taken by the analyzer non-deterministically, resulting in a function from sets of accepted assumptions to the resulting analysis under those assumptions. We also demonstrate the utility of such a function in two ways, both of which showcase how it can enable optimization over a search space of assumptions that is otherwise infeasible without the specified analysis.

2026-02-07T02:45:03Z Abdullah H. Rasheed http://arxiv.org/abs/2509.26553v2 Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling 2026-02-06T22:11:41Z

Existing benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data contamination. We present FuncBenchGen, a unified, contamination-free framework that evaluates TaLMs by generating synthetic multi-step tool-use tasks to stress-test TaLMs. The key idea is to cast tool use as traversal over a hidden function-dependency DAG where models must infer the correct sequence of calls to compute a target value. FuncBenchGen allows precise control over task difficulty (e.g., graph size, dependency depth, and distractor functions) while avoiding pretraining/test-time leakage. Our evaluation demonstrates reasoning-optimized models consistently outperform general-purpose models with GPT-5 significantly outperforming other available models. Performance declines sharply as dependency depth increases. Furthermore, connected distractors -- irrelevant functions sharing type-compatible variables with relevant functions -- prove especially difficult to handle. Also, strong models often make syntactically valid function calls but propagate incorrect or stale argument values across steps, revealing brittle state tracking by LLMs in multi-turn tool use. Motivated by this observation, we introduce a simple mitigation strategy that explicitly restates prior variable values to the agent at each step. Surprisingly, this lightweight change yields substantial gains across models. e.g., yielding an improvement in success rate from 62.5% to 81.3% for GPT-5.

2025-09-30T17:21:17Z ICLR 2026 Seiji Maekawa Jackson Hassell Pouya Pezeshkpour Tom Mitchell Estevam Hruschka http://arxiv.org/abs/2602.06715v1 Practical Refinement Session Type Inference (Extended Version) 2026-02-06T14:03:18Z

Session types express and enforce safe communication in concurrent message-passing systems by statically capturing the interaction protocols between processes in the type. Recent works extend session types with arithmetic refinements, which enable additional fine-grained description of communication, but impose additional annotation burden on the programmer. To alleviate this burden, we propose a type inference algorithm for a session type system with arithmetic refinements. We develop a theory of subtyping for session types, including an algorithm which we prove sound with respect to a semantic definition based on type simulation. We also provide a formal inference algorithm that generates type and arithmetic constraints, which are then solved using the Z3 SMT solver. The algorithm has been implemented on top of the Rast language, and includes 3 key optimizations that make inference feasible and practical. We evaluate the efficacy of our inference engine by evaluating it on 6 challenging benchmarks, ranging from unary and binary natural numbers to linear $λ$-calculus. We show the performance benefits provided by our optimizations in coercing Z3 into solving the arithmetic constraints in reasonable time.

2026-02-06T14:03:18Z Toby Ueno Ankush Das http://arxiv.org/abs/2602.06680v1 Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities (Extended Version) 2026-02-06T13:11:26Z

Fixpoint iteration constitutes the algorithmic core of static analyzers. Parallelizing the fixpoint engine can significantly reduce analysis times. Previous approaches typically fix the granularity of tasks upfront, e.g., at the level of program threads or procedures - yielding an engine permanently stuck in one gear. Instead, we propose to parallelize a generic fixpoint engine in a way that is parametric in the task granularity - meaning that our engine can be run in different gears. We build on the top-down solver TD, extended with support for mixed-flow sensitivity, and realize two competing philosophies for parallelization, both building on a task pool that schedules tasks to a fixed number of workers. The nature of tasks differs between the philosophies. In the immediate approach, all tasks access a single thread-safe hash table maintaining solver state, while in the independent approach, each task has its own state and exchanges data with other tasks via a publish/subscribe data structure. We have equipped the fixpoint engine of the static analysis framework Goblint with implementations following both philosophies and report on our results for large real-world programs.

2026-02-06T13:11:26Z Ali Rasim Kocal Michael Schwarz Simmo Saan Helmut Seidl http://arxiv.org/abs/2507.20674v2 LLM-Based Repair of Static Nullability Errors 2026-02-06T10:34:05Z

Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves 63% of the 1,119 nullability errors that remain after applying a state-of-the-art annotation inference technique. Unlike two baselines (single-shot prompt and mini-SWE-agent), NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.

2025-07-28T09:55:04Z Nima Karimipour Pascal Joos Michael Pradel Martin Kellogg Manu Sridharan http://arxiv.org/abs/2602.06466v1 Auditing Rust Crates Effectively 2026-02-06T07:51:37Z

We introduce Cargo Scan, the first interactive program analysis tool designed to help developers audit third-party Rust code. Real systems written in Rust rely on thousands of transitive dependencies. These dependencies are as dangerous in Rust as they are in other languages (e.g., C or JavaScript) -- and auditing these dependencies today means manually inspecting every line of code. Unlike for most industrial languages, though, we can take advantage of Rust's type and module system to minimize the amount of code that developers need to inspect to the code that is potentially dangerous. Cargo Scan models such potentially dangerous code as effects and performs a side-effects analysis, tailored to Rust, to identify effects and track them across crate and module boundaries. In most cases (69.2%) developers can inspect flagged effects and decide whether the code is potentially dangerous locally. In some cases, however, the safety of an effect depends on the calling context -- how a function is called, potentially by a crate the developer imports later. Hence, Cargo Scan tracks context-dependent information using a call-graph, and collects audit results into composable and reusable audit files. In this paper, we describe our experience auditing Rust crates with Cargo Scan. In particular, we audit the popular client and server HTTP crate, hyper, and all of its dependencies; our experience shows that Cargo Scan can reduce the auditing burden of potentially dangerous code to a median of 0.2% of lines of code when compared to auditing whole crates. Looking at the Rust ecosystem more broadly, we find that Cargo Scan can automatically classify ~3.5K of the top 10K crates on crates.io as safe; of the crates that do require manual inspection, we find that most of the potentially dangerous side-effects are concentrated in roughly 3% of these crates.

2026-02-06T07:51:37Z Lydia Zoghbi David Thien Ranjit Jhala Deian Stefan Caleb Stanford http://arxiv.org/abs/2510.10216v2 Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis 2026-02-06T07:48:57Z

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.

2025-10-11T13:43:36Z Zhechong Huang Zhao Zhang Ruyi Ji Tingxuan Xia Qihao Zhu Qinxiang Cao Zeyu Sun Wiggin Zhou Yingfei Xiong http://arxiv.org/abs/2602.06386v1 Uniqueness is Separation 2026-02-06T04:50:17Z

Value independence is enormously beneficial for reasoning about software systems at scale. These benefits carry over into the world of formal verification. Reasoning about programs algebraically is a simple affair in a proof assistant, whereas programs with unconstrained mutation necessitate much more complex techniques, such as Separation Logic, where invariants about memory safety, aliasing, and state changes must be established by manual proof. Uniqueness type systems allow programs to be compiled to code that uses mutation for efficiency, while retaining a semantics that enjoys value independence for reasoning. The restrictions of these type systems, however, are often too onerous for realistic software. Thus, most uniqueness type systems include some "escape hatch" where the benefits of value independence for reasoning are lost, but the restrictions of uniqueness types are lifted. To formally verify a system with such mixed guarantees, the value independence guarantees from uniqueness types must be expressed in terms of imperative, mutable semantics. In other words, we ought to express value independence as an assertion in Separation Logic.

2026-02-06T04:50:17Z Liam O'Connor Pilar Selene Linares Arevalo Christine Rizkallah http://arxiv.org/abs/2602.17688v1 AnCoder: Anchored Code Generation via Discrete Diffusion Models 2026-02-05T22:46:43Z

Diffusion language models offer a compelling alternative to autoregressive code generation, enabling global planning and iterative refinement of complex program logic. However, existing approaches fail to respect the rigid structure of programming languages and, as a result, often produce broken programs that fail to execute. To address this, we introduce AnchorTree, a framework that explicitly anchors the diffusion process using structured, hierarchical priors native to code. Specifically, AnchorTree uses the abstract syntax tree to prioritize resolving syntactically and semantically salient tokens, such as keywords (e.g., if, while) and identifiers (e.g., variable names), thereby establishing a structural scaffold that guides the remaining generation. We validate this framework via AnCoder, a family of models showing that structurally anchored diffusion offers a parameter-efficient path to high-quality code generation.

2026-02-05T22:46:43Z Anton Xue Litu Rout Constantine Caramanis Sanjay Shakkottai http://arxiv.org/abs/2512.23496v2 Fancy Some Chips for Your TeaStore? Modeling the Control of an Adaptable Discrete System 2026-02-05T16:55:41Z

When designing new web applications, developers must cope with different kinds of constraints relative to the resources they rely on: software, hardware, network, online micro-services, or any combination of the mentioned entities. Together, these entities form a complex system of communicating interdependent processes, physical or logical. It is very desirable that such system ensures its robustness to provide a good quality of service. In this paper we introduce Chips, a language that aims at facilitating the design of models made of various entwined components. It allows the description of applications in the form of functional blocks. Chips mixes notions from control theory and general purpose programming languages to generate robust component-based models. This paper presents how to use Chips to systematically design, model and analyse a complex system project, using a variation of the Adaptable TeaStore application as running example.

2025-12-29T14:35:03Z In Proceedings WACA 2025, arXiv:2512.22054. This work was supported by the ANR grant ANR-23-CE25-0004 (ADAPT). O. Kouchnarenko was supported by the EIPHI Graduate School (grant number ANR-17-EURE-0002) EPTCS 438, 2025, pp. 58-78 Anna Gallone Université Marie et Louis Pasteur, CNRS UMR6174, Institut FEMTO-ST, Besançon, France Simon Bliudze Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France Sophie Cerf Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189 CRIStAL, Lille, France Olga Kouchnarenko Université Marie et Louis Pasteur, CNRS UMR6174, Institut FEMTO-ST, Besançon, France 10.4204/EPTCS.438.4 http://arxiv.org/abs/2602.05850v1 An Equational Axiomatization of Dynamic Threads via Algebraic Effects: Presheaves on Finite Relations, Labelled Posets, and Parameterized Algebraic Theories 2026-02-05T16:36:20Z

We use the theory of algebraic effects to give a complete equational axiomatization for dynamic threads. Our method is based on parameterized algebraic theories, which give a concrete syntax for strong monads on functor categories, and are a convenient framework for names and binding. Our programs are built from the key primitives `fork' and `wait'. `Fork' creates a child thread and passes its name (thread ID) to the parent thread. `Wait' allows us to wait for given child threads to finish. We provide a parameterized algebraic theory built from fork and wait, together with basic atomic actions and laws such as associativity of `fork'. Our equational axiomatization is complete in two senses. First, for closed expressions, it completely captures equality of labelled posets (pomsets), an established model of concurrency: model complete. Second, any two open expressions are provably equal if they are equal under all closing substitutions: syntactically complete. The benefit of algebraic effects is that the semantic analysis can focus on the algebraic operations of fork and wait. We then extend the analysis to a simple concurrent programming language by giving operational and denotational semantics. The denotational semantics is built using the methods of parameterized algebraic theories and we show that it is sound, adequate, and fully abstract at first order for labelled-poset observations.

2026-02-05T16:36:20Z Published at POPL 2026 Ohad Kammar Jack Liell-Cock Sam Lindley Cristina Matache Sam Staton