https://arxiv.org/api/449GrefaZ6vyhiz1NVyVBAgQ5u0 2026-06-21T08:59:28Z 9918 660 15 http://arxiv.org/abs/2602.09197v1 Global Protocols under Rendezvous Synchrony: From Realizability to Type Checking 2026-02-09T21:05:13Z

Global protocol specifications are the starting point of top-down verification methodologies, and serve as a blueprint for synthesizing local specifications that guarantee the correctness of distributed implementations. In this work, we study global protocol specifications targeting distributed processes that communicate via rendezvous synchrony. We obtain the following positive results for the synchronous realizability problem: (a) realizability is decidable for global protocols over a transitive concurrency alphabet in 2-EXPTIME in the size of the protocol, and in 3-EXPTIME in the size of the alphabet, and (b) realizability is decidable in EXPTIME for global protocols that unambiguously represent their trace language. Unambiguous global protocols encompass fragments of directed and sender-driven choice protocols studied in prior work. Further, our reductions admit the corollary that the synchronous verification problem is solvable with the same complexity, where a candidate realization is included as part of the input. We then prove a negative result stating that synchronous realizability is undecidable in general. Finally, we propose a type system to check pi-calculus processes against local specifications in the form of synchronous communicating state machines. Our type system is the first to support processes with local mixed choice in the presence of session interleaving and~delegation.

2026-02-09T21:05:13Z Elaine Li Felix Stutz http://arxiv.org/abs/2602.08846v1 Impredicativity in Linear Dependent Type Theory 2026-02-09T16:12:57Z

We construct a realizability model of linear dependent type theory from a linear combinatory algebra. Our model motivates a number of additions to the type theory. In particular, we add a universe with two decoding operations: one takes codes to cartesian types and the other takes codes to linear types. The universe is impredicative in the sense that it is closed under both large cartesian dependent products and large linear dependent products. We also add a rule for injectivity of the modality turning linear terms into cartesian terms. With all of the additions, we are able to encode (linear) inductive types. As a case study, we consider the type of lists over a linear type, and demonstrate that our encoding has the relevant uniqueness principle. The construction of the realizability model is fully formalized in the proof assistant Rocq.

2026-02-09T16:12:57Z 20 pages, 2 figures Sam Speight Niels van der Weide http://arxiv.org/abs/2601.02563v3 Compressed code: the hidden effects of quantization and distillation on programming tokens 2026-02-08T20:13:10Z

Large Language Models (LLMs) have demonstrated exceptional code generation capabilities, yet their token-level mechanisms remain underexplored, particularly in compressed models. Through systematic analysis of programming language token representations, we characterize how programming languages are encoded in LLM tokenizers by analyzing their vocabulary distribution and keyword coverage patterns. We introduce a novel cold-start probability analysis method that provides insights into model behavior without requiring explicit prompts. Additionally, we present a comprehensive evaluation of how different model optimization techniques - including quantization, distillation, model scaling, and task-specific fine-tuning - affect token-level representations and code generation quality. Our experiments, supported by comprehensive probability distribution analysis and evaluation metrics, reveal critical insights into token-level behavior and provide empirically-validated guidelines for maintaining code generation quality under various optimization constraints. These findings advance both theoretical understanding of LLM code generation and practical implementation of optimized models in production environments.

2026-01-05T21:32:47Z 18 pages, 1 figure and 6 tables Viacheslav Siniaev Iaroslav Chelombitko Aleksey Komissarov http://arxiv.org/abs/2602.07742v1 Gillian Debugging: Swinging Through the (Compositional Symbolic Execution) Trees, Extended Version 2026-02-08T00:32:59Z

In recent years, compositional symbolic execution (CSE) tools have been growing in prominence and are becoming more and more applicable to real-world codebases. Still to this day, however, debugging the output of these tools remains difficult, even for specialist users. To address this, we introduce a debugging interface for symbolic execution tools, integrated with Visual Studio Code and the Gillian multi-language CSE platform, with strong focus on visualisation, interactivity, and intuitive representation of symbolic execution trees. We take care in making this interface tool-agnostic, easing its transfer to other symbolic analysis tools in future. We empirically evaluate our work with a user study, the results of which show the debugger's usefulness in helping early researchers understand the principles of CSE and verify fundamental data structure algorithms in Gillian.

2026-02-08T00:32:59Z 24 pages, 11 figures. To be published at TACAS 2026 Nat Karmios Sacha-Élie Ayoun Philippa Gardner http://arxiv.org/abs/2602.07627v1 Series-Parallel-Loop Decompositions of Control-flow Graphs 2026-02-07T17:16:48Z

Control-flow graphs (CFGs) of structured programs are well known to exhibit strong sparsity properties. Traditionally, this sparsity has been modeled using graph parameters such as treewidth and pathwidth, enabling the development of faster parameterized algorithms for tasks in compiler optimization, model checking, and program analysis. However, these parameters only approximate the structural constraints of CFGs: although every structured CFG has treewidth at most~7, many graphs with treewidth at most~7 cannot arise as CFGs. As a result, existing parameterized techniques are optimized for a substantially broader class of graphs than those encountered in practice. In this work, we introduce a new grammar-based decomposition framework that characterizes \emph{exactly} the class of control-flow graphs generated by structured programs. Our decomposition is intuitive, mirrors the syntactic structure of programs, and remains fully compatible with the dynamic-programming paradigm of treewidth-based methods. Using this framework, we design improved algorithms for two classical compiler optimization problems: \emph{Register Allocation} and \emph{Lifetime-Optimal Speculative Partial Redundancy Elimination (LOSPRE)}. Extensive experimental evaluation demonstrates significant performance improvements over previous state-of-the-art approaches, highlighting the benefits of using decompositions tailored specifically to CFGs.

2026-02-07T17:16:48Z Xuran Cai Amir Goharshady S Hitarth Chun Kit Lam 10.1016/j.sysarc.2026.103732 http://arxiv.org/abs/2602.07581v1 $\partial$CBDs: Differentiable Causal Block Diagrams 2026-02-07T15:17:35Z

Modern cyber-physical systems (CPS) integrate physics, computation, and learning, demanding modeling frameworks that are simultaneously composable, learnable, and verifiable. Yet existing approaches treat these goals in isolation: causal block diagrams (CBDs) support modular system interconnections but lack differentiability for learning; differentiable programming (DP) enables end-to-end gradient-based optimization but provides limited correctness guarantees; while contract-based verification frameworks remain largely disconnected from data-driven model refinement. To address these limitations, we introduce differentiable causal block diagrams ($\partial$CBDs), a unifying formalism that integrates these three perspectives. Our approach (i) retains the compositional structure and execution semantics of CBDs, (ii) incorporates assume--guarantee (A--G) contracts for modular correctness reasoning, and (iii) introduces residual-based contracts as differentiable, trajectory-level certificates compatible with automatic differentiation (AD), enabling gradient-based optimization and learning. Together, these elements enable a scalable, verifiable, and trainable modeling pipeline that preserves causality and modularity while supporting data-, physics-, and constraint-informed optimization for CPS.

2026-02-07T15:17:35Z Thomas Beckers Ján Drgoňa Truong X. Nghiem http://arxiv.org/abs/2602.07455v1 RustCompCert: A Verified and Verifying Compiler for a Sequential Subset of Rust 2026-02-07T09:15:35Z

We present our ongoing work on developing an end-to-end verified Rust compiler based on CompCert. It provides two guarantees: one is semantics preservation from Rust to assembly, i.e., the behaviors of source code includes the behaviors of target code, with which the properties verified at the source can be preserved down to the target; the other is memory safety ensured by the verifying compilation -- the borrow checking pass, which can simplify the verification of Rust programs, e.g., by allowing the verification tools focus on the functional correctness.

2026-02-07T09:15:35Z Submitted to Rust Verify 2026 Jinhua Wu Yuting Wang Liukun Yu Linglong Meng http://arxiv.org/abs/2602.07324v1 Static Analysis Under Non-Deterministic Program Assumptions 2026-02-07T02:45:03Z

Static analyses overwhelmingly trade precision for soundness and automation. For this reason, their use-cases are restricted to situations where imprecision isn't prohibitive. In this paper, we propose and specify a static analysis that accepts user-supplied program assumptions that are local to program locations. Such assumptions can be used to counteract imprecision in static analyses, enabling their use in a much wider variety of applications. These assumptions are taken by the analyzer non-deterministically, resulting in a function from sets of accepted assumptions to the resulting analysis under those assumptions. We also demonstrate the utility of such a function in two ways, both of which showcase how it can enable optimization over a search space of assumptions that is otherwise infeasible without the specified analysis.

2026-02-07T02:45:03Z Abdullah H. Rasheed http://arxiv.org/abs/2509.26553v2 Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling 2026-02-06T22:11:41Z

Existing benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data contamination. We present FuncBenchGen, a unified, contamination-free framework that evaluates TaLMs by generating synthetic multi-step tool-use tasks to stress-test TaLMs. The key idea is to cast tool use as traversal over a hidden function-dependency DAG where models must infer the correct sequence of calls to compute a target value. FuncBenchGen allows precise control over task difficulty (e.g., graph size, dependency depth, and distractor functions) while avoiding pretraining/test-time leakage. Our evaluation demonstrates reasoning-optimized models consistently outperform general-purpose models with GPT-5 significantly outperforming other available models. Performance declines sharply as dependency depth increases. Furthermore, connected distractors -- irrelevant functions sharing type-compatible variables with relevant functions -- prove especially difficult to handle. Also, strong models often make syntactically valid function calls but propagate incorrect or stale argument values across steps, revealing brittle state tracking by LLMs in multi-turn tool use. Motivated by this observation, we introduce a simple mitigation strategy that explicitly restates prior variable values to the agent at each step. Surprisingly, this lightweight change yields substantial gains across models. e.g., yielding an improvement in success rate from 62.5% to 81.3% for GPT-5.

2025-09-30T17:21:17Z ICLR 2026 Seiji Maekawa Jackson Hassell Pouya Pezeshkpour Tom Mitchell Estevam Hruschka http://arxiv.org/abs/2602.06715v1 Practical Refinement Session Type Inference (Extended Version) 2026-02-06T14:03:18Z

Session types express and enforce safe communication in concurrent message-passing systems by statically capturing the interaction protocols between processes in the type. Recent works extend session types with arithmetic refinements, which enable additional fine-grained description of communication, but impose additional annotation burden on the programmer. To alleviate this burden, we propose a type inference algorithm for a session type system with arithmetic refinements. We develop a theory of subtyping for session types, including an algorithm which we prove sound with respect to a semantic definition based on type simulation. We also provide a formal inference algorithm that generates type and arithmetic constraints, which are then solved using the Z3 SMT solver. The algorithm has been implemented on top of the Rast language, and includes 3 key optimizations that make inference feasible and practical. We evaluate the efficacy of our inference engine by evaluating it on 6 challenging benchmarks, ranging from unary and binary natural numbers to linear $λ$-calculus. We show the performance benefits provided by our optimizations in coercing Z3 into solving the arithmetic constraints in reasonable time.

2026-02-06T14:03:18Z Toby Ueno Ankush Das http://arxiv.org/abs/2602.06680v1 Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities (Extended Version) 2026-02-06T13:11:26Z

Fixpoint iteration constitutes the algorithmic core of static analyzers. Parallelizing the fixpoint engine can significantly reduce analysis times. Previous approaches typically fix the granularity of tasks upfront, e.g., at the level of program threads or procedures - yielding an engine permanently stuck in one gear. Instead, we propose to parallelize a generic fixpoint engine in a way that is parametric in the task granularity - meaning that our engine can be run in different gears. We build on the top-down solver TD, extended with support for mixed-flow sensitivity, and realize two competing philosophies for parallelization, both building on a task pool that schedules tasks to a fixed number of workers. The nature of tasks differs between the philosophies. In the immediate approach, all tasks access a single thread-safe hash table maintaining solver state, while in the independent approach, each task has its own state and exchanges data with other tasks via a publish/subscribe data structure. We have equipped the fixpoint engine of the static analysis framework Goblint with implementations following both philosophies and report on our results for large real-world programs.

2026-02-06T13:11:26Z Ali Rasim Kocal Michael Schwarz Simmo Saan Helmut Seidl http://arxiv.org/abs/2507.20674v2 LLM-Based Repair of Static Nullability Errors 2026-02-06T10:34:05Z

Modern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves 63% of the 1,119 nullability errors that remain after applying a state-of-the-art annotation inference technique. Unlike two baselines (single-shot prompt and mini-SWE-agent), NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.

2025-07-28T09:55:04Z Nima Karimipour Pascal Joos Michael Pradel Martin Kellogg Manu Sridharan http://arxiv.org/abs/2602.06466v1 Auditing Rust Crates Effectively 2026-02-06T07:51:37Z

We introduce Cargo Scan, the first interactive program analysis tool designed to help developers audit third-party Rust code. Real systems written in Rust rely on thousands of transitive dependencies. These dependencies are as dangerous in Rust as they are in other languages (e.g., C or JavaScript) -- and auditing these dependencies today means manually inspecting every line of code. Unlike for most industrial languages, though, we can take advantage of Rust's type and module system to minimize the amount of code that developers need to inspect to the code that is potentially dangerous. Cargo Scan models such potentially dangerous code as effects and performs a side-effects analysis, tailored to Rust, to identify effects and track them across crate and module boundaries. In most cases (69.2%) developers can inspect flagged effects and decide whether the code is potentially dangerous locally. In some cases, however, the safety of an effect depends on the calling context -- how a function is called, potentially by a crate the developer imports later. Hence, Cargo Scan tracks context-dependent information using a call-graph, and collects audit results into composable and reusable audit files. In this paper, we describe our experience auditing Rust crates with Cargo Scan. In particular, we audit the popular client and server HTTP crate, hyper, and all of its dependencies; our experience shows that Cargo Scan can reduce the auditing burden of potentially dangerous code to a median of 0.2% of lines of code when compared to auditing whole crates. Looking at the Rust ecosystem more broadly, we find that Cargo Scan can automatically classify ~3.5K of the top 10K crates on crates.io as safe; of the crates that do require manual inspection, we find that most of the potentially dangerous side-effects are concentrated in roughly 3% of these crates.

2026-02-06T07:51:37Z Lydia Zoghbi David Thien Ranjit Jhala Deian Stefan Caleb Stanford http://arxiv.org/abs/2510.10216v2 Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis 2026-02-06T07:48:57Z

Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.

2025-10-11T13:43:36Z Zhechong Huang Zhao Zhang Ruyi Ji Tingxuan Xia Qihao Zhu Qinxiang Cao Zeyu Sun Wiggin Zhou Yingfei Xiong http://arxiv.org/abs/2602.06386v1 Uniqueness is Separation 2026-02-06T04:50:17Z

Value independence is enormously beneficial for reasoning about software systems at scale. These benefits carry over into the world of formal verification. Reasoning about programs algebraically is a simple affair in a proof assistant, whereas programs with unconstrained mutation necessitate much more complex techniques, such as Separation Logic, where invariants about memory safety, aliasing, and state changes must be established by manual proof. Uniqueness type systems allow programs to be compiled to code that uses mutation for efficiency, while retaining a semantics that enjoys value independence for reasoning. The restrictions of these type systems, however, are often too onerous for realistic software. Thus, most uniqueness type systems include some "escape hatch" where the benefits of value independence for reasoning are lost, but the restrictions of uniqueness types are lifted. To formally verify a system with such mixed guarantees, the value independence guarantees from uniqueness types must be expressed in terms of imperative, mutable semantics. In other words, we ought to express value independence as an assertion in Separation Logic.

2026-02-06T04:50:17Z Liam O'Connor Pilar Selene Linares Arevalo Christine Rizkallah