https://arxiv.org/api/449GrefaZ6vyhiz1NVyVBAgQ5u02026-06-21T08:59:28Z991866015http://arxiv.org/abs/2602.09197v1Global Protocols under Rendezvous Synchrony: From Realizability to Type Checking2026-02-09T21:05:13ZGlobal protocol specifications are the starting point of top-down verification methodologies, and serve as a blueprint for synthesizing local specifications that guarantee the correctness of distributed implementations. In this work, we study global protocol specifications targeting distributed processes that communicate via rendezvous synchrony. We obtain the following positive results for the synchronous realizability problem: (a) realizability is decidable for global protocols over a transitive concurrency alphabet in 2-EXPTIME in the size of the protocol, and in 3-EXPTIME in the size of the alphabet, and (b) realizability is decidable in EXPTIME for global protocols that unambiguously represent their trace language. Unambiguous global protocols encompass fragments of directed and sender-driven choice protocols studied in prior work. Further, our reductions admit the corollary that the synchronous verification problem is solvable with the same complexity, where a candidate realization is included as part of the input. We then prove a negative result stating that synchronous realizability is undecidable in general. Finally, we propose a type system to check pi-calculus processes against local specifications in the form of synchronous communicating state machines. Our type system is the first to support processes with local mixed choice in the presence of session interleaving and~delegation.2026-02-09T21:05:13ZElaine LiFelix Stutzhttp://arxiv.org/abs/2602.08846v1Impredicativity in Linear Dependent Type Theory2026-02-09T16:12:57ZWe construct a realizability model of linear dependent type theory from a linear combinatory algebra. Our model motivates a number of additions to the type theory. In particular, we add a universe with two decoding operations: one takes codes to cartesian types and the other takes codes to linear types. The universe is impredicative in the sense that it is closed under both large cartesian dependent products and large linear dependent products. We also add a rule for injectivity of the modality turning linear terms into cartesian terms. With all of the additions, we are able to encode (linear) inductive types. As a case study, we consider the type of lists over a linear type, and demonstrate that our encoding has the relevant uniqueness principle. The construction of the realizability model is fully formalized in the proof assistant Rocq.2026-02-09T16:12:57Z20 pages, 2 figuresSam SpeightNiels van der Weidehttp://arxiv.org/abs/2601.02563v3Compressed code: the hidden effects of quantization and distillation on programming tokens2026-02-08T20:13:10ZLarge Language Models (LLMs) have demonstrated exceptional code generation capabilities, yet their token-level mechanisms remain underexplored, particularly in compressed models. Through systematic analysis of programming language token representations, we characterize how programming languages are encoded in LLM tokenizers by analyzing their vocabulary distribution and keyword coverage patterns. We introduce a novel cold-start probability analysis method that provides insights into model behavior without requiring explicit prompts. Additionally, we present a comprehensive evaluation of how different model optimization techniques - including quantization, distillation, model scaling, and task-specific fine-tuning - affect token-level representations and code generation quality. Our experiments, supported by comprehensive probability distribution analysis and evaluation metrics, reveal critical insights into token-level behavior and provide empirically-validated guidelines for maintaining code generation quality under various optimization constraints. These findings advance both theoretical understanding of LLM code generation and practical implementation of optimized models in production environments.2026-01-05T21:32:47Z18 pages, 1 figure and 6 tablesViacheslav SiniaevIaroslav ChelombitkoAleksey Komissarovhttp://arxiv.org/abs/2602.07742v1Gillian Debugging: Swinging Through the (Compositional Symbolic Execution) Trees, Extended Version2026-02-08T00:32:59ZIn recent years, compositional symbolic execution (CSE) tools have been growing in prominence and are becoming more and more applicable to real-world codebases. Still to this day, however, debugging the output of these tools remains difficult, even for specialist users. To address this, we introduce a debugging interface for symbolic execution tools, integrated with Visual Studio Code and the Gillian multi-language CSE platform, with strong focus on visualisation, interactivity, and intuitive representation of symbolic execution trees. We take care in making this interface tool-agnostic, easing its transfer to other symbolic analysis tools in future. We empirically evaluate our work with a user study, the results of which show the debugger's usefulness in helping early researchers understand the principles of CSE and verify fundamental data structure algorithms in Gillian.2026-02-08T00:32:59Z24 pages, 11 figures. To be published at TACAS 2026Nat KarmiosSacha-Élie AyounPhilippa Gardnerhttp://arxiv.org/abs/2602.07627v1Series-Parallel-Loop Decompositions of Control-flow Graphs2026-02-07T17:16:48ZControl-flow graphs (CFGs) of structured programs are well known to exhibit strong sparsity properties. Traditionally, this sparsity has been modeled using graph parameters such as treewidth and pathwidth, enabling the development of faster parameterized algorithms for tasks in compiler optimization, model checking, and program analysis. However, these parameters only approximate the structural constraints of CFGs: although every structured CFG has treewidth at most~7, many graphs with treewidth at most~7 cannot arise as CFGs. As a result, existing parameterized techniques are optimized for a substantially broader class of graphs than those encountered in practice.
In this work, we introduce a new grammar-based decomposition framework that characterizes \emph{exactly} the class of control-flow graphs generated by structured programs. Our decomposition is intuitive, mirrors the syntactic structure of programs, and remains fully compatible with the dynamic-programming paradigm of treewidth-based methods. Using this framework, we design improved algorithms for two classical compiler optimization problems: \emph{Register Allocation} and \emph{Lifetime-Optimal Speculative Partial Redundancy Elimination (LOSPRE)}. Extensive experimental evaluation demonstrates significant performance improvements over previous state-of-the-art approaches, highlighting the benefits of using decompositions tailored specifically to CFGs.2026-02-07T17:16:48ZXuran CaiAmir GoharshadyS HitarthChun Kit Lam10.1016/j.sysarc.2026.103732http://arxiv.org/abs/2602.07581v1$\partial$CBDs: Differentiable Causal Block Diagrams2026-02-07T15:17:35ZModern cyber-physical systems (CPS) integrate physics, computation, and learning, demanding modeling frameworks that are simultaneously composable, learnable, and verifiable. Yet existing approaches treat these goals in isolation: causal block diagrams (CBDs) support modular system interconnections but lack differentiability for learning; differentiable programming (DP) enables end-to-end gradient-based optimization but provides limited correctness guarantees; while contract-based verification frameworks remain largely disconnected from data-driven model refinement. To address these limitations, we introduce differentiable causal block diagrams ($\partial$CBDs), a unifying formalism that integrates these three perspectives. Our approach (i) retains the compositional structure and execution semantics of CBDs, (ii) incorporates assume--guarantee (A--G) contracts for modular correctness reasoning, and (iii) introduces residual-based contracts as differentiable, trajectory-level certificates compatible with automatic differentiation (AD), enabling gradient-based optimization and learning. Together, these elements enable a scalable, verifiable, and trainable modeling pipeline that preserves causality and modularity while supporting data-, physics-, and constraint-informed optimization for CPS.2026-02-07T15:17:35ZThomas BeckersJán DrgoňaTruong X. Nghiemhttp://arxiv.org/abs/2602.07455v1RustCompCert: A Verified and Verifying Compiler for a Sequential Subset of Rust2026-02-07T09:15:35ZWe present our ongoing work on developing an end-to-end verified Rust compiler based on CompCert. It provides two guarantees: one is semantics preservation from Rust to assembly, i.e., the behaviors of source code includes the behaviors of target code, with which the properties verified at the source can be preserved down to the target; the other is memory safety ensured by the verifying compilation -- the borrow checking pass, which can simplify the verification of Rust programs, e.g., by allowing the verification tools focus on the functional correctness.2026-02-07T09:15:35ZSubmitted to Rust Verify 2026Jinhua WuYuting WangLiukun YuLinglong Menghttp://arxiv.org/abs/2602.07324v1Static Analysis Under Non-Deterministic Program Assumptions2026-02-07T02:45:03ZStatic analyses overwhelmingly trade precision for soundness and automation. For this reason, their use-cases are restricted to situations where imprecision isn't prohibitive. In this paper, we propose and specify a static analysis that accepts user-supplied program assumptions that are local to program locations. Such assumptions can be used to counteract imprecision in static analyses, enabling their use in a much wider variety of applications. These assumptions are taken by the analyzer non-deterministically, resulting in a function from sets of accepted assumptions to the resulting analysis under those assumptions. We also demonstrate the utility of such a function in two ways, both of which showcase how it can enable optimization over a search space of assumptions that is otherwise infeasible without the specified analysis.2026-02-07T02:45:03ZAbdullah H. Rasheedhttp://arxiv.org/abs/2509.26553v2Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling2026-02-06T22:11:41ZExisting benchmarks for tool-augmented language models (TaLMs) lack fine-grained control over task difficulty and remain vulnerable to data contamination. We present FuncBenchGen, a unified, contamination-free framework that evaluates TaLMs by generating synthetic multi-step tool-use tasks to stress-test TaLMs. The key idea is to cast tool use as traversal over a hidden function-dependency DAG where models must infer the correct sequence of calls to compute a target value. FuncBenchGen allows precise control over task difficulty (e.g., graph size, dependency depth, and distractor functions) while avoiding pretraining/test-time leakage. Our evaluation demonstrates reasoning-optimized models consistently outperform general-purpose models with GPT-5 significantly outperforming other available models. Performance declines sharply as dependency depth increases. Furthermore, connected distractors -- irrelevant functions sharing type-compatible variables with relevant functions -- prove especially difficult to handle. Also, strong models often make syntactically valid function calls but propagate incorrect or stale argument values across steps, revealing brittle state tracking by LLMs in multi-turn tool use. Motivated by this observation, we introduce a simple mitigation strategy that explicitly restates prior variable values to the agent at each step. Surprisingly, this lightweight change yields substantial gains across models. e.g., yielding an improvement in success rate from 62.5% to 81.3% for GPT-5.2025-09-30T17:21:17ZICLR 2026Seiji MaekawaJackson HassellPouya PezeshkpourTom MitchellEstevam Hruschkahttp://arxiv.org/abs/2602.06715v1Practical Refinement Session Type Inference (Extended Version)2026-02-06T14:03:18ZSession types express and enforce safe communication in concurrent message-passing systems by statically capturing the interaction protocols between processes in the type. Recent works extend session types with arithmetic refinements, which enable additional fine-grained description of communication, but impose additional annotation burden on the programmer. To alleviate this burden, we propose a type inference algorithm for a session type system with arithmetic refinements. We develop a theory of subtyping for session types, including an algorithm which we prove sound with respect to a semantic definition based on type simulation. We also provide a formal inference algorithm that generates type and arithmetic constraints, which are then solved using the Z3 SMT solver. The algorithm has been implemented on top of the Rast language, and includes 3 key optimizations that make inference feasible and practical. We evaluate the efficacy of our inference engine by evaluating it on 6 challenging benchmarks, ranging from unary and binary natural numbers to linear $λ$-calculus. We show the performance benefits provided by our optimizations in coercing Z3 into solving the arithmetic constraints in reasonable time.2026-02-06T14:03:18ZToby UenoAnkush Dashttp://arxiv.org/abs/2602.06680v1Same Engine, Multiple Gears: Parallelizing Fixpoint Iteration at Different Granularities (Extended Version)2026-02-06T13:11:26ZFixpoint iteration constitutes the algorithmic core of static analyzers. Parallelizing the fixpoint engine can significantly reduce analysis times. Previous approaches typically fix the granularity of tasks upfront, e.g., at the level of program threads or procedures - yielding an engine permanently stuck in one gear. Instead, we propose to parallelize a generic fixpoint engine in a way that is parametric in the task granularity - meaning that our engine can be run in different gears. We build on the top-down solver TD, extended with support for mixed-flow sensitivity, and realize two competing philosophies for parallelization, both building on a task pool that schedules tasks to a fixed number of workers. The nature of tasks differs between the philosophies. In the immediate approach, all tasks access a single thread-safe hash table maintaining solver state, while in the independent approach, each task has its own state and exchanges data with other tasks via a publish/subscribe data structure. We have equipped the fixpoint engine of the static analysis framework Goblint with implementations following both philosophies and report on our results for large real-world programs.2026-02-06T13:11:26ZAli Rasim KocalMichael SchwarzSimmo SaanHelmut Seidlhttp://arxiv.org/abs/2507.20674v2LLM-Based Repair of Static Nullability Errors2026-02-06T10:34:05ZModern Java projects increasingly adopt static analysis tools that prevent null-pointer exceptions by treating nullness as a type property. However, integrating such tools into large, existing codebases remains a significant challenge. While annotation inference can eliminate many errors automatically, a subset of residual errors -- typically a mix of real bugs and false positives -- often persist and can only be resolved via code changes. Manually addressing these errors is tedious and error-prone. Large language models (LLMs) offer a promising path toward automating these repairs, but naively-prompted LLMs often generate incorrect, contextually-inappropriate edits. We present NullRepair, a system that integrates LLMs into a structured workflow for resolving the errors from a nullability checker. NullRepair's decision process follows a flowchart derived from manual analysis of 200 real-world errors. It leverages static analysis to identify safe and unsafe usage regions of symbols, using error-free usage examples to contextualize model prompts. Patches are generated through an iterative interaction with the LLM that incorporates project-wide context and decision logic. Our evaluation on 12 real-world Java projects shows that NullRepair resolves 63% of the 1,119 nullability errors that remain after applying a state-of-the-art annotation inference technique. Unlike two baselines (single-shot prompt and mini-SWE-agent), NullRepair also largely preserves program semantics, with all unit tests passing in 10/12 projects after applying every edit proposed by NullRepair, and 98% or more tests passing in the remaining two projects.2025-07-28T09:55:04ZNima KarimipourPascal JoosMichael PradelMartin KelloggManu Sridharanhttp://arxiv.org/abs/2602.06466v1Auditing Rust Crates Effectively2026-02-06T07:51:37ZWe introduce Cargo Scan, the first interactive program analysis tool designed to help developers audit third-party Rust code. Real systems written in Rust rely on thousands of transitive dependencies. These dependencies are as dangerous in Rust as they are in other languages (e.g., C or JavaScript) -- and auditing these dependencies today means manually inspecting every line of code. Unlike for most industrial languages, though, we can take advantage of Rust's type and module system to minimize the amount of code that developers need to inspect to the code that is potentially dangerous. Cargo Scan models such potentially dangerous code as effects and performs a side-effects analysis, tailored to Rust, to identify effects and track them across crate and module boundaries. In most cases (69.2%) developers can inspect flagged effects and decide whether the code is potentially dangerous locally. In some cases, however, the safety of an effect depends on the calling context -- how a function is called, potentially by a crate the developer imports later. Hence, Cargo Scan tracks context-dependent information using a call-graph, and collects audit results into composable and reusable audit files. In this paper, we describe our experience auditing Rust crates with Cargo Scan. In particular, we audit the popular client and server HTTP crate, hyper, and all of its dependencies; our experience shows that Cargo Scan can reduce the auditing burden of potentially dangerous code to a median of 0.2% of lines of code when compared to auditing whole crates. Looking at the Rust ecosystem more broadly, we find that Cargo Scan can automatically classify ~3.5K of the top 10K crates on crates.io as safe; of the crates that do require manual inspection, we find that most of the potentially dangerous side-effects are concentrated in roughly 3% of these crates.2026-02-06T07:51:37ZLydia ZoghbiDavid ThienRanjit JhalaDeian StefanCaleb Stanfordhttp://arxiv.org/abs/2510.10216v2Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis2026-02-06T07:48:57ZLanguage models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.2025-10-11T13:43:36ZZhechong HuangZhao ZhangRuyi JiTingxuan XiaQihao ZhuQinxiang CaoZeyu SunWiggin ZhouYingfei Xionghttp://arxiv.org/abs/2602.06386v1Uniqueness is Separation2026-02-06T04:50:17ZValue independence is enormously beneficial for reasoning about software systems at scale. These benefits carry over into the world of formal verification. Reasoning about programs algebraically is a simple affair in a proof assistant, whereas programs with unconstrained mutation necessitate much more complex techniques, such as Separation Logic, where invariants about memory safety, aliasing, and state changes must be established by manual proof. Uniqueness type systems allow programs to be compiled to code that uses mutation for efficiency, while retaining a semantics that enjoys value independence for reasoning. The restrictions of these type systems, however, are often too onerous for realistic software. Thus, most uniqueness type systems include some "escape hatch" where the benefits of value independence for reasoning are lost, but the restrictions of uniqueness types are lifted. To formally verify a system with such mixed guarantees, the value independence guarantees from uniqueness types must be expressed in terms of imperative, mutable semantics. In other words, we ought to express value independence as an assertion in Separation Logic.2026-02-06T04:50:17ZLiam O'ConnorPilar Selene Linares ArevaloChristine Rizkallah