https://arxiv.org/api/O4NqeYWIpq1qxvNeNcXwQKEa+Y42026-06-14T12:37:20Z988536015http://arxiv.org/abs/2604.12902v1Towards a Linear-Algebraic Hypervisor2026-04-14T15:49:31ZMany techniques in program synthesis, superoptimization, and array programming require parallel rollouts of general-purpose programs. GPUs, while capable targets for domain-specific parallelism, are traditionally underutilized by such workloads. Motivated by this opportunity, we introduce a pleasingly parallel virtual machine and benchmark its performance by evaluating millions of concurrent array programs, observing speedups up to $147\times$ relative to serial evaluation.2026-04-14T15:49:31ZBreandan Considinehttp://arxiv.org/abs/2604.12713v1Modular Verification of Differential Privacy in Probabilistic Higher-Order Separation Logic (Extended Version)2026-04-14T13:27:24ZDifferential privacy is the standard method for privacy-preserving data analysis. The importance of having strong guarantees on the reliability of implementations of differentially private algorithms is widely recognized and has sparked fruitful research on formal methods. However, the design patterns and language features used in modern DP libraries as well as the classes of guarantees that the library designers wish to establish often fall outside of the scope of previous verification approaches.
We introduce a program logic suitable for verifying differentially private implementations written in complex, general-purpose programming languages. Our logic has first-class support for reasoning about privacy budgets as a separation logic resource. The expressiveness of the logic and the target language allow our approach to handle common programming patterns used in the implementation of libraries for differential privacy, such as privacy filters and caching. While previous work has focused on developing guarantees for programs written in domain-specific languages or for privacy mechanisms in isolation, our logic can reason modularly about primitives, higher-order combinators, and interactive algorithms.
We demonstrate the applicability of our approach by implementing a verified library of differential privacy mechanisms, including an online version of the Sparse Vector Technique, as well as a privacy filter inspired by the popular Python library OpenDP, which crucially relies on our ability to handle the combination of randomization, local state, and higher-order functions. We demonstrate that our specifications are general and reusable by instantiating them to verify clients of our library. All of our results have been foundationally verified in the Rocq Prover.2026-04-14T13:27:24ZPhilipp G. HaselwarterAlejandro AguirreSimon Oddershede GregersenKwing Hei LiJoseph TassarottiLars Birkedal10.1145/3808311http://arxiv.org/abs/2301.06136v7Quantitative Verification with Neural Networks2026-04-14T12:03:17ZWe present a data-driven approach to the quantitative verification of probabilistic programs and stochastic dynamical models. Our approach leverages neural networks to compute tight and sound bounds for the probability that a stochastic process hits a target condition within finite time. This problem subsumes a variety of quantitative verification questions, from the reachability and safety analysis of discrete-time stochastic dynamical models, to the study of assertion-violation and termination analysis of probabilistic programs. We rely on neural networks to represent supermartingale certificates that yield such probability bounds, which we compute using a counterexample-guided inductive synthesis loop: we train the neural certificate while tightening the probability bound over samples of the state space using stochastic optimisation, and then we formally check the certificate's validity over every possible state using satisfiability modulo theories; if we receive a counterexample, we add it to our set of samples and repeat the loop until validity is confirmed. We demonstrate on a diverse set of benchmarks that, thanks to the expressive power of neural networks, our method yields smaller or comparable probability bounds than existing symbolic methods in all cases, and that our approach succeeds on models that are entirely beyond the reach of such alternative techniques.2023-01-15T16:35:36ZLogical Methods in Computer Science, Volume 22, Issue 2 (April 15, 2026) lmcs:13263Alessandro AbateAlec EdwardsMirco GiacobbeHashan PunchihewaDiptarko Roy10.46298/lmcs-22(2:4)2026http://arxiv.org/abs/2511.12253v2The Search for Constrained Random Generators2026-04-14T11:39:00ZAmong the biggest challenges in property-based testing (PBT) is the constrained random generation problem: given a predicate on program values, randomly sample from the set of all values satisfying that predicate, and only those values. Efficient solutions to this problem are critical, since the executable specifications used by PBT often have preconditions that input values must satisfy in order to be valid test cases, and satisfying values are often sparsely distributed.
We propose a novel approach to this problem using ideas from deductive program synthesis. We present a set of synthesis rules, based on a denotational semantics of generators, that give rise to an automatic procedure for synthesizing correct generators. Our system handles recursive predicates by rewriting them as catamorphisms and then matching with appropriate anamorphisms; this is theoretically simpler than other approaches to synthesis for recursive functions, yet still extremely expressive.
Our implementation, Palamedes, is an extensible library for the Lean theorem prover. The synthesis algorithm itself is built on standard proof-search tactics, reducing implementation burden and allowing the algorithm to benefit from further advances in Lean proof automation.2025-11-15T15:18:51ZPublished at PLDI 2026Harrison GoldsteinHila PelegCassia TorczonDaniel SainatiLeonidas LampropoulosBenjamin C. Pierce10.1145/3808329http://arxiv.org/abs/2309.00192v2Logical Relations for Session-Typed Concurrency2026-04-14T03:04:46ZProgram equivalence is the fulcrum for reasoning about and proving properties of programs. For noninterference, for example, program equivalence up to the secrecy level of an observer is shown. A powerful enabler for such proofs are logical relations. Logical relations only recently were adopted for session types -- but exclusively for terminating languages. This paper scales logical relations to general recursive session types. It develops a logical relation for progress-sensitive noninterference (PSNI) for intuitionistic linear logic session types (ILLST), tackling the challenges non-termination and concurrency pose, and shows that logical equivalence is sound and complete with regard to closure of weak bisimilarity under parallel composition, using a biorthogonality argument. A distinguishing feature of the logical relation is its stratification with an observation index (as opposed to a step or unfolding index), a crucial shift to make the logical relation closed under parallel composition in a concurrent setting. To demonstrate practicality of the logical relation, the paper develops an information flow control (IFC) refinement type system for ILLST, with support of secrecy-polymorphic processes, and shows that well-typed programs are self-related by the logical relation and thus enjoy PSNI. The refinement type system has been implemented in a type checker, featuring local security theories to support secrecy-polymorphic processes.2023-09-01T01:13:42ZarXiv admin note: text overlap with arXiv:2208.13741Stephanie BalzerFarzaneh DerakhshanRobert HarperYue Yaohttp://arxiv.org/abs/2604.11935v1Polyregular equivalence is undecidable in higher-order types2026-04-13T18:23:10ZIt is open whether equivalence ( f = g ) is decidable for string-to-string polyregular functions. We consider their higher-order extension based on the λ-calculus definition of polyregular functions from Bojańczyk (2018). In this setting, equivalence is undecidable by reduction from the tiling problem.2026-04-13T18:23:10ZMikołaj BojańczykGrzegorz FabiańskiRafał Stefańskihttp://arxiv.org/abs/2604.11369v1Fast Atomicity Monitoring2026-04-13T12:10:50ZAtomicity is a fundamental abstraction in concurrency, specifying that program behavior can be understood by considering specific code blocks executing atomically. However, atomicity invariants are tricky to maintain while also optimizing for code efficiency, and atomicity violations are a common root cause of many concurrency bugs. To address this problem, several dynamic techniques have been developed for testing whether a program execution adheres to an atomicity specification, most often instantiated as \emph{conflict serializability}. The efficiency of the analysis has been targeted in various papers, with the state-of-the-art algorithms \textsc{RegionTrack} and \textsc{Aerodrome} achieving a time complexity $O(nk^3)$ and $O(nk(k + v + \ell))$, respectively, for a trace $σ$ of $n$ events, $k$ threads, $v$ locations, and $\ell$ locks.
In this paper we introduce \textsc{AtomSanitizer}, a new algorithm for testing conflict serializability, with time complexity $O(nk^2)$. \textsc{AtomSanitizer} operates in an efficient streaming style, is theoretically faster than all existing algorithms, and also has a smaller memory footprint. Moreover, \textsc{AtomSanitizer} is the first algorithm designed to incur minimal locking when deployed in a concurrent monitoring setting. Experiments on standard benchmarks indicate that \textsc{AtomSanitizer} is always faster in practice than all existing conflict-serializability testers. Finally, we also implement \textsc{AtomSanitizer} inside the TSAN framework, for monitoring atomicity in real time. Our experiments reveal that \textsc{AtomSanitizer} incurs minimal time and space overhead compared to the data-race detection engine of TSAN, and thus is the first algorithm for conflict serializability demonstrated to be suitable for a runtime monitoring setting.2026-04-13T12:10:50ZHünkar Can TunYifan DongAndreas Pavlogiannishttp://arxiv.org/abs/2602.12973v2Meta-Monomorphizing Specializations2026-04-13T11:40:48ZAchieving zero-cost specialization remains a fundamental challenge in programming language and compiler design. It often necessitates trade-offs between expressive power and type system soundness, as the interaction between conditional compilation and static dispatch can easily lead to unforeseen coherence violations and increased complexity in the formal model. This paper introduces meta-monomorphizing specializations, a novel framework that achieves specialization by repurposing monomorphization through compile-time metaprogramming. Instead of modifying the host compiler, our approach generates meta-monomorphized traits and implementations that encode specialization constraints directly into the type structure, enabling deterministic, coherent dispatch without overlapping instances. We formalize this method for first-order, predicate-based, and higher-ranked polymorphic specialization, also in presence of lifetime parameters. Our evaluation, based on a Rust implementation using only existing macro facilities, demonstrates that meta-monomorphization enables expressive specialization patterns while maintaining full compatibility with standard optimization pipelines. We show that specialization can be realized as a disciplined metaprogramming layer, offering a practical, language-agnostic path to high-performance abstraction. A comprehensive study of public Rust codebases further validates our approach, revealing numerous workarounds that meta-monomorphization can eliminate, leading to more idiomatic and efficient code. An empirical evaluation on 16 micro-benchmarks confirms that compile-time specialization matches or outperforms runtime TypeId-based dispatch, and demonstrates expressiveness gains on patterns -- such as lifetime-based dispatch, higher-ranked types, compound predicates, and wildcard matching -- that runtime dispatch structurally cannot express.2026-02-13T14:47:44Z36 pagesFederico BruzzoneWalter Cazzolahttp://arxiv.org/abs/2604.11029v1A Categorical Basis for Robust Program Analysis2026-04-13T05:52:39ZUsers of program analyses expect that results change predictably in response to changes in their programs, but many analyses fail to provide such robustness. This paper introduces a theoretical framework that provides a unified language to articulate robustness properties. By modeling programs and their properties as objects in a category, diverse notions of robustness-from variable renaming to semantic refinement and structural transformation-can be characterized as structure-preserving functors.
Beyond formulating the meaning of robustness, this paper provides methods for achieving it. The first is a general recipe for designing robust analyses, by lifting a sound and robust analysis from a restricted (sub-Turing) model of computation to a sound and robust analysis for general programs. This recipe demystifies the design of several existing loop summarization and termination analyses by showing they are instantiations of this general recipe, and furthermore elucidates their robustness properties. The second is a characterization of a sense in which an algebraic program analysis is robust, provided that it is comprised of robust operators. In particular, we show that such analyses behave predictably under common refactoring patterns, such as variable renaming and loop unrolling.2026-04-13T05:52:39ZAccepted to PLDI 2026Zachary KincaidShaowei Zhuhttp://arxiv.org/abs/2412.14399v2NESA: Relational Neuro-Symbolic Static Program Analysis2026-04-13T03:46:05ZStatic program analysis plays an essential role in program optimization, bug detection, and debugging. However, reliance on compilation and limited customization hinder its adoption in the real world. This paper presents a compositional neuro-symbolic approach named NESA that facilitates compilation-free and customizable static program analysis using large language models (LLMs) with mitigated hallucinations. Specifically, we propose an analysis policy language, a restricted form of Datalog, to support users decomposing a static program analysis problem into several sub-problems that target simpler syntactic or semantic properties upon smaller code snippets. The problem decomposition enables the LLMs to target more manageable semantic-related sub-problems with reduced hallucinations, while the syntactic ones are resolved by parsing-based analysis without hallucinations. An analysis policy then is evaluated with lazy and incremental prompting, which significantly mitigates the hallucinations and improves the performance. We evaluate NESA for program slicing and bug detection upon benchmark and real-world programs. Evaluation results show that while NESA supports compilation-free and customizable analysis, it can still achieve comparable and even better performance than existing techniques. In a customized taint vulnerability detection upon TaintBench, for example, NESA achieves a precision of 66.27%, a recall of 78.57%, and an F1 score of 0.72, surpassing an industrial approach by 0.20 in F1 score. NESA also detects 13 real-world memory leak bugs, which have been fixed by developers.2024-12-18T23:14:59Z24 pages, 8 figures, 10 tablesChengpeng WangYifei GaoWuqi ZhangXuwei LiuJinyao GuoMingwei ZhengQingkai ShiXiangyu Zhang10.1145/3808161http://arxiv.org/abs/2604.10800v1Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis2026-04-12T20:22:23ZLearned classifiers deployed in agentic pipelines face a fundamental reliability problem: predictions are probabilistic inferences, not verified conclusions, and acting on them without grounding in observable evidence leads to compounding failures across downstream stages. Software vulnerability analysis makes this cost concrete and measurable. We address this through a unified cross-language vulnerability lifecycle framework built around three LLM-driven reasoning stages-hybrid structural-semantic detection, execution-grounded agentic validation, and validation-aware iterative repair-governed by a strict invariant: no repair action is taken without execution-based confirmation of exploitability. Cross-language generalization is achieved via a Universal Abstract Syntax Tree (uAST) normalizing Java, Python, and C++ into a shared structural schema, combined with a hybrid fusion of GraphSAGE and Qwen2.5-Coder-1.5B embeddings through learned two-way gating, whose per-sample weights provide intrinsic explainability at no additional cost. The framework achieves 89.84-92.02% intra-language detection accuracy and 74.43-80.12% zero-shot cross-language F1, resolving 69.74% of vulnerabilities end-to-end at a 12.27% total failure rate. Ablations establish necessity: removing uAST degrades cross-language F1 by 23.42%, while disabling validation increases unnecessary repairs by 131.7%. These results demonstrate that execution-grounded closed-loop reasoning is a principled and practically deployable mechanism for trustworthy LLM-driven agentic AI.2026-04-12T20:22:23Z20 pages (13 main + 7 appendices), 9 figures, 10 tables. Submitted to NeurIPS 2026Jugal Gajjarhttp://arxiv.org/abs/2604.10646v1Denotational reasoning for asynchronous multiparty session types2026-04-12T13:47:13ZWe provide the first denotational semantics for asynchronous multiparty session types with precise asynchronous subtyping. Our semantics enables us to reason about asynchronous message-passing, in which message-sending is non-blocking. It enables us to prove the correctness of communication optimisations, in particular, those involving reordering of messages. Our development crucially relies on modelling message-passing as a computational effect. We apply grading, a paradigm for tracking computational effects, to asynchronous message-passing, demonstrating that multiparty session typing can be viewed as an instance of grading. We demonstrate the utility of our model by showing that it forms an adequate denotational semantics for a call-by-value asynchronous message-passing calculus, that ensures communication safety, deadlock-freedom and liveness in the presence of communication optimisations.2026-04-12T13:47:13ZTo appear at ESOP 2026; this version adds an additional appendix of proofsDylan McDermottNobuko Yoshida10.1007/978-3-032-22723-2_3http://arxiv.org/abs/2604.10520v1ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization2026-04-12T08:20:45ZAs Large Language Models (LLMs) have become capable of generating long and descriptive code summaries, accurate and reliable evaluation of factual consistency has become a critical challenge. However, previous evaluation methods are primarily designed for short summaries of isolated code snippets. Consequently, they struggle to provide fine-grained evaluation of multi-sentence functionalities and fail to accurately assess dependency context commonly found in real-world code summaries. To address this, we propose ReFEree, a reference-free and fine-grained method for evaluating factual consistency in real-world code summaries. We define factual inconsistency criteria specific to code summaries and evaluate them at the segment level using these criteria along with dependency information. These segment-level results are then aggregated into a fine-grained score. We construct a code summarization benchmark with human-annotated factual consistency labels. The evaluation results demonstrate that ReFEree achieves the highest correlation with human judgment among 13 baselines, improving 15-18% over the previous state-of-the-art. Our code and data are available at https://github.com/bsy99615/ReFEree.git.2026-04-12T08:20:45ZAccepted to ACL 2026 main. 25 pagesSuyoung BaeCheolWon NaJaehoon LeeYumin LeeYunSeok ChoiJee-Hyong Leehttp://arxiv.org/abs/2604.10445v1Points-to Analysis Using MDE: A Multi-level Deduplication Engine for Repetitive Data and Operations2026-04-12T03:55:50ZPrecise pointer analysis is a foundational component of many client analyses and optimizations. Scaling flow- and context-sensitive pointer analysis has been a long-standing challenge, suffering from combinatorial growth in both memory usage and runtime. Existing approaches address this primarily by reducing the amount of information tracked often, at the cost of precision and soundness. In our experience a significant proportion of this cost comes from propagation of duplicate data and low-level data structure operations being repeated a large number of times. Our measurements on SPEC benchmarks show that more than 90% of all set-union operations performed can be redundant.
We present Multi-level Deduplication Engine (MDE), a mechanism that recursively augments the representation of data through de-duplication and the assignment of unique identifiers to values to eliminate redundancy. This allows MDE to trivialize many operations, and memoize operations enabling their future reuse. MDE's recursive structure allows it to represent de-duplicated values that themselves are constructed from other de-deuplicated values, capturing structural redundancy not easily possible with non-recursive techniques.
We provide a full C++ implementation of MDE as a library and integrate it into an existing implementation of a flow- and context-sensitive pointer analysis. Evaluation on selected SPEC benchmarks shows a reduction up to 18.10x in peak memory usage and 8.15x in runtime. More notably, MDE exhibits an upward trend of effectiveness with the increase in benchmark size.
Besides performance improvements, this work highlights the importance of representation design and suggests new opportunities for bringing efficiency to future analyses.2026-04-12T03:55:50ZSubmitted to the Journal of Software: Practice and Experience. 38 PagesAnamitra GhoruiAditi RasteUday P. Khedkerhttp://arxiv.org/abs/2511.15403v2MutDafny: A Mutation-Based Approach to Assess Dafny Specifications2026-04-12T02:35:31ZIn verification-aware languages, such as Dafny, despite their critical role, specifications are as prone to error as implementations. Flaws in specifications can result in formally verified programs that deviate from the intended behavior. In this paper, we explore the use of mutation testing to reveal weaknesses in formal specifications written in Dafny.
We present MutDafny, a tool that increases the reliability of Dafny specifications by automatically signaling potential weaknesses. Using a mutation testing approach, we introduce faults (mutations) into the code and rely on formal specifications for detecting them. If a program with a mutant verifies, this may indicate a weakness in the specification. We extensively analyze mutation operators from popular tools, identifying the ones applicable to Dafny. In addition, we synthesize new operators tailored for the language from bugfix commits in publicly available Dafny projects on GitHub. Drawing from both, we equipped our tool with a total of 40 mutation operators. We evaluate MutDafny's effectiveness and efficiency on a dataset of 794 real-world Dafny programs, and manually analyze a subset of the resulting undetected mutants, identifying five weak real-world specifications (on average, one at every 241 lines of code) that would benefit from strengthening.2025-11-19T12:58:06ZAccepted by the 48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026, Rio de Janeiro, BrazilIsabel AmaralAlexandra MendesJosé Campos