https://arxiv.org/api/TXZD+diV3ZTeA81cNvg5kMxRjZc 2026-06-15T03:32:00Z 9888 570 15 http://arxiv.org/abs/2511.03946v5 Modular abstract syntax trees (MAST): substitution tensors with second-class sorts 2026-02-27T19:38:17Z

We adapt Fiore, Plotkin, and Turi's treatment of abstract syntax with binding, substitution, and holes to account for languages with second-class sorts. These situations include programming calculi such as the Call-by-Value lambda-calculus (CBV) and Levy's Call-by-Push-Value (CBPV). Prohibiting second-class sorts from appearing in variable contexts changes the characterisation of the abstract syntax from monoids in monoidal categories to actions in actegories. We reproduce much of the development through bicategorical arguments. We apply the resulting theory by proving substitution lemmata for varieties of CBV.

2025-11-06T00:44:35Z Electronic Notes in Theoretical Informatics and Computer Science, Volume 5 - Proceedings of MFPS XLI (December 20, 2025) entics:16879 Marcelo P. Fiore Ohad Kammar Georg Moser Sam Staton 10.46298/entics.16879 http://arxiv.org/abs/2602.23927v1 Mixed Choice in Asynchronous Multiparty Session Types 2026-02-27T11:21:58Z

We present a multiparty session type (MST) framework with asynchronous mixed choice (MC). We propose a core construct for MC that allows transient inconsistencies in protocol state between distributed participants, but ensures all participants can always eventually reach a mutually consistent state. We prove the correctness of our system by establishing a progress property and an operational correspondence between global types and distributed local type projections. Based on our theory, we implement a practical toolchain for specifying and validating asynchronous MST protocols featuring MC, and programming compliant gen_statem processes in Erlang/OTP. We test our framework by using our toolchain to specify and reimplement part of the amqp_client of the RabbitMQ broker for Erlang.

2026-02-27T11:21:58Z Laura Bocchi Raymond Hu Adriana Laura Voinea Simon Thompson http://arxiv.org/abs/2510.11573v2 (Dis)Proving Spectre Security with Speculation-Passing Style 2026-02-27T08:22:09Z

Constant-time (CT) verification tools are commonly used for detecting potential side-channel vulnerabilities in cryptographic libraries. Recently, a new class of tools, called speculative constant-time (SCT) tools, has also been used for detecting potential Spectre vulnerabilities. In many cases, these SCT tools have emerged as liftings of CT tools. However, these liftings are seldom defined precisely and are almost never analyzed formally. The goal of this paper is to address this gap, by developing formal foundations for these liftings, and to demonstrate that these foundations can yield practical benefits. Concretely, we introduce a program transformation, coined Speculation-Passing Style (SPS), for reducing SCT verification to CT verification. Essentially, the transformation instruments the program with a new input that corresponds to attacker-controlled predictions and modifies the program to follow them. This approach is sound and complete, in the sense that a program is SCT if and only if its SPS transform is CT. Thus, we can leverage existing CT verification tools to prove SCT; we illustrate this by combining SPS with three standard methodologies for CT verification, namely reducing it to non-interference, assertion safety and dynamic taint analysis. We realize these combinations with three existing tools, EasyCrypt, BINSEC, and ctgrind, and we evaluate them on Kocher's benchmarks for Spectre-v1. Our results focus on Spectre-v1 in the standard CT leakage model; however, we also discuss applications of our method to other variants of Spectre and other leakage models.

2025-10-13T16:19:54Z Santiago Arranz-Olmos Gilles Barthe Lionel Blatter Xingyu Xie Zhiyuan Zhang http://arxiv.org/abs/2602.23216v2 Array-Carrying Symbolic Execution for Function Contract Generation 2026-02-27T07:55:17Z

Function contract generation is a classical problem in program analysis that targets the automated analysis of functions in a program with multiple procedures. The problem is fundamental in inter-procedural analysis where properties of functions are first obtained via the generation of function contracts and then the generated contracts are used as building blocks to analyze the whole program. Typical objectives in function contract generation include pre-/post-conditions and assigns information (that specifies the modification information over program variables and memory segments during function execution). In programs with array manipulations, a crucial point in function contract generation is the treatment of array segments that imposes challenges in inferring invariants and assigns information over such segments. To address this challenge, we propose a novel symbolic execution framework that carries invariants and assigns information over contiguous segments of arrays. We implement our framework as a prototype within LLVM, and further integrate our prototype with the ACSL assertion format and the Frama-C software verification platform. Experimental evaluation over a variety of benchmarks from the literature and functions from realistic libraries shows that our framework is capable of handling array manipulating functions that indeed involve the carry of array information and are beyond existing approaches.

2026-02-26T17:00:10Z 30 pages, 2 figures. To appear in the 27th International Symposium on Formal Methods (FM 2026) Weijie Lu Jingyu Ke Hongfei Fu Zhouyue Sun Yi Zhou Guoqiang Li Haokun Li http://arxiv.org/abs/2511.20369v4 The Ghosts of Empires: Extracting Modularity from Interleaving-Based Proofs (Extended Version) 2026-02-26T14:50:58Z

Implementation bugs threaten the soundness of algorithmic software verifiers. Generating correctness certificates for correct programs allows for efficient independent validation of verification results, and thus helps to reveal such bugs. Automatic generation of small, compact correctness proofs for concurrent programs is challenging, as the correctness arguments may depend on the particular interleaving, which can lead to exponential explosion. We present an approach that converts an interleaving-based correctness proof, as generated by many algorithmic verifiers, into a thread-modular correctness proof in the style of Owicki and Gries. We automatically synthesize ghost variables that capture the relevant interleaving information, and abstract away irrelevant details. Our evaluation shows that the approach is efficient in practice and generates compact proofs, compared to a baseline.

2025-11-25T14:51:05Z 39 pages, 10 figures, 1 table. Extended version with proofs of the paper published at POPL'2026 (https://doi.org/10.1145/3776684) [corrections in Fig. 6] Frank Schüssele Matthias Zumkeller Miriam Lagunes-Rochin Dominik Klumpp 10.1145/3776684 http://arxiv.org/abs/2602.16913v2 A Reversible Semantics for Janus 2026-02-26T09:39:41Z

Janus is a paradigmatic example of a reversible programming language. Indeed, Janus programs can be executed backwards as well as forwards. However, its current small-step semantics (useful, e.g., for debugging or as a basis for extensions with concurrency primitives) is not reversible, since it loses information while computing forwards. E.g., it does not satisfy the Loop Lemma, stating that any reduction has an inverse, a main property of reversibility in process calculi, where a small-step semantics is commonly used. We present here a novel small-step semantics which is actually reversible, while remaining equivalent to the previous one. It involves the non-trivial challenge of defining a semantics based on a "program counter" for a high-level programming language.

2026-02-18T21:59:04Z Submitted for publication Ivan Lanese Germán Vidal http://arxiv.org/abs/2510.10410v3 A Trace-based Approach for Code Safety Analysis 2026-02-26T03:53:57Z

Rust is a memory-safe programming language that disallows undefined behavior. Its safety guarantees have been extensively examined by the community through empirical studies, which has led to its remarkable success. However, unsafe code remains a critical concern in Rust. By reviewing the safety design of Rust and analyzing real-world Rust projects, this paper establishes a systematic framework for understanding unsafe code and undefined behavior, and summarizes the soundness criteria for Rust code. It further derives actionable guidance for achieving sound encapsulation.

2025-10-12T02:11:38Z Hui Xu http://arxiv.org/abs/2412.15042v2 Scylla: Translating an Applicative Subset of C to Safe Rust 2026-02-25T16:25:13Z

The popularity of the Rust language continues to explode; yet, many critical codebases remain authored in C. Automatically translating C to Rust is thus an appealing course of action. Several works have gone down this path, handling an ever-increasing subset of C through a variety of Rust features, such as unsafe. While the prospect of automation is appealing, producing code that relies on unsafe negates the memory safety guarantees offered by Rust, and therefore the main advantages of porting existing codebases to memory-safe languages. We instead advocate for a different approach, where the programmer iterates on the original C, gradually making the code more structured until it becomes eligible for compilation to safe Rust. This means that redesigns and rewrites can be evaluated incrementally for performance and correctness against existing test suites and production environments. Compiling structured C to safe Rust relies on the following contributions: a type-directed translation from (a subset of) C to safe Rust; a novel static analysis based on "split trees" which allows expressing C's pointer arithmetic using Rust's slices and splitting operations; an analysis that infers which borrows need to be mutable; and a compilation strategy for C pointer types that is compatible with Rust's distinction between non-owned and owned allocations. We evaluate our approach on real-world cryptographic libraries, binary parsers and serializers, and a file compression library. We show that these can be rewritten to Rust with small refactors of the original C code, and that the resulting Rust code exhibits similar performance characteristics as the original C code. As part of our translation process, we also identify and report undefined behaviors in the bzip2 compression library and in Microsoft's implementation of the FrodoKEM cryptographic primitive.

2024-12-19T16:51:29Z OOPSLA 2026 camera-ready version Aymeric Fromherz Jonathan Protzenko http://arxiv.org/abs/2602.22075v1 RustyDL: A Program Logic for Rust 2026-02-25T16:25:04Z

Rust is a modern programming language that guarantees memory safety and the absence of data races with a strong type system. We present RustyDL, a program logic for Rust, as a foundation for an auto-interactive, deductive verification tool for Rust. RustyDL reasons about Rust programs directly on the source code level, in contrast to other tools that are all based on translation to an intermediate language. A source-level program logic for Rust is crucial for a human-in-the-loop (HIL) style of verification that permits proving highly complex functional properties. We discuss specific Rust challenges in designing a program logic and calculus for HIL-style verification and propose a solution in each case. We provide a proof-of-concept of our ideas in the form of a prototype of a Rust instance of the deductive verification tool KeY.

2026-02-25T16:25:04Z Long version of paper published at 27th International Symposium on Formal Methods (FM 2026) Daniel Drodt Reiner Hähnle http://arxiv.org/abs/2602.21630v1 Type-Based Enforcement of Non-Interference for Choreographic Programming 2026-02-25T06:50:51Z

Choreographies describe distributed protocols from a global viewpoint, enabling correct-by-construction synthesis of local behaviours. We develop a policy-parametric type system that prevents information leaks from high-security data to low-security observers, handling both explicit and implicit flows through a program-counter discipline. The system supports recursive procedures via a procedure context that we reconstruct through constraint generation. We prove termination-insensitive non-interference with respect to a standard small-step semantics.

2026-02-25T06:50:51Z Marco Bertoni Saverio Giallorenzo Marco Peressotti http://arxiv.org/abs/2507.17691v2 CASCADE: LLM-Powered JavaScript Deobfuscator at Google 2026-02-25T04:00:38Z

Software obfuscation, particularly prevalent in JavaScript, hinders code comprehension and analysis, posing significant challenges to software testing, static analysis, and malware detection. This paper introduces CASCADE, a novel hybrid approach that integrates the advanced coding capabilities of Gemini with the deterministic transformation capabilities of a compiler Intermediate Representation (IR), specifically JavaScript IR (JSIR). By employing Gemini to identify critical prelude functions, the foundational components underlying the most prevalent obfuscation techniques, and leveraging JSIR for subsequent code transformations, CASCADE effectively recovers semantic elements like original strings and API names, and reveals original program behaviors. This method overcomes limitations of existing static and dynamic deobfuscation techniques, eliminating hundreds to thousands of hardcoded rules while achieving reliability and flexibility. CASCADE is already deployed in Google's production environment, demonstrating substantial improvements in JavaScript deobfuscation efficiency and reducing reverse engineering efforts.

2025-07-23T16:57:32Z To appear in 2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE-SEIP '26) Shan Jiang Pranoy Kovuri David Tao Zhixun Tan 10.1145/3786583.3786873 http://arxiv.org/abs/2406.11935v4 A Problem-Oriented Perspective and Anchor Verification for Code Optimization 2026-02-25T02:17:57Z

Large Language Models (LLMs) have shown remarkable capabilities in solving various programming tasks, such as code generation. However, their potential for code optimization, particularly in performance enhancement, remains largely unexplored. This paper investigates the capabilities of LLMs in optimizing code for minimal execution time, addressing a critical gap in current research. The recently proposed code optimization methods construct program optimization pairs based on iterative submissions from the same programmer for the same problem. However, this approach confines LLMs to local performance improvements, neglecting global algorithmic innovation. To overcome this limitation, we adopt a completely different perspective by reconstructing the optimization pairs into a problem-oriented approach. This allows for the integration of various ideas from multiple programmers tackling the same problem. Furthermore, we observe that code optimization presents greater challenges compared to code generation, often accompanied by "optimization tax". Recognizing the inherent trade-offs in correctness and efficiency, we introduce a novel anchor verification framework to mitigate this "optimization tax". Ultimately, the problem oriented perspective combined with the anchor verification framework significantly enhances both the correct optimization ratio and speedup to new levels.

2024-06-17T16:10:10Z ICLR 2026 Tong Ye Tengfei Ma Xuhong Zhang Hang Yu Jianwei Yin Wenhai Wang http://arxiv.org/abs/2506.10056v2 Pareto Optimal Code Generation 2026-02-24T21:46:21Z

Generate-then-rank is the dominant test-time scaling (TTS) paradigm for code generation, but scaling accuracy by sampling and executing more candidates makes comprehensive verification a major computational bottleneck. This creates an inherent trade-off between accuracy and compute that, despite its importance to TTS, is often ignored. Specifically, faster but noisier signals, such as outcome reward models (ORMs), are dismissed as suboptimal. We frame verifier selection as a Pareto optimization problem and empirically map the accuracy-throughput frontier across signals, including the full test suite, heuristics for selective execution, and ORMs, across four Python benchmarks. We show that ORMs are most effective at optimizing the Pareto curve when pruning is used in the generate-then-rank pipeline--known as staged verification--where lightweight filters remove obviously incorrect solutions, including candidates with small syntactic or character-level bugs, before expensive verification. Our pruning analysis shows that eliminating incorrect yet highly ranked candidates (often character-level bugs) prevents wasted compute on incorrect tokens. We find that ORMs with staged verification shift the Pareto frontier outward, achieving 11.64x higher throughput at a cost of 8.26% accuracy relative to full test-suite verification.

2025-06-11T17:58:21Z 29 pages, 6 figures, code released here: https://github.com/SprocketLab/orm-code-verifier Gabriel Orlanski Nicholas Roberts Aws Albarghouthi Frederic Sala http://arxiv.org/abs/2602.20979v1 Toward an Agentic Infused Software Ecosystem 2026-02-24T15:01:29Z

Fully leveraging the capabilities of AI agents in software development requires a rethinking of the software ecosystem itself. To this end, this paper outlines the creation of an Agentic Infused Software Ecosystem (AISE), that rests on three pillars. The first, of course, is the AI agents themselves, which in the past 5 years have moved from simple code completion and toward sophisticated independent development tasks, a trend which will only continue. The second pillar is the programming language and APIs (or tools) that these agents use to accomplish tasks, and increasingly, serve as the communication substrate that humans and AI agents interact and collaborate through. The final pillar is the runtime environment and ecosystem that agents operate within, and which provide the capabilities that programmatic agents use to interface with (and effect actions in) the external world. To realize the vision of AISE, all three pillars must be advanced in a holistic manner, and critically, in a manner that is synergistic for AI agents as they exist today, those that will exist in the future, and for the human developers that work alongside them.

2026-02-24T15:01:29Z Mark Marron http://arxiv.org/abs/2602.20866v1 DeCo: A Core Calculus for Incremental Functional Programming with Generic Data Types 2026-02-24T13:10:51Z

Incrementalization speeds up computations by avoiding unnecessary recomputations and by efficiently reusing previous results. While domain-specific techniques achieve impressive speedups, e.g., in the context of database queries, they are difficult to generalize. Meanwhile, general approaches offer little support for incrementalizing domain-specific operations. In this work, we present DeCo, a novel core calculus for incremental functional programming with support for a wide range of user-defined data types. Despite its generic nature, our approach statically incrementalizes domain-specific operations on user-defined data types. It is, hence, more fine-grained than other generic techniques which resort to treating domain-specific operations as black boxes. We mechanized our work in Lean and proved it sound, meaning incrementalized execution computes the same result as full reevaluation. We also provide an executable implementation with case studies featuring examples from linear algebra, relational algebra, dictionaries, trees, and conflict-free replicated data types, plus a brief performance evaluation on linear and relational algebra and on trees.

2026-02-24T13:10:51Z Accepted at OOPSLA'26 Proc. ACM Program. Lang. 10, OOPSLA1, Article 156 (April 2026), 27 pages Timon Böhler Tobias Reinhard David Richter Mira Mezini 10.1145/3798264