https://arxiv.org/api/ARjemTsuYLiV//c35LkcjZ1T984 2026-06-14T10:29:53Z 9885 330 15 http://arxiv.org/abs/2507.11282v2 The downgrading semantics of memory safety (Extended version) 2026-04-17T15:24:10Z

Memory safety is traditionally characterized in terms of bad things that cannot happen. This approach is currently embraced in the literature on formal methods for memory safety. However, a general semantic principle for memory safety, that implies the negative items, remains elusive. This paper focuses on the allocator-specific aspects of memory safety, such as null-pointer dereference, use after free, double free, and heap overflow. To that extent, we propose a notion of gradual allocator independence that accurately captures the allocator-dependent aspects of memory safety. Our approach is inspired by the previously suggested connection between memory safety and noninterference, but extends that connection in a fundamentally important direction towards downgrading. We consider a low-level language with access to an allocator that provides malloc and free primitives in a flat memory model. Pointers are just integers, and as such it is trivial to write memory-unsafe programs. The basic intuition of gradual allocator independence is that of noninterference, namely that allocators must not influence program execution. This intuition is refined in two important ways that account for the allocators running out-of-memory and for programs to have pointer-to-integer casts. The key insight of the definition is to treat these extensions as forms of downgrading and give them satisfactory technical treatment using the state-of-the-art information flow machinery.

2025-07-15T12:59:11Z 65 pages, 26 figures, Extended version of PLDI 2026 paper with the same title René Rydhof Hansen Andreas Stenbæk Larsen Aslan Askarov 10.1145/3808260 http://arxiv.org/abs/2604.16584v1 Certified Program Synthesis with a Multi-Modal Verifier 2026-04-17T14:56:45Z

Certified program synthesis (aka vericoding) is the process of automatically generating a program, its formal specification, and a machine-checkable proof of their alignment from a natural-language description. Two challenges make vericoding difficult. First, specifications synthesised from natural language are often either too weak to be meaningful or too strong to be implementable, yet existing approaches lack systematic means to detect such defects. Second, the landscape of program verifiers is fragmented: each tool supports a particular reasoning mode -- auto-active (e.g., Dafny, Verus) or interactive (e.g., Coq, Lean) -- with its own trade-off between automation and expressivity. This forces every synthesis methodology to be tailored to a single verification paradigm, limiting the class of tasks it can handle effectively. We overcome both challenges by structuring the certified synthesis workflow around a multi-modal verifier -- a single tool combining dynamic validation, automated proofs, and interactive proof scripting in one foundational framework. We realise this idea in LeetProof, an agentic pipeline built on Velvet, a multi-modal verifier embedded in Lean. Multi-modality enables LeetProof to validate generated specifications via randomised property-based testing before any code is synthesised, decompose the synthesis task into sub-problems guided by verification conditions, and delegate residual proof obligations to frontier AI provers specialised for Lean. We evaluate LeetProof on benchmarks derived from prior work on certified synthesis. Our specification validation uncovers defects in existing reference benchmarks, and LeetProof's staged pipeline achieves a significantly higher rate of fully certified solutions than a single-mode baseline at the same budget -- consistently across two frontier LLM backends.

2026-04-17T14:56:45Z Yueyang Feng Dipesh Kafle Vladimir Gladshtein Vitaly Kurin George Pîrlea Qiyuan Zhao Peter Müller Ilya Sergey http://arxiv.org/abs/2604.16043v1 Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems 2026-04-17T13:16:03Z

High-performance computing (HPC) applications are increasingly executed in heterogeneous environments, introducing new challenges for programming and software portability. SYCL has emerged as a leading model designed to simplify heterogeneous programming and make it more accessible to developers. Intended as a single-source, cross-platform parallel programming framework, SYCL promises portability, productivity, and performance across a variety of architectures. However, these goals have not been consistently defined or realized, leaving developers with varying expectations. This paper addresses this gap by evaluating SYCL from the perspective of application developers. We analyze whether SYCL meets essential criteria for cross-platform development, including code portability, development productivity, and runtime efficiency. Our evaluation draws on benchmarks and illustrative examples and focuses on SYCL's memory management and parallelism abstractions. We provide detailed comparisons between Unified Shared Memory (USM) and buffer-accessor approaches, as well as between NDRange and hierarchical kernel models. In addition to presenting our own benchmark results on Intel platforms, we synthesize findings from recent studies across multiple SYCL implementations and compilers. Our results expose key limitations and inconsistencies in current SYCL implementations and offer insights into the steps needed to improve the framework's reliability and cross-platform usability.

2026-04-17T13:16:03Z Ami Marowka http://arxiv.org/abs/2512.10748v3 Intrinsically Correct Algorithms and Recursive Coalgebras 2026-04-17T12:42:02Z

Recursive coalgebras provide an elegant categorical tool for modelling recursive algorithms and analysing their termination and correctness. By considering coalgebras over categories of suitably indexed families, the correctness of the corresponding algorithms follows intrinsically just from the type of the computed maps. However, proving recursivity of the underlying coalgebras is non-trivial, and proofs are typically ad hoc. This layer of complexity impedes the formalization of coalgebraically defined recursive algorithms in proof assistants. We introduce a framework for constructing coalgebras which are intrinsically recursive in the sense that the type of the coalgebra guarantees recursivity from the outset. Our approach is based on the novel concept of a well-founded functor on a category of families indexed by a well-founded relation. We show as our main result that every coalgebra for a well-founded functor is recursive, and demonstrate that well-known techniques for proving recursivity and termination such as ranking functions are subsumed by this abstract setup. We present a number of case studies, including Quicksort, the Euclidian algorithm, and CYK parsing. Both the main theoretical result and selected case studies have been formalized in Cubical Agda.

2025-12-11T15:36:47Z Agda source repository is archived under the Software Heritage ID: swh:1:dir:dd4da03ce67637632d48f28e941771c11a2d64ee Cass Alexandru Henning Urbat Thorsten Wißmann http://arxiv.org/abs/2604.15978v1 jMT: Testing Correctness of Java Memory Models (Extended Version) 2026-04-17T11:41:43Z

Folklore is often saying "The Java memory model is broken." Therefore, several approaches have proposed repairs, only to find new programs exhibiting unexpected, unintuitive behavior or the model forbidding standard compiler optimizations. The complexity of defining a memory model for concurrent Java lies in the fact that it requires a multi-execution model. Multi-execution models need to inspect the many potential executions of a program in order to find the valid ones. Tools automatically validating novel proposals of Java memory models are, however, largely lacking. To alleviate this problem, we introduce jMT, a novel tool for constructing multi-execution semantics for concurrent Java programs. jMT relies on single-execution models defining well-formed execution graphs, based on which it builds valid multi-execution semantics via causality checking. Thereby, jMT supports evaluating new proposals of Java memory models (JMMs) on a per-program basis. jMT can furthermore be employed for testing the conformance of JMMs to existing compilation schemes and compilers. Our evaluation of jMT on 169 litmus tests reveals a number of interesting insights into existing JMMs.

2026-04-17T11:41:43Z This is the extended version referenced in the TACAS proceedings TACAS 2026 proceedings in LNCS 16506, Springer, Cham Lukas Panneke Heike Wehrheim 10.1007/978-3-032-22749-2_13 http://arxiv.org/abs/2604.15633v1 Synthesizing Backward Error Bounds, Backward 2026-04-17T02:19:08Z

Backward stability is a desirable property for a well-designed numerical algorithm: given an input, a backward stable floating-point program produces the exact output for a nearby input. While automated tools for bounding the forward error of a numerical program are well-established, few existing tools target backward error analysis. We present a formal framework that enables sound, automated backward error analysis for a broad class of numerical programs. First, we propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Second, based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. Third, we implement a tool, eggshel, that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program. Our algorithm handles many programs with variable reuse, a known challenge in backward error analysis. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of automated analysis.

2026-04-17T02:19:08Z To appear at PLDI 2026. Extended version (31 pages) Proceedings of the ACM on Programming Languages 10, PLDI, Article 225 (2026) Laura Zielinski Justin Hsu 10.1145/3808333 http://arxiv.org/abs/2512.03053v2 Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation 2026-04-16T21:39:03Z

We show for invertible problems that transform data from a source domain (for example, Logic Condition Tables (LCTs)) to a destination domain (for example, Hardware Description Language (HDL) code), an approach of using Large Language Models (LLMs) as a lossless encoder from source to destination followed by as a lossless decoder back to the source, comparable to lossless compression in information theory, can mitigate most of the LLM drawbacks of hallucinations and omissions. Specifically, using LCTs as inputs, we generate the full HDL for a two-dimensional network-on-chip router (13 units, 1500-2000 lines of code) using seven different LLMs, reconstruct the LCTs from the auto-generated HDL, and compare the original and reconstructed LCTs. This approach yields significant productivity improvements, not only confirming correctly generated LLM logic and detecting incorrectly generated LLM logic but also assisting developers in finding design specification errors.

2025-11-25T00:47:02Z 7 pages, 2 figures, 7 tables Andrew S. Cassidy Guillaume Garreau Jay Sivagnaname Mike Grassi Bernard Brezzo John V. Arthur Dharmendra S. Modha http://arxiv.org/abs/2604.16538v1 Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis 2026-04-16T21:15:46Z

Automatic translation of natural language mathematics into faithful Lean 4 code is hindered by the fundamental dissonance between informal set-theoretic intuition and strict formal type theory. This gap often causes LLMs to hallucinate non-existent library definitions, resulting in code that fails to compile or lacks semantic fidelity. In this work, we investigate the effectiveness of tool-augmented agents for this task through a systematic factorial analysis of three distinct tool categories: Fine-tuned Model Querying (accessing expert drafts), Knowledge Search (retrieving symbol definitions), and Compiler Feedback (verifying code via a Lean REPL). We first benchmark the agent against one-shot baselines, demonstrating large gains in both compilation success and semantic equivalence. We then use the factorial decomposition to quantify the impact of each category, isolating the marginal contribution of each tool type to overall performance.

2026-04-16T21:15:46Z 15 pages,8 figures Ke Zhang Patricio Gallardo Maziar Raissi Sudhir Murthy http://arxiv.org/abs/2604.15272v1 Prism: Symbolic Superoptimization of Tensor Programs 2026-04-16T17:43:31Z

This paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints. We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.

2026-04-16T17:43:31Z Mengdi Wu Xiaoyu Jiang Oded Padon Zhihao Jia http://arxiv.org/abs/2604.15266v1 Simplifying Safety Proofs with Forward-Backward Reasoning and Prophecy 2026-04-16T17:40:27Z

We propose an incremental approach for safety proofs that decomposes a proof with a complex inductive invariant into a sequence of simpler proof steps. Our proof system combines rules for (i) forward reasoning using inductive invariants, (ii) backward reasoning using inductive invariants of a time-reversed system, and (iii) prophecy steps that add witnesses for existentially quantified properties. We prove each rule sound and give a construction that recovers a single safe inductive invariant from an incremental proof. The construction of the invariant demonstrates the increased complexity of a single inductive invariant compared to the invariant formulas used in an incremental proof, which may have simpler Boolean structures and fewer quantifiers and quantifier alternations. Under natural restrictions on the available invariant formulas, each proof rule strictly increases proof power. That is, each rule allows to prove more safety problems with the same set of formulas. Thus, the incremental approach is able to reduce the search space of invariant formulas needed to prove safety of a given system. A case study on Paxos, several of its variants, and Raft demonstrates that forward-backward steps can remove complex Boolean structure while prophecy eliminates quantifiers and quantifier alternations.

2026-04-16T17:40:27Z Eden Frenkel Kenneth L. McMillan Oded Padon Sharon Shoham 10.1145/3808341 http://arxiv.org/abs/2604.18616v1 ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants 2026-04-16T15:49:31Z

LLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pass/fail feedback, leaving them unable to diagnose global constraint violations. We present Argus, an agentic framework that addresses this through data-flow invariants: compile-time specifications encoding how data must be choreographed throughout kernel execution. Argus introduces a tile-based, Pythonic DSL exposing hardware instructions and compiler policies while hiding low-level representations. The DSL provides tag functions to propagate symbolic annotations through data and control flow, and tag assertions to enforce relational constraints at use sites. When violations occur, the compiler returns concrete counterexamples identifying the thread, data element, and program point, enabling dense, structured feedback for targeted fixes. Invariants are verified at compile time via abstract interpretation over a layout algebra and SMT solving, with zero runtime overhead. An in-context reinforcement learning planner learns to select optimizations and synthesize effective invariants, supported by a curated knowledge base of GPU optimization techniques. We evaluate Argus on the AMD MI300X GPU across GEMM, flash attention, and MoE kernels accounting for over 90% of GPU time in LLM inference. Generated kernels achieve 99-104% of state-of-the-art hand-optimized assembly throughput and are 2-1543x faster than existing agentic systems. Argus further generalizes to 200 KernelBench tasks, solving 100% of Level 1 and 90% of Level 2 problems.

2026-04-16T15:49:31Z Haohui Mai Xiaoyan Guo Xiangyun Ding Daifeng Li Qiuchu Yu Chenzhun Guo Cong Wang Jiacheng Zhao Christos Kozyrakis Binhang Yuan http://arxiv.org/abs/2604.11027v2 Parameterized Algorithms and Complexity for Function Merging with Branch Reordering 2026-04-16T13:18:49Z

Binary size reduction is an increasingly important optimization objective for compilers. One emerging technique is function merging, where multiple similar functions are merged into one, thereby eliminating redundancy. The SOTA approach to perform the merging is based on sequence alignment, where functions are viewed as linear sequences of instructions that are then matched in a way maximizing their alignment. In this paper, we consider a significantly generalized formulation of the problem by allowing reordering of branches within each function, subsequently allowing for more flexible matching and better merging. We show that this makes the problem NP-hard, and thus we study it through the lens of parameterized algorithms and complexity, where we identify certain parameters of the input that govern its complexity. We look at two natural parameters: the branching factor and nesting depth of input functions. Concretely, our input consists of two functions $F_1, F_2,$ where each $F_i$ has size $n_i,$ branching factor $b_i,$ and nesting depth $d_i.$ Our task is to reorder the branches of $F_1$ and $F_2$ in a way that yields linearizations achieving the maximum sequence alignment. Let $n=\max(n_1, n_2),$ and define $b, d$ similarly. Our results are as follows: - A simple algorithm running in time $2^{O(bd)} n^2,$ establishing that the problem is fixed-parameter tractable (FPT) with respect to all four parameters $b_1,d_1, b_2, d_2.$ - An algorithm running in time $2^{O(bd_2)} n^7,$ showing that even when one of the functions has an unbounded nesting depth, the problem remains in FPT. - A hardness result showing that the problem is NP-hard even when constrained to constant $d_1, b_2, d_2.$ To the best of our knowledge, this is the first systematic study of function merging with branch reordering from an algorithmic or complexity-theoretic perspective.

2026-04-13T05:50:24Z Amir K. Goharshady Kerim Kochekov Tian Shu Ahmed Khaled Zaher http://arxiv.org/abs/2604.14825v1 Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels 2026-04-16T09:55:23Z

We present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.

2026-04-16T09:55:23Z Yifan Zhao Yuchen Yang Matei Budiu Sasa Misailovic http://arxiv.org/abs/2604.13638v2 Cerisier: A Program Logic for Attestation in a Capability Machine 2026-04-16T08:01:17Z

A key feature in trusted computing is attestation, which allows encapsulated components (enclaves) to prove their identity to (local or remote) distrusting components. Reasoning about software that uses the technique requires tracking how trust evolves after successful attestation. This process is security-critical and non-trivial, but no existing formal verification technique supports modular reasoning about attestation of enclaves and their clients, or proving end-to-end properties for systems combining trusted, untrusted and attested code. We contribute Cerisier, the first program logic for modular reasoning about trusted, untrusted and attested code, fully mechanized in the Iris separation logic and the Rocq Prover. We formalize a recent proposal, CHERI-TrEE, to extend capability machines with enclave primitives, as an extension to the Cerise capability machine and program logic. Our program logic comes with a universal contract for untrusted code, which captures both capability safety and local enclave attestation. Like Cerise, this universal contract is phrased in terms of a logical relation defining capabilities' authority. We demonstrate Cerisier by proving end-to-end properties for three representative applications of trusted computing: secure outsourced computation, mutual attestation and a modeled trusted sensor component.

2026-04-15T09:04:09Z To be published in: PLDI 2026 June Rousseau Denis Carnier Thomas Van Strydonck Steven Keuchel Dominique Devriese Lars Birkedal http://arxiv.org/abs/2511.20782v4 Optimism in Equality Saturation 2026-04-16T05:47:05Z

Equality saturation is a program optimization technique based on non-destructive rewriting and a form of abstract interpretation called e-class analysis. Existing e-class analyses are pessimistic and therefore typically imprecise when analyzing cyclic programs, such as those in SSA form. We show that a straightforward optimistic variant of e-class analysis can result in unsoundness, due to a subtlety in how e-graphs represent programs. We propose an abstract interpretation algorithm that circumvents this issue and can optimistically analyze e-graphs during equality saturation. This results in a unified algorithm for optimistic analysis and non-destructive rewriting. We implement a prototype abstract interpreter and equality saturation tool for SSA programs. Our tool exhibits precision improvements over pure abstract interpretation (without rewriting) and pessimistic e-class analysis on example programs. Additionally, its performance is comparable to existing abstract interpretation and e-class analysis techniques.

2025-11-25T19:19:31Z Russel Arbore Alvin Cheung Max Willsey 10.1145/3808302