https://arxiv.org/api/ARjemTsuYLiV//c35LkcjZ1T9842026-06-14T10:29:53Z988533015http://arxiv.org/abs/2507.11282v2The downgrading semantics of memory safety (Extended version)2026-04-17T15:24:10ZMemory safety is traditionally characterized in terms of bad things that cannot happen. This approach is currently embraced in the literature on formal methods for memory safety. However, a general semantic principle for memory safety, that implies the negative items, remains elusive.
This paper focuses on the allocator-specific aspects of memory safety, such as null-pointer dereference, use after free, double free, and heap overflow. To that extent, we propose a notion of gradual allocator independence that accurately captures the allocator-dependent aspects of memory safety. Our approach is inspired by the previously suggested connection between memory safety and noninterference, but extends that connection in a fundamentally important direction towards downgrading.
We consider a low-level language with access to an allocator that provides malloc and free primitives in a flat memory model. Pointers are just integers, and as such it is trivial to write memory-unsafe programs. The basic intuition of gradual allocator independence is that of noninterference, namely that allocators must not influence program execution. This intuition is refined in two important ways that account for the allocators running out-of-memory and for programs to have pointer-to-integer casts. The key insight of the definition is to treat these extensions as forms of downgrading and give them satisfactory technical treatment using the state-of-the-art information flow machinery.2025-07-15T12:59:11Z65 pages, 26 figures, Extended version of PLDI 2026 paper with the same titleRené Rydhof HansenAndreas Stenbæk LarsenAslan Askarov10.1145/3808260http://arxiv.org/abs/2604.16584v1Certified Program Synthesis with a Multi-Modal Verifier2026-04-17T14:56:45ZCertified program synthesis (aka vericoding) is the process of automatically generating a program, its formal specification, and a machine-checkable proof of their alignment from a natural-language description. Two challenges make vericoding difficult. First, specifications synthesised from natural language are often either too weak to be meaningful or too strong to be implementable, yet existing approaches lack systematic means to detect such defects. Second, the landscape of program verifiers is fragmented: each tool supports a particular reasoning mode -- auto-active (e.g., Dafny, Verus) or interactive (e.g., Coq, Lean) -- with its own trade-off between automation and expressivity. This forces every synthesis methodology to be tailored to a single verification paradigm, limiting the class of tasks it can handle effectively.
We overcome both challenges by structuring the certified synthesis workflow around a multi-modal verifier -- a single tool combining dynamic validation, automated proofs, and interactive proof scripting in one foundational framework. We realise this idea in LeetProof, an agentic pipeline built on Velvet, a multi-modal verifier embedded in Lean. Multi-modality enables LeetProof to validate generated specifications via randomised property-based testing before any code is synthesised, decompose the synthesis task into sub-problems guided by verification conditions, and delegate residual proof obligations to frontier AI provers specialised for Lean. We evaluate LeetProof on benchmarks derived from prior work on certified synthesis. Our specification validation uncovers defects in existing reference benchmarks, and LeetProof's staged pipeline achieves a significantly higher rate of fully certified solutions than a single-mode baseline at the same budget -- consistently across two frontier LLM backends.2026-04-17T14:56:45ZYueyang FengDipesh KafleVladimir GladshteinVitaly KurinGeorge PîrleaQiyuan ZhaoPeter MüllerIlya Sergeyhttp://arxiv.org/abs/2604.16043v1Evaluating SYCL as a Unified Programming Model for Heterogeneous Systems2026-04-17T13:16:03ZHigh-performance computing (HPC) applications are increasingly executed in heterogeneous environments, introducing new challenges for programming and software portability. SYCL has emerged as a leading model designed to simplify heterogeneous programming and make it more accessible to developers. Intended as a single-source, cross-platform parallel programming framework, SYCL promises portability, productivity, and performance across a variety of architectures. However, these goals have not been consistently defined or realized, leaving developers with varying expectations. This paper addresses this gap by evaluating SYCL from the perspective of application developers. We analyze whether SYCL meets essential criteria for cross-platform development, including code portability, development productivity, and runtime efficiency. Our evaluation draws on benchmarks and illustrative examples and focuses on SYCL's memory management and parallelism abstractions. We provide detailed comparisons between Unified Shared Memory (USM) and buffer-accessor approaches, as well as between NDRange and hierarchical kernel models. In addition to presenting our own benchmark results on Intel platforms, we synthesize findings from recent studies across multiple SYCL implementations and compilers. Our results expose key limitations and inconsistencies in current SYCL implementations and offer insights into the steps needed to improve the framework's reliability and cross-platform usability.2026-04-17T13:16:03ZAmi Marowkahttp://arxiv.org/abs/2512.10748v3Intrinsically Correct Algorithms and Recursive Coalgebras2026-04-17T12:42:02ZRecursive coalgebras provide an elegant categorical tool for modelling recursive algorithms and analysing their termination and correctness. By considering coalgebras over categories of suitably indexed families, the correctness of the corresponding algorithms follows intrinsically just from the type of the computed maps. However, proving recursivity of the underlying coalgebras is non-trivial, and proofs are typically ad hoc. This layer of complexity impedes the formalization of coalgebraically defined recursive algorithms in proof assistants. We introduce a framework for constructing coalgebras which are intrinsically recursive in the sense that the type of the coalgebra guarantees recursivity from the outset. Our approach is based on the novel concept of a well-founded functor on a category of families indexed by a well-founded relation. We show as our main result that every coalgebra for a well-founded functor is recursive, and demonstrate that well-known techniques for proving recursivity and termination such as ranking functions are subsumed by this abstract setup. We present a number of case studies, including Quicksort, the Euclidian algorithm, and CYK parsing. Both the main theoretical result and selected case studies have been formalized in Cubical Agda.2025-12-11T15:36:47ZAgda source repository is archived under the Software Heritage ID: swh:1:dir:dd4da03ce67637632d48f28e941771c11a2d64eeCass AlexandruHenning UrbatThorsten Wißmannhttp://arxiv.org/abs/2604.15978v1jMT: Testing Correctness of Java Memory Models (Extended Version)2026-04-17T11:41:43ZFolklore is often saying "The Java memory model is broken." Therefore, several approaches have proposed repairs, only to find new programs exhibiting unexpected, unintuitive behavior or the model forbidding standard compiler optimizations. The complexity of defining a memory model for concurrent Java lies in the fact that it requires a multi-execution model. Multi-execution models need to inspect the many potential executions of a program in order to find the valid ones. Tools automatically validating novel proposals of Java memory models are, however, largely lacking. To alleviate this problem, we introduce jMT, a novel tool for constructing multi-execution semantics for concurrent Java programs. jMT relies on single-execution models defining well-formed execution graphs, based on which it builds valid multi-execution semantics via causality checking. Thereby, jMT supports evaluating new proposals of Java memory models (JMMs) on a per-program basis. jMT can furthermore be employed for testing the conformance of JMMs to existing compilation schemes and compilers. Our evaluation of jMT on 169 litmus tests reveals a number of interesting insights into existing JMMs.2026-04-17T11:41:43ZThis is the extended version referenced in the TACAS proceedingsTACAS 2026 proceedings in LNCS 16506, Springer, ChamLukas PannekeHeike Wehrheim10.1007/978-3-032-22749-2_13http://arxiv.org/abs/2604.15633v1Synthesizing Backward Error Bounds, Backward2026-04-17T02:19:08ZBackward stability is a desirable property for a well-designed numerical algorithm: given an input, a backward stable floating-point program produces the exact output for a nearby input. While automated tools for bounding the forward error of a numerical program are well-established, few existing tools target backward error analysis. We present a formal framework that enables sound, automated backward error analysis for a broad class of numerical programs. First, we propose a novel generalization of the definition of backward stability that is both compositional and flexible, satisfied by a wide range of floating-point operations. Second, based on this generalization, we develop the category Shel where morphisms model stable numerical programs, and show that structures in Shel support a rich variety of backward error analyses. Third, we implement a tool, eggshel, that automatically searches within a syntactic subcategory of Shel to prove backward stability for a given program. Our algorithm handles many programs with variable reuse, a known challenge in backward error analysis. We prove soundness of our algorithm and use our tool to synthesize backward error bounds for a suite of programs that were previously beyond the reach of automated analysis.2026-04-17T02:19:08ZTo appear at PLDI 2026. Extended version (31 pages)Proceedings of the ACM on Programming Languages 10, PLDI, Article 225 (2026)Laura ZielinskiJustin Hsu10.1145/3808333http://arxiv.org/abs/2512.03053v2Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation2026-04-16T21:39:03ZWe show for invertible problems that transform data from a source domain (for example, Logic Condition Tables (LCTs)) to a destination domain (for example, Hardware Description Language (HDL) code), an approach of using Large Language Models (LLMs) as a lossless encoder from source to destination followed by as a lossless decoder back to the source, comparable to lossless compression in information theory, can mitigate most of the LLM drawbacks of hallucinations and omissions. Specifically, using LCTs as inputs, we generate the full HDL for a two-dimensional network-on-chip router (13 units, 1500-2000 lines of code) using seven different LLMs, reconstruct the LCTs from the auto-generated HDL, and compare the original and reconstructed LCTs. This approach yields significant productivity improvements, not only confirming correctly generated LLM logic and detecting incorrectly generated LLM logic but also assisting developers in finding design specification errors.2025-11-25T00:47:02Z7 pages, 2 figures, 7 tablesAndrew S. CassidyGuillaume GarreauJay SivagnanameMike GrassiBernard BrezzoJohn V. ArthurDharmendra S. Modhahttp://arxiv.org/abs/2604.16538v1Understanding Tool-Augmented Agents for Lean Formalization: A Factorial Analysis2026-04-16T21:15:46ZAutomatic translation of natural language mathematics into faithful Lean 4 code is hindered by the fundamental dissonance between informal set-theoretic intuition and strict formal type theory. This gap often causes LLMs to hallucinate non-existent library definitions, resulting in code that fails to compile or lacks semantic fidelity. In this work, we investigate the effectiveness of tool-augmented agents for this task through a systematic factorial analysis of three distinct tool categories: Fine-tuned Model Querying (accessing expert drafts), Knowledge Search (retrieving symbol definitions), and Compiler Feedback (verifying code via a Lean REPL). We first benchmark the agent against one-shot baselines, demonstrating large gains in both compilation success and semantic equivalence. We then use the factorial decomposition to quantify the impact of each category, isolating the marginal contribution of each tool type to overall performance.2026-04-16T21:15:46Z15 pages,8 figuresKe ZhangPatricio GallardoMaziar RaissiSudhir Murthyhttp://arxiv.org/abs/2604.15272v1Prism: Symbolic Superoptimization of Tensor Programs2026-04-16T17:43:31ZThis paper presents Prism, the first symbolic superoptimizer for tensor programs. The key idea is sGraph, a symbolic, hierarchical representation that compactly encodes large classes of tensor programs by symbolically representing some execution parameters. Prism organizes optimization as a two-level search: it constructs symbolic graphs that represent families of programs, and then instantiates them into concrete implementations. This formulation enables structured pruning of provably suboptimal regions of the search space using symbolic reasoning over operator semantics, algebraic identities, and hardware constraints.
We develop techniques for efficient symbolic graph generation, equivalence verification via e-graph rewriting, and parameter instantiation through auto-tuning. Together, these components allow Prism to bridge the rigor of exhaustive search with the scalability required for modern ML workloads. Evaluation on five commonly used LLM workloads shows that Prism achieves up to $2.2\times$ speedup over best superoptimizers and $4.9\times$ over best compiler-based approaches, while reducing end-to-end optimization time by up to $3.4\times$.2026-04-16T17:43:31ZMengdi WuXiaoyu JiangOded PadonZhihao Jiahttp://arxiv.org/abs/2604.15266v1Simplifying Safety Proofs with Forward-Backward Reasoning and Prophecy2026-04-16T17:40:27ZWe propose an incremental approach for safety proofs that decomposes a proof with a complex inductive invariant into a sequence of simpler proof steps. Our proof system combines rules for (i) forward reasoning using inductive invariants, (ii) backward reasoning using inductive invariants of a time-reversed system, and (iii) prophecy steps that add witnesses for existentially quantified properties. We prove each rule sound and give a construction that recovers a single safe inductive invariant from an incremental proof. The construction of the invariant demonstrates the increased complexity of a single inductive invariant compared to the invariant formulas used in an incremental proof, which may have simpler Boolean structures and fewer quantifiers and quantifier alternations. Under natural restrictions on the available invariant formulas, each proof rule strictly increases proof power. That is, each rule allows to prove more safety problems with the same set of formulas. Thus, the incremental approach is able to reduce the search space of invariant formulas needed to prove safety of a given system. A case study on Paxos, several of its variants, and Raft demonstrates that forward-backward steps can remove complex Boolean structure while prophecy eliminates quantifiers and quantifier alternations.2026-04-16T17:40:27ZEden FrenkelKenneth L. McMillanOded PadonSharon Shoham10.1145/3808341http://arxiv.org/abs/2604.18616v1ARGUS: Agentic GPU Optimization Guided by Data-Flow Invariants2026-04-16T15:49:31ZLLM-based coding agents can generate functionally correct GPU kernels, yet their performance remains far below hand-optimized libraries on critical computations such as matrix multiplication, attention, and Mixture-of-Experts (MoE). Peak GPU performance requires coordinated reasoning over tightly coupled optimizations, including tiling, shared-memory staging, software pipelining, and instruction scheduling, while existing agents rely on sparse pass/fail feedback, leaving them unable to diagnose global constraint violations.
We present Argus, an agentic framework that addresses this through data-flow invariants: compile-time specifications encoding how data must be choreographed throughout kernel execution. Argus introduces a tile-based, Pythonic DSL exposing hardware instructions and compiler policies while hiding low-level representations. The DSL provides tag functions to propagate symbolic annotations through data and control flow, and tag assertions to enforce relational constraints at use sites. When violations occur, the compiler returns concrete counterexamples identifying the thread, data element, and program point, enabling dense, structured feedback for targeted fixes. Invariants are verified at compile time via abstract interpretation over a layout algebra and SMT solving, with zero runtime overhead. An in-context reinforcement learning planner learns to select optimizations and synthesize effective invariants, supported by a curated knowledge base of GPU optimization techniques.
We evaluate Argus on the AMD MI300X GPU across GEMM, flash attention, and MoE kernels accounting for over 90% of GPU time in LLM inference. Generated kernels achieve 99-104% of state-of-the-art hand-optimized assembly throughput and are 2-1543x faster than existing agentic systems. Argus further generalizes to 200 KernelBench tasks, solving 100% of Level 1 and 90% of Level 2 problems.2026-04-16T15:49:31ZHaohui MaiXiaoyan GuoXiangyun DingDaifeng LiQiuchu YuChenzhun GuoCong WangJiacheng ZhaoChristos KozyrakisBinhang Yuanhttp://arxiv.org/abs/2604.11027v2Parameterized Algorithms and Complexity for Function Merging with Branch Reordering2026-04-16T13:18:49ZBinary size reduction is an increasingly important optimization objective for compilers. One emerging technique is function merging, where multiple similar functions are merged into one, thereby eliminating redundancy. The SOTA approach to perform the merging is based on sequence alignment, where functions are viewed as linear sequences of instructions that are then matched in a way maximizing their alignment.
In this paper, we consider a significantly generalized formulation of the problem by allowing reordering of branches within each function, subsequently allowing for more flexible matching and better merging. We show that this makes the problem NP-hard, and thus we study it through the lens of parameterized algorithms and complexity, where we identify certain parameters of the input that govern its complexity. We look at two natural parameters: the branching factor and nesting depth of input functions.
Concretely, our input consists of two functions $F_1, F_2,$ where each $F_i$ has size $n_i,$ branching factor $b_i,$ and nesting depth $d_i.$ Our task is to reorder the branches of $F_1$ and $F_2$ in a way that yields linearizations achieving the maximum sequence alignment. Let $n=\max(n_1, n_2),$ and define $b, d$ similarly. Our results are as follows:
- A simple algorithm running in time $2^{O(bd)} n^2,$ establishing that the problem is fixed-parameter tractable (FPT) with respect to all four parameters $b_1,d_1, b_2, d_2.$
- An algorithm running in time $2^{O(bd_2)} n^7,$ showing that even when one of the functions has an unbounded nesting depth, the problem remains in FPT.
- A hardness result showing that the problem is NP-hard even when constrained to constant $d_1, b_2, d_2.$
To the best of our knowledge, this is the first systematic study of function merging with branch reordering from an algorithmic or complexity-theoretic perspective.2026-04-13T05:50:24ZAmir K. GoharshadyKerim KochekovTian ShuAhmed Khaled Zaherhttp://arxiv.org/abs/2604.14825v1Nautilus: An Auto-Scheduling Tensor Compiler for Efficient Tiled GPU Kernels2026-04-16T09:55:23ZWe present Nautilus, a novel tensor compiler that moves toward fully automated math-to-kernel optimization. Nautilus compiles a high-level algebraic specification of tensor operators into efficient tiled GPU kernels. Nautilus's successive lowering design allows high-level optimizations, expression rewrites, and tile optimizations to be jointly applied in a single end-to-end system. Nautilus presents a novel auto-scheduler that discovers sequences of high-level optimizations, while preserving the regular program structure needed by tile optimizers. Nautilus's auto-scheduler captures complex interactions and trade-offs in the high-level optimizations, including aggressive global transformations like advanced reduction fusion. Nautilus is the first end-to-end tensor compiler capable of starting from a math-like description of attention and automatically discovering FlashAttention-3-like kernels, offloading the entire burden of optimization from the programmer to the compiler. Across five transformer-based models and 150 evaluation configurations on NVIDIA GH200 and RTX 5090 GPUs, Nautilus achieves up to 23% higher throughput than state-of-the-art compilers on GH200 and up to 42% on RTX 5090, while matching or exceeding manually written cuDNN kernels on many long-sequence configurations.2026-04-16T09:55:23ZYifan ZhaoYuchen YangMatei BudiuSasa Misailovichttp://arxiv.org/abs/2604.13638v2Cerisier: A Program Logic for Attestation in a Capability Machine2026-04-16T08:01:17ZA key feature in trusted computing is attestation, which allows encapsulated components (enclaves) to prove their identity to (local or remote) distrusting components. Reasoning about software that uses the technique requires tracking how trust evolves after successful attestation. This process is security-critical and non-trivial, but no existing formal verification technique supports modular reasoning about attestation of enclaves and their clients, or proving end-to-end properties for systems combining trusted, untrusted and attested code.
We contribute Cerisier, the first program logic for modular reasoning about trusted, untrusted and attested code, fully mechanized in the Iris separation logic and the Rocq Prover. We formalize a recent proposal, CHERI-TrEE, to extend capability machines with enclave primitives, as an extension to the Cerise capability machine and program logic. Our program logic comes with a universal contract for untrusted code, which captures both capability safety and local enclave attestation. Like Cerise, this universal contract is phrased in terms of a logical relation defining capabilities' authority. We demonstrate Cerisier by proving end-to-end properties for three representative applications of trusted computing: secure outsourced computation, mutual attestation and a modeled trusted sensor component.2026-04-15T09:04:09ZTo be published in: PLDI 2026June RousseauDenis CarnierThomas Van StrydonckSteven KeuchelDominique DevrieseLars Birkedalhttp://arxiv.org/abs/2511.20782v4Optimism in Equality Saturation2026-04-16T05:47:05ZEquality saturation is a program optimization technique based on non-destructive rewriting and a form of abstract interpretation called e-class analysis. Existing e-class analyses are pessimistic and therefore typically imprecise when analyzing cyclic programs, such as those in SSA form. We show that a straightforward optimistic variant of e-class analysis can result in unsoundness, due to a subtlety in how e-graphs represent programs. We propose an abstract interpretation algorithm that circumvents this issue and can optimistically analyze e-graphs during equality saturation. This results in a unified algorithm for optimistic analysis and non-destructive rewriting. We implement a prototype abstract interpreter and equality saturation tool for SSA programs. Our tool exhibits precision improvements over pure abstract interpretation (without rewriting) and pessimistic e-class analysis on example programs. Additionally, its performance is comparable to existing abstract interpretation and e-class analysis techniques.2025-11-25T19:19:31ZRussel ArboreAlvin CheungMax Willsey10.1145/3808302