https://arxiv.org/api/8eIQ1t6JSjhown9UJ5PVTleCGAU 2026-06-22T01:31:04Z 9918 870 15 http://arxiv.org/abs/2512.05159v1 Stellis: A Strategy Language for Purifying Separation Logic Entailments 2025-12-04T09:58:03Z

Automatically proving separation logic entailments is a fundamental challenge in verification. While rule-based methods rely on separation logic rules (lemmas) for automation, these rule statements are insufficient for describing automation strategies, which usually involve the alignment and elimination of corresponding memory layouts in specific scenarios. To overcome this limitation, we propose Stellis, a strategy language for purifying separation logic entailments, i.e., removing all spatial formulas to reduce the entailment to a simpler pure entailment. Stellis features a powerful matching mechanism and a flexible action description, enabling the straightforward encoding of a wide range of strategies. To ensure strategy soundness, we introduce an algorithm that generates a soundness condition for each strategy, thereby reducing the soundness of each strategy to the correctness of its soundness condition. Furthermore, based on a mechanized reduction soundness theorem, our prototype implementation generates correctness proofs for the overall automation. We evaluate our system on a benchmark of 229 entailments collected from verification of standard linked data structures and the memory module of a microkernel, and the evaluation results demonstrate that, with such flexibility and convenience provided, our system is also highly effective, which automatically purifies 95.6% (219 out of 229) of the entailments using 5 libraries with 98 strategies.

2025-12-04T09:58:03Z Zhiyi Wang Xiwei Wu Yi Fang Chengtao Li Hongyi Zhong Lihan Xie Qinxiang Cao Zhenjiang Hu http://arxiv.org/abs/2511.20193v2 Separating the Wheat from the Chaff: Understanding (In-)Completeness of Proof Mechanisms for Separation Logic with Inductive Definitions 2025-12-04T08:06:40Z

For over two decades Separation Logic has been arguably the most popular framework for reasoning about heap-manipulating programs, as well as reasoning about shared resources and permissions. Separation Logic is often extended to include inductively-defined predicates, interpreted as least fixpoints, forming Separation Logic with Inductive Definitions (SLID). Many theoretical and practical advances have been made in developing automated proof mechanisms for SLID, but these mechanisms are imperfect, and a deeper understanding of their failures is desired. As expressive as Separation Logic is, it is not surprising that it is incomplete, and in fact, it contains several sources of incompleteness that defy automated reasoning. In this paper we study these sources of incompleteness and how they relate to failures of proof mechanisms. We place SLID within a larger logic, that we call Weak Separation Logic (WSL). We prove that unlike SLID, WSL is complete for a non-trivial fragment of quantified entailments with background theories and inductive definitions, via a reduction to first-order logic (FOL). Moreover, we show that the ubiquitous fold/unfold proof mechanism is sound and complete for theory-free, quantifier-free WSL entailments with inductive definitions. Through this, we understand proof failures as stemming from nonstandard models present in WSL, but not allowed in SLID. These rogue models are typically infinite, and we use the formalism of symbolic structures to represent and automatically find them. We present a prototype tool that implements the FOL encoding of WSL and test it on an existing benchmark, which contains over 700 quantified entailment problems with inductive definitions. Our tool is able to find counter-models to many of the examples, and we provide a partial taxonomy of the rogue models, shedding some light on real-world proof failures.

2025-11-25T11:19:28Z Neta Elad Adithya Murali Sharon Shoham 10.1145/3776671 http://arxiv.org/abs/2509.11418v2 Mechanizing Synthetic Tait Computability in Istari 2025-12-04T05:55:02Z

Categorical gluing is a powerful technique for proving meta-theorems of type theories such as canonicity and normalization. Synthetic Tait Computability (STC) provides an abstract treatment of the complex gluing models by internalizing the gluing category into a modal dependent type theory with a phase distinction. This work presents a mechanization of STC in the Istari proof assistant. Istari is a Martin-Löf-style extensional type theory with equality reflection, which avoids much of the explicit transport reasoning typically found in intensional proof assistants. This work develops a reusable library for synthetic phase distinction, including modalities, extension types, and strict glue types, and applies it to two case studies: (1) a canonicity model for dependent type theory with dependent products and booleans with large elimination, and (2) a Kripke canonicity model for the cost-aware logical framework. Our results demonstrate that the core STC constructions can be formalized essentially verbatim in Istari, preserving the elegance of the on-paper arguments while ensuring machine-checked correctness.

2025-09-14T20:11:58Z Runming Li Yue Yao Robert Harper http://arxiv.org/abs/2512.03972v1 OOPredictor: Predicting Object-Oriented Accesses using Static Analysis 2025-12-03T17:05:58Z

Object-oriented Programming has become one of the most dominant design paradigms as the separation of concerns and adaptability of design reduce development and maintenance costs. However, the convenience is not without cost. The added indirection inherent in such designs causes excessive pointer chasing, negatively affecting locality, which in turn degrades the performance of cache structures. Furthermore, modern hardware prefetchers are mostly stride prefetchers that are ill-equipped to handle the unpredictability of access patterns generated by pointer chasing. Most software approaches that seek to address this problem resort to profiling the program as it runs, which comes with a significant run-time overhead or requires data from previous runs. In this paper, we propose the use of compile-time static analysis to predict the most common access patterns displayed by a program during run time. Since Java is one of the most popular object-oriented languages, we implement our prototype within the OpenJ9 JVM, inside the OMR optimizer infrastructure. The outputs of our proposed predictor are Markov chains that model the expected behavior of the program. The effectiveness of the proposed predictor is evaluated by comparing the model with the actual run-time behavior of the program measured using an instrumented interpreter. Our experiments show that the proposed predictor exhibits good accuracy and can be used to inform minimally intrusive load stall mitigation strategies, e.g. informing copying GCs on more locality-friendly copying orders

2025-12-03T17:05:58Z Hassan Arafat David Bremner Kenneth B. Kent Julian Wang http://arxiv.org/abs/2512.03926v1 Tunable Automation in Automated Program Verification 2025-12-03T16:27:01Z

Automated verification tools based on SMT solvers have made significant progress in verifying complex software systems. However, these tools face a fundamental tension between automation and performance when dealing with quantifier instantiation -- the primary source of incompleteness and verification slowdown in SMT-based verifiers. Tools choose between aggressive quantifier instantiation that provides more automation but longer verification times, or conservative instantiation that responds quickly but may require more manual proof hints. We present a mechanism that enables fine-grained control over the availability of quantified facts in verification contexts, allowing developers to selectively tune the level of automation. Our approach lets library authors provide different pre-defined automation levels while giving end-users the ability to further customize quantifier availability at the module, function, or proof context level. We implement our techniques in Verus, a Rust-based verification tool, and evaluate them on multiple openly available codebases. Our empirical analysis demonstrates the automation-performance tradeoff and that selective quantifier management enables developers to select the appropriate level of automation in different contexts.

2025-12-03T16:27:01Z Alexander Y. Bai Chris Hawblitzel Andrea Lattuada http://arxiv.org/abs/2512.03571v1 EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths 2025-12-03T08:50:16Z

We introduce a new approach to agent programming, the development of LLM-based agents. Current approaches to agent programming often entangle two aspects of agent design: the core workflow logic and the inference-time strategy (e.g., tree search). We introduce "probabilistic angelic nondeterminism" ("PAN"), a programming model that disentangles these two concerns, allowing the programmer to describe the agent workflow and independently experiment with different inference-time strategies by simply changing a few inputs. We provide an implementation of PAN in Python as the EnCompass framework, which uses a Python decorator to compile agent workflow programs into a search space. We present three case studies that demonstrate how the framework lets the programmer quickly improve the reliability of an agent and easily switch between different inference-time strategies, all with little additional coding.

2025-12-03T08:50:16Z 65 pages, 2 figures, published in NeurIPS 2025 Zhening Li Armando Solar-Lezama Yisong Yue Stephan Zheng http://arxiv.org/abs/2512.03492v1 Functional Python Programming in Introductory Computer Science Courses 2025-12-03T06:39:08Z

The functional programming paradigm has a long and storied history, with its beginnings in the Lambda Calculus. In recent decades, pure functional languages such as Haskell have been shown to be highly effective in producing robust software due to immutable data structures, among other functional features. The advantages of programming with immutable data structures can also be had in non-functional languages such as Python. Over the years, non-functional languages have introduced immutable data structures as well as comprehension and lambda expressions, and it is possible to program in a purely functional style in them. In this paper, we present a ``best practice'' idea in introductory programming classes that forces students to learn and complete programming assignments in a purely functional subset of Python. By doing so, the student can learn functional ideas such as immutability, pure functions with no side effects, and stateless programming. We define a functional subset of Python and illustrate the best practice using small examples. We strongly feel that students in computing need familiarity with pure functional programming and argue that this can be taught in introductory programming courses that use Python.

2025-12-03T06:39:08Z Presented in Best Practices Track of COMPUTE 2025 (arXiv:2512.02349) Rajshekhar Sunderraman http://arxiv.org/abs/2512.10977v1 Agentic Operator Generation for ML ASICs 2025-12-03T04:03:13Z

We present TritorX, an agentic AI system designed to generate functionally correct Triton PyTorch ATen kernels at scale for emerging accelerator platforms. TritorX integrates open-source large language models with a custom linter, JIT compilation, and a PyTorch OpInfo-based test harness. This pipeline is compatible with both real Meta Training and Inference Accelerator (MTIA) silicon and in hardware simulation environments for next-generation devices. In contrast to previous kernel-generation approaches that prioritize performance for a limited set of high-usage kernels, TritorX prioritizes coverage. Our system emphasizes correctness and generality across the entire operator set, including diverse data types, shapes, and argument patterns. In our experiments, TritorX successfully generated kernels and wrappers for 481 unique ATen operators that pass all corresponding PyTorch OpInfo tests (over 20,000 in total). TritorX paves the way for overnight generation of complete PyTorch ATen backends for new accelerator platforms.

2025-12-03T04:03:13Z Alec M. Hammond Aram Markosyan Aman Dontula Simon Mahns Zacharias Fisches Dmitrii Pedchenko Keyur Muzumdar Natacha Supper Mark Saroufim Joe Isaacson Laura Wang Warren Hunt Kaustubh Gondkar Roman Levenstein Gabriel Synnaeve Richard Li Jacob Kahn Ajit Mathews http://arxiv.org/abs/2511.23358v2 TypeDis: A Type System for Disentanglement 2025-12-02T18:17:07Z

Disentanglement is a runtime property of parallel programs guaranteeing that parallel tasks remain oblivious to each other's allocations. As demonstrated in the MaPLe compiler and run-time system, disentanglement can be exploited for fast automatic memory management, especially task-local garbage collection with no synchronization between parallel tasks. However, as a low-level property, disentanglement can be difficult to reason about for programmers. The only means of statically verifying disentanglement so far has been DisLog, an Iris-fueled variant of separation logic, mechanized in the Rocq proof assistant. DisLog is a fully-featured program logic, allowing for proof of functional correctness as well as verification of disentanglement. Yet its employment requires significant expertise and per-program proof effort. This paper explores the route of automatic verification via a type system, ensuring that any well-typed program is disentangled and lifting the burden of carrying out manual proofs from the programmer. It contributes TypeDis, a type system inspired by region types, where each type is annotated with a timestamp, identifying the task that allocated it. TypeDis supports iso-recursive types as well as polymorphism over both types and timestamps. Crucially, timestamps are allowed to change during type-checking, at join points as well as via a form of subtyping, dubbed subtiming. The paper illustrates TypeDis and its features on a range of examples. The soundness of TypeDis and the examples are mechanized in the Rocq proof assistant, using an improved version of DisLog, dubbed DisLog2.

2025-11-28T17:05:52Z 34 pages, 24 figures, extended version of the same paper accepted at POPL 2026 Alexandre Moine Stephanie Balzer Alex Xu Sam Westrick 10.1145/3776655 http://arxiv.org/abs/2502.13033v4 Classical notions of computation and the Hasegawa-Thielecke theorem (extended version) 2025-12-02T14:19:52Z

In the spirit of the Curry-Howard correspondence between proofs and programs, we define and study a syntax and semantics for classical logic equipped with a computationally involutive negation, using a polarised effect calculus, the linear classical L-calculus. A main challenge in designing a denotational semantics for the calculus is to accommodate both call-by-value and call-by-name evaluation strategies, which leads to a failure of associativity of composition. In order to tackle this issue, we define a notion of adjunction between graph morphisms on non-associative categories, which we use to formulate polarized and non-associative notions of symmetric monoidal closed duploid and of dialogue duploid. We show that they provide a direct style counterpart to adjunction models: linear effect adjunctions for the (linear) call-by-push-value calculus and dialogue chiralities for linear continuations, respectively. In particular, we show that the syntax of the linear classical L-calculus can be interpreted in any dialogue duploid, and that it defines in fact a syntactic dialogue duploid. As an application, we establish, by semantic as well as syntactic means, the Hasegawa-Thielecke theorem, which states that the notions of central map and of thunkable map coincide in any dialogue duploid (in particular, for any double negation monad on a symmetric monoidal category).

2025-02-18T16:48:06Z 51 pages incl. appendixes. Version of the paper with same title published in PACMPL (POPL 2026) extended with additional illustrations and proofs Éléonore Mangel Paul-André Melliès Guillaume Munch-Maccagnoni 10.1145/3776715 http://arxiv.org/abs/2512.02738v1 Probabilistic energy profiler for statically typed JVM-based programming languages 2025-12-02T13:21:35Z

Energy consumption is a growing concern in several fields, from mobile devices to large data centers. Developers need detailed data on the energy consumption of their software to mitigate consumption issues. Previous approaches have a broader focus, such as on specific functions or programs, rather than source code statements. They primarily focus on estimating the CPU's energy consumption using point estimates, thereby disregarding other hardware effects and limiting their use for statistical reasoning and explainability. We developed a novel methodology to address the limitations of measuring only the CPU's consumption and using point estimates, focusing on predicting the energy usage of statically typed JVM-based programming languages, such as Java and Scala. We measure the energy consumption of Bytecode patterns, the translation from the programming language's source code statement to their Java Bytecode representation. With the energy measurements, we construct a statistical model using Bayesian statistics, which allows us to predict the energy consumption through statistical distributions and analyze individual factors. The model includes three factors we obtain statically from the code: data size, data type, operation, and one factor about the hardware platform the code executes on: device. To validate our methodology, we implemented it for Java and evaluated its energy predictions on unseen programs. We observe that all four factors are influential, notably that two devices of the same model may differ in energy consumption and that the operations and data types cause consumption differences. The experiments also show that the energy prediction of programs closely follows the program's real energy consumption, validating our approach. Our work presents a methodology for constructing an energy model that future work, such as verification tools, can use for their energy estimates.

2025-12-02T13:21:35Z Joel Nyholm Wojciech Mostowski Christoph Reichenbach http://arxiv.org/abs/2503.20332v7 Bounded Exhaustive Random Program Generation for Testing Solidity Compilers 2025-12-02T10:06:47Z

By July 2025, smart contracts collectively manage roughly $120 billion in assets. With Solidity remaining the dominant language for smart contract development, the correctness of Solidity compilers has become critically important. However, Solidity compilers are bug-prone, with a recent study revealing that combinations of qualifiers in Solidity programs are the primary cause of compiler crashes, accounting for 40.5% of all historical crashes. While random program generators are widely used for compiler testing, they may be less effective at finding Solidity compiler bugs because they explore the unbounded space of possible programs rather than concentrating on the specific subspace related to bug-prone qualifiers. A promising idea for finding qualifier-related bugs is to bound the search space based on empirical evidence of where such bugs are likely to occur, specifically focusing test generation to target subspaces with rich combinations of qualifiers. To address this, we propose bounded exhaustive random program generation, a novel approach that dynamically bounds the search space, enhancing the likelihood of uncovering Solidity compiler bugs. Specifically, our method bounds the search space by generating valid program templates that abstract programs that use bug-prone qualifiers, and then uses these templates as a basis for compiler testing through exhaustive enumeration of suitable qualifiers. Mechanisms are devised to address technical challenges regarding validity and efficiency. We have implemented our novel generation approach in a new tool, Erwin. We have used Erwin to find and report 26 bugs across two Solidity compilers, solc and solang, and one Solidity static analyzer, slither. Among these, 23 were previously unknown, 18 have been confirmed, and 10 have been fixed. Evaluation results demonstrate that Erwin outperforms state-of-the-art Solidity fuzzers in bug detection.

2025-03-26T08:58:00Z Haoyang Ma Alastair F. Donaldson Qingchao Shen Yongqiang Tian Junjie Chen Shing-Chi Cheung 10.1145/3744916.3773237 http://arxiv.org/abs/2502.20496v4 Abstraction Functions as Types 2025-12-02T04:35:17Z

Software development depends on the use of libraries whose public specifications inform client code and impose obligations on private implementations; it follows that verification at scale must also be modular, preserving such abstraction. Hoare's influential methodology uses abstraction functions to demonstrate the coherence between such concrete implementations and their abstract specifications. However, the Hoare methodology relies on a conventional separation between implementation and specification, providing no linguistic support for ensuring that this convention is obeyed. This paper proposes a synthetic account of Hoare's methodology within univalent dependent type theory by encoding the data of abstraction functions within types themselves. This is achieved via a phase distinction, which gives rise to a gluing construction that renders an abstraction function as a type and a pair of modalities that fracture a type into its concrete and abstract parts. A noninterference theorem governing the phase distinction characterizes the modularity guarantees provided by the theory. This approach scales to verification of cost, allowing the analysis of client cost relative to a cost-aware specification. A monadic sealing effect facilitates modularity of cost, permitting an implementation to be upper-bounded by its specification in cases where private details influence observable cost. The resulting theory supports modular development of programs and proofs in a manner that hides private details of no concern to clients while permitting precise specifications of both the cost and behavior of programs.

2025-02-27T20:02:17Z Harrison Grodin Carnegie Mellon University Runming Li Carnegie Mellon University Robert Harper Carnegie Mellon University 10.1145/3776673 http://arxiv.org/abs/2509.00360v2 ChopChop: a Programmable Framework for Semantically Constraining the Output of Language Models 2025-12-02T01:59:32Z

Language models (LMs) can generate code but cannot guarantee its correctness$\unicode{x2014}$often producing outputs that violate type safety, program invariants, or other semantic properties. Constrained decoding offers a solution by restricting generation to only produce programs that satisfy user-defined properties. However, existing methods are either limited to syntactic constraints or rely on brittle, ad hoc encodings of semantic properties over token sequences rather than program structure. We present ChopChop, the first programmable framework for constraining the output of LMs with respect to semantic properties. ChopChop introduces a principled way to construct constrained decoders based on analyzing the space of programs a prefix represents. It formulates this analysis as a realizability problem which is solved via coinduction, connecting token-level generation with structural reasoning over programs. We demonstrate ChopChop's generality by using it to enforce (1) equivalence to a reference program and (2) type safety. Across a range of models and tasks, ChopChop improves success rates while maintaining practical decoding latency.

2025-08-30T04:51:03Z Shaan Nagy Timothy Zhou Nadia Polikarpova Loris D'Antoni http://arxiv.org/abs/2407.16801v3 Qudit Quantum Programming with Projective Cliffords 2025-12-02T01:09:18Z

This paper introduces a novel abstraction for programming quantum operations, specifically projective Cliffords, as functions over the qudit Pauli group. Generalizing the idea behind Pauli tableaux, we introduce a type system and lambda calculus for projective Cliffords called LambdaPC, which captures well-formed Clifford operations via a Curry-Howard correspondence with a particular encoding of the Clifford and Pauli groups. Importantly, the language captures not just qubit operations, but qudit operations for any dimension $d$. Throughout the paper we explore what it means to program with projective Cliffords through a number of examples and a case study focusing on stabilizer error correcting codes.

2024-07-23T19:02:55Z 44 pages Jennifer Paykin Sam Winnick 10.1145/3776646