https://arxiv.org/api/Y52bq/YMCu5AsjohU0AaL6oOL2c 2026-06-14T07:22:22Z 9885 285 15 http://arxiv.org/abs/2604.22210v1 From Monolithic to Compositional: A Compositional Operational Semantics for Crystality 2026-04-24T04:27:34Z

Parallel execution has become a key approach to improving blockchain scalability, but the lack of formal semantics for smart contract languages in such settings makes rigorous reasoning difficult. Crystality is a smart contract language designed for parallel EVMs, supporting scoped state and asynchronous relay across execution engines. This paper introduces a compositional operational semantics for Crystality. Unlike the original monolithic semantics, the new semantics decomposes the system into engine components and a global component, making the structure of parallel execution explicit. The compositional formulation enables simple proofs of key structural properties, including locality, global isolation, and strong commutativity of independent local steps. Furthermore, we prove that the compositional semantics is semantically equivalent to the original one via a transaction-level bisimulation theorem based on encoding and decoding functions between configurations, and two code-level bisimulation theorems for local and global execution.

2026-04-24T04:27:34Z Ziyun Xu Hao Wang Meng Sun http://arxiv.org/abs/2503.15825v2 Efficient Symbolic Execution of Software under Fault Attacks 2026-04-24T00:02:42Z

We propose a symbolic execution method for analyzing the safety of software under fault attacks both accurately and efficiently. Fault attacks leverage physically injected hardware faults in an embedded system to break the safety of a software program. While there are existing methods for analyzing the impact of maliciously injected hardware faults on the embedded software, they suffer from inaccurate fault modeling and inefficient fault analysis. To overcome these limitations, we propose two novel techniques. First, we propose a new fault modeling technique that leverages automated program transformation to add symbolic variables to the original program, to accurately model the new program behavior induced by the injected faults. This new fault modeling approach has two advantages over existing techniques: (a) the fault-induced program behavior is closely related to what attackers exploit in practice and (b) the automatically transformed program may be analyzed by any downstream fault analysis algorithm. Second, we propose an efficient symbolic execution algorithm that is designed specifically for conducting fault analysis on the transformed program. It leverages two pruning techniques to mitigate path explosion. We have implemented the proposed method and evaluated it on a variety of benchmark programs. The experimental results show that our method significantly outperforms the state-of-the-art techniques. Compared to the current state-of-the-art, it is able to detect previously-missed safety violations and at the same time avoid bogus violations. Furthermore, compared to the baseline algorithm, our optimized symbolic execution algorithm can be orders-of-magnitude faster.

2025-03-20T03:19:48Z Accepted to ECOOP 2026. Camera-ready version 10.4230/LIPIcs.ECOOP.2026.16 Yuzhou Fang Chenyu Zhou Jingbo Wang Chao Wang http://arxiv.org/abs/2604.22041v1 Causality and Semantic Separation 2026-04-23T19:54:23Z

The design of scientific experiments deserves its own variation of formal verification to catch cases where scientists made important mistakes, such as forgetting to take confounding variables into account. One of the most fundamental underpinnings of science is causality, or what it means for interventions in the world to cause other outcomes, as formalized by computer scientists like Judea Pearl. However, these ideas had not previously been made rigorous to the standards of the programming-languages community, where one expects a (syntactic) program analysis to be proved sound with respect to a natural semantics. In the domain of causality, as the relevant "program analysis," we focus on $d$-separation, a classic condition on graphs that can be used to decide when the design of an experiment controls for sufficiently many confounding variables, even though the reason that this condition works is often unintuitive. Our central result (mechanized in Rocq) is that $d$-separation exactly coincides with a novel semantic definition inspired by noninterference from the theory of security. This characterization provides a structural semantic foundation for $d$-separation and helps explain why the graph-theoretic condition is correct, independently of probabilistic assumptions. For each given automated test on the quality of an experiment design, our theorem justifies an associated method for falsifying the world-modeling hypothesis behind the experiment.

2026-04-23T19:54:23Z Anna Zhang Qinglan Luo London Bielicke Eunice Jun Adam Chlipala http://arxiv.org/abs/2604.22032v1 Kernel Contracts: A Specification Language for ML Kernel Correctness Across Heterogeneous Silicon 2026-04-23T19:46:52Z

Every ML kernel ships with an implicit contract about what it computes. People rarely write the contract down. When two kernels disagree -- when a matmul on AMD produces a different gradient than the same matmul on NVIDIA, when a fused attention kernel silently downcasts an accumulator, when an out-of-bounds access returns zero on one stack and garbage on another -- there is no formal artifact to arbitrate the dispute. Recent empirical work has measured the gap across silicon platforms, but none of it specifies the contract being violated. We present a specification language for kernel contracts. A contract has eight parts: identifier, scope, precondition, postcondition, tolerance, reference oracle, measurement protocol, and violation signature. We use it to state twelve contract classes covering precision, ordering, compiler-induced, and exceptional-value failure modes, each grounded in published empirical evidence. We require a three-state calibration: every contract must admit at least one reference-conforming implementation and at least one contract-violating implementation that passes basic functional tests. We apply the framework to three documented incidents -- Huawei Ascend silent precision coercion, Sakana AI CUDA Engineer reward hacking, AMD out-of-bounds silent acceptance -- and show that each informal diagnosis maps to a specific contract violation with a measurable signature. A kernel contract suite is a normative reference against which conformance can be graded, in the way that ISASecure grades industrial control systems against IEC 62443.

2026-04-23T19:46:52Z 28 pages, 1 figure Cooper Veit http://arxiv.org/abs/2604.21795v1 NEST: Network Enforced Session Types (Technical Report) 2026-04-23T15:55:52Z

This paper introduces NEST (Network-Enforced Session Types), a runtime verification framework that moves application-level protocol monitoring into the network fabric. Unlike prior work that instruments or wraps application code, we synthesize packet-level monitors that enforce protocols directly in the data plane. We develop algorithms to generate network-level monitors from session types and extend them to handle packet loss and reordering. We implement NEST in P4 and evaluate it on applications including microservice and network-function models, showing that network-level monitors can enforce realistic non-trivial protocols.

2026-04-23T15:55:52Z Jens Kanstrup Larsen Alceste Scalas Guy Amir Jules Jacobs Jana Wagemaker Nate Foster 10.4230/LIPIcs.ECOOP.2026.31 http://arxiv.org/abs/2604.21467v1 Linear Constraints 2026-04-23T09:28:15Z

Linear constraints are the linear counterpart of Haskell's class constraints. Linearly typed parameters allow the programmer to control resources such as file handles and manually managed memory as linear arguments. Indeed, a linear type system can verify that these resources are used safely. However, writing code with explicit linear arguments requires bureaucracy. Linear constraints address this shortcoming: a linear constraint acts as an implicit linear argument that can be filled in automatically by the compiler. We present this new feature as a qualified type system, together with an inference algorithm which extends GHC's existing constraint solver algorithm. Soundness of linear constraints is ensured by the fact that they desugar into Linear Haskell. This paper is a revised and extended version of a previous paper by the same authors (arXiv:2103.06127). The formal system and the constraint solver have been significantly simplified and numerous additional applications are described.

2026-04-23T09:28:15Z Arnaud Spiwack Csongor Kiss Jean-Philippe Bernardy Nicolas Wu Richard A. Eisenberg http://arxiv.org/abs/2604.21263v1 Trustworthy Clinical Decision Support Using Meta-Predicates and Domain-Specific Languages 2026-04-23T04:11:44Z

\textbf{Background:} Regulatory frameworks for AI in healthcare, including the EU AI Act and FDA guidance on AI/ML-based medical devices, require clinical decision support to demonstrate not only accuracy but auditability. Existing formal languages for clinical logic validate syntactic and structural correctness but not whether decision rules use epistemologically appropriate evidence. \textbf{Methods:} Drawing on design-by-contract principles, we introduce meta-predicates -- predicates about predicates -- for asserting epistemological constraints on clinical decision rules expressed in a DSL. An epistemological type system classifies annotations along four dimensions: purpose, knowledge domain, scale, and method of acquisition. Meta-predicates assert which evidence types are permissible in any given rule. The framework is instantiated in AnFiSA, an open-source platform for genetic variant curation, and demonstrated using the Brigham Genomics Medicine protocol on 5.6 million variants from the Genome in a Bottle benchmark. \textbf{Results:} Decision trees used in variant interpretation can be reformulated as unate cascades, enabling per-variant audit trails that identify which rule classified each variant and why. Meta-predicate validation catches epistemological errors before deployment, whether rules are human-written or AI-generated. The approach complements post-hoc methods such as LIME and SHAP: where explanation reveals what evidence was used after the fact, meta-predicates constrain what evidence may be used before deployment, while preserving human readability. \textbf{Conclusions:} Meta-predicate validation is a step toward demonstrating not only that decisions are accurate but that they rest on appropriate evidence in ways that can be independently audited. While demonstrated in genomics, the approach generalises to any domain requiring auditable decision logic.

2026-04-23T04:11:44Z Michael Bouzinier Sergey Trifonov Michael Chumack Eugenia Lvova Dmitry Etin http://arxiv.org/abs/2604.20073v2 Scaling Worst-Case Optimal Datalog to GPUs 2026-04-23T02:13:52Z

Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with the potential for massive data parallelism. While traditional engines are CPU-based, the memory-bound nature of Datalog has led to increasing interest in leveraging GPUs. These engines beat CPU-based engines by operationalizing iterated relational joins via SIMT-friendly join algorithms. Unfortunately, all existing GPU Datalog engines are built on binary joins, which are inadequate for the complex multi-way queries arising in production systems such as DOOP and ddisasm. For these queries, binary decomposition can incur the AGM bound asymptotic blowup in time and space, leading to OOM failures regardless of join order. Worst-Case Optimal Joins (WCOJ) avoid this blowup, but their attribute-at-a-time intersections map poorly to SIMT hardware under key skew, causing severe load imbalance across Streaming Multiprocessors (SMs). We present SRDatalog, the first GPU Datalog engine based on WCOJ. SRDatalog uses flat columnar storage and two-phase deterministic memory allocation to avoid the OOM failures of binary joins and the index-rebuild overheads of static WCOJ systems. To mitigate skew and hide hardware stalls, SRDatalog further employs root-level histogram-guided load balancing, structural helper-relation splitting, and stream-aligned rule multiplexing. On real-world program-analysis workloads, SRDatalog achieves geometric-mean speedups of 21x to 47x.

2026-04-22T00:37:50Z Yihao Sun Kunting Qi Thomas Gilray Sidharth Kumar Kristopher Micinski http://arxiv.org/abs/2605.23931v1 BODHI: Precise OS Kernel Specification Inference 2026-04-22T19:29:16Z

The formal verification of operating system kernels requires precise specifications that capture the intended behavior of system calls. Writing these specifications manually demands deep domain expertise, motivating the use of large language models (LLMs) to automate the process. However, in OSV-Bench, a benchmark of 245 specification generation tasks derived from the Hyperkernel OS kernel, the best reported Pass@1 is 55.10%. We propose a domain knowledge prompting method (BODHI), which augments the standard few-shot prompt with a structured C-to-Python translation guide covering 15 categories of domain-specific translation patterns. Inspired by Structured Chain-of-Thought (SCoT) prompting, the guide organizes translation by separation of concerns, addressing pre-condition extraction and post-condition generation as distinct categories. Evaluated on nine models from six providers (Anthropic, Mistral, Amazon, DeepSeek, Meta, Alibaba), covering dense, mixture-of-experts and reasoning architectures, BODHI improves every model tested, with gains ranging from +11% to +32%. The best configuration (Claude Opus 4.6 + BODHI) reaches 96.73% Pass@1. BODHI reduces both syntax and semantic errors, with the strongest effect on models that have sufficient instruction-following capability to utilize structured reference material. These results demonstrate that domain knowledge injection is a model-agnostic technique that substantially bridges the gap between general-purpose code generation and formal specification synthesis.

2026-04-22T19:29:16Z Zhiming Chang Ziyang Li http://arxiv.org/abs/2604.08570v2 QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation 2026-04-22T16:54:46Z

Large Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBench+, a unified benchmark spanning Qiskit, PennyLane, and Cirq, with 42 aligned tasks covering quantum algorithms, gate decomposition, and state preparation. We evaluate models with executable functional tests, report Pass@1 and Pass@5, and use KL-divergence-based acceptance for probabilistic outputs. We additionally study Pass@1 after feedback-based repair, where a model may revise code after a runtime error or wrong answer. Across frameworks, the strongest one-shot scores reach 59.5% in Qiskit, 54.8% in Cirq, and 42.9% in PennyLane; with feedback-based repair, the best scores rise to 83.3%, 76.2%, and 66.7%, respectively. These results show clear progress, but also that reliable multi-framework quantum code generation remains unsolved and still depends strongly on framework-specific knowledge.

2026-03-25T20:51:21Z 24 pages total, 25 figures, 5 tables, including supplementary material. Accepted to the ICLR 2026 Workshop on I Can't Believe It's Not Better Ali Slim Haydar Hamieh Jawad Kotaich Yehya Ghosn Mahdi Chehimi Ammar Mohanna Hasan Abed Al Kader Hammoud Bernard Ghanem http://arxiv.org/abs/2604.17198v2 Partitioning Unstructured Sparse Tensor Algebra for Load-Balanced Parallel Execution 2026-04-22T15:41:10Z

Sparse tensor algebra is challenging to efficiently parallelize due to the irregular, data-dependent, and potentially skewed structure of sparse computation. We propose the first partitioning algorithm that provably load balances the computation of any sparse tensor algebra expression across parallel execution units. Our algorithm generalizes parallel merging algorithms to any number of operands, and to multi-dimensional, hierarchical sparse data structures. We implement our algorithm within an existing sparse tensor algebra compilation framework to automatically generate parallel sparse tensor algebra kernels that target multi-core CPUs and GPUs. We show that our generated code is competitive with hand-implemented parallelization strategies used by vendor libraries like Intel MKL and NVIDIA cuSPARSE (geo-means of $0.73$--$3.4\times$) and \textsc{Taco} (geo-means of $1.0$--$2.4\times$), and significantly outperforms general-purpose strategies for sparse tensor expressions where specialized algorithms have not been developed (geo-means of $2.0$--$6.4\times$).

2026-04-19T01:58:55Z Atharva Chougule Alexander J Root Rubens Lacouture Bobby Yan Rohan Yadav Fredrik Kjolstad http://arxiv.org/abs/2604.20507v1 Automatic Code and Test Generation of Smart Contracts from Coordination Models 2026-04-22T12:51:56Z

We propose a formal approach for specifying and implementing decentralised coordination in distributed systems, with a focus on smart contracts. Our model captures dynamic roles, data-driven transitions, and external coordination interfaces, enabling high-level reasoning about decentralised workflows. We implement a toolchain that supports formal model validation, code generation for Solidity (our framework is extendable to other smart contract languages), and automated test synthesis. Although our implementation targets blockchain platforms, the methodology is platform-agnostic and may generalise to other service-oriented and distributed architectures. We demonstrate the expressiveness and practicality of the approach by modelling and realising some coordination patterns in smart contracts.

2026-04-22T12:51:56Z Elvis Konjoh Selabi Maurizio Murgia António Ravara Emilio Tuosto http://arxiv.org/abs/2604.20410v1 Extending Contract Verification for Parallel Programming Models to Fortran 2026-04-22T10:27:28Z

High-performance computing often relies on parallel programming models such as MPI for distributed-memory systems. While powerful, these models are prone to subtle programming errors, leading to development of multiple correctness checking tools. However, these are often limited to C/C++ codes, tied to specific library implementations, or restricted to certain error classes. Building on our prior work with CoVer, a generic, contract-based verification framework for parallel programming models, we extend CoVer's applicability to Fortran, enabling static and dynamic analysis across multiple programming languages. We adapted language-specific contract definitions and modified the analyses to support both C/C++ and Fortran programs. Our evaluation demonstrates that the enhanced version preserves CoVer's analysis accuracy and even revealed a bug in the MPI-BugBench testing framework, underscoring the effectiveness of the approach. The Fortran port of CoVer turns out to be substantially more efficient than the state-of-the-art tool MUST, while maintaining generality across languages.

2026-04-22T10:27:28Z A peer-reviewed version is to be published by Springer as part of the ISC C3PO workshop proceedings. This is the originally submitted article Yussur Mustafa Oraji Christian Bischof http://arxiv.org/abs/2604.20917v1 The Path Not Taken: Duality in Reasoning about Program Execution 2026-04-22T03:20:53Z

Large language models (LLMs) have shown remarkable capabilities across diverse coding tasks. However, their adoption requires a true understanding of program execution rather than relying on surface-level patterns. Existing benchmarks primarily focus on predicting program properties tied to specific inputs (e.g., code coverage, program outputs). As a result, they provide a narrow view of dynamic code reasoning and are prone to data contamination. We argue that understanding program execution requires evaluating its inherent duality through two complementary reasoning tasks: (i) predicting a program's observed behavior for a given input, and (ii) inferring how the input must be mutated toward a specific behavioral objective. Both tasks jointly probe a model's causal understanding of execution flow. We instantiate this duality in DexBench, a benchmark comprising 445 paired instances, and evaluate 13 LLMs. Our results demonstrate that dual-path reasoning provides a robust and discriminative proxy for dynamic code understanding.

2026-04-22T03:20:53Z Accepted to ACL 2026 Main Conference Eshgin Hasanov Md Mahadi Hassan Sibat Santu Karmaker Aashish Yadavally http://arxiv.org/abs/2601.20370v2 A Program Logic for Abstract (Hyper)Properties 2026-04-21T18:46:45Z

We introduce APPL (Abstract Program Property Logic), a unifying Hoare-style logic that subsumes standard Hoare logic, incorrectness logic, and several variants of Hyper Hoare logic. APPL provides a principled foundation for abstract program logics parameterised by an abstract domain, encompassing both existing and novel abstractions of properties and hyperproperties. The logic is grounded in a semantic framework where the meaning of commands is first defined on a lattice basis and then extended to the full lattice via additivity. Crucially, nondeterministic choice is interpreted by a monoidal operator that need not be idempotent nor coincide with the lattice join. This flexibility allows the framework to capture collecting semantics, various classes of abstract semantics, and hypersemantics. The APPL proof system is sound, and it is relatively complete whenever the property language is sufficiently expressive. When the property language is restricted to an abstract domain, the result is a sound abstract deduction system based on best correct approximations. Relative completeness with respect to a corresponding abstract semantics is recovered provided the abstract domain is complete, in the sense of abstract interpretation, for the monoidal operator.

2026-01-28T08:33:39Z Paolo Baldan Roberto Bruni Francesco Ranzato Diletta Rigo