https://arxiv.org/api/5/L8Ra3W+xyTsmX8dqI/ZwLbeyM 2026-06-13T13:14:10Z 9885 45 15 http://arxiv.org/abs/2606.04813v1 GraphAlg Playground: An Online Platform for Learning and Experimenting with the GraphAlg Language 2026-06-03T12:37:43Z

The GraphAlg language for graph algorithms enables native support for user-defined graph analytics workloads in databases. In this demonstration, we present a web-based playground for writing and executing GraphAlg programs in the web browser, including an interactive tutorial explaining its key concepts. The playground runs inside the user's web browser without any installation, and is freely available under a permissive license as a reusable library. We present two demonstration scenarios of the publicly available playground website, showing how new users can learn to program in GraphAlg using the tutorial, while expert users can use the playground to prototype and validate their algorithms.

2026-06-03T12:37:43Z Accepted at the VLDB 2026 Demonstration Track; to appear in PVLDB Vol. 19. 4 pages, 8 figures, 1 table. Artifacts: https://wildarch.dev/graphalg Daan de Graaf Robert Brijder Soham Chakraborty George Fletcher Bram van de Wall Nikolay Yakovets http://arxiv.org/abs/2412.18134v5 Learning Randomized Reductions 2026-06-03T03:12:59Z

Randomized self-reductions (RSRs) express $f(x)$ using $f$ evaluated at random correlated points, enabling self-correcting programs, instance-hiding protocols, and applications in complexity theory and cryptography. Yet discovering RSRs has required manual expert derivation for over 40 years, limiting their practical use. We present Bitween for automated RSR learning. First, we formalize RSR learning with sample complexity analysis under correlated sampling. Second, we develop Vanilla Bitween, which integrates multiple backends (linear regression, genetic programming, symbolic regression, and mixed-integer programming). The linear regression backend outperforms the others, discovering RSRs for 43 of 80 functions (54%) in RSR-Bench, our benchmark suite, including the first known reduction for sigmoid. Third, we introduce Agentic Bitween, a neuro-symbolic approach where LLM agents propose novel query functions beyond the fixed set ($x+r$, $x-r$, $x \cdot r$, $x$, $r$) in prior work. Agentic Bitween discovers RSRs for 64 of 80 functions (80%), outperforming pure neural baselines in both RSR discovery and verification accuracy.

2024-12-24T03:42:53Z Accepted at ICML 2026 (Spotlight). 9 pages main text + appendix Proceedings of the 43rd International Conference on Machine Learning, PMLR 306, 2026 Ferhat Erata Orr Paradise Thanos Typaldos Timos Antonopoulos ThanhVu Nguyen Shafi Goldwasser Ruzica Piskac http://arxiv.org/abs/2606.04352v1 Negative and Fractional Types in the Fidelity Framework 2026-06-03T02:09:11Z

Our Native Type Universe (NTU) has been detailed through five previous papers establishing the substrate our framework's compilation pipeline targets across multiple hardware platforms. We have found in the course of that work a deeper reach this foundation makes available: negative and fractional types as native first-class constructs. James and Sabry established these dualities in 2012; Chen and Sabry later developed their categorical interpretation in compact closed categories. These dualities have practical benefit for compute modalities in our Fidelity Framework where extant general purpose compute framings lack the substrate to host them as native constructs. We see practicality with these type forms in preserving decidability and principal types through the abelian-group algebraic pattern Kennedy's dimensional types establish. The resulting isomorphisms would admit new, concise forms of resolution within our novel lowering strategy, and we sketch a notional Clef language syntax that would admit rational dimensional exponents into our algebra. We trace the implications across several problem spaces these type forms would open to our compilation and verification disciplines: Bayesian inference where fractional types would express conditioning obligations, quantum computation (and simulations) where negative types would provide the type-level adjoint, and finally adiabatic computation where the combined discipline would express Hamiltonian deformation as a reversible constraint-propagation process. The inherent structure of our NTU together with the supporting framework appears well-suited to problem spaces that current software ecosystems do not directly address, while keeping approachable development ergonomics and mature tooling aligned with operational guarantees the framework aspires to provide.

2026-06-03T02:09:11Z 23 pages, 1 figure, sample code appendix Houston Haynes http://arxiv.org/abs/2606.04311v1 Formal verification of the S-two AIR 2026-06-03T00:36:41Z

StarkWare's S-two prover provides an efficient means for establishing, on blockchain, that a program written in the Cairo virtual machine language runs to completion. The latter claim is encoded by an algebraic intermediate representation (AIR) that captures the semantics of the Cairo language. The AIR asserts the existence of tables of values from a finite field satisfying certain algebraic constraints. A cryptographic interactive proof system, circle STARK, provides an efficiently-checked certificate that the AIR is satisfied. We describe our verification, using the Lean 4 proof assistant, that the AIR encoding is sound, which is to say, the satisfiability of the AIR implies the computational claim.

2026-06-03T00:36:41Z Jeremy Avigad Anat Ganor Lior Goldberg David Levit Ohad Nir Yoav Seginer Alon Titelman http://arxiv.org/abs/2606.05228v1 Where Do Large Language Models Fail on Competitive Programming? A Taxonomy of Failures by Algorithm Type and Difficulty Rating 2026-06-02T18:48:38Z

Large language models (LLMs) demonstrate increasing proficiency on competitive programming benchmarks, yet technical reports predominantly publish aggregate pass rates, obscuring domain-specific vulnerabilities. We present a systematic empirical study of LLM failure patterns using a balanced taxonomy of 315 Codeforces problems across seven algorithm categories and three difficulty tiers. We evaluate GPT-4o and Claude Sonnet 4.6 under strict execution-based conditions, controlling for temperature (T = 0.2). To isolate the impact of reasoning frameworks on algorithmic correctness, we conduct an ablation study comparing direct zero-shot generation against zero-shot Chain-of-Thought (CoT). Our findings reveal a severe divergence from standard NLP benchmarks: forcing CoT aggressively penalizes GPT-4o, dropping its pass rate from 46.0% to 36.8% and exacerbating a critical weakness in Greedy logic. Conversely, while Claude maintains a higher logical baseline (63.5% under CoT), the expanded text generation severely degrades its markdown instruction adherence, causing its Compile Errors to more than triple (from 9 to 31, a 244% increase). Furthermore, failure-mode analysis indicates that Wrong Answer (WA) is the dominant verdict for both models--accounting for over 90% of GPT-4o's and roughly 70% of Claude's unaccepted solutions. These findings empirically demonstrate that standard prompt engineering techniques fail to bridge the algorithmic reasoning gap in competitive programming environments.

2026-06-02T18:48:38Z 12 pages, 4 figures Ayush Kumar Jha Shalini Jha http://arxiv.org/abs/2511.13663v2 SAIL: Sound Abstract Interpreters with LLMs 2026-06-02T16:19:57Z

How to construct globally sound abstract interpreters to safely approximate program behaviors remains a bottleneck in abstract interpretation. In this paper, we show the potential of using state-of-the-art LLMs to automate this tedious process. Focusing on the neural network verification area, we synthesize non-trivial sound abstract transformers across diverse abstract domains using LLMs to search within infinite space from scratch. We formalize the synthesis task as a constrained optimization problem, for which we design a novel mathematically grounded cost function that measures the degree of unsoundness of each generated candidate transformer, while enforcing hard syntactic and semantic validity constraints. Building on this formulation, we introduce SAIL, a novel unified framework that combines model generation, syntactic and semantic validation, and cost-function-based refinement to synthesize globally sound abstract transformers. Evaluation results show that SAIL not only matches the performance of manually designed transformers, but also is able to synthesize sound and high-precision transformers that do not exist in the literature for complex non-linear operators.

2025-11-17T18:16:36Z 43 pages, 21 figures Proc. ACM Program. Lang. 10, PLDI, Article 230, 26 pages (2026) Qiuhan Gu Avaljot Singh Gagandeep Singh 10.1145/3808308 http://arxiv.org/abs/2606.01794v2 Tridirectional Discriminating-Power Formal Verification of Smart Contract Reentrancy Defense Against Production-Deployed Solidity Source 2026-06-02T14:01:42Z

We present the first machine-checked correctness proof of the OpenZeppelin reentrancy-guard pattern against a Lean 4 state-machine model of production-deployed Solidity source. All thirteen theorems are machine-checked with zero sorry, zero user-introduced axioms, and an axiom footprint bounded by [propext] (a standard mathlib4 axiom), gated under continuous integration. Smart contract reentrancy has caused over US$500M in documented losses since 2016, with the DAO 2016 attack draining ~3.6M ETH and forcing the hard fork that split Ethereum. The OpenZeppelin ReentrancyGuard pattern is the de facto defense across production DeFi, yet no prior work has established its discriminating power: that the guard blocks attacks on vulnerable instances, preserves correct execution for non-attacking transactions, and distinguishes adjacent safe and vulnerable variants. Prior efforts formalized either guard correctness on toy contracts or attack feasibility on isolated instances - not both directions plus boundary cases against production source. We verify three production instantiations - DAO 2016, Compound v2, and Aave V3 flashLoan - plus a minimal-diff mutant of Aave V3's flashLoan (flashLoanVulnerable) isolating one security-critical difference, via mutation testing. The tridirectional structure pairs (a) attack reproduction of the DAO 2016 pattern, (b) a correctness proof for Compound v2, and (c) a boundary-case proof distinguishing Aave V3's CEI-correct flashLoan from the mutant. A capstone meta-theorem composes the three under a no-retrofit discipline, demonstrated at the first cross-protocol stress test (Compound v2 to Aave V3); broader-family portability is future work. Full Lean 4 source, CI config and reproduction commands are at https://github.com/rayiskander2406/qanary-contracts, reproducible at v1.6-phase7-closure (substrate: v1.3-layer6-closure).

2026-06-01T07:13:39Z Preprint v2: added Zenodo concept DOI (10.5281/zenodo.20510920) for the archival artifact snapshot in the reproduction-commands appendix. No changes to the methodology, results, theorems, or references. Community feedback welcome; v3 with feedback planned before venue submission Ray Iskander http://arxiv.org/abs/2408.14345v4 Guard Analysis and Safe Erasure Gradual Typing: a Type System for Elixir 2026-06-02T13:24:13Z

We formalize a new type system for Elixir, a dynamically typed functional programming language of growing popularity that runs on the Erlang virtual machine. Our system combines gradual typing with semantic subtyping to enable precise, sound, and practical static type analysis, without requiring any changes to Elixir's compilation pipeline or runtime. Type soundness is ensured by leveraging runtime checks -- both implicit, from the Erlang VM, and explicit, via developer-written guards. Central to our approach are two key innovations: the notion of "strong functions", which can be assigned precise types even when applied to inputs that may fall outside their intended domain; and a fine-grained analysis of guards that enables accurate type refinement for case expressions and guarded function definitions. While type information is erased before execution and not used by the compiler, our "safe erasure" gradual typing strategy maintains soundness and expressiveness without compromising compatibility or performance. This work lays the theoretical foundation for Elixir's new type system, outlines its integration into recent versions of the language, and demonstrates its effectiveness on large-scale industrial codebases.

2024-08-26T15:20:09Z Giuseppe Castagna Guillaume Duboc http://arxiv.org/abs/2606.04056v1 Token Budgets: An Empirical Catalog of 63 LLM-Agent Budget-Overrun Incidents, with an Affine-Typed Rust Mitigation as a Case Study 2026-06-02T10:46:57Z

LLM-agent budget overruns are a documented production failure class: a single retry loop can spend thousands of dollars before an operator notices, and the in-process integrity properties that would prevent it (no aliasing, no double-spend, no use-after-delegation of a cost-bearing value) are enforced, if at all, by ad-hoc wrappers rather than by the type system. Our central contribution is empirical: a catalog of 63 confirmed production incidents from 21 orchestration frameworks (2023-2026), each backed by a quoted GitHub issue and, where reported, a dollar loss, organized into an eight-cluster failure taxonomy (inter-rater Cohen's kappa = 0.837, N = 113), plus 47 supplementary structural entries. As one mitigation evaluated against this taxonomy, we build token-budgets, an 1,180-line Rust crate (no unsafe) that operationalizes affine ownership so that cloning, double-spending, or using a budget after delegating it are compile errors rather than runtime hazards an operator must remember to avoid. The dollar cap is runtime arithmetic under an estimator assumption; the affine layer makes that arithmetic non-bypassable. On single-agent workloads a 4-line Python counter matches the crate at 0/30 overshoot, so the distinguishing value is non-bypassability under operator error in multi-agent delegation: the delegation-fanout race documented in 11 incidents is rejected by the borrow checker at compile time, while the same pattern under asyncio overshoots 30/30 and three disciplined alternatives overshoot 0/30. Across five runtimes, three providers, and a temperature-stratified live-API test (N = 160), the approach reports zero cap violations and zero false refusals, at operational parity with concurrent work. Static over-reservation is 4-6x (2.11x adaptive). Binary-level cap-soundness on the running binary is left open.

2026-06-02T10:46:57Z 26 pages. Artifact (catalog CSV, Rust crate, formal proofs): https://github.com/sajjadanwar0/token-budgets Sajjad Khan http://arxiv.org/abs/2507.15017v4 Polynomial Invariant Generation for Floating-Point Programs 2026-06-02T06:54:48Z

In numeric-intensive computations, it is well known that the execution of floating-point programs is imprecise as floating-point arithmetic incurs round-off errors. Although round-off errors are small for a single floating-point operation, the aggregation of such errors may be dramatic and cause catastrophic program failures. Therefore, to ensure the correctness of floating-point programs, round-off error needs to be carefully taken into account. In this work, we consider polynomial invariant generation for floating-point programs, aiming at generating tight invariants under the perturbation of round-off errors. Our contribution is a novel framework for applying polynomial constraint solving to address the invariant generation problem, which is also the first polynomial constraint solving based approach that handles floating-point errors to our best knowledge. In our framework, we propose a novel combination of round-off error analysis and polynomial constraint solving, aiming to circumvent the cost of handling a large number of error variables in the floating-point model. Experimental results over a variety of challenging benchmarks show that our framework outperforms SOTA approaches in both time efficiency and the precision of generated invariants.

2025-07-20T16:02:17Z Xuran Cai Liqian Chen Hongfei Fu http://arxiv.org/abs/2604.15713v3 Just Type It in Isabelle! AI Agents Drafting, Mechanizing, and Generalizing from Human Hints 2026-06-02T05:13:23Z

Type annotations are essential when printing terms in a way that preserves their meaning under reparsing and type inference. We study the problem of complete and minimal type annotations for rank-one polymorphic $λ$-calculus terms, as used in Isabelle. Building on prior work by Smolka, Blanchette et al., we give a metatheoretical account of the problem, with a full formal specification and proofs, and formalize it in Isabelle/HOL. Our development is a series of experiments featuring human-driven and AI-driven formalization workflows: a human and an LLM-powered AI agent independently produce pen-and-paper proofs, and the AI agent autoformalizes both in Isabelle, with further human-hinted AI interventions refining and generalizing the development.

2026-04-17T05:34:49Z Kevin Kappelmann Maximilian Schäffeler Lukas Stevens Mohammad Abdulaziz Andrei Popescu Dmitriy Traytel http://arxiv.org/abs/2606.04034v1 Climbing Up the Semantic Tower -- at Runtime 2026-06-01T21:32:40Z

Software exists at multiple levels of abstraction, where each more concrete level is an implementation of the more abstract level above, in a semantic tower of compilers and/or interpreters. First-class implementations are a reflection protocol to navigate this tower *at runtime*: they enable changing the underlying implementation of a computation *while it is running*. Key is a generalized notion of *safe points* that enable observing a computation at a higher-level than that at which it runs, and therefore to climb up the semantic tower, when at runtime most existing systems only ever allow but to go further down. The protocol was obtained by extracting the computational content of a formal specification for implementations and some of their properties. This approach reconciles two heretofore mutually exclusive fields: Semantics and Runtime Reflection.

2026-06-01T21:32:40Z Presented at Off the Beaten Track Workshop (OBT) at POPL 2018 in Los Angeles François-René Rideau http://arxiv.org/abs/2606.02854v1 Fixed-Point Scaffolding in the Clef Programming Language 2026-06-01T20:19:10Z

For fans of Gabriel's "Worse is Better" it may be ironic that C++, by way of MLIR, serves as the scaffold for compiling an ML-family language whose correctness properties are structural. A crucial intersection in our Composer compiler initiates its lowering with a fixed-point combinator that preserves the dimensional, grade, escape, and numeric-representation structure from the Program Semantic Graph. And the MLIR that's witnessed from the PSG is no passive host. Its use of static single assignment, attribute system and dialects carry that structure materially. We show that our compiler middle end uses categorical construction for lowering code with companion verification to that strata: a functor from the compilation poset to a target category, subject to the compositionality equation. The grounding of our approach comes from three sources, each on its own algebraic object: Ohori's machine-code proof theory grounds the compilation axis, parametricity grounds the content at the base, and adjoint mode logic grounds the traversal between our verification tiers. To extend the thesis we introduce compact-closed negative and fractional types, and show the type machinery can be carried with preserved structure and realized through tooling MLIR provides. More broadly, the same fixed-point primitive that preserves types through compilation also supplies proof terms that can continue to be exercised in MLIR to verify its integrity as lowering proceeds through the pipeline. We argue that this foundation is a unique additional point anticipated by our framework that includes dimensional types, Tarau's groupoid, and cellular sheaves. Throughout, the formalism is instrumented as an internal scaffold: the abstractions support the compiler's mechanics, where a developer is never required to reach for category theory in order to rely on the guarantees the compiler provides.

2026-06-01T20:19:10Z 16 pages, 2 figures Houston Haynes http://arxiv.org/abs/2606.02394v1 From Time to Space: The Impact of Linearity in Higher-Order Datalog 2026-06-01T15:43:37Z

We consider a fragment of Higher-Order Datalog with negation and argue that it generalizes the familiar and important fragment of Linear Datalog. We investigate the expressive power of this fragment, establishing a tight connection with the hierarchy of space complexity classes. In particular, we demonstrate that for all $k \ge 1$, the $(k+1)$-order fragment of Stratified Linear Higher-Order Datalog$^\neg$ captures $(k-1)$-EXPSPACE. This result suggests that restricting programs to linear recursion shifts the expressive power of the corresponding fragments from time to space, generalizing the classical result that (Stratified) Linear Datalog captures NL. Unlike the first-order setting where an ordering assumption is required to capture NL, our results hold without any such assumption on the input database. The proof relies on simulating space-bounded Turing machines using Stratified Linear Higher-Order Datalog$^\neg$ programs and providing a space-efficient evaluation of the query program. We argue that identifying such computationally well-behaved fragments is a crucial step towards paving the way for practical implementations of Higher-Order Datalog.

2026-06-01T15:43:37Z Angelos Charalambidis Babis Kostopoulos Panos Rondogiannis http://arxiv.org/abs/2606.01974v1 Toka: A Systems Programming Language with Explicit Resource Semantics 2026-06-01T09:35:41Z

Systems programming languages traditionally struggle with the tension between physical transparency and compile-time memory safety. C++ provides direct, zero-cost hardware access but lacks strict safety boundaries, whereas Rust guarantees safety at the cost of complex lifetime annotations and implicit dereferencing chains. In this paper, we present Toka, a native systems programming language that establishes physical transparency in resource management via Explicit Resource Semantics. At the core of Toka's design is the Handle-Soul Duality (informally referred to as the Hat-Soul model), which cleanly dissociates pointer identities (Handles) from their underlying values (Souls) at the syntactic level. By enforcing that bare identifiers always represent values (Souls) and explicit sigils represent pointer handles, Toka eliminates the semantic ambiguity between rebind operations and value mutations. We detail Toka's resource morphology (supporting unique, shared, borrowed, and raw semantics), its lifetime checking mechanism, and its implementation of a prototype compiler. Our evaluation demonstrates that Toka achieves competitive runtime performance and minimal binary size while drastically reducing the cognitive overhead of lifetime annotations.

2026-06-01T09:35:41Z 12 pages. A Design and Implementation Experience Report on Explicit Resource Semantics Zhonghua Yi