https://arxiv.org/api/Ee4VbddI9lKEIBGXSbf+UGrlLoM 2026-03-20T10:28:53Z 18901 15 15 http://arxiv.org/abs/2603.17457v1 Synthetic Differential Geometry in Lean 2026-03-18T07:57:05Z

This article is about the formalization of synthetic differential geometry with the Lean proof assistant and the mathematical library mathlib. The main result we prove and formalize is a Taylor theorem for functions of several variables, where the series expansion is around an infinitesimal neighborhood. Most of our proofs are in fact new. Our investigations highlight the possibility of using mathlib to do constructive mathematics.

2026-03-18T07:57:05Z Riccardo Brasca IMJ-PRG, UPCité Gabriella Clemente IRIF, UPCité http://arxiv.org/abs/2508.11972v2 Filling in the semantics for intuitionistic conditional logic 2026-03-18T05:36:18Z

We prove completeness results for a wide variety of intuitionistic conditional logics. We do so by first using a canonical model construction obtain completeness with respect to descriptive conditional frames, and then introducing the fill-in method to transfer this to classes of conditional frames without extra structure. The fill-in method closes the gap between descriptive conditional frames, which do not have a canonical underlying frame, and conditional frames.

2025-08-16T08:21:48Z Brendan Dufty Jim de Groot http://arxiv.org/abs/2603.17244v1 Graph-Native Cognitive Memory for AI Agents: Formal Belief Revision Semantics for Versioned Memory Architectures 2026-03-18T00:59:49Z

While individual components for AI agent memory exist in prior systems, their architectural synthesis and formal grounding remain underexplored. We present Kumiho, a graph-native cognitive memory architecture grounded in formal belief revision semantics. The structural primitives required for cognitive memory -- immutable revisions, mutable tag pointers, typed dependency edges, URI-based addressing -- are identical to those required for managing agent-produced work as versionable assets, enabling a unified graph-native architecture that serves both purposes. The central formal contribution is a correspondence between the AGM belief revision framework and the operational semantics of a property graph memory system, proving satisfaction of the basic AGM postulates (K*2--K*6) and Hansson's belief base postulates (Relevance, Core-Retainment). The architecture implements a dual-store model (Redis working memory, Neo4j long-term graph) with hybrid fulltext and vector retrieval. On LoCoMo (token-level F1), Kumiho achieves 0.565 overall F1 (n=1,986) including 97.5% adversarial refusal accuracy. On LoCoMo-Plus, a Level-2 cognitive memory benchmark testing implicit constraint recall, Kumiho achieves 93.3% judge accuracy (n=401); independent reproduction by the benchmark authors yielded results in the mid-80% range, still substantially outperforming all published baselines (best: Gemini 2.5 Pro, 45.7%). Three architectural innovations drive the results: prospective indexing (LLM-generated future-scenario implications indexed at write time), event extraction (structured causal events preserved in summaries), and client-side LLM reranking. The architecture is model-decoupled: switching the answer model from GPT-4o-mini (~88%) to GPT-4o (93.3%) improves end-to-end accuracy without pipeline changes, at a total evaluation cost of ~$14 for 401 entries.

2026-03-18T00:59:49Z 56 pages, 1 figure Young Bin Park http://arxiv.org/abs/2602.11202v2 interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors 2026-03-17T18:20:33Z

Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification increasingly important for ensuring correctness. Existing approaches either verify only the final answer, which misses early errors, or rely on branch-and-verify strategies that explore multiple trajectories at substantially higher compute cost. We introduce interwhen, a single-trajectory verification framework that steers model behavior by providing feedback on intermediate verifiable properties. Our method addresses two key challenges. First, extracting intermediate solutions from a reasoning trace typically requires prompt engineering or external task decomposition into fixed steps, which can constrain the model's reasoning strategy. Instead, we periodically poll the reasoning trace and fork inference to recover intermediate solutions without imposing any predefined structure. Second, frequent verifier calls can increase latency; we address this by running verifiers asynchronously and interrupting the main trajectory only when an error is detected, leaving generation unaffected otherwise. This design improves both reliability and efficiency, and naturally supports early stopping based on consistency over recent intermediate solutions. Across benchmarks in code generation and arithmetic, logical and spatial reasoning, interwhen improves accuracy by up to 15 percentage points over standard chain-of-thought execution while staying within 1.5x of token compute cost. Moreover, on every dataset, interwhen achieves a Pareto-optimal operating point between accuracy and efficiency compared to existing test-time verification methods. Code is available at https://github.com/microsoft/interwhen.

2026-02-05T08:35:01Z 50 pages, 7 figures Vishak K Bhat Prateek Chanda Vijval Ekbote Ashmit Khandelwal Maitreyi Swaroop Vineeth N. Balasubramanian Subbarao Kambhampati Nagarajan Natarajan Amit Sharma http://arxiv.org/abs/2603.16802v1 Computability of the Hahn-Banach Theorem Revisited 2026-03-17T17:06:10Z

Computational properties of the Hahn-Banach theorem have been studied in computable, constructive and reverse mathematics and in all these approaches the theorem is equivalent to weak Kőnig's lemma. Gherardi and Marcone proved that this is also true in the uniform sense of Weihrauch complexity. However, their result requires the underlying space to be variable. We prove that the Hahn-Banach theorem attains its full complexity already for the Banach space $\ell^1$. We also prove that the one-step Hahn-Banach theorem for this space is Weihrauch equivalent to the intermediate value theorem. This also yields a new and very simple proof of the reduction of the Hahn-Banach theorem to weak Kőnig's lemma using infinite products. Finally, we show that the Hahn-Banach theorem for $\ell^1$ in the two-dimensional case is Weihrauch equivalent to the lesser limited principle of omniscience.

2026-03-17T17:06:10Z Vasco Brattka Christopher Sorg http://arxiv.org/abs/2509.22493v2 Ontological foundations for contrastive explanatory narration of robot plans 2026-03-17T16:37:39Z

Mutual understanding of artificial agents' decisions is key to ensuring a trustworthy and successful human-robot interaction. Hence, robots are expected to make reasonable decisions and communicate them to humans when needed. In this article, the focus is on an approach to modeling and reasoning about the comparison of two competing plans, so that robots can later explain the divergent result. First, a novel ontological model is proposed to formalize and reason about the differences between competing plans, enabling the classification of the most appropriate one (e.g., the shortest, the safest, the closest to human preferences, etc.). This work also investigates the limitations of a baseline algorithm for ontology-based explanatory narration. To address these limitations, a novel algorithm is presented, leveraging divergent knowledge between plans and facilitating the construction of contrastive narratives. Through empirical evaluation, it is observed that the explanations excel beyond the baseline method.

2025-09-26T15:37:47Z Information Sciences, 123280 (2026) Alberto Olivares-Alarcos Sergi Foix Júlia Borràs Gerard Canal Guillem Alenyà 10.1016/j.ins.2026.123280 http://arxiv.org/abs/2603.16983v1 Formal verification of tree-based machine learning models for lateral spreading 2026-03-17T16:27:51Z

Machine learning models for geotechnical hazard prediction can achieve high accuracy while learning physically inconsistent relationships from sparse or biased training data. Current remedies (post-hoc explainability, such as SHAP and LIME, and training-time constraints) either diagnose individual predictions approximately or restrict model capacity without providing exhaustive guarantees. This paper encodes trained tree ensembles as logical formulas in a Satisfiability Modulo Theories (SMT) solver and checks physical specifications across the entire input domain, not just sampled points. Four geotechnical specifications (water table depth, PGA monotonicity, distance safety, and flat-ground safety) are formalized as decidable logical formulas and verified via SMT against both XGBoost ensembles and Explainable Boosting Machines (EBMs) trained on the 2011 Christchurch earthquake lateral spreading dataset (7,291 sites, four features). The SMT solver either produces a concrete counterexample where a specification fails or proves that no violation exists. The unconstrained EBM (80.1% accuracy) violates all four specifications. A fully constrained EBM (67.2%) satisfies three of four specifications, demonstrating that iterative constraint application guided by verification can progressively improve physical consistency. A Pareto analysis of 33 model variants reveals a persistent trade-off, as none of the variants studied achieve both greater than 80% accuracy and full compliance with the specified set. SHAP analysis of specification counterexamples shows that the offending feature can rank last, demonstrating that post-hoc explanations do not substitute for formal verification. These results establish a verify-fix-verify engineering loop and a formal certification for deploying physically consistent ML models in safety-critical geotechnical applications.

2026-03-17T16:27:51Z Krishna Kumar http://arxiv.org/abs/2308.09481v5 Types, equations, dimensions and the Pi theorem 2026-03-17T11:58:04Z

The languages of mathematical physics and modelling are endowed with a rich ``grammar of dimensions'' that common abstractions of programming languages fail to represent. We propose a dependently typed domain-specific language (embedded in Idris) that captures this grammar. We apply it to formalize basic notions of dimensional analysis: those of dimension function, physical quantity, homomorphic measurement, the covariance principle and Buckingham's Pi theorem. We hope that the language makes mathematical physics more accessible to computer scientists and functional programming more palatable to modellers and physicists.

2023-08-16T14:33:18Z Nicola Botta Patrik Jansson http://arxiv.org/abs/2603.16375v1 Monoidal categories graded by partial commutative monoids 2026-03-17T11:05:47Z

Effectful categories have two classes of morphisms: pure morphisms, which form a monoidal category; and effectful morphisms, which can only be combined monoidally with central morphisms (such as the pure ones), forming a premonoidal category. This suggests seeing morphisms of an effectful category as carrying a grade that combines under the monoidal product in a partially defined manner. We axiomatize this idea with the notion of monoidal category graded by a partial commutative monoid (PCM). Monoidal categories arise as the special case of grading by the singleton PCM, and effectful categories arise from grading by a two-element PCM. Further examples include grading by powerset PCMs, modelling non-interfering parallelism for programs accessing shared resources, and grading by intervals, modelling bounded resource usage. We show that effectful categories form a coreflective subcategory of PCM-graded monoidal categories; introduce cartesian structure, recovering Freyd categories; and describe PCM-graded monoidal categories as monoids by viewing a PCM as a thin promonoidal category.

2026-03-17T11:05:47Z Matthew Earnshaw Chad Nester Mario Román http://arxiv.org/abs/2603.16308v1 Three-Dimensional Affine Spatial Logics 2026-03-17T09:43:30Z

We focus on a branch of region-based spatial logics dealing with affine geometry. The research on this topic is scarce: only a handful of papers investigate such systems, mostly in the case of the real plane. Our long-term goal is to analyse certain family of affine logics with inclusion and convexity as primitives interpreted over real spaces of increasing dimensionality. In this article we show that logics of different dimensionalities must have different theories, thus justifying further work on different dimensions. We then focus on the three-dimensional case, exploring the expressiveness of this logic and consequently showing that it is possible to construct formulas describing a three-dimensional coordinate frame. The final result, making use of the high expressive power of this logic, is that every region satisfies an affine complete formula, meaning that all regions satisfying it are affine equivalent.

2026-03-17T09:43:30Z Adam Trybus http://arxiv.org/abs/2504.08923v2 A convergence law for continuous logic and continuous structures with finite domains 2026-03-17T09:38:20Z

We consider continuous relational structures with finite domain $[n] := \{1, \ldots, n\}$ and a many valued logic, $CLA$, with values in the unit interval and which uses continuous connectives and continuous aggregation functions. $CLA$ subsumes first-order logic on ``conventional'' finite structures. To each relation symbol $R$ and identity constraint $ic$ on a tuple the length of which matches the arity of $R$ we associate a continuous probability density function $μ_R^{ic} : [0, 1] \to [0, \infty)$. We also consider a probability distribution on the set $\mathbf{W}_n$ of continuous structures with domain $[n]$ which is such that for every relation symbol $R$, identity constraint $ic$, and tuple $\bar{a}$ satisfying $ic$, the distribution of the value of $R(\bar{a})$ is given by $μ_R^{ic}$, independently of the values for other relation symbols or other tuples. In this setting we prove that every formula in $CLA$ is asymptotically equivalent to a formula without any aggregation function. This is used to prove a convergence law for $CLA$ which reads as follows for formulas without free variables: If $\varphi \in CLA$ has no free variable and $I \subseteq [0, 1]$ is an interval, then there is $α\in [0, 1]$ such that, as $n$ tends to infinity, the probability that the value of $\varphi$ is in $I$ tends to $α$.

2025-04-11T19:08:38Z Vera Koponen http://arxiv.org/abs/2505.23135v3 VERINA: Benchmarking Verifiable Code Generation 2026-03-16T23:40:56Z

Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation -- jointly generating code, specifications, and proofs of code-specification alignment -- offers a promising path to address this limitation and further unleash LLMs' benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often focus on only individual components rather than providing a holistic evaluation framework of all tasks. In this paper, we introduce VERINA (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. VERINA consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o3, achieves a 72.6\% code correctness rate, 52.3\% for specification soundness and completeness, and a mere 4.9\% proof success rate (based on one trial per task). We hope VERINA will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on https://huggingface.co/datasets/sunblaze-ucb/verina and our evaluation code on https://github.com/sunblaze-ucb/verina.

2025-05-29T06:12:52Z Zhe Ye Zhengxu Yan Jingxuan He Timothe Kasriel Kaiyu Yang Dawn Song http://arxiv.org/abs/2603.15876v1 A Non-Binary Method for Finding Interpolants: Theory and Practice 2026-03-16T20:14:14Z

We describe a new method of finding interpolants for classical logic using certain refutation system as a starting point. Refutation can be thought of as an alternative approach to the analysis of formal systems: instead of focusing on which formulas provably belong to a given logic, it shows which formulas are to be rejected. Thus, it provides a mirror proof system. As it turns out, the benefits of such an approach go well beyond the area of refutation calculi themselves. We provide one such example in the shape of an interpolant-searching method. To be sure, a number of such methods are already in use. The novelty of our proposal lies in the fact that it can be considered as based on a non-binary version of resolution.

2026-03-16T20:14:14Z Adam Trybus Karolina Rożko Tomasz Skura http://arxiv.org/abs/2603.15770v1 Formalization of QFT 2026-03-16T18:01:52Z

A foundational result in constructive quantum field theory is the construction of the free bosonic quantum field theory in four-dimensional Euclidean spacetime and the proof that it satisfies the Glimm-Jaffe axioms, a variant of the Osterwalder-Schrader axioms. We present a formalization of this result in the Lean 4 interactive theorem prover. The project is intended as a proof of concept that extended arguments in mathematical physics can be translated into machine-checked proofs using existing AI tools. We begin by introducing interactive theorem proving and constructive quantum field theory, then describe our formalization and the design decisions that shaped it. We also explain the methods we used, including coding assistants, and conclude by considering how AI assisted formalization may influence the future of theoretical physics. Our original release assumed three results, Minlos' theorem, the nuclear property of Schwartz space, and Goursat's theorem. In subsequent releases from our group and from contributors from the Lean community, these assumptions have been proven (or avoided), so that the OS/GJ axioms are now proven using only Lean and its library Mathlib.

2026-03-16T18:01:52Z 35 pages, 1 figure Michael R. Douglas Sarah Hoback Anna Mei Ron Nissim http://arxiv.org/abs/2603.15559v1 Probabilistic Model Checking Taken by Storm 2026-03-16T17:22:21Z

This tutorial paper presents a hands-on perspective on probabilistic model checking with the Storm model checker. Storm is a decade-old model checker that excels in performance and a rich Python-based ecosystem, which makes it easy to integrate in various workflows. This tutorial focuses on Markov decision processes (MDP), which are popular in a variety of fields. It demonstrates the basic workflow, from Python-based modeling, model checking with a variety of properties, to the extraction of policies. Further, it showcases the support for recent topics that focus on different types of uncertainty, such as interval MDP and POMDP, and the ability to quickly implement simple algorithms on top of existing data structures.

2026-03-16T17:22:21Z Matthias Volk Linus Heck Sebastian Junges Joost-Pieter Katoen Tim Quatmann