https://arxiv.org/api/Tr7bx4nvd+EyCHzoDynib9+dAPQ2026-03-24T11:08:49Z189209015http://arxiv.org/abs/2603.12744v1TaoBench: Do Automated Theorem Prover LLMs Generalize Beyond MathLib?2026-03-13T07:39:47ZAutomated theorem proving (ATP) benchmarks largely consist of problems formalized in MathLib, so current ATP training and evaluation are heavily biased toward MathLib's definitional framework. However, frontier mathematics is often exploratory and prototype-heavy, relying on bespoke constructions that deviate from standard libraries. In this work, we evaluate the robustness of current ATP systems when applied to a novel definitional framework, specifically examining the performance gap between standard library problems and bespoke mathematical constructions. We introduce TaoBench, an undergraduate-level benchmark derived from Terence Tao's Analysis I, which formalizes analysis by constructing core mathematical concepts from scratch, without relying on standard Mathlib definitions, as well as by mixing from-scratch and MathLib constructions. For fair evaluation, we build an agentic pipeline that automatically extracts a compilable, self-contained local environment for each problem. To isolate the effect of definitional frameworks, we additionally translate every problem into a mathematically equivalent Mathlib formulation, yielding paired TaoBench-Mathlib statements for direct comparison. While state-of-the-art ATP models perform capably within the MathLib framework, performance drops by an average of roughly 26% on the definitionally equivalent Tao formulation. This indicates that the main bottleneck is limited generalization across definitional frameworks rather than task difficulty. TaoBench thus highlights a gap between benchmark performance and applicability, and provides a concrete foundation for developing and testing provers better aligned with research mathematics.2026-03-13T07:39:47ZAlexander K TaylorJunyi ZhangEthan JiVigyan SahaiHaikang DengYuanzhou ChenYifan YuanDi WuJia-Chen GuKai-Wei ChangNanyun PengAmit SahaiWei Wanghttp://arxiv.org/abs/2603.12232v1Incremental Neural Network Verification via Learned Conflicts2026-03-12T17:52:12ZNeural network verification is often used as a core component within larger analysis procedures, which generate sequences of closely related verification queries over the same network. In existing neural network verifiers, each query is typically solved independently, and information learned during previous runs is discarded, leading to repeated exploration of the same infeasible regions of the search space. In this work, we aim to expedite verification by reducing this redundancy. We propose an incremental verification technique that reuses learned conflicts across related verification queries. The technique can be added on top of any branch-and-bound-based neural network verifier. During verification, the verifier records conflicts corresponding to learned infeasible combinations of activation phases, and retains them across runs. We formalize a refinement relation between verification queries and show that conflicts learned for a query remain valid under refinement, enabling sound conflict inheritance. Inherited conflicts are handled using a SAT solver to perform consistency checks and propagation, allowing infeasible subproblems to be detected and pruned early during search. We implement the proposed technique in the Marabou verifier and evaluate it on three verification tasks: local robustness radius determination, verification with input splitting, and minimal sufficient feature set extraction. Our experiments show that incremental conflict reuse reduces verification effort and yields speedups of up to $1.9\times$ over a non-incremental baseline.2026-03-12T17:52:12ZRaya ElsalehLiam DavisHaoze WuGuy Katzhttp://arxiv.org/abs/2603.12171v1When do modal definability and preservation theorems transfer to the finite?2026-03-12T17:04:26ZWe study which classic modal definability and preservation results survive when attention is restricted to finite structures, where many first-order transfer theorems are known to break down. Several semantic characterizations for modal formula classes survive the passage to the finite, while a number of first-order preservation theorems for basic frame operations fail. Our main positive result is that the Bisimulation Safety Theorem does transfer to finite structures. We also discuss computability aspects, and analogues in the finite for the Goldblatt-Thomason theorem and for modal correspondence theory.2026-03-12T17:04:26ZJohan van BenthemBalder ten CateXi Yanghttp://arxiv.org/abs/2405.21025v4On Reduction and Synthesis of Petri's Cycloids2026-03-12T16:00:19ZCycloids are particular Petri nets for modelling processes of actions and events, belonging to the fundaments of Petri's general systems theory. Defined by four parameters they provide an algebraic formalism to describe strongly synchronized sequential processes. To further investigate their structure, reduction systems of cycloids are defined in the style of rewriting systems and properties of irreducible cycloids are proved. In particular the synthesis of cycloid parameters from their Petri net structure is derived, leading to an efficient method for a decision procedure for cycloid isomorphism.2024-05-31T17:13:44ZRüdiger ValkDaniel Moldthttp://arxiv.org/abs/2503.08530v2A Probabilistic Choreography Language for PRISM2026-03-12T15:36:51ZWe present a choreographic framework for modelling and
analysing concurrent probabilistic systems based on the PRISM
model-checker. This is achieved through the development of a
choreography language, which is a specification language that allows
to describe the desired interactions within a concurrent system from
a global viewpoint. Using choreographies gives a clear and complete
view of system interactions, making it easier to understand the
process flow and identify potential errors, which helps ensure
correct execution and improves system reliability. We equip our
language with a probabilistic semantics and then define a formal
encoding into the PRISM language and discuss its
correctness. Properties of programs written in our choreographic
language can be model-checked by the PRISM model-checker via their
translation into the PRISM language. Finally, we implement a
compiler for our language and demonstrate its practical
applicability via examples drawn from the use cases featured in the
PRISM website.2025-03-11T15:20:18ZMarco CarboneAdele Veschettihttp://arxiv.org/abs/2510.19444v3A Foundational Theory of Quantitative Abstraction: Adjunctions, Duality, and Logic for Probabilistic Systems2026-03-12T14:52:17ZThe analysis and control of stochastic dynamical systems rely on probabilistic models such as (continuous-space) Markov decision processes, but large or continuous state spaces make exact analysis intractable and call for principled quantitative abstraction. This work develops a unified theory of such abstraction by integrating category theory, coalgebra, quantitative logic, and optimal transport, centred on a canonical $\varepsilon$-quotient of the behavioral pseudo-metric with a universal property: among all abstractions that collapse behavioral differences below $\varepsilon$, it is the most detailed, and every other abstraction achieving the same discounted value-loss guarantee factors uniquely through it. Categorically, a quotient functor $Q_\varepsilon$ from a category of probabilistic systems to a category of metric specifications admits, via the Special Adjoint Functor Theorem, a right adjoint $R_\varepsilon$, yielding an adjunction $Q_\varepsilon \dashv R_\varepsilon$ that formalizes a duality between abstraction and realization; logically, a quantitative modal $μ$-calculus with separate reward and transition modalities is shown, for a broad class of systems, to be expressively complete for the behavioral pseudo-metric, with a countable fully abstract fragment suitable for computation. The theory is developed coalgebraically over Polish spaces and the Giry monad and validated on finite-state models using optimal-transport solvers, with experiments corroborating the predicted contraction properties and structural stability and aligning with the theoretical value-loss bounds, thereby providing a rigorous foundation for quantitative state abstraction and representation learning in probabilistic domains.2025-10-22T10:16:24ZSome major mathematical errors that we need to rectify. We cannot specify exact error areas as they are spread throughout. The theorems need further developmentNivar AnwerGeorgia Institute of Technology, USAEzequiel López-RubioUniversity of Málaga, Spain and IBIMA Plataforma BIONAND, SpainDavid ElizondoDe Montfort University, United KingdomRafael M. Luque-BaenaUniversity of Málaga, Spain and IBIMA Plataforma BIONAND, Spainhttp://arxiv.org/abs/2506.08585v3k-Planar and Fan-Crossing Drawings and Transductions of Embeddable Graphs2026-03-12T13:39:59ZWe introduce, for every surface Σ, a two-way connection between FO transductions (first-order logical transformations) of the graphs embeddable in Σ and a certain variant of fan-crossing drawings of graphs in Σ. If the target graphs drawn in Σ are additionally of bounded maximum degree, then the restriction on drawings is simply to have a bounded number of crossings per edge (such as being k-planar for fixed k if Σ is the plane). For graph classes, this connection allows us to derive non-transducibility results from nonexistence of the said drawings and, conversely, from nonexistence of a transduction to derive nonexistence of the said drawings. For example, the class of 3D-grids is not k-planar for any fixed k. We hope that this connection will help to draw a path to a possible proof that not all toroidal graphs are transducible from planar graphs.
The result is based on a very recent characterization of weakly sparse FO transductions of classes of bounded expansion by [Gajarský, Gładkowski, Jedelský, Pilipczuk and Toruńczyk, arXiv:2505.15655].2025-06-10T08:54:08ZAlso correcting mistakenly omitted condition of the k-fold k-clustered fan-crossing drawings to be "monotone" (Definition 2 in both versions)Petr HliněnýJan Jedelskýhttp://arxiv.org/abs/2603.11908v1Witnesses for Fixpoint Games on Lattices2026-03-12T13:24:37ZWe construct witnesses that can be used to derive strategies in fixpoint games and provide proof that the least fixpoint of a function is either above or not below some given bound. We rely on a lattice-theoretical approach, including a Galois connection that connects a lattice representing the "logic universe", where the witness lives, with another lattice representing the "behaviour universe", over which the function is defined. In fact we consider two types of games -- primal and dual games -- and in both cases show how to derive winning strategies in the game from witnesses and construct witnesses from strategies. The two games differ wrt. their rules and the choice of basis of the lattice.
The theory can be instantiated to well-known examples: in particular we compare with the construction of distinguishing formulas in standard bisimilarity and behavioural metrics for probabilistic systems. As a new case study we consider witnesses for certifying lower bounds for the termination probability for Markov chains.2026-03-12T13:24:37Z28 pagesBarbara KönigKarla Messinghttp://arxiv.org/abs/2505.17350v2{log}: From a Constraint Logic Programming Language to a Formal Verification Tool2026-03-12T12:32:00Z{log} (read 'setlog') was born as a Constraint Logic Programming (CLP) language where sets and binary relations are first-class citizens, thus fostering set programming. Internally, {log} is a constraint satisfiability solver implementing decision procedures for several fragments of set theory. Hence, {log} can be used as a declarative, set, logic programming language and as an automated theorem prover for set theory. Over time {log} has been extended with some components integrated to the satisfiability solver thus providing a formal verification environment. In this paper we make a comprehensive presentation of this environment which includes a language for the description of state machines based on set theory, an interactive environment for the execution of functional scenarios over state machines, a generator of verification conditions for state machines, automated verification of state machines, and test case generation. State machines are both, programs and specifications; exactly the same code works as a program and as its specification. In this way, with a few additions, a CLP language turned into a seamlessly integrated programming and automated proof system.2025-05-23T00:01:32ZUnder consideration in Theory and Practice of Logic Programming (TPLP)Maximiliano CristiáAlfredo CapozuccaGianfranco Rossihttp://arxiv.org/abs/2309.17022v4Positionality in Σ_0^2 and a completeness result2026-03-12T10:15:02ZWe study the existence of positional strategies for the protagonist in infinite duration games over arbitrary game graphs. We prove that prefix-independent objectives in Σ_0^2 which are positional and admit a (strongly) neutral letter are exactly those that are recognised by history-deterministic monotone co-Büchi automata over countable ordinals. This generalises a criterion proposed by [Kopczyński, ICALP 2006] and gives an alternative proof of closure under union for these objectives, which was known from [Ohlmann, TheoretiCS 2023].
We then give two applications of our result. First, we prove that the mean-payoff objective is positional over arbitrary game graphs. Second, we establish the following completeness result: for any objective W which is prefix-independent, admits a (weakly) neutral letter, and is positional over finite game graphs, there is an objective W' which is equivalent to W over finite game graphs and positional over arbitrary game graphs.2023-09-29T07:15:01ZPierre OhlmannMichał Skrzypczakhttp://arxiv.org/abs/2603.09975v2d-DNNF Modulo Theories: A General Framework for Polytime SMT Queries2026-03-12T09:34:09ZIn Knowledge Compilation (KC) a propositional knowledge base is compiled off-line into some target form, typically into deterministic decomposable negation normal form (d-DNNF) or one of its subcases, which is then used on-line to answer a large number of queries in polytime, such as clausal entailment, model counting, and others. The general idea is to push as much of the computational effort into the off-line compilation phase, which is amortized over all on-line polytime queries.
In this paper, we present for the first time a novel and general technique to leverage d-DNNF compilation and querying to SMT level. Intuitively, before d-DNNF compilation, the input SMT formula is combined with a list of pre-computed ad-hoc theory lemmas, so that the queries at SMT level reduce to those at propositional level. This approach has several features: (i) it works for every theory, or theory combination thereof; (ii) it works for all forms of d-DNNF; (iii) it is easy to implement on top of any d-DNNF compiler and any theory-lemma enumerator, which are used as black boxes; (iv) most importantly, these compiled SMT d-DNNFs can be queried in polytime by means of a standard propositional d-DNNF reasoner. We have implemented a tool on top of state-of-the-art d-DNNF packages and of the MathSAT SMT solver. Some preliminary empirical evaluation supports the effectiveness of the approach.2026-03-10T17:59:31ZGabriele MasinaEmanuale CiviniMassimo MicheluttiGiuseppe SpallittaRoberto Sebastianihttp://arxiv.org/abs/2603.03668v2Can LLM Aid in Solving Constraints with Inductive Definitions?2026-03-12T08:30:51ZSolving constraints involving inductive (aka recursive) definitions is challenging. State-of-the-art SMT/CHC solvers and first-order logic provers provide only limited support for solving such constraints, especially when they involve, e.g., abstract data types. In this work, we leverage structured prompts to elicit Large Language Models (LLMs) to generate auxiliary lemmas that are necessary for reasoning about these inductive definitions. We further propose a neuro-symbolic approach, which synergistically integrates LLMs with constraint solvers: the LLM iteratively generates conjectures, while the solver checks their validity and usefulness for proving the goal. We evaluate our approach on a diverse benchmark suite comprising constraints originating from algebrai data types and recurrence relations. The experimental results show that our approach can improve the state-of-the-art SMT and CHC solvers, solving considerably more (around 25%) proof tasks involving inductive definitions, demonstrating its efficacy.2026-03-04T02:48:27ZAccepted by the 27th Symposium on Formal Methods (FM 2026)Weizhi FengShidong ShenJiaxiang LiuTaolue ChenFu SongZhilin Wuhttp://arxiv.org/abs/2402.05854v4Slightly Non-Linear Higher-Order Tree Transducers2026-03-11T23:24:20ZWe investigate the tree-to-tree functions computed by "affine $λ$-transducers": tree automata whose memory consists of an affine $λ$-term instead of a finite state. They can be seen as variations on Gallot, Lemay and Salvati's Linear High-Order Deterministic Tree Transducers.
When the memory is almost purely affine ($\textit{à la}$ Kanazawa), we show that these machines can be translated to tree-walking transducers (and with a purely affine memory, we get a reversible tree-walking transducer). This leads to a proof of an inexpressivity conjecture of Nguyên and Pradic on "implicit automata" in an affine $λ$-calculus. We also prove that a more powerful variant, extended with preprocessing by an MSO relabeling and allowing a limited amount of non-linearity, is equivalent in expressive power to Engelfriet, Hoogeboom and Samwel's invisible pebble tree transducers.
The key technical tool in our proofs is the Interaction Abstract Machine (IAM), an operational avatar of Girard's geometry of interaction, a semantics of linear logic. We work with ad-hoc specializations to $λ$-terms of low exponential depth of a tree-generating version of the IAM.2024-02-08T17:38:29ZLê Thành Dũng NguyênGabriele Vanonihttp://arxiv.org/abs/2603.10936v1A Formalization of Abstract Rewriting in Agda2026-03-11T16:21:54ZWe present a constructive formalization of Abstract Rewriting Systems (ARS) in the Agda proof assistant, focusing on standard results in term rewriting. We define a taxonomy of concepts related to termination and confluence and investigate the relationships between them and their classical counterparts. We identify, and eliminate where possible, the use of classical logic in the proofs of standard ARS results. Our analysis leads to refinements and mild generalizations of classical termination and confluence criteria. We investigate logical relationships between several notions of termination, arising from different formulations of the concept of a well-founded relation. We illustrate general applicability of our ARS development with an example formalization of the lambda calculus.2026-03-11T16:21:54ZSam ArkleAndrew Polonskyhttp://arxiv.org/abs/2404.10616v9One is all you need: Second-order Unification without First-order Variables2026-03-11T13:36:19ZWe introduce a fragment of second-order unification, referred to as \emph{Second-Order Ground Unification (SOGU)}, with the following properties: (i) only one second-order variable is allowed, and (ii) first-order variables do not occur. We study an equational variant of SOGU where the signature contains \textit{associative} binary function symbols (ASOGU) and show that Hilbert's 10$^{th}$ problem is reducible to ASOGU unifiability, thus proving undecidability. Our reduction provides a new lower bound for the undecidability of second-order unification, as previous results required first-order variable occurrences, multiple second-order variables, and/or equational theories involving \textit{length-reducing} rewrite systems. Furthermore, our reduction holds even in the case when associativity of the binary function symbol is restricted to \emph{power associative}, i.e. f(f(x,x),x)= f(x,f(x,x)), as our construction requires a single constant.2024-04-16T14:41:35ZAccepted at LMCSDavid M. CernaJulian Parsert