https://arxiv.org/api/HVB9mbfYUtFxsjUHIx4as8ilWCA 2026-03-26T10:02:10Z 18924 135 15 http://arxiv.org/abs/2603.08139v1 Formalizing the stability of the two Higgs doublet model potential into Lean: identifying an error in the literature 2026-03-09T09:16:22Z

In 2006, using the best methods and techniques available at the time, Maniatis, von Manteuffel, Nachtmann and Nagel published a now widely cited paper on the stability of the two Higgs doublet model (2HDM) potential. Twenty years on, it is now easier to apply the process of formalization into an interactive theorem prover to this work thanks to projects like Mathlib and PhysLib (formerly PhysLean and Lean-QuantumInfo), and to ask for a higher standard of mathematical correctness. Doing so has revealed an error in the arguments of this 2006 paper, invalidating their main theorem on the stability of the 2HDM potential. This case is noteworthy because to the best of our knowledge it is the first non-trivial error in a physics paper found through formalization. It was one of the first papers where formalization was attempted, which raises the uncomfortable question of how many physics papers would not pass this higher level of scrutiny.

2026-03-09T09:16:22Z 11 pages Joseph Tooby-Smith http://arxiv.org/abs/2506.09487v3 BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation 2026-03-09T08:32:07Z

This paper presents BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation, with a focus on systematic evaluation of discriminator combination strategies. Long-term audio generation is critical for applications in Text-to-Music (TTM) and Text-to-Audio (TTA) systems, where maintaining temporal co- herence, prosodic consistency, and harmonic structure over extended durations remains a significant challenge. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we proposed, to extract rich temporal en- velope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this com- bination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including Multi-Scale Discriminator (MSD) + MED, MSD + MRD, and Multi-Period Discriminator (MPD) + MED + MRD, using objective metrics (Fréchet Audio Distance (FAD), Structural Similar- ity Index (SSIM), Pearson Correlation Coefficient (PCC), Mel-Cepstral Distortion (MCD), Multi-Resolution STFT (M-STFT), Periodicity error (Periodicity)) and subjective evaluations (MOS, SMOS). To support reproducibility, we provide detailed architectural descriptions, training configurations, and complete implementation details. The code, pre-trained models, and audio demo samples are available at: https://github.com/dinhoitt/BemaGANv2.

2025-06-11T07:57:05Z Currently under review at ICT Express as an extended version of our ICAIIC 2025 paper Taesoo Park Mungwi Jeong Mingyu Park Narae Kim Junyoung Kim Mujung Kim Jisang Yoo Hoyun Lee Sanghoon Kim Soonchul Kwon http://arxiv.org/abs/2603.08043v1 Step Automata 2026-03-09T07:33:07Z

For computation, there existed Turing machine and later-matured automata theory. For low-level parallel computation, there existed variants of Turing machine, such as two-tapes Turing machine and multi-tapes Turing machine. In the literature, the combination of computation and concurrency is still active, such the combination of automata and processes, and the introduction of concurrency into automata: the so-called pomset automata and branch automata. But the linkage of Turing machine and concurrent automaton is still absent. In this paper, we propose the concepts of step automaton and step Turing machine (STM), which a natural extension to traditional automaton and classical Turing machine just allowing an automaton or Turing machine to execute a step of atomic actions (without partial orders pairwise).

2026-03-09T07:33:07Z Yong Wang http://arxiv.org/abs/2603.07771v1 Apply2Isar: Automatically Converting Isabelle/HOL Apply-Style Proofs to Structured Isar 2026-03-08T19:24:34Z

In Isabelle/HOL, declarative proofs written in the Isar language are widely appreciated for their readability and robustness. However, some users may prefer writing procedural "apply-style" proof scripts since they enable rapid exploration of the search space. To get the best of both worlds, we introduce Apply2Isar, a tool for Isabelle/HOL that automatically converts apply-style scripts to declarative Isar. This allows users to write complex, possibly fragile apply-style scripts, and then automatically convert them to more readable and robust declarative Isar proofs. To demonstrate the efficacy of Apply2Isar in practice, we evaluate it on a large benchmark set consisting of apply-style proofs from the Isabelle Archive of Formal Proofs.

2026-03-08T19:24:34Z 18 pages, 1 figure Sage Binder Hanna Lachnitt Katherine Kosaian http://arxiv.org/abs/2509.26197v2 Demystifying Codensity Monads via Duality 2026-03-08T17:11:00Z

Codensity monads provide a universal method to generate complex monads from simple functors. Recently, a wide range of important monads in logic, denotational semantics, and probabilistic computation, such as several incarnations of the ultrafilter monad, the Vietoris monad, and the Giry monad, have been presented as codensity monads, using complex arguments. We propose a unifying categorical approach to codensity presentations of monads, based on the idea of relating the presenting functor to a dense functor via a suitable duality between categories. We prove a general presentation result applying to every such situation and demonstrate that most codensity presentations known in the literature emerge from this strikingly simple duality-based setup, drastically alleviating the complexity of their proofs and in many cases completely reducing them to standard duality results. Additionally, we derive a number of novel codensity presentations using our framework, including the first non-trivial codensity presentations for the filter monads on sets and topological spaces, the lower Vietoris monad on topological spaces, and the expectation monad on sets.

2025-09-30T12:54:05Z 43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026), pp. 65:1-65:20 Fabian Lenke Nico Wittrock Stefan Milius Henning Urbat 10.4230/LIPIcs.STACS.2026.65 http://arxiv.org/abs/2511.02872v4 FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels 2026-03-08T09:01:49Z

Recent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models' natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.

2025-11-04T03:25:17Z Jiedong Jiang Wanyi He Yuefeng Wang Guoxiong Gao Yongle Hu Jingting Wang Nailin Guan Peihao Wu Chunbo Dai Liang Xiao Bin Dong http://arxiv.org/abs/2603.07322v1 A method for the automated generation of proof exercises with comparable levels of proving complexity 2026-03-07T19:48:45Z

The automated generation of exercises may substantially reduce the time educators devote to manual exercise design. A major obstacle to the integration of such automation into teaching practice, however, lies in the ability to control the difficulty of mechanically generated exercises. This paper presents a method for the automated generation of proof exercises with comparable levels of proving complexity. The method takes as input a proof exercise together with a set of rules that yield a proof of the exercise, and produces as output a set of proof exercises whose proving complexity is comparable to that of the input exercise. The approach focuses on mathematical proof exercises formulated in first-order languages, covering topics typically addressed in undergraduate Discrete Mathematics courses. We assess the proving complexity of these exercises by considering the effort required to solve them through informal proofs, and argue that this effort can be formally captured through cut-based tableau proofs that are free of logical symbols. The rules governing such proofs are obtained through a mechanizable extraction procedure introduced in this paper. By exploiting the analytic nature of these rules and the structure inherent in proofs constructed via tableau rules, we derive a computational procedure implementing the proposed method.

2026-03-07T19:48:45Z João Mendes João Marcos Patrick Terrematte http://arxiv.org/abs/2603.07268v1 Sketch-Oriented Databases 2026-03-07T15:52:28Z

This paper introduces sketch-oriented databases, a categorical framework that encodes database paradigms as finite-limit sketches and individual databases and schemas as set-valued models. It illustrates the formalism through graph-oriented paradigms such as quivers, RDF triplestores and property graphs. It also shows how common graph features such as labels, attributes, typing, and paths, are uniformly captured by sketch constructions. Because paths play an important role in queries, we propose inference rules formalized via localizers to compute useful paths lazily; such localizers are also useful for tasks like database type conformance. Finally, the paper introduces stuttering sketches, whose aim is to facilitate modular composition and scalable model growth: stuttering sketches are finite-limit sketches in which relations are specified by a single limit instead of two nested limits, and the paper proves that finite unions of models of a stuttering sketch are pointwise colimits.

2026-03-07T15:52:28Z 20 pages, 1 appendix, Dominique Duval Rachid Echahed http://arxiv.org/abs/2603.07176v1 Learning to Rank the Initial Branching Order of SAT Solvers 2026-03-07T12:35:36Z

Finding good branching orders is key to solving SAT problems efficiently, but finding such branching orders is a difficult problem. Using a learning based approach to predict a good branching order before solving, therefore, has potential. In this paper, we investigate predicting branching orders using graph neural networks as a preprocessing step to conflict-driven clause learning (CDCL) SAT solvers. We show that there are significant gains to be made in existing CDCL SAT solvers by providing a good initial branching. Further, we provide three labeling methods to find such initial branching orders in a tractable way. Finally, we train a graph neural network to predict these branching orders and show through our evaluations that a GNN-initialized ordering yields significant speedups on random 3-CNF and pseudo-industrial benchmarks, with generalization capabilities to instances much larger than the training set. However, we also find that the predictions fail at speeding up more difficult and industrial instances. We attribute this to the solver's dynamic heuristics, which rapidly overwrite the provided initialization, and to the complexity of these instances, making GNN prediction hard.

2026-03-07T12:35:36Z Published at VerifAI-2: The Second Workshop on AI Verification in the Wild Arvid Eriksson KTH Royal Institute of Technology Gabriel Poesia Kempner Institute at Harvard University Roman Bresson Mohamed Bin Zayed University of Artificial Intelligence Karl Henrik Johansson KTH Royal Institute of Technology David Broman KTH Royal Institute of Technology http://arxiv.org/abs/2512.10064v2 Classifying covering types in homotopy type theory 2026-03-07T11:05:16Z

Covering spaces are a fundamental tool in algebraic topology because of the close relationship they bear with the fundamental groups of spaces. Indeed, they are in correspondence with the subgroups of the fundamental group: this is known as the Galois correspondence. In particular, the covering space corresponding to the trivial group is the universal covering, which is a "1-connected" variant of the original space, in the sense that it has the same homotopy groups, except for the first one which is trivial. In this article, we formalize this correspondence in homotopy type theory, a variant of Martin-Löf type theory in which types can be interpreted as spaces (up to homotopy). Along the way, we develop an n-dimensional generalization of covering spaces. Moreover, in order to demonstrate the applicability of our approach, we formally classify the covering of lens spaces and explain how to construct the Poincaré homology sphere.

2025-12-10T20:34:41Z Samuel Mimram Émile Oleon http://arxiv.org/abs/2603.06974v1 Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues 2026-03-07T01:26:36Z

We present Elenchus, a dialogue system for knowledge base construction grounded in inferentialist semantics, where knowledge engineering is re-conceived as explicitation rather than extraction from expert testimony or textual content. A human expert develops a bilateral position (commitments and denials) about a topic through prover-skeptic dialogue with a large language model (LLM) opponent. The LLM proposes tensions (claims that parts of the position are jointly incoherent) which the expert resolves by retraction, refinement, or contestation. The LLM thus serves as a defeasible derivability oracle whose unreliability is structurally contained by the expert's authority. Our main technical contribution is a mapping from Elenchus dialectical states to material bases in Hlobil and Brandom's NonMonotonic MultiSuccedent (NMMS) logic, satisfying Containment and enabling the elaboration of logical vocabulary that makes explicit the inferential relationships negotiated in the dialectic. We demonstrate the approach on the W3C PROV-O provenance ontology, where a single dialogue session elicits and structures design tensions that a domain expert can articulate, corresponding to decisions documented in a retrospective analysis of the ontology's design. Using pyNMMS, an automated NMMS reasoner, we verify that the structural properties of the resulting material base (nontransitivity, nonmonotonicity, and independence) correspond to specific PROV design rationales, demonstrating end-to-end integration from dialogue through formal reasoning.

2026-03-07T01:26:36Z 12 pages, 4 figures, 4 tables Bradley P. Allen http://arxiv.org/abs/2501.08947v5 Taint Analysis for Graph APIs Focusing on Broken Access Control 2026-03-06T23:35:06Z

We present the first systematic approach to static and dynamic taint analysis for Graph APIs focusing on broken access control. The approach comprises the following. We taint nodes of the Graph API if they represent data requiring specific privileges in order to be retrieved or manipulated, and identify API calls which are related to sources and sinks. Then, we statically analyze whether a tainted information flow between API source and sink calls occurs. To this end, we model the API calls using graph transformation rules. We subsequently use Critical Pair Analysis to automatically analyze potential dependencies between rules representing source calls and rules representing sink calls. We distinguish direct from indirect tainted information flow and argue under which conditions the Critical Pair Analysis is able to detect not only direct, but also indirect tainted flow. The static taint analysis (i) identifies flows that need to be further reviewed, since tainted nodes may be created by an API call and used or manipulated by another API call later without having the necessary privileges, and (ii) can be used to systematically design dynamic security tests for broken access control. The dynamic taint analysis checks if potential broken access control risks detected during the static taint analysis really occur. We apply the approach to a part of the GitHub GraphQL API. The application illustrates that our analysis supports the detection of two types of broken access control systematically: the case where users of the API may not be able to access or manipulate information, although they should be able to do so; and the case where users (or attackers) of the API may be able to access/manipulate information that they should not.

2025-01-15T16:49:32Z Logical Methods in Computer Science, Volume 22, Issue 1 (March 10, 2026) lmcs:15080 Leen Lambers Lucas Sakizloglou Taisiya Khakharova Fernando Orejas 10.46298/lmcs-22(1:18)2026 http://arxiv.org/abs/2603.06849v1 Twitch: Learning Abstractions for Equational Theorem Proving 2026-03-06T20:15:09Z

Several successful strategies in automated reasoning rely on human-supplied guidance about which term or clause shapes are interesting. In this paper we aim to discover interesting term shapes automatically. Specifically, we discover abstractions : term patterns that occur over and over again in relevant proofs. We present our tool Twitch which discovers abstractions with the help of Stitch, a tool originally developed for discovering reusable library functions in program synthesis tasks. Twitch can produce abstractions in two ways: (1) from a partial, failed proof of a conjecture; (2) from successful proofs of other theorems in the same domain. We have also extended Twee, an equational theorem prover, to use these abstractions. We evaluate Twitch on a set of unit equality (UEQ) problems from TPTP, and show that it can prove 12 rating-1 problems as well as yielding significant speed-ups on many other problems.

2026-03-06T20:15:09Z 20 pages, submitted to IJCAR 2026 Guy Axelrod Moa Johansson Nicholas Smallbone http://arxiv.org/abs/2603.06345v1 Finding Connections via Satisfiability Solving 2026-03-06T14:49:22Z

Commonly used proof strategies by automated reasoners organise proof search either by ordering-based saturation or by reducing goals to subgoals. In this paper, we combine these two approaches and advocate a SAT-based method with symmetry breaking for connection calculi in first-order logic, with the purpose of further pushing the automation in first-order classical logic proofs. In contrast to classical ways of reducing first-order logic to propositional logic, our method encodes the structure of the proof search itself. We present three distinct SAT encodings for connection calculi, analyse their theoretical properties, and discuss the effect of using SAT/SMT solvers on these encodings. We implemented our work in the new solver upCoP and showcase its practical feasibility.

2026-03-06T14:49:22Z arXiv admin note: substantial text overlap with arXiv:2402.10610 Lecture Notes in Computer Science 15980 (2025) 82-102 Clemens Eisenhofer Michael Rawson Laura Kovács 10.1007/978-3-032-06085-3_5 http://arxiv.org/abs/2603.04512v2 Fusions of One-Variable First-Order Modal Logics 2026-03-06T14:44:02Z

We investigate preservation results for the independent fusion of one-variable first-order modal logics. We show that, without equality, Kripke completeness and decidability of the global and local consequence relation are preserved, under both expanding and constant domain semantics. By contrast, Kripke completeness and decidability are not preserved for fusions with equality and non-rigid constants (or, equivalently, counting up to one), again for the global and local consequence and under both expanding and constant domain semantics. This result is shown by encoding Diophantine equations. Even without equality, the finite model property is only preserved in the local case. Finally, we view fusions of one-variable modal logics as fusions of propositional modal logics sharing an S5 modality and provide a general sufficient condition for transfer of Kripke completeness and decidability (but not of finite model property).

2026-03-04T19:01:03Z Roman Kontchakov Dmitry Shkatov Frank Wolter