https://arxiv.org/api/HVB9mbfYUtFxsjUHIx4as8ilWCA2026-03-26T10:02:10Z1892413515http://arxiv.org/abs/2603.08139v1Formalizing the stability of the two Higgs doublet model potential into Lean: identifying an error in the literature2026-03-09T09:16:22ZIn 2006, using the best methods and techniques available at the time, Maniatis, von Manteuffel, Nachtmann and Nagel published a now widely cited paper on the stability of the two Higgs doublet model (2HDM) potential. Twenty years on, it is now easier to apply the process of formalization into an interactive theorem prover to this work thanks to projects like Mathlib and PhysLib (formerly PhysLean and Lean-QuantumInfo), and to ask for a higher standard of mathematical correctness. Doing so has revealed an error in the arguments of this 2006 paper, invalidating their main theorem on the stability of the 2HDM potential. This case is noteworthy because to the best of our knowledge it is the first non-trivial error in a physics paper found through formalization. It was one of the first papers where formalization was attempted, which raises the uncomfortable question of how many physics papers would not pass this higher level of scrutiny.2026-03-09T09:16:22Z11 pagesJoseph Tooby-Smithhttp://arxiv.org/abs/2506.09487v3BemaGANv2: Discriminator Combination Strategies for GAN-based Vocoders in Long-Term Audio Generation2026-03-09T08:32:07ZThis paper presents BemaGANv2, an advanced GAN-based vocoder designed for high-fidelity and long-term audio generation, with a focus on systematic evaluation of discriminator combination strategies. Long-term audio generation is critical for applications in Text-to-Music (TTM) and Text-to-Audio (TTA) systems, where maintaining temporal co- herence, prosodic consistency, and harmonic structure over extended durations remains a significant challenge. Built upon the original BemaGAN architecture, BemaGANv2 incorporates major architectural innovations by replacing traditional ResBlocks in the generator with the Anti-aliased Multi-Periodicity composition (AMP) module, which internally applies the Snake activation function to better model periodic structures. In the discriminator framework, we integrate the Multi-Envelope Discriminator (MED), a novel architecture we proposed, to extract rich temporal en- velope features crucial for periodicity detection. Coupled with the Multi-Resolution Discriminator (MRD), this com- bination enables more accurate modeling of long-range dependencies in audio. We systematically evaluate various discriminator configurations, including Multi-Scale Discriminator (MSD) + MED, MSD + MRD, and Multi-Period Discriminator (MPD) + MED + MRD, using objective metrics (Fréchet Audio Distance (FAD), Structural Similar- ity Index (SSIM), Pearson Correlation Coefficient (PCC), Mel-Cepstral Distortion (MCD), Multi-Resolution STFT (M-STFT), Periodicity error (Periodicity)) and subjective evaluations (MOS, SMOS). To support reproducibility, we provide detailed architectural descriptions, training configurations, and complete implementation details. The code, pre-trained models, and audio demo samples are available at: https://github.com/dinhoitt/BemaGANv2.2025-06-11T07:57:05ZCurrently under review at ICT Express as an extended version of our ICAIIC 2025 paperTaesoo ParkMungwi JeongMingyu ParkNarae KimJunyoung KimMujung KimJisang YooHoyun LeeSanghoon KimSoonchul Kwonhttp://arxiv.org/abs/2603.08043v1Step Automata2026-03-09T07:33:07ZFor computation, there existed Turing machine and later-matured automata theory. For low-level parallel computation, there existed variants of Turing machine, such as two-tapes Turing machine and multi-tapes Turing machine. In the literature, the combination of computation and concurrency is still active, such the combination of automata and processes, and the introduction of concurrency into automata: the so-called pomset automata and branch automata. But the linkage of Turing machine and concurrent automaton is still absent. In this paper, we propose the concepts of step automaton and step Turing machine (STM), which a natural extension to traditional automaton and classical Turing machine just allowing an automaton or Turing machine to execute a step of atomic actions (without partial orders pairwise).2026-03-09T07:33:07ZYong Wanghttp://arxiv.org/abs/2603.07771v1Apply2Isar: Automatically Converting Isabelle/HOL Apply-Style Proofs to Structured Isar2026-03-08T19:24:34ZIn Isabelle/HOL, declarative proofs written in the Isar language are widely appreciated for their readability and robustness. However, some users may prefer writing procedural "apply-style" proof scripts since they enable rapid exploration of the search space. To get the best of both worlds, we introduce Apply2Isar, a tool for Isabelle/HOL that automatically converts apply-style scripts to declarative Isar. This allows users to write complex, possibly fragile apply-style scripts, and then automatically convert them to more readable and robust declarative Isar proofs. To demonstrate the efficacy of Apply2Isar in practice, we evaluate it on a large benchmark set consisting of apply-style proofs from the Isabelle Archive of Formal Proofs.2026-03-08T19:24:34Z18 pages, 1 figureSage BinderHanna LachnittKatherine Kosaianhttp://arxiv.org/abs/2509.26197v2Demystifying Codensity Monads via Duality2026-03-08T17:11:00ZCodensity monads provide a universal method to generate complex monads from simple functors. Recently, a wide range of important monads in logic, denotational semantics, and probabilistic computation, such as several incarnations of the ultrafilter monad, the Vietoris monad, and the Giry monad, have been presented as codensity monads, using complex arguments. We propose a unifying categorical approach to codensity presentations of monads, based on the idea of relating the presenting functor to a dense functor via a suitable duality between categories. We prove a general presentation result applying to every such situation and demonstrate that most codensity presentations known in the literature emerge from this strikingly simple duality-based setup, drastically alleviating the complexity of their proofs and in many cases completely reducing them to standard duality results. Additionally, we derive a number of novel codensity presentations using our framework, including the first non-trivial codensity presentations for the filter monads on sets and topological spaces, the lower Vietoris monad on topological spaces, and the expectation monad on sets.2025-09-30T12:54:05Z43rd International Symposium on Theoretical Aspects of Computer Science (STACS 2026), pp. 65:1-65:20Fabian LenkeNico WittrockStefan MiliusHenning Urbat10.4230/LIPIcs.STACS.2026.65http://arxiv.org/abs/2511.02872v4FATE: A Formal Benchmark Series for Frontier Algebra of Multiple Difficulty Levels2026-03-08T09:01:49ZRecent advances in large language models (LLMs) have demonstrated impressive capabilities in formal theorem proving, particularly on contest-based mathematical benchmarks like the IMO. However, these contests do not reflect the depth, breadth, and abstraction of modern mathematical research. To bridge this gap, we introduce FATE (Formal Algebra Theorem Evaluation), a new benchmark series in formal algebra designed to chart a course toward advanced mathematical reasoning. We present two new components, FATE-H and FATE-X, each with 100 problems in abstract and commutative algebra. The FATE series spans a difficulty spectrum from undergraduate exercises to problems exceeding PhD qualifying exams. Notably, FATE-X is the first formal benchmark to surpass both PhD-level exam difficulty and the coverage of the Mathlib library. Our evaluations of state-of-the-art LLM provers on this new benchmark reveal a stark performance gap compared to contest math: the best model achieves only 3% (pass@64) accuracy on FATE-H and 0% on FATE-X. Our two-stage evaluation reveals that models' natural-language reasoning is notably more accurate than their ability to formalize this reasoning. We systematically classify the common errors that arise during this formalization process. Furthermore, a comparative study shows that a specialized prover can exhibit less effective reflection than general-purpose models, reducing its accuracy at the natural-language stage. We believe FATE provides a robust and challenging benchmark that establishes essential checkpoints on the path toward research-level formal mathematical reasoning.2025-11-04T03:25:17ZJiedong JiangWanyi HeYuefeng WangGuoxiong GaoYongle HuJingting WangNailin GuanPeihao WuChunbo DaiLiang XiaoBin Donghttp://arxiv.org/abs/2603.07322v1A method for the automated generation of proof exercises with comparable levels of proving complexity2026-03-07T19:48:45ZThe automated generation of exercises may substantially reduce the time educators devote to manual exercise design. A major obstacle to the integration of such automation into teaching practice, however, lies in the ability to control the difficulty of mechanically generated exercises. This paper presents a method for the automated generation of proof exercises with comparable levels of proving complexity. The method takes as input a proof exercise together with a set of rules that yield a proof of the exercise, and produces as output a set of proof exercises whose proving complexity is comparable to that of the input exercise. The approach focuses on mathematical proof exercises formulated in first-order languages, covering topics typically addressed in undergraduate Discrete Mathematics courses. We assess the proving complexity of these exercises by considering the effort required to solve them through informal proofs, and argue that this effort can be formally captured through cut-based tableau proofs that are free of logical symbols. The rules governing such proofs are obtained through a mechanizable extraction procedure introduced in this paper. By exploiting the analytic nature of these rules and the structure inherent in proofs constructed via tableau rules, we derive a computational procedure implementing the proposed method.2026-03-07T19:48:45ZJoão MendesJoão MarcosPatrick Terremattehttp://arxiv.org/abs/2603.07268v1Sketch-Oriented Databases2026-03-07T15:52:28ZThis paper introduces sketch-oriented databases, a categorical
framework that encodes database paradigms as finite-limit sketches
and individual databases and schemas as set-valued models. It
illustrates the formalism through graph-oriented paradigms such as
quivers, RDF triplestores and property graphs. It also
shows how common graph features such as labels, attributes, typing, and
paths, are uniformly captured by sketch constructions. Because paths
play an important role in queries, we propose inference rules
formalized via localizers to compute useful paths lazily; such
localizers are also useful for tasks like database type conformance.
Finally, the paper introduces stuttering sketches, whose aim is to
facilitate modular composition and scalable model growth: stuttering
sketches are finite-limit sketches in which relations are specified
by a single limit instead of two nested limits, and the paper proves
that finite unions of models of a stuttering sketch are pointwise
colimits.2026-03-07T15:52:28Z20 pages, 1 appendix,Dominique DuvalRachid Echahedhttp://arxiv.org/abs/2603.07176v1Learning to Rank the Initial Branching Order of SAT Solvers2026-03-07T12:35:36ZFinding good branching orders is key to solving SAT problems efficiently, but finding such branching orders is a difficult problem. Using a learning based approach to predict a good branching order before solving, therefore, has potential. In this paper, we investigate predicting branching orders using graph neural networks as a preprocessing step to conflict-driven clause learning (CDCL) SAT solvers. We show that there are significant gains to be made in existing CDCL SAT solvers by providing a good initial branching. Further, we provide three labeling methods to find such initial branching orders in a tractable way. Finally, we train a graph neural network to predict these branching orders and show through our evaluations that a GNN-initialized ordering yields significant speedups on random 3-CNF and pseudo-industrial benchmarks, with generalization capabilities to instances much larger than the training set. However, we also find that the predictions fail at speeding up more difficult and industrial instances. We attribute this to the solver's dynamic heuristics, which rapidly overwrite the provided initialization, and to the complexity of these instances, making GNN prediction hard.2026-03-07T12:35:36ZPublished at VerifAI-2: The Second Workshop on AI Verification in the WildArvid ErikssonKTH Royal Institute of TechnologyGabriel PoesiaKempner Institute at Harvard UniversityRoman BressonMohamed Bin Zayed University of Artificial IntelligenceKarl Henrik JohanssonKTH Royal Institute of TechnologyDavid BromanKTH Royal Institute of Technologyhttp://arxiv.org/abs/2512.10064v2Classifying covering types in homotopy type theory2026-03-07T11:05:16ZCovering spaces are a fundamental tool in algebraic topology because of the close relationship they bear with the fundamental groups of spaces. Indeed, they are in correspondence with the subgroups of the fundamental group: this is known as the Galois correspondence. In particular, the covering space corresponding to the trivial group is the universal covering, which is a "1-connected" variant of the original space, in the sense that it has the same homotopy groups, except for the first one which is trivial. In this article, we formalize this correspondence in homotopy type theory, a variant of Martin-Löf type theory in which types can be interpreted as spaces (up to homotopy). Along the way, we develop an n-dimensional generalization of covering spaces. Moreover, in order to demonstrate the applicability of our approach, we formally classify the covering of lens spaces and explain how to construct the Poincaré homology sphere.2025-12-10T20:34:41ZSamuel MimramÉmile Oleonhttp://arxiv.org/abs/2603.06974v1Elenchus: Generating Knowledge Bases from Prover-Skeptic Dialogues2026-03-07T01:26:36ZWe present Elenchus, a dialogue system for knowledge base construction grounded in inferentialist semantics, where knowledge engineering is re-conceived as explicitation rather than extraction from expert testimony or textual content. A human expert develops a bilateral position (commitments and denials) about a topic through prover-skeptic dialogue with a large language model (LLM) opponent. The LLM proposes tensions (claims that parts of the position are jointly incoherent) which the expert resolves by retraction, refinement, or contestation. The LLM thus serves as a defeasible derivability oracle whose unreliability is structurally contained by the expert's authority. Our main technical contribution is a mapping from Elenchus dialectical states to material bases in Hlobil and Brandom's NonMonotonic MultiSuccedent (NMMS) logic, satisfying Containment and enabling the elaboration of logical vocabulary that makes explicit the inferential relationships negotiated in the dialectic. We demonstrate the approach on the W3C PROV-O provenance ontology, where a single dialogue session elicits and structures design tensions that a domain expert can articulate, corresponding to decisions documented in a retrospective analysis of the ontology's design. Using pyNMMS, an automated NMMS reasoner, we verify that the structural properties of the resulting material base (nontransitivity, nonmonotonicity, and independence) correspond to specific PROV design rationales, demonstrating end-to-end integration from dialogue through formal reasoning.2026-03-07T01:26:36Z12 pages, 4 figures, 4 tablesBradley P. Allenhttp://arxiv.org/abs/2501.08947v5Taint Analysis for Graph APIs Focusing on Broken Access Control2026-03-06T23:35:06ZWe present the first systematic approach to static and dynamic taint analysis for Graph APIs focusing on broken access control. The approach comprises the following. We taint nodes of the Graph API if they represent data requiring specific privileges in order to be retrieved or manipulated, and identify API calls which are related to sources and sinks. Then, we statically analyze whether a tainted information flow between API source and sink calls occurs. To this end, we model the API calls using graph transformation rules. We subsequently use Critical Pair Analysis to automatically analyze potential dependencies between rules representing source calls and rules representing sink calls. We distinguish direct from indirect tainted information flow and argue under which conditions the Critical Pair Analysis is able to detect not only direct, but also indirect tainted flow. The static taint analysis (i) identifies flows that need to be further reviewed, since tainted nodes may be created by an API call and used or manipulated by another API call later without having the necessary privileges, and (ii) can be used to systematically design dynamic security tests for broken access control. The dynamic taint analysis checks if potential broken access control risks detected during the static taint analysis really occur. We apply the approach to a part of the GitHub GraphQL API. The application illustrates that our analysis supports the detection of two types of broken access control systematically: the case where users of the API may not be able to access or manipulate information, although they should be able to do so; and the case where users (or attackers) of the API may be able to access/manipulate information that they should not.2025-01-15T16:49:32ZLogical Methods in Computer Science, Volume 22, Issue 1 (March 10, 2026) lmcs:15080Leen LambersLucas SakizloglouTaisiya KhakharovaFernando Orejas10.46298/lmcs-22(1:18)2026http://arxiv.org/abs/2603.06849v1Twitch: Learning Abstractions for Equational Theorem Proving2026-03-06T20:15:09ZSeveral successful strategies in automated reasoning rely on human-supplied guidance about which term or clause shapes are interesting. In this paper we aim to discover interesting term shapes automatically. Specifically, we discover abstractions : term patterns that occur over and over again in relevant proofs. We present our tool Twitch which discovers abstractions with the help of Stitch, a tool originally developed for discovering reusable library functions in program synthesis tasks. Twitch can produce abstractions in two ways: (1) from a partial, failed proof of a conjecture; (2) from successful proofs of other theorems in the same domain. We have also extended Twee, an equational theorem prover, to use these abstractions. We evaluate Twitch on a set of unit equality (UEQ) problems from TPTP, and show that it can prove 12 rating-1 problems as well as yielding significant speed-ups on many other problems.2026-03-06T20:15:09Z20 pages, submitted to IJCAR 2026Guy AxelrodMoa JohanssonNicholas Smallbonehttp://arxiv.org/abs/2603.06345v1Finding Connections via Satisfiability Solving2026-03-06T14:49:22ZCommonly used proof strategies by automated reasoners organise proof search either by ordering-based saturation or by reducing goals to subgoals. In this paper, we combine these two approaches and advocate a SAT-based method with symmetry breaking for connection calculi in first-order logic, with the purpose of further pushing the automation in first-order classical logic proofs. In contrast to classical ways of reducing first-order logic to propositional logic, our method encodes the structure of the proof search itself. We present three distinct SAT encodings for connection calculi, analyse their theoretical properties, and discuss the effect of using SAT/SMT solvers on these encodings. We implemented our work in the new solver upCoP and showcase its practical feasibility.2026-03-06T14:49:22ZarXiv admin note: substantial text overlap with arXiv:2402.10610Lecture Notes in Computer Science 15980 (2025) 82-102Clemens EisenhoferMichael RawsonLaura Kovács10.1007/978-3-032-06085-3_5http://arxiv.org/abs/2603.04512v2Fusions of One-Variable First-Order Modal Logics2026-03-06T14:44:02ZWe investigate preservation results for the independent fusion of one-variable first-order modal logics. We show that, without equality, Kripke completeness and decidability of the global and local consequence relation are preserved, under both expanding and constant domain semantics. By contrast, Kripke completeness and decidability are not preserved for fusions with equality and non-rigid constants (or, equivalently, counting up to one), again for the global and local consequence and under both expanding and constant domain semantics. This result is shown by encoding Diophantine equations. Even without equality, the finite model property is only preserved in the local case. Finally, we view fusions of one-variable modal logics as fusions of propositional modal logics sharing an S5 modality and provide a general sufficient condition for transfer of Kripke completeness and decidability (but not of finite model property).2026-03-04T19:01:03ZRoman KontchakovDmitry ShkatovFrank Wolter