https://arxiv.org/api/oLZ+uwLA0Alj5hvmw0RBBFwIdhk2026-06-14T17:35:37Z988543515http://arxiv.org/abs/2603.29819v1Multi-paradigm Logic Programming in the ${\cal E}$rgoAI System2026-03-31T14:44:03ZErgoAI is a high level, multi-paradigm logic programming language and system developed by Coherent Knowledge Systems as an enhancement of and a successor to the popular Flora-2 system. ErgoAI is oriented towards scalable knowledge representation and reasoning, and can exploit both structured knowledge as well as knowledge derived from external sources such as vector embeddings. From the start, ErgoAI (and Flora-2 before it) were designed to exploit the well-founded semantics for reasoning in a multi-paradigm environment, including object-based logic (F-logic) with non-monotonic inheritance; higher order syntax in the style of HiLog; defeasibility of rules; semantically clean transactional updates; extensive use of subgoal delay for handling unsafe queries and for better performance; and optional support for bounded rationality at a module level. Although Flora-2 programs are compiled into XSB and adopt many Prolog features, ErgoAI is altogether a different language and system.
Under consideration in Theory and Practice of Logic Programming (TPLP).2026-03-31T14:44:03Z43 pages, 9 figures. Under consideration in Theory and Practice of Logic Programming (TPLP)Michael KiferTheresa Swifthttp://arxiv.org/abs/2603.29292v1Self-Improving Code Generation via Semantic Entropy and Behavioral Consensus2026-03-31T05:55:17ZImproving the code generation capabilities of large language models (LLMs) typically relies on supervised fine-tuning or preference optimization, both of which require costly external resources such as powerful teacher models or reliable test units. However, in real-world scenarios, it is much harder to obtain reference solutions and test oracles than problem descriptions and test inputs. In this paper, we tackle a challenging yet realistic question: Can a code language model improve itself without access to a superior teacher and a test oracle? To answer this, we propose ConSelf, a self-improving approach built upon two key ideas. First, we introduce code semantic entropy, a novel metric that measures problem-level uncertainty by assessing the functional diversity of program behaviors, enabling a curriculum construction with the most learnable problems. Second, we present consensus-driven direct preference optimization (Con-DPO), a preference-based fine-tuning method that weights each preference pair by its behavioral consensus, thereby mitigating the impact of noisy self-generated supervision. Experiments on various benchmarks and backbone LLMs demonstrate that ConSelf significantly outperforms baselines, validating the effectiveness of semantic entropy-based curriculum construction and consensus-driven optimization in improving code generation without external supervision.2026-03-31T05:55:17ZAccepted in the 34th IEEE/ACM International Conference on Program Comprehension (ICPC 2026)Huan ZhangWei ChengWei Huhttp://arxiv.org/abs/2603.22519v2LLMON: An LLM-native Markup Language to Leverage Structure and Semantics at the LLM Interface2026-03-30T20:39:09ZTextual Large Language Models (LLMs) provide a simple and familiar interface: a string of text is used for both input and output. However, the information conveyed to an LLM often has a richer structure and semantics, which is not conveyed in a string. For example, most prompts contain both instructions ("Summarize this paper into a paragraph") and data (the paper to summarize), but these are usually not distinguished when passed to the model. This can lead to model confusion and security risks, such as prompt injection attacks.
This work addresses this shortcoming by introducing an LLM-native mark-up language, LLMON (LLM Object Notation, pronounced "Lemon"), that enables the structure and semantic metadata of the text to be communicated in a natural way to an LLM. This information can then be used during model training, model prompting, and inference implementation, leading to improvements in model accuracy, safety, and security. This is analogous to how programming language types can be used for many purposes, such as static checking, code generation, dynamic checking, and IDE highlighting.
We discuss the general design requirements of an LLM-native markup language, introduce the LLMON markup language and show how it meets these design requirements, describe how the information contained in a LLMON artifact can benefit model training and inference implementation, and provide some preliminary empirical evidence of its value for both of these use cases. We also discuss broader issues and research opportunities that are enabled with an LLM-native approach.2026-03-23T19:27:35Z28 pagesMichael HindBasel ShbitaBo WuFarhan AhmedChad DeLucaNathan FultonDavid CoxDan Gutfreundhttp://arxiv.org/abs/2503.10863v22-Functoriality of Initial Semantics, and Applications2026-03-30T15:44:21ZInitial semantics aims to model inductive structures and their properties, and to provide them with recursion principles respecting these properties. An ubiquitous example is the fold operator for lists. We are concerned with initial semantics that model languages with variable binding and their substitution structure, and that provide substitution-safe recursion principles.
There are different approaches to implementing languages with variable binding depending on the choice of representation for contexts and free variables, such as unscoped syntax, or well-scoped syntax with finite or infinite contexts. Abstractly, each approach corresponds to choosing a different monoidal category to model contexts and binding, each choice yielding a different notion of "model" for the same abstract specification (or "signature").
In this work, we provide tools to compare and relate the models obtained from a signature for different choices of monoidal category. We do so by showing that initial semantics naturally has a 2-categorical structure when parametrized by the monoidal category modeling contexts. We thus can relate models obtained from different choices of monoidal categories provided the monoidal categories themselves are related. In particular, we use our results to relate the models of the different implementation -- de Bruijn vs locally nameless, finite vs infinite contexts -- , and to provide a generalized recursion principle for simply-typed syntax.2025-03-13T20:21:21ZVersion identical to the one published in ICFP 2025Proceedings of the ACM on Programming Languages 2025, Volume 9, Issue ICFP Article No.: 258, Pages 643 - 674Benedikt AhrensAmbroise LafontThomas Lamiaux10.1145/3747527http://arxiv.org/abs/2603.28326v1Towards verifying unsafe Rust programs against Rust's pointer-aliasing restrictions2026-03-30T11:54:42ZThe Rust programming language is famous for its strong ownership regime: at each point, each value is either exclusively owned, exclusively borrowed through a mutable reference, or borrowed as read-only through one or more shared references. These rules, known as Rust's pointer-aliasing rules, are exploited by the Rust compiler to generate more efficient machine code, and enforced by Rust's static type system, except inside unsafe blocks. In this paper, we present our work in progress towards the first program logic for modularly verifying that Rust programs that use unsafe blocks comply with the pointer-aliasing rules.2026-03-30T11:54:42Z8 pages, 9 figuresWannes TasBart Jacobshttp://arxiv.org/abs/2603.28274v1Statistics 101, 201, and 202: Three Shiny Apps for Teaching Probability Distributions, Inferential Statistics, and Simple Linear Regression2026-03-30T10:56:59ZStatistics 101, 201, and 202 are three open-source interactive web applications built with R \citep{R} and Shiny \citep{shiny} to support the teaching of introductory statistics and probability. The apps help students carry out common statistical computations -- computing probabilities from standard probability distributions, constructing confidence intervals, conducting hypothesis tests, and fitting simple linear regression models -- without requiring prior knowledge of R or any other programming language. Each app provides numerical results, plots rendered with \texttt{ggplot2} \citep{ggplot2}, and inline mathematical derivations typeset with MathJax \citep{cervone2012mathjax}, so that computation and statistical reasoning appear side by side in a single interface. The suite is organised around a broad pedagogical progression: Statistics~101 introduces probability distributions and their properties; Statistics~201 addresses confidence intervals and hypothesis tests; and Statistics~202 covers the simple linear model. All three apps are freely accessible online and their source code is released under a CC-BY-4.0 license.2026-03-30T10:56:59Z6 pages, 0 figureAntoine Soeteweyhttp://arxiv.org/abs/2603.25810v2ExVerus: Verus Proof Repair via Counterexample Reasoning2026-03-30T05:32:19ZLarge Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.2026-03-26T18:14:34Z31 pages, 8 figuresJun YangYuechun SunYi WuRodrigo CaridadYongwei YuanJianan YaoShan LuKexin Peihttp://arxiv.org/abs/2603.28002v1Superset Decompilation2026-03-30T03:47:19ZReverse engineering tools remain monolithic and imperative compared to the advancement of modern compiler architectures: analyses are tied to a single mutable representation, making them difficult to extend or refine, and forcing premature choices between soundness and precision. We observe that decompilation is the reverse of compilation and can be structured as a sequence of modular passes, each performing a granular and clearly defined interpretation of the binary at a progressively higher level of abstraction. We formalize this as provenance-guided superset decompilation (PGSD), a framework that monotonically derives facts about the binary into a relation store. Instead of committing early to a single interpretation, the pipeline retains ambiguous interpretations as parallel candidates with provenance, deferring resolution until the final selection phase. Manifold implements PGSD as a declarative reverse engineering framework that lifts Linux ELF binaries to C99 through a granular intermediate representation in ~35K lines of Rust and Datalog. On GNU coreutils, Manifold's output quality matches Ghidra, IDA Pro, angr, and RetDec on multiple metrics while producing fewer compiler errors, and generalizes across compilers and optimization levels.2026-03-30T03:47:19ZChang LiuYihao SunThomas GilrayKristopher Micinskihttp://arxiv.org/abs/2603.27691v1The Case for Multi-Version Experimental Evaluation (MVEE)2026-03-29T13:32:42ZIn the database community, we typically evaluate new methods based on experimental results, which we produce by integrating the proposed method along with a set of baselines in a single benchmarking codebase and measuring the individual runtimes. If we are unhappy with the performance of our method, we gradually improve it while repeatedly comparing to the baselines, until we outperform them. While this seems like a reasonable approach, it makes one delicate assumption: We assume that across the optimization workflow, there exists only a single compiled version of each baseline to compare to. However, we learned the hard way that in practice, even though the source code remains untouched, general purpose compilers might still generate highly different compiled code across builds, caused by seemingly unrelated changes in other parts of the codebase, leading to flawed comparisons and evaluations. To tackle this problem, we propose the concept of Multi-Version Experimental Evaluation (MVEE). MVEE automatically and transparently analyzes subsequent builds on the assembly code level for occurring "build anomalies" and materializes them as new versions of the methods. As a consequence, all observed versions of the respective methods can be included in the experimental evaluation, highly increasing its quality and overall expressiveness.2026-03-29T13:32:42ZSimon JörzFelix Schuhknechthttp://arxiv.org/abs/2604.00043v1DriftScript: A Domain-Specific Language for Programming Non-Axiomatic Reasoning Agents2026-03-29T12:38:06ZNon-Axiomatic Reasoning Systems (NARS) provide a framework for building adaptive agents that operate under insufficient knowledge and resources. However, the standard input language, Narsese, poses a usability barrier: its dense symbolic notation, overloaded punctuation, and implicit conventions make programs difficult to read, write, and maintain. We present DriftScript, a Lisp-like domain-specific language that compiles to Narsese. DriftScript provides source-level constructs covering the major sentence and term forms used in Non-Axiomatic Logic (NAL) levels 1 through 8, including inheritance, temporal implication, variable quantification, sequential conjunction, and operation invocation, while replacing symbolic syntax with readable keyword-based S-expressions. The compiler is a zero-dependency, four-stage pipeline implemented in 1,941 lines of C99. When used with the DriftNARS engine, DriftScript programs connect to external systems through four structured callback types and an HTTP operation registry, enabling a sense-reason-act loop for autonomous agents. We describe the language design and formal grammar, detail the compiler architecture, and evaluate the compiler through a 106-case test suite, equivalence testing against hand-written Narsese, a NAL coverage analysis, structural readability metrics, and compilation benchmarks. The source code is available at https://github.com/seamus-brady/DriftNARS. This paper focuses on the design and implementation of the DriftScript language and its embedding into DriftNARS, rather than on new inference algorithms for NARS itself.2026-03-29T12:38:06ZSeamus Bradyhttp://arxiv.org/abs/2404.17297v3Denotation-based Compositional Compiler Verification2026-03-29T10:01:32ZA desired but challenging property of compiler verification is compositionality, in the sense that the compilation correctness of a program can be deduced incrementally from that of its substructures ranging from statements, functions, and modules. This article proposes a novel compiler verification framework based on denotational semantics for better compositionality, compared to previous approaches based on small-step operational semantics and simulation theories. Our denotational semantics is defined by semantic functions that map a syntactic component to a semantic domain composed of multiple behavioral \emph{sets}, with compiler correctness established through behavior refinement between the semantic domains of the source and target programs. The main contributions of this article include proposing a denotational semantics for open modules, a novel semantic linking operator, and a refinement algebra that unifies various behavior refinements, making compiler verification structured and compositional. Furthermore, our formalization captures the full meaning of a program and bridges the gap between traditional power-domain-based denotational semantics and the practical needs of compiler verification. We apply our denotation-based framework to verify the front-end of CompCert and typical optimizations on simple prototypes of imperative languages. Our results demonstrate that the compositionality from sub-statements to statements, from functions to modules, and from modules to the whole program can be effectively achieved.2024-04-26T10:06:06Z65 pages; 21 figures. Accepted manuscript of an article published in ACM Transactions on Programming Languages and Systems (TOPLAS). Artifact: https://github.com/chaptercheng/denotational-framework-for-compcertACM Trans. Program. Lang. Syst. 48, 1, Article 5 (2026), 1-65Zhang ChengJiyang WuDi WangQinxiang Cao10.1145/3797874http://arxiv.org/abs/2308.01554v2Taming the Hydra: Targeted Control-Flow Transformations for Dynamic Symbolic Execution2026-03-28T17:25:59ZDynamic Symbolic Execution (DSE) suffers from the path explosion problem when the target program has many conditional branches. The classical approach for managing the path explosion problem is dynamic state merging. Dynamic state merging combines similar symbolic program states to avoid the exponential growth in the number of states during DSE. However, state merging still requires solver invocations at each program branch, even when both paths of the branch are feasible. Moreover, the best path search strategy for DSE may not create the best state merging opportunities. Some drawbacks of state merging can be mitigated by compile-time state merging (i.e., branch elimination by converting control-flow into dataflow). In this paper, we propose a non-semantics-preserving but failure-preserving compiler transformation for removing expensive symbolic branches in a program to improve the scalability of DSE. We have developed a framework for detecting spurious bugs that our transformation can insert. Finally, we show that our transformation can significantly improve the performance of DSE on various benchmark programs and help improve the performance of coverage and bug discovery of large real-world programs.2023-08-03T06:34:34ZCharitha SaumyaMuhammad HassanRohan GangarajuMilind KulkarniKirshanthan Sundararajah10.1145/3798202http://arxiv.org/abs/2603.27302v1Folding the Heighway dragon curve2026-03-28T15:15:29ZThe Heighway dragon curve is one of the most known fractal curves. There are two ways to construct the curve: repeatedly make a copy of the current curve, rotate it by 90 degrees, and connect them; or repeatedly replace each straight segment in the curve by two segments with a right angle. A natural question is how do we prove the equivalence of the two approaches?
We generalise the construction of the curve to allow rotations to both sides. It then turns out that the two approaches are respectively a foldr and a foldl, and the key property for proving their equivalence, using the second duality theorem, is the distributivity of an "interleave" operator.2026-03-28T15:15:29ZTing-Wu ChangLiang-Ting ChenShin-Cheng Muhttp://arxiv.org/abs/2603.27277v1Codebase-Memory: Tree-Sitter-Based Knowledge Graphs for LLM Code Exploration via MCP2026-03-28T14:18:12ZLarge Language Model (LLM) coding agents typically explore codebases through repeated file-reading and grep-searching, consuming thousands of tokens per query without structural understanding. We present Codebase-Memory, an open-source system that constructs a persistent, Tree-Sitter-based knowledge graph via the Model Context Protocol (MCP), parsing 66 languages through a multi-phase pipeline with parallel worker pools, call-graph traversal, impact analysis, and community discovery. Evaluated across 31 real-world repositories, Codebase-Memory achieves 83% answer quality versus 92% for a file-exploration agent, at ten times fewer tokens and 2.1 times fewer tool calls. For graph-native queries such as hub detection and caller ranking, it matches or exceeds the explorer on 19 of 31 languages.2026-03-28T14:18:12Z10 pages, 5 authors, preprintMartin VogelFalk Meyer-EschenbachSeverin KohlerElias GrünewaldFelix Balzerhttp://arxiv.org/abs/2603.27202v1Sal: Multi-modal Verification of Replicated Data Types2026-03-28T08:57:30ZDesigning correct replicated data types (RDTs) is challenging because replicas evolve independently and must be merged while preserving application intent. A promising approach is correct-by-construction development in a proof-oriented programming language such as F*, Dafny and Lean, where desired correctness guarantees are specified and checked as the RDTs are implemented. Recent work Neem proposes the use of replication-aware linearizability (RA linearizability) as the correctness condition for state-based CRDTs and mergeable replicated data types (MRDTs), with automation in the SMT-aided, proof-oriented programming language F*. However, SMT-centric workflows can be opaque when automation fails to discharge a verification condition (VC), and they enlarge the trusted computing base (TCB).
We present Sal, a multi-modal workflow to design and verify state-based CRDTs and MRDTs in Lean. Sal combines (i) kernel-checkable automation with proof reconstruction, (ii) SMT-aided automation when needed, and (iii) AI-assisted interactive theorem proving for remaining proof obligations. When a verification condition is shown to be invalid, we leverage Lean's property-based testing to automatically generate and visualize counterexamples, helping developers debug incorrect specifications or implementations. We report on our experience verifying a suite of 13 CRDTs and MRDTs with Sal: 69% of verification conditions are discharged by kernel-verified automation without SMT, and counterexamples automatically expose subtle bugs such as the well-known enable-wins flag anomaly. The codebase for Sal is open-sourced, and is available at \href{https://github.com/fplaunchpad/sal}{https://github.com/fplaunchpad/sal}2026-03-28T08:57:30ZPranav RameshVimala SoundarapandianKC Sivaramakrishnan