https://arxiv.org/api/Y5VgXz3u4OrR7mF9f40/uK7Bfmc 2026-06-21T13:40:39Z 9918 720 15 http://arxiv.org/abs/2510.25975v2 SymCode: A Neurosymbolic Approach to Mathematical Reasoning via Verifiable Code Generation 2026-01-25T05:03:33Z

Large Language Models (LLMs) often struggle with complex mathematical reasoning, where prose-based generation leads to unverified and arithmetically unsound solutions. Current prompting strategies like Chain of Thought still operate within this unreliable medium, lacking a mechanism for deterministic verification. To address these limitations, we introduce SymCode, a neurosymbolic framework that reframes mathematical problem-solving as a task of verifiable code generation using the SymPy library. We evaluate SymCode on challenging benchmarks, including MATH-500 and OlympiadBench, demonstrating significant accuracy improvements of up to 13.6 percentage points over baselines. Our analysis shows that SymCode is not only more token-efficient but also fundamentally shifts model failures from opaque logical fallacies towards transparent, programmatic errors. By grounding LLM reasoning in a deterministic symbolic engine, SymCode represents a key step towards more accurate and trustworthy AI in formal domains.

2025-10-29T21:17:57Z camera-ready EACL 2026 Findings Sina Bagheri Nezhad Yao Li Ameeta Agrawal http://arxiv.org/abs/2510.25369v2 Have a thing? Reasoning around recursion with dynamic typing in grounded arithmetic 2026-01-24T08:37:29Z

Neither the classical nor intuitionistic logic traditions are perfectly-aligned with the purpose of reasoning about computation, in that neither tradition can permit unconstrained recursive definitions without inconsistency: recursive logical definitions must normally be proven terminating before admission and use. We introduce grounded arithmetic or GA, a formal-reasoning foundation allowing direct expression of arbitrary recursive definitions. GA adjusts traditional inference rules so that terms that express nonterminating computations harmlessly denote no semantic value (i.e., bottom) instead of yielding inconsistency. Recursive functions may be proven terminating in GA essentially by "dynamically typing" terms, or equivalently, symbolically reverse-executing the computations they denote via GA's inference rules. Once recursive functions have been proven terminating, logical reasoning about their results reduce to the familiar classical rules. A mechanically-checked formal development of basic grounded arithmetic or BGA - a minimal kernel for GA - shows that BGA simultaneously exhibits the useful metalogical properties of being (a) semantically and syntactically consistent, (b) semantically complete, and (c) sufficiently powerful to express and prove the termination of arbitrary closed Turing-complete computations. This combination is impossible in classical logic due to Göel's incompleteness theorems, but our results do not contradict Gödel's theorems because BGA is paracomplete, not classical. Leveraging BGA's power of computation and reflection, we find that grounded logical operators including quantifiers are definable as non-primitive computations in BGA, despite not being included as primitives.

2025-10-29T10:40:12Z Elliot Bobrow Bryan Ford Stefan Milenkovic http://arxiv.org/abs/2601.09217v2 Relational Hoare Logic for High-Level Synthesis of Hardware Accelerators 2026-01-23T01:25:27Z

High-level synthesis (HLS) is a powerful tool for developing efficient hardware accelerators that rely on specialized memory systems to achieve sufficient on-chip data reuse and off-chip bandwidth utilization. However, even with HLS, designing such systems still requires careful manual tuning, as automatic optimizations provided by existing tools are highly sensitive to programming style and often lack transparency. To address these issues, we present a formal translation framework based on relational Hoare logic, which enables robust and transparent transformations. Our method recognizes complex memory access patterns in naïve HLS programs and automatically transforms them by inserting on-chip buffers to enforce linear access to off-chip memory, and by replacing non-sequential processing with stream processing, while preserving program semantics. Experiments using our prototype translator, combined with an off-the-shelf HLS compiler and a real FPGA board, have demonstrated significant performance improvements.

2026-01-14T06:43:43Z An extended version of the paper to appear in Proceedings of ESOP 2026 Izumi Tanaka Ken Sakayori Shinya Takamaeda-Yamazaki Naoki Kobayashi http://arxiv.org/abs/2602.00087v1 ECCO: Evidence-Driven Causal Reasoning for Compiler Optimization 2026-01-23T01:23:20Z

Compiler auto-tuning faces a dichotomy between traditional black-box search methods, which lack semantic guidance, and recent Large Language Model (LLM) approaches, which often suffer from superficial pattern matching and causal opacity. In this paper, we introduce ECCO, a framework that bridges interpretable reasoning with combinatorial search. We first propose a reverse engineering methodology to construct a Chain-of-Thought dataset, explicitly mapping static code features to verifiable performance evidence. This enables the model to learn the causal logic governing optimization decisions rather than merely imitating sequences. Leveraging this interpretable prior, we design a collaborative inference mechanism where the LLM functions as a strategist, defining optimization intents that dynamically guide the mutation operations of a genetic algorithm. Experimental results on seven datasets demonstrate that ECCO significantly outperforms the LLVM opt -O3 baseline, achieving an average 24.44% reduction in cycles.

2026-01-23T01:23:20Z Haolin Pan Lianghong Huang Jinyuan Dong Mingjie Xing Yanjun Wu http://arxiv.org/abs/2601.09986v2 Outrunning Big KATs: Efficient Decision Procedures for Variants of GKAT 2026-01-22T19:50:23Z

This paper presents several efficient decision procedures for trace equivalence of GKAT automata, which make use of on-the-fly symbolic techniques via SAT solvers. To demonstrate applicability of our algorithms, we designed symbolic derivatives for CF-GKAT, a practical system based on GKAT designed to validate control-flow transformations. We implemented the algorithms in Rust and evaluated them on both randomly generated benchmarks and real-world control-flow transformations. Indeed, we observed order-of-magnitude performance improvements against existing implementations for both KAT and CF-GKAT. Notably, our experiments also revealed a bug in Ghidra, an industry-standard decompiler, highlighting the practical viability of these systems.

2026-01-15T01:59:57Z Conditionally Accepted at ESOP 2026 Cheng Zhang Qiancheng Fu Hang Ji Ines Santacruz Del Valle Alexandra Silva Marco Gaboardi http://arxiv.org/abs/2601.16008v1 Prioritizing Configuration Relevance via Compiler-Based Refined Feature Ranking 2026-01-22T14:34:09Z

Modern programming languages, most notably Rust, offer advanced linguistic constructs for building highly configurable software systems as aggregation of features -- identified by a configuration. However, they pose substantial challenges for program analysis, optimization, and testing, as the combinatorial explosion of configurations often makes exhaustive exploration infeasible. In this manuscript, we present the first compiler-based method for prioritizing configurations. Our approach consists of four main steps: 1. extracting a tailored intermediate representation from the Rust compiler, 2. constructing two complementary graph-based data structures, 3. using centrality measures to rank features, and 4. refining the ranking by considering the extent of code they impact. A fixed number of most relevant configurations are generated based on the achieved feature ranking. The validity of the generated configurations is guaranteed by using a SAT solver that takes a representation of this graph in conjunctive normal form. We formalized this approach and implemented it in a prototype, RustyEx, by instrumenting the Rust compiler. An empirical evaluation on higher-ranked open source Rust projects shows that RustyEx efficiently generates user-specified sets of configurations within bounded resources, while ensuring soundness by construction. The results demonstrate that centrality-guided configuration prioritization enables effective and practical exploration of large configuration spaces, paving the way for future research in configuration-aware analysis and optimization.

2026-01-22T14:34:09Z 29 pages 4 figures Federico Bruzzone Walter Cazzola Luca Favini http://arxiv.org/abs/2601.12943v2 Dependently-Typed AARA: A Non-Affine Approach for Resource Analysis of Higher-Order Programs 2026-01-22T13:44:40Z

Static resource analysis determines the resource consumption (e.g., time complexity) of a program without executing it. Among the numerous existing approaches for resource analysis, affine type systems have been one dominant approach. However, these affine type systems fall short of deriving precise resource behavior of higher-order programs, particularly in cases that involve partial applications. This article presents λ_\ms{amor}^\ms{na}}, a non-affine AARA-style dependent type system for resource reasoning about higher-order functional programs. The key observation is that the main issue in previous approaches comes from (i) the close coupling of types and resources, and (ii) the conflict between affine and higher-order typing mechanisms. To derive precise resource behavior of higher-order functions, λ_\ms{amor}^\ms{na}} decouples resources from types and follows a non-affine typing mechanism. The non-affine type system of λ_\ms{amor}^\ms{na}} achieves this by using dependent types, which allows expressing type-level potential functions separate from ordinary types. This article formalizes λ_\ms{amor}^\ms{na}}'s syntax and semantics, and proves its soundness, which guarantees the correctness of resource bounds. Several challenging classic and higher-order examples are presented to demonstrate the expressiveness and compositionality of λ_\ms{amor}^\ms{na}}'s reasoning capability.

2026-01-19T10:52:08Z Han Xu Di Wang http://arxiv.org/abs/2508.17518v2 Evaluating Compiler Optimization Impacts on zkVM Performance 2026-01-22T10:25:30Z

Zero-knowledge proofs (ZKPs) are the cornerstone of programmable cryptography. They enable (1) privacy-preserving and verifiable computation across blockchains, and (2) an expanding range of off-chain applications such as credential schemes. Zero-knowledge virtual machines (zkVMs) lower the barrier by turning ZKPs into a drop-in backend for standard compilation pipelines. This lets developers write proof-generating programs in conventional languages (e.g., Rust or C++) instead of hand-crafting arithmetic circuits. However, these VMs inherit compiler infrastructures tuned for traditional architectures rather than for proof systems. In particular, standard compiler optimizations assume features that are absent in zkVMs, including cache locality, branch prediction, or instruction-level parallelism. Therefore, their impact on proof generation is questionable. We present the first systematic study of the impact of compiler optimizations on zkVMs. We evaluate 64 LLVM passes, six standard optimization levels, and an unoptimized baseline across 58 benchmarks on two RISC-V-based zkVMs (RISC Zero and SP1). While standard LLVM optimization levels do improve zkVM performance (over 40\%), their impact is far smaller than on traditional CPUs, since their decisions rely on hardware features rather than proof constraints. Guided by a fine-grained pass-level analysis, we~\emph{slightly} refine a small set of LLVM passes to be zkVM-aware, improving zkVM execution time by up to 45\% (average +4.6\% on RISC Zero, +1\% on SP1) and achieving consistent proving-time gains. Our work highlights the potential of compiler-level optimizations for zkVM performance and opens new direction for zkVM-specific passes, backends, and superoptimizers.

2025-08-24T20:51:06Z Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, 2026 Thomas Gassmann Stefanos Chaliasos Thodoris Sotiropoulos Zhendong Su 10.1145/3779212.3790159 http://arxiv.org/abs/2604.18585v1 RECURSUM: Automated Code Generation for Recurrence Relations Exceeds Expert Optimization via LayeredCodegen 2026-01-22T04:44:48Z

Automated code generation can systematically exceed expert hand-optimization for recurrence relations-computational primitives ubiquitous in orthogonal polynomials, special functions, numerical integration, and molecular integral evaluation. We present RECURSUM, a Python-based domain-specific language generating optimized C++ for arbitrary recurrence relations via three backends: template metaprogramming for compile-time evaluation, a novel LayeredCodegen backend with architectural optimizations, and runtime loop-based evaluation. The DSL uses einsum-inspired notation to specify recurrences, validity constraints, and base cases in 10-30 lines of Python, generating 650+ lines of production C++. LayeredCodegen achieves 9.8x speedup over expert hand-written implementations and 1.9x over template metaprogramming for McMurchie-Davidson Hermite coefficients. Architecture analysis reveals three quantifiable effects: (1) zero-copy output parameters eliminate return-by-value overhead (70-80% of speedup), (2) guaranteed function inlining eliminates compiler-refused overhead (15-20%), (3) exact-sized stack buffers achieve 100% cache efficiency vs 27% for MAX-sized arrays (5-10%). We validate on 24 recurrence types spanning pure mathematics (Legendre, Chebyshev, Hermite, Laguerre polynomials), numerical analysis (Clenshaw, Golub-Welsch), and quantum chemistry (McMurchie-Davidson, Rys quadrature, Boys function). Production benchmarks show speedups propagate to complete algorithms, with generated code matching expert baselines within 3.3%. RECURSUM demonstrates that systematic code generation serves as the performance ceiling for recurrence algorithms. By eliminating the dual expertise barrier (domain knowledge + C++ metaprogramming), the framework democratizes high-performance scientific computing-establishing a paradigm where automated generation systematically exceeds manual optimization.

2026-01-22T04:44:48Z 43 pages, 9 figures Rubén Darío Guerrero http://arxiv.org/abs/2601.15455v1 Remarks on Algebraic Reconstruction of Types and Effects 2026-01-21T20:50:21Z

In their 1991 paper "Algebraic Reconstruction of Types and Effects," Pierre Jouvelot and David Gifford presented a type-and-effect reconstruction algorithm based on an algebraic structure of effects. Their work is considered a milestone in the development of type-and-effect systems, and has inspired numerous subsequent works in the area of static analysis. However, unlike the later research it spawned, the original algorithm considered a language with higher-rank polymorphism, a feature which is challenging to implement correctly. In this note, we identify subtle bugs related to variable binding in their approach to this feature. We revisit their type system and reconstruction algorithm, and describe the discovered issues.

2026-01-21T20:50:21Z Patrycja Balik Szymon Jędras Piotr Polesiuk http://arxiv.org/abs/2511.23283v2 All for One and One for All: Program Logics for Exploiting Internal Determinism in Parallel Programs 2026-01-21T20:07:41Z

Nondeterminism makes parallel programs challenging to write and reason about. To avoid these challenges, researchers have developed techniques for internally deterministic parallel programming, in which the steps of a parallel computation proceed in a deterministic way. Internal determinism is useful because it lets a programmer reason about a program as if it executed in a sequential order. However, no verification framework exists to exploit this property and simplify formal reasoning about internally deterministic programs. To capture the essence of why internally deterministic programs should be easier to reason about, this paper defines a property called schedule-independent safety. A program satisfies schedule-independent safety, if, to show that the program is safe across all orderings, it suffices to show that one terminating execution of the program is safe. We then present a separation logic called Musketeer for proving that a program satisfies schedule-independent safety. Once a parallel program has been shown to satisfy schedule-independent safety, we can verify it with a new logic called Angelic, which allows one to dynamically select and verify just one sequential ordering of the program. Using Musketeer, we prove the soundness of MiniDet, an affine type system for enforcing internal determinism. MiniDet supports several core algorithmic primitives for internally deterministic programming that have been identified in the research literature, including a deterministic version of a concurrent hash set. Because any syntactically well-typed MiniDet program satisfies schedule-independent safety, we can apply Angelic to verify such programs. All results in this paper have been verified in Rocq using the Iris separation logic framework.

2025-11-28T15:37:17Z 32 pages, 26 figures, extended version of the same paper accepted at POPL 2026 Alexandre Moine Sam Westrick Joseph Tassarotti 10.1145/3776668 http://arxiv.org/abs/2508.10781v3 Generating Compilers for Qubit Mapping and Routing 2026-01-21T18:23:58Z

To evaluate a quantum circuit on a quantum processor, one must find a mapping from circuit qubits to processor qubits and plan the instruction execution while satisfying the processor's constraints. This is known as the qubit mapping and routing (QMR) problem. High-quality QMR solutions are key to maximizing the utility of scarce quantum resources and minimizing the probability of logical errors affecting computation. The challenge is that the landscape of quantum processors is incredibly diverse and fast-evolving. Given this diversity, dozens of papers have addressed the QMR problem for different qubit hardware, connectivity constraints, and quantum error correction schemes by a developing a new algorithm for a particular context. We present an alternative approach: automatically generating qubit mapping and routing compilers for arbitrary quantum processors. Though each QMR problem is different, we identify a common core structure-device state machine-that we use to formulate an abstract QMR problem. Our formulation naturally leads to a compact domain-specific language for specifying QMR problems and a powerful parametric algorithm that can be instantiated for any QMR specification. Our thorough evaluation on case studies of important QMR problems shows that generated compilers are competitive with handwritten, specialized compilers in terms of runtime and solution quality.

2025-08-14T16:07:07Z Abtin Molavi Amanda Xu Ethan Cecchetti Swamit Tannu Aws Albarghouthi 10.1145/3776720 http://arxiv.org/abs/2601.15188v1 Benchmarking Large Language Models for ABAP Code Generation: An Empirical Study on Iterative Improvement by Compiler Feedback 2026-01-21T17:06:41Z

This work investigates the performance of Large Language Models (LLMs) in generating ABAP code. Despite successful applications of generative AI in many programming languages, there are hardly any systematic analyses of ABAP code generation to date. The aim of the study is to empirically analyze to what extent various LLMs can generate syntactically correct and functional ABAP code, how effectively they use compiler feedback for iterative improvement, and which task types pose special challenges. For this purpose, a benchmark with 180 tasks is conducted, consisting of adapted HumanEval tasks and practical SAP scenarios. The results show significant performance differences between the models: more powerful LLMs achieve success rates of around 75% after several iterations and benefit greatly from compiler feedback, while smaller models perform significantly weaker. Overall, the study highlights the high potential of powerful LLMs for ABAP development processes, especially in iterative error correction.

2026-01-21T17:06:41Z 20 pages, 10 figures, Author: Hartmut Westenberger (ORCID: 0009-0009-9063-8318) Stephan Wallraven Tim Köhne Hartmut Westenberger Andreas Moser http://arxiv.org/abs/2601.15180v1 Contextual Metaprogramming for Session Types 2026-01-21T16:59:18Z

We propose the integration of staged metaprogramming into a session-typed message passing functional language. We build on a model of contextual modal type theory with multi-level contexts, where contextual values, closing arbitrary terms over a series of variables, may be boxed and transmitted in messages. Once received, one such value may then be unboxed and locally applied before being run. To motivate this integration, we present examples of real-world use cases, for which our system would be suitable, such as servers preparing and shipping code on demand via session typed messages. We present a type system that distinguishes linear (used exactly once) from unrestricted (used an unbounded number of times) resources, and further define a type checker, suitable for a concrete implementation. We show type preservation, a progress result for sequential computations and absence of runtime errors for the concurrent runtime environment, as well as the correctness of the type checker.

2026-01-21T16:59:18Z 36 pages, 14 figures, ESOP 2026 Pedro Ângelo Atsushi Igarashi Yuito Murase Vasco T. Vasconcelos http://arxiv.org/abs/2601.15167v1 DeGAS: Gradient-Based Optimization of Probabilistic Programs without Sampling 2026-01-21T16:45:10Z

We present DeGAS, a differentiable Gaussian approximate semantics for loopless probabilistic programs that enables sample-free, gradient-based optimization in models with both continuous and discrete components. DeGAS evaluates programs under a Gaussian-mixture semantics and replaces measure-zero predicates and discrete branches with a vanishing smoothing, yielding closed-form expressions for posterior and path probabilities. We prove differentiability of these quantities with respect to program parameters, enabling end-to-end optimization via standard automatic differentiation, without Monte Carlo estimators. On thirteen benchmark programs, DeGAS achieves accuracy and runtime competitive with variational inference and MCMC. Importantly, it reliably tackles optimization problems where sampling-based baselines fail to converge due to conditioning involving continuous variables.

2026-01-21T16:45:10Z Francesca Randone Romina Doz Mirco Tribastone Luca Bortolussi