https://arxiv.org/api/GmBO4JmrA9dsErKOEWSUqnho7mQ 2026-06-14T03:23:30Z 9885 225 15 http://arxiv.org/abs/2605.02691v1 Compile-Time Java Stream Fusion via mapMulti 2026-05-04T15:04:54Z

The Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.

2026-05-04T15:04:54Z Yegor Bugayenko Maxim Trunnikov Vladimir Zakharov 10.1145/3814987.3814988 http://arxiv.org/abs/2605.02569v1 Static Type Checking for Database Access Code 2026-05-04T13:19:39Z

JDBC remains a key technology for database access in Java applications. Since the database dictionary and the Java type system have distinct scopes, developers inevitably need to deal with bugs in SQL-to-Java type mappings. We propose an extension of the Java compiler, based on the established Checker Framework, which allows us to bridge this gap. Our approach verifies statically that the correct Java types are used when setting prepared statement parameters or when getting values from result sets. This allows us to lift a practically important class of runtime errors to compile time. Our approach is sound and, therefore, is guaranteed not to produce false negatives. Our prototype implementation also offers a degraded mode for type-checking legacy software, if developers are only interested in a subset of errors. Our experiments show that our approach detects a wide range of type mismatches in realworld application code and can indeed prevent errors which might otherwise surface as runtime errors. From the perspective of the developer, our approach is extremely lightweight: it processes the unmodified Java code, yet developers may add their own annotations. This allows us to perform type-checking even across method boundaries, whereas commercial developer tools are restricted to local checks. Finally, we show that we can type-check real-world JDBC software with reasonable overhead during compilation.

2026-05-04T13:19:39Z Thomas James Kirz Werner Dietl Mattias Ulbrich Stefanie Scherzinger http://arxiv.org/abs/2605.02362v1 A uniform characterisation of the (a)synchronous must-preorder 2026-05-04T09:04:03Z

In the setting of message passing software, De Nicola and Hennessy must-preorder defines when a program improves on another one. Since this preorder does not come equipped with a viable proof method, using it requires an alternative relation that characterises it. The literature presents at least four different definitions of such alternative preorders, depending on whether communication is synchronous or asynchronous and on whether there is value-passing or not. The existence of these different definitions complicates the overall theory, hinders the development of tools, and, upon the whole, suggests a lack of understanding of the properties necessary and sufficient to reason on the must-preorder. This paper presents the first alternative characterisation that works at least in all the four settings mentioned above. We achieve this result thanks to an axiomatic approach that is calculus independent, by highlighting the role of blocking and non-blocking actions, and by introducing the novel notion of label abstraction. Label abstractions capture the essence of safe substitutivity w.r.t. interactions, and they let us obtain a unique proof of soundness and a unique proof of completeness that work in all the mentioned settings. We believe this generalises and simplifies the overall theory, while letting us present the existing results in a uniform way. Our proofs are constructive and our result is entirely mechanised in Rocq.

2026-05-04T09:04:03Z Giovanni Bernardi UPCité, IRIF Hugo Férée UPCité, IRIF Gaëtan Lopez UPCité, IRIF http://arxiv.org/abs/2605.02233v1 How to benchmark: the Measure-Explain-Test-Improve loop 2026-05-04T05:08:51Z

I would like to share recommendations on how to do performance benchmarks for the purpose of computer science research evaluation. Research in my field (programming language research) often involves performance considerations, but it is typically not the main tool used to evaluate our research (typically we evaluate via formal statements and their proofs, experience writing large or interesting examples, or systematic comparison of expressivity, feature set, etc.). My impression is that, as a result, we tend to not do our performance evaluation very well. In the present document I will try to explain a methodology to do benchmarking correctly (I hope!). People with no former benchmarking experience should be able to build solid performance evaluation as part of their research. I explain the justification for each aspect along the way.

2026-05-04T05:08:51Z sources available at https://codeberg.org/gasche-papers/how-to-benchmark Gabriel Scherer http://arxiv.org/abs/2605.02113v1 A Shallow Embedding of Datalog in Lean 2026-05-04T00:34:11Z

Datalog is a lightweight logic programming language, based on the logic of Horn clauses. Lean, on the other hand, is a proof assistant system and language based on the Calculus of Inductive Constructions (CIC). Datalog is more constrained and less expressive than Lean but has a long history of established deduction algorithms. Writing definitions and queries in the Datalog fragment of Lean would be more succinct and understandable than writing them in Lean itself. This paper outlines the design and implementation of a shallow embedding of Datalog as a Domain Specific Language (DSL) on top of Lean. Bidirectional interoperability between the Datalog DSL and Lean is a primary goal of this design. In addition to rules and facts, backward chaining queries are automatically translated into theorems with tactic-based proofs. The paper also includes three simple examples of how the DSL can be used.

2026-05-04T00:34:11Z SLE 2026 preprint Ramy Shahin http://arxiv.org/abs/2601.13398v2 Can LLMs Compress (and Decompress)? Evaluating Code Understanding and Execution via Invertibility 2026-05-03T18:40:55Z

LLMs demonstrate strong performance on code benchmarks, yet consistent reasoning across forward and backward execution remains elusive. We present RoundTripCodeEval (RTCE), a benchmark of four code execution reasoning tasks that evaluates round-trip consistency through execution-free, exact-match assessment of bijection fidelity across four lossless compression algorithms. We evaluate state-of-the-art Code-LLMs under zero-shot prompting, supervised fine-tuning on execution traces, and iterative self-reflection. All approaches yield only modest improvements and none closes the gap, revealing that current LLMs lack the internal coherence required for reliable bidirectional code reasoning. RTCE surfaces findings invisible to existing benchmarks: models frequently pass individual forward and backward tasks yet fail the combined round-trip, exposing mutually inconsistent internal representations; SFT and self-reflection saturate after one revision round, indicating they cannot repair fundamental algorithmic misunderstandings; and failures persist even on simple bijections such as RLE, suggesting that algorithmic complexity is not the sole root cause.\footnote{Code and dataset are available at https://github.com/Nickil21/round-trip-code-compression.

2026-01-19T21:09:48Z Accepted to the Findings of ACL 2026 Nickil Maveli Antonio Vergari Shay B. Cohen http://arxiv.org/abs/2604.17010v2 Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification 2026-05-03T12:28:34Z

We introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release \textbf{OpInstruct-HSx}, a synthetic dataset of $\approx$28k validated Haskell programs. Empirical experiments show that our evaluator transfers effectively to downstream tasks, achieving up to 13.3pp accuracy gain on EquiBench and consistent gains on PySecDB. Ablation studies on the SEQ-SINQ regimes indicate that while inequivalence supervision provides data volume, equivalence proofs are uniquely responsible for the model's reasoning capabilities. The entire training pipeline and dataset are publicly released on GitHub and Hugging Face respectively.

2026-04-18T14:43:00Z Antonio Valerio Miceli Barone Poon Tsz Nok http://arxiv.org/abs/2103.15776v6 CHAD: Combinatory Homomorphic Automatic Differentiation 2026-05-03T09:37:44Z

We introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward- and reverse-mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure-preserving) extension to expressive languages of Elliott's well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a compositional logical-relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program. In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: while the correctness proof requires tracking linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all the type safety that linear types give us: we can implement all linear types used to type the transformations as abstract types by using a basic module system. In this paper, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.

2021-03-29T17:10:22Z arXiv admin note: substantial text overlap with arXiv:2007.05283 Matthijs Vákár Tom Smeding http://arxiv.org/abs/2605.02963v1 Towards Definitional Interpreters for Hoare Logics 2026-05-03T06:38:25Z

Intrinsic definitional interpreters, definitional interpreters that operate on typing derivations instead of abstract syntax trees, have recently been studied as a promising methodology for defining dynamic semantics of programming languages. A key benefit is that type safety interactively guides and constrains the interpreter's construction. Analogously to typing relations, Hoare logic is widely used to guarantee program correctness. Can intrinsic definitional interpreters be realized to operate over Hoare-logic derivations? We explore this question in depth by developing definitional interpreters in Rocq for (i) a basic Hoare logic, and (ii) a realistic logic featuring heaps, dynamic-frame-based local reasoning, well-founded functions, and behavioral subtyping. Central to our approach is a novel technique we call entry-indexing, which we use to interpret total-correctness derivations and well-founded functions. Our second development yields, to our knowledge, the first formalization of a dynamic-frame-based Hoare logic with well-founded functions, behavioral subtyping, and total correctness, as well as the first fully mechanized Hoare logic with dynamic frames.

2026-05-03T06:38:25Z Ke Sun Di Wang Yuyan Bao Meng Wang Dan Hao http://arxiv.org/abs/2605.01560v1 FlowBook: Enforcing Reproducibility in Computational Notebooks 2026-05-02T18:04:01Z

Computational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).

2026-05-02T18:04:01Z Stephen N. Freund Emery D. Berger Cormac Flanagan Eunice Jun http://arxiv.org/abs/2605.02953v1 DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs 2026-05-02T09:59:56Z

The scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for rapidly evolving model architectures. Conversely, existing tensor compilers fail to address the complex memory hierarchy of distributed clusters effectively. To bridge this gap, we propose DITRON, a scalable tile-level compiler that democratizes high-performance distributed kernel development. DITRON introduces a novel hierarchical programming abstraction spanning Core, Device, and Task levels to map tensor programs efficiently onto heterogeneous distributed hardware. This abstraction allows DITRON to support diverse parallelism strategies while abstracting away the complexity of inter-node and intra-node communication. Evaluated across large-scale clusters, DITRON achieves performance parity with or exceeding expert-tuned CUDA libraries, delivering speedups of $6\%-30\%$ on isolated kernels and $5\%-30\%$ on end-to-end inference in vLLM. Furthermore, DITRON demonstrates strong portability, achieving significant speedups on both NVIDIA and AMD platforms. \ours{} has been deployed at the enterprise level for both training and inference. It achieves an MFU improvement of over 10\% in training tasks, saving approximately 500,000 GPU hours of training cost per month. For inference tasks, it delivers an end-to-end gain of over 20\% and has been applied to cloud service inference and edge inference scenarios.

2026-05-02T09:59:56Z Size Zheng Xuegui Zheng Hanshi Sun Qi Hou Wenlei Bao Shiyu Li Haojie Duanmu Jin Fang Chenli Xue Chenhui Huang Yuanqiang Liu Renze Chen Ningxin Zheng Dongyang Wang Li-Wen Chang Liqiang Lu Yun Liang Jidong Zhai Xin Liu http://arxiv.org/abs/2510.22907v2 Reinforcement Learning from Compiler and Language Server Feedback 2026-05-01T23:47:55Z

Coding agents fail when text-level guesses outrun program facts: they hallucinate APIs, drift to the wrong symbol, and apply edits without evidence that the workspace remains valid. Compilers, type checkers, and language servers already compute the missing supervision signal, in the form of diagnostics, symbol resolution, type information, references, and refactoring preconditions, but expose it through interfaces designed for human-driven IDEs rather than learning loops. We introduce Reinforcement Learning from Compiler and Language Server Feedback (RLCSF) together with Lanser-CLI, a CLI-first orchestration layer that exposes this signal to agents and CI. RLCSF treats each tool interaction as a transition and computes a shaped process reward from deterministic changes in diagnostics, selector confidence, and edit safety. Lanser-CLI, in turn, converts ephemeral LSP sessions into replayable Analysis Bundles with pinned environment metadata and stable content hashes. Its core mechanisms are robust selectors that go beyond file:line:col, deterministic bundle normalization, preview-first guarded mutations, and a reward functional whose potential-based component is replayable under frozen snapshots. We formalize determinism for canonical bundles and prove that componentwise-improving transitions receive non-negative reward in the undiscounted setting. Together, these pieces yield a practical substrate for process supervision of coding agents.

2025-10-27T01:25:20Z Project Page: https://github.com/yifanzhang-pro/lanser-cli Yifan Zhang Lanser Contributors http://arxiv.org/abs/2605.01140v1 SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes 2026-05-01T22:32:52Z

Array-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.

2026-05-01T22:32:52Z Vidush Singhal Mikah Kainen Artem Pelenitsyn Michael H. Borkowski Mike Vollmer Milind Kulkarni http://arxiv.org/abs/2605.01124v1 Practical Formal Verification for MLIR Programs 2026-05-01T21:51:21Z

Optimizing compilers have become a cornerstone for high-performance program generation in research and industry. Optimizations, including those implemented manually by a user and those target-specific and non-target-specific, are used to transform programs to achieve good performance. Although these optimizations are necessary for performance, assessing their correctness has remained a major challenge; the risk of incorrect code being deployed increases with unproven optimization flows. In this work, we target the formal verification of correctness of a transformed program by computing whether a pair of programs are semantically equivalent, one being a transformed version of the other. We restrict the class of programs supported to enable a hybrid concrete-symbolic interpretation approach to equivalence, which in turn is mostly agnostic to how the programs are implemented (syntax, schedule, storage, etc.). This approach can show equivalence in linear time with respect to the operations executed by the programs. We develop a verifier for a meaningful subset of MLIR, and report on the verification of the AMD MLIR-AIR and MLIR-AIE toolchains, as well as the standard mlir-opt on hundreds of benchmarks variants.

2026-05-01T21:51:21Z Emily Tucker Louis-Noël Pouchet Erika Hunhoff Stephen Neuendorffer Erwei Wang http://arxiv.org/abs/2605.00655v1 Type Theory With Erasure 2026-05-01T13:38:10Z

Erasure enriches type theory with a distinction between runtime relevant and irrelevant data, allowing the compilation step to safely erase the latter. Versions of this feature are implemented by many systems, including Agda, Idris, and Rocq. We present a structural version of type theory with erasure, formulated as a second-order generalised algebraic theory (SOGAT). Erasure is encoded as a phase distinction between runtime and erased terms, in the form of a proposition that can appear in a context. This formulation has several advantages: it has models based on categories with families, is compatible with other structural features such as staging, and provides a better guideline for implementation. Through the model theory of SOGATs, we study the semantics of type theory with erasure in families of sets, which generalises to any Grothendieck topos equipped with a tiny proposition. We establish conservativity over Martin-Löf type theory (MLTT) in both phases. For code extraction, we construct a presheaf model that produces untyped lambda calculus programs and prove its correctness through gluing. Our results are formalised in Agda and we provide a toy elaborator implementation.

2026-05-01T13:38:10Z Accepted to FSCD 2026 Constantine Theocharis Edwin Brady