https://arxiv.org/api/GmBO4JmrA9dsErKOEWSUqnho7mQ2026-06-14T03:23:30Z988522515http://arxiv.org/abs/2605.02691v1Compile-Time Java Stream Fusion via mapMulti2026-05-04T15:04:54ZThe Java Stream API, introduced in Java 8, makes data processing more expressive and concise compared to imperative loops. However, this abstraction can come with significant performance overhead, often due to the creation of multiple intermediate objects during pipeline execution. In functional languages such as Haskell, this problem is addressed through stream fusion, a compile-time optimization that eliminates unnecessary intermediate structures. Inspired by this idea, Streamliner was the first tool to perform ahead-of-time, bytecode-to-bytecode stream optimization for Java by unrolling stream pipelines into imperative loops. In this paper, we introduce an open-source optimizer that takes a different approach. Instead of unrolling streams into loops, it merges consecutive map() and filter() operations into a single mapMulti() call, available since Java 16. Our method avoids several limitations of Streamliner, including its sensitivity to escaping objects in lambda expressions and its restrictions on assigning or passing streams as variables. We evaluated our optimizer on nine benchmarks and observed superior performance in two cases and comparable results in most others. We also applied it to the bytecode of Apache Kafka, successfully executing all 31,799 unit tests without failures.2026-05-04T15:04:54ZYegor BugayenkoMaxim TrunnikovVladimir Zakharov10.1145/3814987.3814988http://arxiv.org/abs/2605.02569v1Static Type Checking for Database Access Code2026-05-04T13:19:39ZJDBC remains a key technology for database access in Java applications. Since the database dictionary and the Java type system have distinct scopes, developers inevitably need to deal with bugs in SQL-to-Java type mappings. We propose an extension of the Java compiler, based on the established Checker Framework, which allows us to bridge this gap. Our approach verifies statically that the correct Java types are used when setting prepared statement parameters or when getting values from result sets. This allows us to lift a practically important class of runtime errors to compile time. Our approach is sound and, therefore, is guaranteed not to produce false negatives. Our prototype implementation also offers a degraded mode for type-checking legacy software, if developers are only interested in a subset of errors. Our experiments show that our approach detects a wide range of type mismatches in realworld application code and can indeed prevent errors which might otherwise surface as runtime errors. From the perspective of the developer, our approach is extremely lightweight: it processes the unmodified Java code, yet developers may add their own annotations. This allows us to perform type-checking even across method boundaries, whereas commercial developer tools are restricted to local checks. Finally, we show that we can type-check real-world JDBC software with reasonable overhead during compilation.2026-05-04T13:19:39ZThomas James KirzWerner DietlMattias UlbrichStefanie Scherzingerhttp://arxiv.org/abs/2605.02362v1A uniform characterisation of the (a)synchronous must-preorder2026-05-04T09:04:03ZIn the setting of message passing software, De Nicola and Hennessy must-preorder defines when a program improves on another one. Since this preorder does not come equipped with a viable proof method, using it requires an alternative relation that characterises it. The literature presents at least four different definitions of such alternative preorders, depending on whether communication is synchronous or asynchronous and on whether there is value-passing or not. The existence of these different definitions complicates the overall theory, hinders the development of tools, and, upon the whole, suggests a lack of understanding of the properties necessary and sufficient to reason on the must-preorder. This paper presents the first alternative characterisation that works at least in all the four settings mentioned above. We achieve this result thanks to an axiomatic approach that is calculus independent, by highlighting the role of blocking and non-blocking actions, and by introducing the novel notion of label abstraction. Label abstractions capture the essence of safe substitutivity w.r.t. interactions, and they let us obtain a unique proof of soundness and a unique proof of completeness that work in all the mentioned settings. We believe this generalises and simplifies the overall theory, while letting us present the existing results in a uniform way. Our proofs are constructive and our result is entirely mechanised in Rocq.2026-05-04T09:04:03ZGiovanni BernardiUPCité, IRIFHugo FéréeUPCité, IRIFGaëtan LopezUPCité, IRIFhttp://arxiv.org/abs/2605.02233v1How to benchmark: the Measure-Explain-Test-Improve loop2026-05-04T05:08:51ZI would like to share recommendations on how to do performance benchmarks for the purpose of computer science research evaluation. Research in my field (programming language research) often involves performance considerations, but it is typically not the main tool used to evaluate our research (typically we evaluate via formal statements and their proofs, experience writing large or interesting examples, or systematic comparison of expressivity, feature set, etc.). My impression is that, as a result, we tend to not do our performance evaluation very well.
In the present document I will try to explain a methodology to do benchmarking correctly (I hope!). People with no former benchmarking experience should be able to build solid performance evaluation as part of their research. I explain the justification for each aspect along the way.2026-05-04T05:08:51Zsources available at https://codeberg.org/gasche-papers/how-to-benchmarkGabriel Schererhttp://arxiv.org/abs/2605.02113v1A Shallow Embedding of Datalog in Lean2026-05-04T00:34:11ZDatalog is a lightweight logic programming language, based on the logic of Horn clauses. Lean, on the other hand, is a proof assistant system and language based on the Calculus of Inductive Constructions (CIC). Datalog is more constrained and less expressive than Lean but has a long history of established deduction algorithms. Writing definitions and queries in the Datalog fragment of Lean would be more succinct and understandable than writing them in Lean itself.
This paper outlines the design and implementation of a shallow embedding of Datalog as a Domain Specific Language (DSL) on top of Lean. Bidirectional interoperability between the Datalog DSL and Lean is a primary goal of this design. In addition to rules and facts, backward chaining queries are automatically translated into theorems with tactic-based proofs. The paper also includes three simple examples of how the DSL can be used.2026-05-04T00:34:11ZSLE 2026 preprintRamy Shahinhttp://arxiv.org/abs/2601.13398v2Can LLMs Compress (and Decompress)? Evaluating Code Understanding and Execution via Invertibility2026-05-03T18:40:55ZLLMs demonstrate strong performance on code benchmarks, yet consistent reasoning across forward and backward execution remains elusive. We present RoundTripCodeEval (RTCE), a benchmark of four code execution reasoning tasks that evaluates round-trip consistency through execution-free, exact-match assessment of bijection fidelity across four lossless compression algorithms. We evaluate state-of-the-art Code-LLMs under zero-shot prompting, supervised fine-tuning on execution traces, and iterative self-reflection. All approaches yield only modest improvements and none closes the gap, revealing that current LLMs lack the internal coherence required for reliable bidirectional code reasoning. RTCE surfaces findings invisible to existing benchmarks: models frequently pass individual forward and backward tasks yet fail the combined round-trip, exposing mutually inconsistent internal representations; SFT and self-reflection saturate after one revision round, indicating they cannot repair fundamental algorithmic misunderstandings; and failures persist even on simple bijections such as RLE, suggesting that algorithmic complexity is not the sole root cause.\footnote{Code and dataset are available at https://github.com/Nickil21/round-trip-code-compression.2026-01-19T21:09:48ZAccepted to the Findings of ACL 2026Nickil MaveliAntonio VergariShay B. Cohenhttp://arxiv.org/abs/2604.17010v2Improving LLM Code Reasoning via Semantic Equivalence Self-Play with Formal Verification2026-05-03T12:28:34ZWe introduce a self-play framework for semantic equivalence in Haskell, utilizing formal verification to guide adversarial training between a generator and an evaluator. The framework leverages Liquid Haskell proofs for validating equivalence and execution-based counterexamples for inequivalence, organized via a difficulty-aware curriculum. To facilitate this, we release \textbf{OpInstruct-HSx}, a synthetic dataset of $\approx$28k validated Haskell programs. Empirical experiments show that our evaluator transfers effectively to downstream tasks, achieving up to 13.3pp accuracy gain on EquiBench and consistent gains on PySecDB. Ablation studies on the SEQ-SINQ regimes indicate that while inequivalence supervision provides data volume, equivalence proofs are uniquely responsible for the model's reasoning capabilities. The entire training pipeline and dataset are publicly released on GitHub and Hugging Face respectively.2026-04-18T14:43:00ZAntonio Valerio Miceli BaronePoon Tsz Nokhttp://arxiv.org/abs/2103.15776v6CHAD: Combinatory Homomorphic Automatic Differentiation2026-05-03T09:37:44ZWe introduce Combinatory Homomorphic Automatic Differentiation (CHAD), a principled, pure, provably correct define-then-run method for performing forward- and reverse-mode automatic differentiation (AD) on programming languages with expressive features. It implements AD as a compositional, type-respecting source-code transformation that generates purely functional code. This code transformation is principled in the sense that it is the unique homomorphic (structure-preserving) extension to expressive languages of Elliott's well-known and unambiguous definitions of AD for a first-order functional language. Correctness of the method follows by a compositional logical-relations argument that shows that the semantics of the syntactic derivative is the usual calculus derivative of the semantics of the original program.
In their most elegant formulation, the transformations generate code with linear types. However, the code transformations can be implemented in a standard functional language lacking linear types: while the correctness proof requires tracking linearity, the actual transformations do not. In fact, even in a standard functional language, we can get all the type safety that linear types give us: we can implement all linear types used to type the transformations as abstract types by using a basic module system.
In this paper, we detail the method when applied to a simple higher-order language for manipulating statically sized arrays. However, we explain how the methodology applies, more generally, to functional languages with other expressive features. Finally, we discuss how the scope of CHAD extends beyond applications in AD to other dynamic program analyses that accumulate data in a commutative monoid.2021-03-29T17:10:22ZarXiv admin note: substantial text overlap with arXiv:2007.05283Matthijs VákárTom Smedinghttp://arxiv.org/abs/2605.02963v1Towards Definitional Interpreters for Hoare Logics2026-05-03T06:38:25ZIntrinsic definitional interpreters, definitional interpreters that operate on typing derivations instead of abstract syntax trees, have recently been studied as a promising methodology for defining dynamic semantics of programming languages. A key benefit is that type safety interactively guides and constrains the interpreter's construction. Analogously to typing relations, Hoare logic is widely used to guarantee program correctness. Can intrinsic definitional interpreters be realized to operate over Hoare-logic derivations?
We explore this question in depth by developing definitional interpreters in Rocq for (i) a basic Hoare logic, and (ii) a realistic logic featuring heaps, dynamic-frame-based local reasoning, well-founded functions, and behavioral subtyping. Central to our approach is a novel technique we call entry-indexing, which we use to interpret total-correctness derivations and well-founded functions. Our second development yields, to our knowledge, the first formalization of a dynamic-frame-based Hoare logic with well-founded functions, behavioral subtyping, and total correctness, as well as the first fully mechanized Hoare logic with dynamic frames.2026-05-03T06:38:25ZKe SunDi WangYuyan BaoMeng WangDan Haohttp://arxiv.org/abs/2605.01560v1FlowBook: Enforcing Reproducibility in Computational Notebooks2026-05-02T18:04:01ZComputational notebooks are notoriously prone to reproducibility failures. By permitting out-of-order cell execution, notebooks accumulate hidden state and implicit dependencies that cause interactive executions to silently diverge from clean top-to-bottom runs. Prior approaches either employ dependency analyses or enforce reactive dataflow models that face fundamental tradeoffs among expressiveness, precision, and performance. This paper exploits the insight that reproducibility can be enforced without precise dependency tracking: a notebook is reproducible if and only if executing its cells in top-to-bottom order from an empty store produces exactly the outputs currently recorded. We formalize this notion of reproducibility and present FlowBook, which implements a dynamic analysis that enforces reproducibility by tracking read and write sets at cell boundaries. FlowBook detects stale cells whose recorded outputs may no longer reflect the current notebook state and prevents operations that would violate reproducibility. FlowBook incurs near-imperceptible latency overhead (median: 70 ms).2026-05-02T18:04:01ZStephen N. FreundEmery D. BergerCormac FlanaganEunice Junhttp://arxiv.org/abs/2605.02953v1DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs2026-05-02T09:59:56ZThe scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for rapidly evolving model architectures. Conversely, existing tensor compilers fail to address the complex memory hierarchy of distributed clusters effectively. To bridge this gap, we propose DITRON, a scalable tile-level compiler that democratizes high-performance distributed kernel development. DITRON introduces a novel hierarchical programming abstraction spanning Core, Device, and Task levels to map tensor programs efficiently onto heterogeneous distributed hardware. This abstraction allows DITRON to support diverse parallelism strategies while abstracting away the complexity of inter-node and intra-node communication.
Evaluated across large-scale clusters, DITRON achieves performance parity with or exceeding expert-tuned CUDA libraries, delivering speedups of $6\%-30\%$ on isolated kernels and $5\%-30\%$ on end-to-end inference in vLLM. Furthermore, DITRON demonstrates strong portability, achieving significant speedups on both NVIDIA and AMD platforms. \ours{} has been deployed at the enterprise level for both training and inference. It achieves an MFU improvement of over 10\% in training tasks, saving approximately 500,000 GPU hours of training cost per month. For inference tasks, it delivers an end-to-end gain of over 20\% and has been applied to cloud service inference and edge inference scenarios.2026-05-02T09:59:56ZSize ZhengXuegui ZhengHanshi SunQi HouWenlei BaoShiyu LiHaojie DuanmuJin FangChenli XueChenhui HuangYuanqiang LiuRenze ChenNingxin ZhengDongyang WangLi-Wen ChangLiqiang LuYun LiangJidong ZhaiXin Liuhttp://arxiv.org/abs/2510.22907v2Reinforcement Learning from Compiler and Language Server Feedback2026-05-01T23:47:55ZCoding agents fail when text-level guesses outrun program facts: they hallucinate APIs, drift to the wrong symbol, and apply edits without evidence that the workspace remains valid. Compilers, type checkers, and language servers already compute the missing supervision signal, in the form of diagnostics, symbol resolution, type information, references, and refactoring preconditions, but expose it through interfaces designed for human-driven IDEs rather than learning loops. We introduce Reinforcement Learning from Compiler and Language Server Feedback (RLCSF) together with Lanser-CLI, a CLI-first orchestration layer that exposes this signal to agents and CI. RLCSF treats each tool interaction as a transition and computes a shaped process reward from deterministic changes in diagnostics, selector confidence, and edit safety. Lanser-CLI, in turn, converts ephemeral LSP sessions into replayable Analysis Bundles with pinned environment metadata and stable content hashes. Its core mechanisms are robust selectors that go beyond file:line:col, deterministic bundle normalization, preview-first guarded mutations, and a reward functional whose potential-based component is replayable under frozen snapshots. We formalize determinism for canonical bundles and prove that componentwise-improving transitions receive non-negative reward in the undiscounted setting. Together, these pieces yield a practical substrate for process supervision of coding agents.2025-10-27T01:25:20ZProject Page: https://github.com/yifanzhang-pro/lanser-cliYifan ZhangLanser Contributorshttp://arxiv.org/abs/2605.01140v1SoCal: A Language for Memory-Layout Factorization of Recursive Datatypes2026-05-01T22:32:52ZArray-of-structures (AoS) to structure-of-arrays (SoA) is a classic compiler transformation that improves memory locality and enables data-parallel execution. Existing AoS-to-SoA transformations primarily target regular, array-based programs in imperative languages like C and C++. In contrast, many applications manipulate tree-shaped data structures, for example, ASTs in compilers, DOM trees in browsers, and k-d trees in scientific workloads. Prior work improves the performance of functional programs operating on such data by serializing algebraic datatypes (ADTs) into contiguous memory buffers. However, these representations interleave fields within a single buffer, similar to AoS layouts. We introduce factored, multi-buffer layouts that store different ADT fields in separate buffers, enabling SoA-like layouts for serialized recursive data structures. We formalize this approach in SoCal, a language for generating factored ADT representations, and implement it in a compiler called Colobus. Colobus automatically transforms functional programs to operate over a serialized, factored layout of recursive ADTs. Our evaluation shows a 1.46x geometric mean speedup on a suite of tree-processing benchmarks.2026-05-01T22:32:52ZVidush SinghalMikah KainenArtem PelenitsynMichael H. BorkowskiMike VollmerMilind Kulkarnihttp://arxiv.org/abs/2605.01124v1Practical Formal Verification for MLIR Programs2026-05-01T21:51:21ZOptimizing compilers have become a cornerstone for high-performance program generation in research and industry. Optimizations, including those implemented manually by a user and those target-specific and non-target-specific, are used to transform programs to achieve good performance. Although these optimizations are necessary for performance, assessing their correctness has remained a major challenge; the risk of incorrect code being deployed increases with unproven optimization flows.
In this work, we target the formal verification of correctness of a transformed program by computing whether a pair of programs are semantically equivalent, one being a transformed version of the other. We restrict the class of programs supported to enable a hybrid concrete-symbolic interpretation approach to equivalence, which in turn is mostly agnostic to how the programs are implemented (syntax, schedule, storage, etc.). This approach can show equivalence in linear time with respect to the operations executed by the programs. We develop a verifier for a meaningful subset of MLIR, and report on the verification of the AMD MLIR-AIR and MLIR-AIE toolchains, as well as the standard mlir-opt on hundreds of benchmarks variants.2026-05-01T21:51:21ZEmily TuckerLouis-Noël PouchetErika HunhoffStephen NeuendorfferErwei Wanghttp://arxiv.org/abs/2605.00655v1Type Theory With Erasure2026-05-01T13:38:10ZErasure enriches type theory with a distinction between runtime relevant and irrelevant data, allowing the compilation step to safely erase the latter. Versions of this feature are implemented by many systems, including Agda, Idris, and Rocq. We present a structural version of type theory with erasure, formulated as a second-order generalised algebraic theory (SOGAT). Erasure is encoded as a phase distinction between runtime and erased terms, in the form of a proposition that can appear in a context. This formulation has several advantages: it has models based on categories with families, is compatible with other structural features such as staging, and provides a better guideline for implementation. Through the model theory of SOGATs, we study the semantics of type theory with erasure in families of sets, which generalises to any Grothendieck topos equipped with a tiny proposition. We establish conservativity over Martin-Löf type theory (MLTT) in both phases. For code extraction, we construct a presheaf model that produces untyped lambda calculus programs and prove its correctness through gluing. Our results are formalised in Agda and we provide a toy elaborator implementation.2026-05-01T13:38:10ZAccepted to FSCD 2026Constantine TheocharisEdwin Brady