https://arxiv.org/api/C7F5ysXTeylZFJmYEg3NK9haxe0 2026-06-13T10:34:51Z 9885 15 15 http://arxiv.org/abs/2606.11158v1 Defeat the Heap: Zero-Copy Data Movement in AXI4MLIR 2026-06-09T17:40:13Z

As custom hardware accelerators become increasingly central to machine learning workloads, efficient data transfer is critical for maximizing accelerator performance on linear algebra kernels. AXI4MLIR, an extension of the Multi-Level Intermediate Representation (MLIR) compiler framework for automated generation of host-accelerator driver code, incurs significant runtime overhead due to non-zero-copy CPU-accelerator data movement. During transfers from the host to the accelerator, data is copied from heap-allocated memory buffers into contiguous Direct Memory Access (DMA)-mapped buffers. This work identifies this copy as a redundant staging operation and eliminates it through zero-copy data movement. The optimization extends accel, an MLIR dialect introduced by AXI4MLIR, and implements lowering support that allocates buffers directly within DMA-mapped memory, thereby omitting the staging copy. We evaluate the proposed scheme using a configurable matrix-matrix multiplication accelerator and show that the zero-copy optimization reduces main memory data movement by up to 2x, increasing overall accelerator utilization.

2026-06-09T17:40:13Z Accepted to the 7th Compilers for Machine Learning Workshop (C4ML), co-located with CGO 2026 Elam Cohavi Nicolas Bohm Agostini Jude Haris Antonino Tumeo David Kaeli José Cano http://arxiv.org/abs/2606.10920v1 Dynamic Software Updates using CRDTs 2026-06-09T14:28:34Z

This paper investigates how Conflict-free Replicated Data Types (CRDTs) can be used for dynamic software updates of distributed applications. We propose to model application updates as a new App CRDT that stores the application code associated with a semantic version, which defines a total order of the code updates. The App CRDT works with an API-compatible message delivery middleware, which allows applications to continue working with partially updated components in the face of backwards-incompatible software updates. We implemented our approach in AmbientTalk, an ambient-oriented programming language designed for distributed systems. We show how this CRDT can be integrated with existing AmbientTalk applications, requiring minimal changes. We also implemented our approach in LuAT, an ambient-oriented programming framework for Lua. This shows that our approach of using CRDTs to replicate code can be generalised to other programming languages.

2026-06-09T14:28:34Z 13 pages, 1 figure, 6 listings Seppe Wyns Jim Bauwens Elisa Gonzalez Boix http://arxiv.org/abs/2606.10717v1 Max-Policy Iteration, Revisited 2026-06-09T11:24:07Z

Max-policy iteration is an approach to computing precise numeric program invariants by successive attempts at resolving maximum operators and reduction to mathematical optimization. Mathematical optimization, though, may be expensive. Here, we show, for max-policy iteration on systems of equations over integers as well as over floating point numbers, that mathematical optimization can be replaced by plain value iteration -- which is still guaranteed to terminate. As an application, a precise bound analysis for integer or floating point variables is obtained, avoiding widening operators altogether. We also consider max-policy iteration over the rational numbers, where the right-hand sides are maxima of minima of affine combinations of unknowns. We propose min-policy iteration as an alternative to linear programming for solving the optimization problems posed by max-policy iteration. We prove that max-min policy iteration is guaranteed to return the least solution for bounded systems. We also show how to extend this algorithm to unbounded systems, and how to construct certificates of soundness as well as of optimality of the computed results.

2026-06-09T11:24:07Z 35th European Symposium on Programming (ESOP 2026), Apr 2026, Torino, Italy. pp.94-124 David Monniaux VERIMAG - IMAG, INS2I-CNRS Helmut Seidl TUM http://arxiv.org/abs/2606.10644v1 Answer Set Programming for Egg Extraction and More 2026-06-09T09:49:39Z

Three years ago, Philip Zucker posted an attempt to use answer set programming (ASP) for term extraction from e-graphs Although the task is NP-hard and ASP offers a natural modelling of e-graph terms, the initial attempt did not yield convincing results. From the aspect of practical ASP users, we first pinpoint the way to make ASP work and work well on the task of e-graph extraction. The initial results show the naïve ASP encoding is comparable on efficiency to the well-optimised ILP-based exact DAG extraction in the extraction-gym, and find several extra optimal extraction on the complex instances. This leads us to a further agenda: with the "better together of egg+Datalog", is there a better "better together" by having ASP as a more powerful Datalog? We discuss the potential benefit from each other.

2026-06-09T09:49:39Z To be presented at EGRAPHS 2026 Ziyi Yang Ilya Sergey http://arxiv.org/abs/2605.31569v2 A Datalog Framework for Conflict-Free Replicated Data Types 2026-06-08T15:41:42Z

Distributed applications increasingly support local-first collaboration over shared data, allowing multiple users to perform updates concurrently without global coordination. Such collaboration requires careful design to capture the intended semantics of the concurrent interactions. We introduce a declarative framework for specifying and reasoning about the semantics of conflict-free replicated data types (CRDTs) and CRDT-based applications in Datalog. The framework models CRDT semantics as executable logic programs over operation contexts, making concurrency explicit and compositional, and thus amenable to automated analysis. As one application, we use property-based testing to compare implementations. To the best of our knowledge, this is the first work to systematically use Datalog as a foundation for prototyping and analyzing complex CRDTs and their compositions. We evaluate our methodology using a collaborative graph data editing case study and report experimentation results assessing correctness validation and scalability with an increasing number of operations and replicas.

2026-05-29T17:36:29Z Paper presented at the 42nd International Conference on Logic Programming (ICLP 2026), Lisbon, Portugal, July 20 to July 23, 2026 Elena Yanakieva Annette Bieniusa Stefania Dumbrava http://arxiv.org/abs/2606.09645v1 Modeling Components and Connections in Cyber-Physical Systems 2026-06-08T15:39:43Z

Text based configuration files for cyber-physical systems show the hierarchy of component modules well but often hide the details of connections and interfaces between modules. A model-based visual approach to these configuration files can better capture this information. The XML structure of Robot Operating System (ROS) launch files can be improved using a modeling approach. This paper presents ROSLaunchVisual, a model-integrated environment built on WebGME for designing, visualizing, and managing ROS launch files. The tool raises the level of abstraction by allowing developers to create and modify launch files using a graphical interface that represents nodes, publishers, subscribers, and arguments as interconnected components. The tool provides a dynamic system analysis that can then be used in the static development and analysis of new and existing launch files. ROSLaunchVisual incorporates features such as metamodel-driven validation, automatic import/export of launch files, and visual communication mapping. Plugins further enhance functionality by updating libraries, checking for semantic errors, and managing remaps. By making launch file creation more intuitive and less error-prone, ROSLaunchVisual improves development efficiency and system understanding, especially in collaborative or large-scale robotics projects.

2026-06-08T15:39:43Z Kate Sanborn Tanuj Kenchannavar Vakul Nath Jonathan Sprinkle http://arxiv.org/abs/2606.09526v1 When Types Intersect and Effects Get Handled 2026-06-08T14:12:22Z

We introduce a novel intersection type system for a $λ$-calculus with algebraic effects and handlers. The system, inherently behavioral in nature, enjoys the classical properties of intersection type systems, in particular subject reduction and expansion. It thus characterizes the set of terms whose evaluation process terminates and, at the same time, allows reducing the reachability problem to type inference. This new system, the first with these features for a calculus with handlers, induces a system of simple types which, although not guaranteeing termination, is type sound and admits a decidable HOMC problem, unlike similar type systems like Dal Lago and Ghyselen's HEPCF.

2026-06-08T14:12:22Z Ugo Dal Lago Taro Sekiyama Stefano Catozi http://arxiv.org/abs/2606.09312v1 Toward Compiler World Models: Learning Latent Dynamics for Efficient Tensor Program Search 2026-06-08T10:17:27Z

Tensor program optimization is essential for modern machine learning systems, but its search space is enormous. Existing auto-schedulers reduce measurement cost with learned cost models, yet they usually evaluate each candidate as a static code snapshot, ignoring the schedule trajectory that produced it. This makes them insensitive to action dependencies and vulnerable to superficial code variations. We propose a \emph{world-model-inspired} evaluator that models schedule evaluation as action-conditioned latent dynamics over program states. Starting from the initial program, it rolls out scheduling actions in a continuous latent space with a lightweight transition model, avoiding expensive AST mutation and repeated code encoding. The final dynamic representation is combined with action and hardware features to rank candidates. Implemented in TVM AutoScheduler, our method improves representative-subgraph latency over Ansor by 1.37$\times$ on GPU and 1.54$\times$ on CPU under the same 64-trial budget. It also matches Ansor-10K within 2.2% geometric mean using 10$\times$ fewer measurements, and accelerates full-model inference over PyTorch/PyTorch-opt(cuDNN) by 4.61$\times$/3.67$\times$ geometric mean.

2026-06-08T10:17:27Z Haolin Pan Lianghong Huang Xvlin Zhou Mingjie Xing Yanjun Wu http://arxiv.org/abs/2606.09213v1 SNN-MLIR: An MLIR Dialect for Compiling Neuromorphic SNNs from NIR to Bare-Metal C 2026-06-08T08:47:48Z

Spiking neural networks (SNNs) are increasingly trained in a wide range of frameworks (SnnTorch, Lava, Norse, and others) each with its own model format. The Neuromorphic Intermediate Representation (NIR) addresses this fragmentation by providing a common, framework-independent format for exchanging trained SNN models. NIR solves the exchange problem, but it stops there. It provides a description of a network, not a path to running one. Each backend is still left to implement deployment on its own, with no shared, transformable compiler representation in between. This paper presents snn-mlir, an outof-tree MLIR dialect for SNNs together with a NIR-MLIR-C compilation bridge. The dialect provides a small set of typepolymorphic operations that work identically on floating-point (f32/f64) and quantized data, so a single intermediate representation serves both simulation and hardware-oriented deployment. A Python front end reads any NIR file and emits dialect IR, automatically inserting rescaling operations to keep quantization scales consistent across layers. A reference lowering pass converts the dialect to standard linalg and arith operations, from which the toolchain produces self-contained, dependency free C11 code that compiles and runs on any C-capable CPU or embedded target. We evaluate numerical fidelity against reference outputs, portability across CPU targets, and the cost of quantization. The current scope is feedforward, fully-connected networks with a CPU backend. snn-mlir is released as open source under the Apache-2.0 license with LLVM-exception and it is already available on Github.

2026-06-08T08:47:48Z 8 pages, 5 figures, 5 tables Alejandro García Gener Alvaro Rollón de Pinedo http://arxiv.org/abs/2606.08944v1 LongRTL: Graph-Similarity-Guided LLM-driven Long Context RTL Optimization 2026-06-08T02:41:20Z

Large Language Models (LLMs) show great promise in RTL code generation and optimization. However, real-world RTL designs are typically long, entangled, and poorly modularized, posing a major challenge due to context-length limitations and lack of structure. To overcome these obstacles, we propose a scalable LLM-based RTL optimization framework guided by graph similarity. Our method introduces three collaborative agents: (1) a Partition Agent that decomposes RTL designs into semantically meaningful AST subtrees, guided by AST graph similarity to reusable design templates; (2) an Optimization Agent that generates RTL submodule code based on partitioned subtrees using multi-modal Retrieval-Augmented Generation (RAG) with both AST and RTL guidance; and (3) a Reconstruction Agent that reassembles optimized submodules based on logic-aware ordering and Graph-RAG prompting, ensuring global functional equivalence. Together, these components enable robust, structure-aware optimization of long-context RTL designs, bridging the gap between toy examples and industrial-scale hardware codebases.

2026-06-08T02:41:20Z 7 pages, 6 figures, 5 tables, conference Yuyang Ye Che-Kuan Shen Xiangfei Hu Yuchen Liu Shuo Yin Xufeng Yao Bei Yu Tsung-Yi Ho http://arxiv.org/abs/2508.11874v2 Discovering Expert-Level Nash Equilibrium Algorithms with Large Language Models 2026-06-08T00:24:06Z

Designing polynomial-time algorithms for approximate Nash equilibria (ANE) with provable worst-case guarantees is a fundamental open problem in algorithmic game theory. While large language models (LLMs) can generate candidate algorithms at scale, certifying worst-case guarantees requires formal analysis over all game instances -- a task for which no automated system previously existed. Here, we present LegoNE, a framework encoding expert proof strategies into a symbolic language that automatically compiles any candidate algorithm into a finite optimization problem certifying its worst-case guarantee. Integrating LegoNE with a reasoning LLM, we rediscovered an algorithm matching the best polynomial-time guarantee for two-player games, and discovered a three-player algorithm improving the best guarantee from $0.6+δ$ to $0.5+δ$ -- provably beyond the reach of the extension technique, the only previously known multi-player ANE design paradigm. These results show that encoding domain-specific proof strategies into a machine-tractable language can support LLM-driven discovery of algorithms outside known human design paradigms.

2025-08-16T02:18:43Z accepted by Nature Communications Hanyu Li Dongchen Li Xiaotie Deng 10.1038/s41467-026-74003-1 http://arxiv.org/abs/2510.15747v3 GLP: A Grassroots, Multiagent, Concurrent, Logic Programming Language for AI (Full Version) 2026-06-07T20:02:43Z

A grassroots platform is a multiagent distributed system in which multiple independent instances can form and operate independently of each other and of any global resource, yet may coalesce into ever larger instances, possibly resulting in a single global instance. Grassroots platforms aim to offer an egalitarian/democratic alternative to centralised/autocratic and decentralised/plutocratic global platforms. Here, we present Grassroots Logic Programs (GLP), a multiagent concurrent logic programming language designed for the implementation of grassroots platforms: we recall the standard operational semantics of logic programs; introduce the concurrent operational semantics of GLP as its restriction; recall multiagent atomic transactions; use them to introduce a multiagent operational semantics of GLP; and prove multiagent GLP to be grassroots. The grassroots social graph -- the foundational grassroots platform on which all others are based -- serves as a GLP programming example. These mathematical foundations are being used by AI to implement GLP as well as to program in GLP: a workstation-based implementation of concurrent GLP in Dart was derived from the concurrent operational semantics of GLP; a multiagent smartphone-based implementation of GLP in Dart/Flutter is being developed based on the multiagent operational semantics of GLP; a moded type system for GLP was designed (and implemented by AI in Dart) to facilitate collaborative human-AI development of GLP programs, where AI derives working GLP programs from human-approved type definitions and declarations; GLP implementations of grassroots platforms for the social graph, social networks, currencies and bonds, and more, have been derived by AI from mathematical specifications written as volitional multiagent atomic transactions.

2025-10-17T15:34:27Z Ehud Shapiro http://arxiv.org/abs/2602.06934v4 Implementing Grassroots Logic Programs with Multiagent Transition Systems and AI (Full Version) 2026-06-07T18:21:02Z

Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities. The language was introduced together with concurrent (cGLP) and multiagent (maGLP) operational semantics. Here, we derive from these (\ia)~dGLP, a deterministic counterpart of cGLP, and (\ib)~madGLP, a counterpart of maGLP in which deterministic agents communicate solely by asynchronous message passing, and prove them correct against their abstract counterparts. maGLP shared variable pairs spanning agents can be implemented as local variables paired by \emph{global links}, with correctness following from disjoint substitution commutativity (a consequence of GLP's single-occurrence invariant). We further prove that madGLP is grassroots. Both dGLP and madGLP serve as formal specifications for an AI-driven implementation discipline (math $\to$ informal spec $\to$ Dart) employed and described here: from dGLP, AI (Claude) developed a workstation-based GLP implementation in Dart, and from madGLP it is developing a smartphone-based multiagent one.

2026-02-06T18:30:11Z Ehud Shapiro http://arxiv.org/abs/2601.17957v4 Moded Types for Grassroots Logic Programs, by AI, for AI (Full Version) 2026-06-07T15:28:48Z

Grassroots Logic Programs (GLP) is a concurrent logic programming language in which logic variables are partitioned into paired readers and writers. An assignment is produced at most once via a writer and consumed at most once via its paired reader, and may contain additional readers and/or writers. This enables the concise expression of rich multidirectional communication modalities. ``Logic Programs as Types for Logic Programs'' (LICS'91) defined types as regular sets of paths over the Herbrand atom semantics of a logic program. Here, we develop a \emph{moded-atom semantics} that extends the standard Herbrand atom semantics in two ways: (\ia)~each atom subterm carries a \emph{mode}, recording whether it is consumed from or produced to the environment; and (\ib)~partial computations, including those that deadlock, fail, or never terminate, also contribute moded atoms to the semantics. We define types to be regular sets of \emph{moded paths} over this semantics, give a syntactic definition of GLP well-typing, and prove that a well-typed program is sound: every output path in its well-typed moded-atom semantics conforms to its declared output type. A type checker for GLP was implemented \emph{by} AI (Claude) in Dart, starting from the mathematical specification of Typed GLP (this paper), deriving from it an English+pseudocode spec (written by AI), and from the spec deriving Dart code (by AI). While GLP is naturally untyped, the motivation for typing it was \emph{for} AI: tasking AI to program complex communication modalities and hoping for the best turned out to be a tenuous strategy. The discipline we developed with Typed GLP is for the human designer and AI to jointly develop formal GLP type definitions and declarations, together with informal intent of the declared procedures, and only then let AI write the GLP code.

2026-01-25T19:18:27Z Ehud Shapiro http://arxiv.org/abs/2606.09930v1 Compile Once, Differentiate Everywhere: A Differentiable Meta-Circular Interpreter 2026-06-07T15:13:41Z

The boundary between program execution and gradient-based optimization has long limited the use of code itself as a learnable scientific model. We present a compiler that translates a self-hosting subset of Scheme into differentiable computation graphs for autograd backends. Because the subset can compile its own evaluator, this yields differentiable meta-circular interpretation (DMCI): a compiled Scheme interpreter executes programs supplied as data, while reverse-mode autodiff propagates gradients to continuous constants embedded in those programs. The interpreter is compiled once, so new programs inherit differentiability without recompilation or custom gradient machinery, while retaining closures, recursion, and data structures. We prove that gradients through the compiled interpreter are correct almost everywhere and show that they match direct compilation to numerical precision across 171 recursive and higher-order program-seed pairs. We then use DMCI for program-and-parameter co-search, where a large language model proposes Scheme programs and exact gradients calibrate their continuous parameters through a single frozen interpreter. This enables OpenEvolve-style program search in which an outer loop proposes discrete program structures and DMCI supplies exact gradient-based calibration of each candidate's continuous parameters. On battery capacity-fade data, the search recovers a knee-like degradation structure and improves held-out extrapolation over hand-crafted baselines on the harder early-extrapolation split, matching them on the later split. On a high-dimensional El Nino inverse problem, DMCI optimizes an interpreted Kalman-filter likelihood where gradient-free search fails. These results extend symbolic regression and neurosymbolic search from closed-form expressions to executable, stateful programs, making model-generated code directly optimizable against data.

2026-06-07T15:13:41Z Lucas Sheneman