https://arxiv.org/api/qn/V6gOYxJu9EmjT50t5qKijWxM 2026-06-14T22:33:45Z 9885 510 15 http://arxiv.org/abs/2509.21550v2 A Target-Agnostic Protocol-Independent Interface for the Transport Layer 2026-03-14T22:13:13Z

Transport protocols continue to evolve to meet the demands of new applications, workloads, and network environments, yet implementing and evolving transport protocols remains difficult and costly. High-performance transport stacks tightly interweave protocol behavior with system-level mechanisms such as packet I/O, memory management, and concurrency control, resulting in large code bases where protocol logic is scattered and hard to modify -- an issue exacerbated by modern heterogeneous execution environments. This paper introduces transport programs, a target-independent abstraction that precisely and centrally captures a transport protocol's reactions to relevant transport events using abstract instructions for key transport operations such as data reassembly, packet generation and scheduling, and timer manipulation, while leaving execution strategy and low-level mechanisms to the target. We show that transport programs can express a diverse set of transport protocols, be efficiently realized on targets built over DPDK and Linux XDP, achieve performance comparable to hand-optimized implementations, and enable protocol changes and portability across targets without modifying underlying infrastructure.

2025-09-25T20:34:52Z Pedro Mizuno Kimiya Mohammadtaheri Linfan Qian Joshua Johnson Danny Akbarzadeh Chris Neely Mario Baldi Nachiket Kapre Mina Tahmasbi Arashloo http://arxiv.org/abs/2603.14019v1 MapReplay: Trace-Driven Benchmark Generation for Java HashMap 2026-03-14T16:46:09Z

Hash-based maps, particularly java.util.HashMap, are pervasive in Java applications and the JVM, making their performance critical. Evaluating optimizations is challenging because performance depends on factors such as operation patterns, key distributions, and resizing behavior. Microbenchmarks are fast and repeatable but often oversimplify workloads, failing to capture the realistic usage patterns. Application benchmarks (e.g., DaCapo, Renaissance) provide realistic usages but are more expensive to run, prone to variability, and dominated by non-HashMap computations, making map-related performance changes difficult to observe. To address this challenge, we propose MapReplay, a benchmarking methodology that combines the realism of application benchmarks with the efficiency of microbenchmarks. MapReplay traces HashMap API usages generating a replay workload that reproduces the same operation sequence while faithfully reconstructing internal map states. This enables realistic and efficient evaluation of alternative implementations under realistic usage patterns. Applying MapReplay to DaCapo-Chopin and Renaissance, the resulting suite, MapReplayBench, reproduces application-level performance trends while reducing experimentation time and revealing insights difficult to obtain from full benchmarks.

2026-03-14T16:46:09Z Filippo Schiavio Andrea Rosà Júnior Löff Lubomír Bulej Petr Tůma Walter Binder 10.1145/3777884.3797010 http://arxiv.org/abs/2603.13514v1 Executable Archaeology: Reanimating the Logic Theorist from its IPL-V Source 2026-03-13T18:47:31Z

The Logic Theorist (LT), created by Allen Newell, J. C. Shaw, and Herbert Simon in 1955-1956, is widely regarded as the first artificial intelligence program. While the original conceptual model was described in 1956, it underwent several iterations as the underlying Information Processing Language (IPL) evolved. Here I describe the construction of a new IPL-V interpreter, written in Common Lisp, and the faithful reanimation of the Logic Theorist from code transcribed directly from Stefferud's 1963 RAND technical report. Stefferud's version represents a pedagogical re-coding of the original heuristic logic into the standardized IPL-V. The reanimated LT successfully proves 16 of 23 attempted theorems from Chapter 2 of Principia Mathematica, results that are historically consistent with the original system's behavior within its search limits. To the author's knowledge, this is the first successful execution of the original Logic Theorist code in over half a century.

2026-03-13T18:47:31Z Jeff Shrager http://arxiv.org/abs/2603.20264v1 Can LLMs Perform Synthesis? 2026-03-13T13:35:10Z

How do LLMs compare with symbolic tools on program synthesis tasks? We investigate this question on several synthesis domains: LTL reactive synthesis, syntax-guided synthesis, distributed protocol synthesis, and recursive function synthesis. For each domain, we choose a state-of-the-art symbolic tool and compare it to an open-source, 32 billion parameter version of the Qwen LLM and the proprietary, frontier LLM GPT-5. We couple Qwen with a symbolic verifier and run it repeatedly until it either produces a solution that passes the verifier, or until there is a timeout, for each benchmark. We run GPT-5 once per benchmark and verify the generated output. In all domains, the symbolic tools solve more benchmarks than Qwen and either outperform or are about on par with GPT-5. In terms of execution time, the symbolic tools outperform GPT-5 in all domains, and either outperform or are very close to Qwen, despite the fact that the LLMs are run on significantly more powerful hardware.

2026-03-13T13:35:10Z Derek Egolf Yuhao Zhou Stavros Tripakis http://arxiv.org/abs/2603.13443v1 NormCode Canvas: Making LLM Agentic Workflows Development Sustainable via Case-Based Reasoning 2026-03-13T12:59:22Z

We present NormCode Canvas (v1.1.3), a deployed system realizing Case-Based Reasoning at two levels for multi-step LLM workflows. The foundation is NormCode, a semi-formal planning language whose compiler-verified scope rule ensures every execution checkpoint is a genuinely self-contained case -- eliminating the implicit shared state that makes retrieval unreliable and failure non-localizable in standard orchestration frameworks. Level 1 treats each checkpoint as a concrete case (suspended runtime); Fork implements retrieve-and-reuse, Value Override implements revision with automatic stale-boundary propagation. Level 2 treats each compiled plan as an abstract case; the compilation pipeline is itself a NormCode plan, enabling recursive case learning. Three structural properties follow: (C1) direct checkpoint inspection; (C2) pre-execution review via compiler-generated narrative; (C3) scope-bounded selective re-execution. Four deployed plans serve as structured evidence: PPT Generation produces presentation decks at ~40s per slide on commercial APIs; Code Assistant carries out multi-step software-engineering tasks spanning up to ten reasoning cycles; NC Compilations converts natural-language specifications into executable NormCode plans; and Canvas Assistant, when connected to an external AI code editor, automates plan debugging. Together these plans form a self-sustaining ecosystem in which plans produce, debug, and refine one another -- realizing cumulative case-based learning at system scale.

2026-03-13T12:59:22Z 16 pages, 5 figures. Submitted to ICCBR 2025 Xin Guan Yunshan Li Ze Wang http://arxiv.org/abs/2604.13046v1 A Domain-Specific Language for LLM-Driven Trigger Generation in Multimodal Data Collection 2026-03-13T11:53:42Z

Data-driven systems depend on task-relevant data, yet data collection pipelines remain passive and indiscriminate. Continuous logging of multimodal sensor streams incurs high storage costs and captures irrelevant data. This paper proposes a declarative framework for intent-driven, on-device data collection that enables selective collection of multimodal sensor data based on high-level user requests. The framework combines natural language interaction with a formally specified domain-specific language (DSL). Large language models translate user-defined requirements into verifiable and composable DSL programs that define conditional triggers across heterogeneous sensors, including cameras, LiDAR, and system telemetry. Empirical evaluation on vehicular and robotic perception tasks shows that the DSL-based approach achieves higher generation consistency and lower execution latency than unconstrained code generation while maintaining comparable detection performance. The structured abstraction supports modular trigger composition and concurrent deployment on resource-constrained edge platforms. This approach replaces passive logging with a verifiable, intent-driven mechanism for multimodal data collection in real-time systems.

2026-03-13T11:53:42Z Version submitted to the IEEE International Conference on Intelligent Transportation Systems (ITSC 2026) Philipp Reis Philipp Rigoll Martin Zehetner Jacqueline Henle Stefan Otten Eric Sax http://arxiv.org/abs/2602.23389v2 CIll: CTI-Guided Invariant Generation via LLMs for Model Checking 2026-03-13T09:45:01Z

Inductive invariants are crucial in model checking, yet generating effective inductive invariants automatically and efficiently remains challenging. A common approach is to iteratively analyze counterexamples to induction (CTIs) and derive invariants that rule them out, as in IC3. However, IC3's clause-based learning is limited to a CNF representation. For some designs, the resulting invariants may require a large number of clauses, which hurts scalability. We present CIll, a CTI-guided framework that leverages LLMs to synthesize invariants for model checking. CIll alternates between (bounded) correctness checking and inductiveness checking for the generated invariants. In correctness checking, CIll uses BMC to validate whether the generated invariants hold on reachable states within a given bound. In inductiveness checking, CIll checks whether the generated invariants, together with the target property, become inductive under the accumulated strengthening. When inductiveness fails, CIll extracts CTIs and provides them to the LLM. The LLM inspects the design and the CTI to propose new invariants that invalidate the CTIs. The proposed invariants are then re-validated through correctness and inductiveness checks, and the loop continues until the original property strengthened by the generated invariants becomes inductive. CIll also employs IC3 to work with the LLM for automatically discovering invariants, and uses K-Induction as a complementary engine. To improve performance, CIll applies local proof and reuses invariants learned by IC3, reducing redundant search and accelerating convergence. In our evaluation, CIll proved full compliance within RISCV-Formal framework and full accuracy of all non-M instructions in NERV and PicoRV32, whereas M extensions are proved against the RVFI ALTOPS substitute semantics provided by RISCV-Formal.

2026-02-21T00:19:48Z Yuheng Su Tianjun Bu Qiusong Yang Yiwei Ci Enyuan Tian http://arxiv.org/abs/2508.04115v2 Weak Memory Model Formalisms: Introduction and Survey 2026-03-13T02:53:35Z

Memory consistency models define the order in which accesses to shared memory in a concurrent system may be observed to occur. Such models are a necessity since program order is not a reliable indicator of execution order, due to microarchitectural features or compiler transformations. Concurrent programming, already a challenging task, is thus made even harder when weak memory effects must be addressed. A rigorous specification of weak memory models is therefore essential to make this problem tractable for developers of safety- and security-critical, low-level software. In this paper we survey the field of formalisations of weak memory models, including their specification, their effects on execution, and tools and inference systems for reasoning about code. To assist the discussion we also provide an introduction to two styles of formal representation found commonly in the literature (using a much simplified version of Intel's x86 as the example): a step-by-step construction of traces of the system (operational semantics); and with respect to relations between memory events (axiomatic semantics). The survey covers some long-standing hardware features that lead to observable weak behaviours, a description of historical developments in practice and in theory, an overview of computability and complexity results, and outlines current and future directions in the field.

2025-08-06T06:25:29Z New to this version: Publisher's self-archiving notice, and updated email address Concurrency and Computation: Practice and Experience 38(2), 2026 Roger C. Su Robert J. Colvin 10.1002/cpe.70484 http://arxiv.org/abs/2503.08530v2 A Probabilistic Choreography Language for PRISM 2026-03-12T15:36:51Z

We present a choreographic framework for modelling and analysing concurrent probabilistic systems based on the PRISM model-checker. This is achieved through the development of a choreography language, which is a specification language that allows to describe the desired interactions within a concurrent system from a global viewpoint. Using choreographies gives a clear and complete view of system interactions, making it easier to understand the process flow and identify potential errors, which helps ensure correct execution and improves system reliability. We equip our language with a probabilistic semantics and then define a formal encoding into the PRISM language and discuss its correctness. Properties of programs written in our choreographic language can be model-checked by the PRISM model-checker via their translation into the PRISM language. Finally, we implement a compiler for our language and demonstrate its practical applicability via examples drawn from the use cases featured in the PRISM website.

2025-03-11T15:20:18Z Marco Carbone Adele Veschetti http://arxiv.org/abs/2603.11489v1 AutoVeriFix+: High-Correctness RTL Generation via Trace-Aware Causal Fix and Semantic Redundancy Pruning 2026-03-12T03:15:57Z

Large language models (LLMs) have demonstrated impressive capabilities in generating software code for high-level programming languages such as Python and C++. However, their application to hardware description languages, such as Verilog, is challenging due to the scarcity of high-quality training data. Current approaches to Verilog code generation using LLMs often focus on syntactic correctness, resulting in code with functional errors. To address these challenges, we propose AutoVeriFix+, a novel three-stage framework that integrates high-level semantic reasoning with state-space exploration to enhance functional correctness and design efficiency. In the first stage, an LLM is employed to generate high-level Python reference models that define the intended circuit behavior. In the second stage, another LLM generates initial Verilog RTL candidates and iteratively fixes syntactic errors. In the third stage, we introduce a Concolic testing engine to exercise deep sequential logic and identify corner-case vulnerabilities. With cycle-accurate execution traces and internal register snapshots, AutoVeriFix+ provides the LLM with the causal context necessary to resolve complex state-transition errors. Furthermore, it will generate a coverage report to identify functionally redundant branches, enabling the LLM to perform semantic pruning for area optimization. Experimental results demonstrate that AutoVeriFix+ achieves over 80% functional correctness on rigorous benchmarks, reaching a pass@10 score of 90.2% on the VerilogEval-machine dataset. In addition, it eliminates an average of 25% redundant logic across benchmarks through trace-aware optimization.

2026-03-12T03:15:57Z arXiv admin note: text overlap with arXiv:2509.08416 Yan Tan Xiangchen Meng Zijun Jiang Yangdi Lyu http://arxiv.org/abs/2603.11104v1 Type-safe Monitoring of Parameterized Streams 2026-03-11T10:03:57Z

Stream-based monitoring is a real-time safety assurance mechanism for complex cyber-physical systems such as unmanned aerial vehicles. The monitor aggregates streams of input data from sensors and other sources to give real-time statistics and assessments of the system's health. Since the monitor is a safety-critical component, it is mandatory to ensure the absence of runtime errors in the monitor. Providing such guarantees is particularly challenging when the monitor must handle unbounded data domains, like an unlimited number of airspace participants, requiring the use of dynamic data structures. This paper provides a type-safe integration of parameterized streams into the stream-based monitoring framework RTLola. Parameterized streams generalize individual streams to sets of an unbounded number of stream instances and provide a systematic mechanism for memory management. We show that the absence of runtime errors is, in general, undecidable but can be effectively ensured with a refinement type system that guarantees all memory references are either successful or backed by a default value. We report on the performance of the type analysis on example specifications from a range of benchmarks, including specifications from the monitoring of autonomous aircraft.

2026-03-11T10:03:57Z Jan Baumeister Bernd Finkbeiner Florian Kohn http://arxiv.org/abs/2603.10196v1 Fully Symbolic Analysis of Loop Locality: Using Imaginary Reuse to Infer Real Performance 2026-03-10T19:51:31Z

This paper presents a new theory of locality and its compiler support. The theory is fully symbolic and derives locality as polynomials, and the compiler analysis supports affine loop nests. They derive cache-performance scaling in quadratic and reciprocal expressions and are more general and precise than empirical scaling rules. Evaluated on a benchmark suite of 41 scientific kernels and tensor operations, the compiler requires an average of 41 seconds to derive the locality polynomials. After derivation, predicting the cache miss count for any given input size and cache configuration takes less than a millisecond. Across all tests--with and without loop fusion--the accuracy in the data movement prediction is 99.6\%, compared to simulated set-associative L1 data cache.

2026-03-10T19:51:31Z Yifan Zhu Yekai Pan Chen Ding Yanghui Wu http://arxiv.org/abs/2507.16051v3 Getting Python Types Right with RightTyper 2026-03-10T19:33:43Z

Python type annotations enable static type checking, but most code remains untyped because manual annotation is time-consuming and tedious. Past approaches to automatic type inference fall short: static methods struggle with dynamic features and infer overly broad types; AI-based methods are unsound and miss rare types; and dynamic methods impose extreme overheads (up to 270x), lack important language support such as inferring variable types, or produce annotations that cause runtime errors. This paper presents RightTyper, a novel hybrid approach for Python that produces accurate and precise type annotations grounded in actual program behavior. RightTyper grounds inference in types observed during actual program execution and combines these observations with static analysis and name resolution to produce substantially higher-quality type annotations than prior approaches. Through principled, statistically guided adaptive sampling, RightTyper balances runtime overhead with the need to observe sufficient execution behavior to infer high-quality type annotations. We evaluate RightTyper against static, dynamic, and AI-based systems on both synthetic benchmarks and real-world code, and find that it consistently achieves higher semantic similarity to ground-truth and developer-written annotations, respectively, while incurring only approximately 27% runtime overhead.

2025-07-21T20:37:35Z Juan Altmayer Pizzorno Emery D. Berger http://arxiv.org/abs/2603.06731v2 PolyBlocks: A Compiler Infrastructure for AI Chips and Programming Frameworks 2026-03-10T17:20:46Z

We present the design and implementation of PolyBlocks, a modular and reusable MLIR-based compiler infrastructure for AI programming frameworks and AI chips. PolyBlocks is based on pass pipelines that compose transformations on loop nests and SSA, primarily relying on lightweight affine access analysis; the transformations are stitched together in specialized ways to realize high-performance code automatically by the use of analytical cost models and heuristics. The optimizations in these passes include multi-level tiling, fusion, on-chip scratchpad usage, mapping matmuls and convolutions to matrix units, fusing the attention layer, and several other transformations for parallelism and locality. They have been developed in a way that makes it easy to build PolyBlocks-based compilers to target new chips, reusing much of the infrastructure. PolyBlocks' design and architecture enable fully automatic code generation from high-level frameworks to low-level target-specific intrinsics. Experimental results from evaluating PolyBlocks-powered just-in-time compilation for PyTorch and JAX targeting NVIDIA GPUs show that it is able to match or outperform Torch Inductor and XLA in several cases, although the latter rely on a combination of vendor libraries and code generation. For individual operators like matmuls and convolutions, PolyBlocks-generated code is competitive with the best vendor-tuned libraries or hand-written kernels.

2026-03-06T03:45:50Z Fixed the "Acknowledgments" section that was missing phrases Uday Bondhugula Akshay Baviskar Navdeep Katel Vimal Patel Anoop JS Arnab Dutta http://arxiv.org/abs/2603.09726v1 Idempotent Slices with Applications to Code-Size Reduction 2026-03-10T14:32:33Z

Given a value computed within a program, an idempotent backward slice with respect to this value is a maximal subprogram that computes it. An informal notion of an idempotent slice has previously been used by Guimaraes et al. to transform eager into strict evaluation in the LLVM intermediate representation. However, that algorithm is insufficient to be correctly applied to general control-flow graphs. This paper addresses these omissions by formalizing the notion of idempotent backward slices and presenting a sound and efficient algorithm for extracting them from programs in Gated Static Single Assignment (GSA) form. As an example of their practical use, the paper describes how identifying and extracting idempotent backward slices enables a sparse code-size reduction optimization; that is, one capable of merging non-contiguous sequences of instructions within the control-flow graph of a single function or across functions. Experiments with the LLVM test suite show that, in specific benchmarks, this new algorithm achieves code-size reductions up to -7.24% on programs highly optimized by the -Os sequence of passes from clang 17.

2026-03-10T14:32:33Z 23 pages, 4 tables, 12 figures Rafael Alvarenga de Azevedo Daniel Augusto Costa de Sa Rodrigo Caetano Rocha Fernando Magno Quintão Pereira