Actegories, Copowers, and Higher-Order Message Passing Semantics

2026-03-10T09:44:16Z

In this paper we prove that giving a right actegory with hom-objects is equivalent to giving a right-enriched category with copowers. While this result is known in the closed symmetric setting, our contribution extends the equivalence to non-closed and non-symmetric monoidal bases. This generalization is motivated by the semantics of higher-order message passing in the Categorical Message Passing Language (CaMPL), a concurrent language whose semantics is given by a linear actegory. A desirable feature for this language is the support of higher-order processes: processes that are passed as first class citizens between processes. While this ability is already present in any closed linear type systems -- such as CaMPL's -- to support arbitrary recursive process definitions requires the ability to reuse passed processes. Concurrent resources in CaMPL, however, cannot be duplicated, thus, passing processes as linear closures does not provide the required flexibility. This means processes must be passed as sequential data and the concurrent side must be enriched in the sequential side, motivating the technical result of this paper.

Floating-Point Usage on GitHub: A Large-Scale Study of Statically Typed Languages

2026-03-10T07:33:33Z

Reasoning about floating-point arithmetic is notoriously hard. While static and dynamic analysis techniques or program repair have made significant progress, more work is still needed to make them relevant to real-world code. On the critical path to that goal is understanding what real-world floating-point code looks like. To close that knowledge gap, this paper presents the first large-scale empirical study of floating-point arithmetic usage across public GitHub repositories. We focus on statically typed languages to allow our study to scale to millions of repositories. We follow state-of the art mining practices including random sampling and filtering based on only intrinsic properties to avoid bias, and identify floating-point usage by searching for keywords in the source code, and programming language constructs (e.g., loops) by parsing the code. Our evaluation supports the claim often made in papers that floating-point arithmetic is widely used. Comparing statistics such as size and usage of certain constructs and functions, we find that benchmarks used in literature to evaluate automated reasoning techniques for floating-point arithmetic are in certain aspects representative of 'real-world' code, but not in all. We publish a dataset of 10 million real-world floating-point functions extracted from our study. We demonstrate in a case study how it may be used to identify new floating-point benchmarks and help future techniques for floating-point arithmetic to be designed and evaluated to match actual users' expectations.

Arbiter: Detecting Interference in LLM Agent System Prompts

2026-03-09T22:29:47Z

System prompts for LLM-based coding agents are software artifacts that govern agent behavior, yet lack the testing infrastructure applied to conventional software. We present Arbiter, a framework combining formal evaluation rules with multi-model LLM scouring to detect interference patterns in system prompts. Applied to three major coding agent system prompts: Claude Code (Anthropic), Codex CLI (OpenAI), and Gemini CLI (Google), we identify 152 findings across the undirected scouring phase and 21 hand-labeled interference patterns in directed analysis of one vendor. We show that prompt architecture (monolithic, flat, modular) strongly correlates with observed failure class but not with severity, and that multi-model evaluation discovers categorically different vulnerability classes than single-model analysis. One scourer finding was structural data loss in Gemini CLI's memory system was consistent with an issue filed and patched by Google, which addressed the symptom without addressing the schema-level root cause identified by the scourer. Total cost of cross-vendor analysis: \$0.27 USD.

EAGLE-Pangu: Accelerator-Safe Tree Speculative Decoding on Ascend NPUs

2026-03-09T08:30:04Z

Autoregressive decoding remains a primary bottleneck in large language model (LLM) serving, motivating speculative decoding methods that reduce expensive teacher-model invocations by verifying multiple candidate tokens per step. Tree-structured speculation further increases parallelism, but is often brittle when ported across heterogeneous backends and accelerator stacks, where attention masking, KV-cache layouts, and indexing semantics are not interchangeable. We present EAGLE-Pangu, a reproducible system that ports EAGLE-3-style tree speculative decoding to a Pangu teacher backend on Ascend NPUs. EAGLE-Pangu contributes (i) an explicit branch/commit cache manager built on the Cache API, (ii) accelerator-safe tree tensorization that removes undefined negative indices by construction and validates structural invariants, and (iii) a fused-kernel-compatible teacher verification path with a debuggable eager fallback. On 240 turns from MT-Bench and HumanEval-style prompts, EAGLE-Pangu improves end-to-end decoding throughput by 1.27x on average, up to 2.46x at p99, over teacher-only greedy decoding in the fused-kernel performance path. We also provide a fused-kernel-free reference path with structured traces and invariant checks to support reproducible debugging and ablation across execution modes and tree budgets.

From State Changes to Creative Decisions: Documenting and Interpreting Traces Across Creative Domains

2026-03-07T12:50:42Z

Analyzing creative activity traces requires capturing activity at appropriate granularity and interpreting it in ways that reflect the structure of creative practice. However, existing approaches record state changes without preserving the intent or relationships that define higher-level creative moves. This decoupling manifests differently across domains: GenAI tools lose non-linear exploration structure, visualization authoring obscures representational intent, and programmatic environments flatten interaction boundaries. We present three complementary approaches: a node-based interface for stateful GenAI artifact management, a vocabulary of visual cues as higher-level creative moves in visualization authoring, and a programming model that embeds semantic histories directly into interaction state.

Turn: A Language for Agentic Computation

2026-03-07T09:10:09Z

We present \textbf{Turn}, a compiled, actor-based programming language -- statically typed for schema inference, dynamically typed at the value level -- for agentic software: programs that reason and act autonomously by delegating inference to large language models (LLMs). Existing approaches augment general-purpose languages with frameworks, encoding critical invariants (bounded context, typed inference output, credential isolation, durable state) as application-level conventions rather than language guarantees. Turn introduces five language-level constructs that address this gap. \emph{Cognitive Type Safety} makes LLM inference a typed primitive: the compiler generates a JSON Schema from a struct definition and the VM validates model output before binding. The \emph{confidence operator} enables deterministic control flow gated on model certainty. Turn's \emph{actor-based process model}, derived from Erlang, gives each agent an isolated context window, persistent memory, and mailbox. A \emph{capability-based identity system} returns opaque, unforgeable handles from the VM host, ensuring raw credentials never enter agent memory. Finally, \emph{compile-time schema absorption} (\texttt{use schema::}) synthesizes typed API bindings from external specifications at compile time; the \texttt{openapi} adapter is shipped with \texttt{graphql}, \texttt{fhir}, and \texttt{mcp} in active development. We describe the language design, type rules, schema semantics, and a Rust-based bytecode VM, and evaluate Turn against representative agentic workloads. Turn is open source at https://github.com/ekizito96/Turn.

Linear Layouts: Robust Code Generation of Efficient Tensor Computation Using $\mathbb{F}_2$

2026-03-06T16:48:24Z

Efficient tensor computation is a cornerstone of modern deep learning (DL) workloads, yet existing approaches struggle to achieve flexible and performant design and implementation of tensor layouts -- mappings between logical tensors and hardware resources. The increasing complexity of DL algorithms and hardware demands a generic and systematic approach to handling tensor layouts. In this work, we introduce Linear Layouts, a novel approach that models tensor layouts using linear algebra over $\mathbb{F}_2$. By representing tensor layouts as binary matrices acting on the bits of the hardware representation, our approach enables a generic layout definition -- as opposed to the classical case-by-case approach -- and allows for generic layout-to-layout conversions, eliminating the quadratic explosion that plagues existing solutions. We integrate linear layouts with Triton and demonstrate their effectiveness in optimizing individual Triton operators as well as kernels written in Triton. We also show that linear layouts reduce engineering effort in the compiler backend while fixing several bugs in Triton's legacy layout system.

Model2Kernel: Model-Aware Symbolic Execution For Safe CUDA Kernels

2026-03-06T04:13:28Z

The widespread adoption of large language models (LLMs) has made GPU-accelerated inference a critical part of modern computing infrastructure. Production inference systems rely on CUDA kernels to implement core transformer operations, yet these kernels are highly susceptible to memory-safety bugs due to model-dependent tensor layouts, intricate memory indexing, and massive thread-level parallelism. Such bugs can corrupt model weights, crash inference services, or even enable adversarial attacks. Existing techniques either depend on unavailable hardware, incur high overhead, or fail to handle kernel inputs with variable lengths, and none can effectively detect CUDA memory bugs in LLM inference systems. This paper presents Model2Kernel, the first practical system for automatically verifying the memory safety of CUDA kernels used in LLM inference. Model2Kernel performs model-aware dynamic analysis to determine how each model invokes kernels and to classify kernel arguments as either fixed by the model architecture or controlled by model users. Using this information, Model2Kernel then applies CUDA-specialized symbolic execution, supported by new abstractions for dynamic tensor memory and thread identifiers, to accurately pinpoint memory bugs in kernels. In the evaluation on CUDA kernels and models from vLLM, Hugging Face, and recent LLM research papers, Model2Kernel discovers 353 previously unknown bugs while producing only nine false positives, demonstrating its effectiveness.

Category Theory for Programming

2026-03-05T20:40:36Z

In these lecture notes, we give a brief introduction to some elements of category theory. The choice of topics is guided by applications to functional programming. Firstly, we study initial algebras, which provide a mathematical characterization of datatypes and recursive functions on them. Secondly, we study monads, which give a mathematical framework for effects in functional languages. The notes include many problems and solutions.

Efficient Selection of Type Annotations for Performance Improvement in Gradual Typing

2026-03-05T19:58:35Z

Gradual typing has gained popularity as a design choice for integrating static and dynamic typing within a single language. Several practical languages have adopted gradual typing to offer programmers the flexibility to annotate their programs as needed. Meanwhile there is a key challenge of unexpected performance degradation in partially typed programs. The execution speed may significantly decrease when simply adding more type annotations. Prior studies have investigated strategies of selectively adding type annotations for better performance. However, they are restricted in substantial compilation time, which impedes the practical usage. This paper presents a new technique to select a subset of type annotations derived by type inference for improving the execution performance of gradually typed programs. The advantage of the proposal is shorter compilation time by employing a lightweight, amortized approach. It selects type annotations along the data flows, which is expected to avoid expensive runtime casts caused by a value repeatedly crossing the boundaries between untyped and typed code. We demonstrate the applicability of our proposal, and conduct experiments to validate its effectiveness of improving the execution time on Reticulated Python. Our implementation supports a Python subset to select type annotations derived by an implemented, external type inference engine. Experiment results show that our proposal outperforms a naive strategy of using all type annotations derived by type inference among the benchmark programs. In comparison with an existing approach, the proposal achieves comparable execution speed and shows advantage of maintaining a more stable compilation time of deriving and selecting type annotations. Our results empirically indicate that the proposed technique is practical within Reticulated Python for mitigating the performance bottleneck of gradually typed programs.

Mining Beyond the Bools: Learning Data Transformations and Temporal Specifications

2026-03-05T19:56:25Z

Mining specifications from execution traces presents an automated way of capturing characteristic system behaviors. However, existing approaches are largely restricted to Boolean abstractions of events, limiting their ability to express data-aware properties. In this paper, we extend mining procedures to operate over richer datatypes. We first establish candidate functions in our domain that cover the set of traces by leveraging Syntax Guided Synthesis (SyGuS) techniques. To capture these function applications temporally, we formalize the semantics of TSL$_f$, a finite-prefix interpretation of Temporal Stream Logic (TSL) that extends LTL$_f$ with support for first-order predicates and functional updates. This allows us to unify a corresponding procedure for learning the data transformations and temporal specifications of a system. We demonstrate our approach synthesizing reactive programs from mined specifications on the OpenAI-Gymnasium ToyText environments, finding that our method is more robust and orders of magnitude more sample-efficient than passive learning baselines on generalized problem instances.

JoinActors: A Modular Library for Actors with Join Patterns

2026-03-05T19:56:02Z

Join patterns are a high-level programming construct for message-passing applications. They offer an intuitive and declarative approach for specifying how concurrent and distributed components coordinate, possibly depending on complex conditions over combinations of messages. Join patterns have inspired many implementations -- but most of them are not available as libraries: rather, they are domain-specific languages that can be hard to integrate into pre-existing ecosystems. Moreover, all implementations ship with a predefined matching algorithm, which may not be optimal depending on the application requirements. These limitations are addressed by `JoinActors`, a recently published library which integrates join patterns in the off-the-shelf Scala 3 programming language, and is designed to be modular w.r.t. the matching algorithm in use. In this work we address the problem of designing, developing, and evaluating a modular join pattern matching toolkit that (1) can be used as a regular library with a developer-friendly syntax within a pre-existing programming language, and (2) has an extensible design that supports the use and comparison of different matching algorithms. We analyse how `JoinActors` achieves goals (1) and (2) above. The paper that introduced `JoinActors` only briefly outlined its design and implementation (as its main goal was formalising its novel fair matching semantics*). In this work we present and discuss in detail an improved version of `JoinActors`, focusing on its use of metaprogramming (which enables an intuitive API resembling standard pattern matching) and on its modular design. We show how this enables the integration of multiple matching algorithms with different optimisations and we evaluate their performance via benchmarks covering different workloads. We illustrate a sophisticated use of Scala 3's metaprogramming for the integration of an advanced concurrent programming construct within a pre-existing language. In addition, we discuss the insights and "lessons learned" in optimising join pattern matching, and how they are facilitated by `JoinActors`'s modularity -- which allows for the systematic comparison of multiple matching algorithm implementations. We adopt the fair join pattern matching semantics and the benchmark suite from the paper that originally introduced `JoinActors`. Through extensive testing we ensure that our new optimised matching algorithms produce exactly the same matches as the original `JoinActors` library, while achieving significantly better performance. The improved version of `JoinActors` is the companion artifact of this paper. This work showcases the expressiveness, effectiveness, and usability of join patterns for implementing complex coordination patterns in distributed message-passing systems, within a pre-existing language. It also demonstrates promising performance results, with significant improvements over previous work. Besides the practical promise, `JoinActors`'s modular design offers a research playground for exploring and comparing new join pattern matching algorithms, possibly based on entirely different semantics.

Evaluating LLMs in the Context of a Functional Programming Course: A Comprehensive Study

2026-03-05T19:55:41Z

Large-Language Models (LLMs) are changing the way learners acquire knowledge outside the classroom setting. Previous studies have shown that LLMs seem effective in generating to short and simple questions in introductory CS courses using high-resource programming languages such as Java or Python. In this paper, we evaluate the effectiveness of LLMs in the context of a low-resource programming language -- OCaml, in an educational setting. In particular, we built three benchmarks to comprehensively evaluate 9 state-of-the-art LLMs: 1) $λ$CodeGen (a benchmark containing natural-language homework programming problems); 2) $λ$Repair (a benchmark containing programs with syntax, type, and logical errors drawn from actual student submissions); 3) $λ$Explain (a benchmark containing natural language questions regarding theoretical programming concepts). We grade each LLMs responses with respect to correctness using the OCaml compiler and an autograder. And our evaluation goes beyond common evaluation methodology by using manual grading to assess the quality of the responses. Our study shows that the top three LLMs are effective on all tasks within a typical functional programming course, although they solve much fewer homework problems in the low-resource setting compared to their success on introductory programming problems in Python and Java. The strength of LLMs lies in correcting syntax and type errors as well as generating answers to basic conceptual questions. While LLMs may not yet match dedicated language-specific tools in some areas, their convenience as a one-stop tool for multiple programming languages can outweigh the benefits of more specialized systems. We hope our benchmarks can serve multiple purposes: to assess the evolving capabilities of LLMs, to help instructors raise awareness among students about the limitations of LLM-generated solutions, and to inform programming language researchers about opportunities to integrate domain-specific reasoning into LLMs and develop more powerful code synthesis and repair tools for low-resource languages.

Pitfalls in VM Implementation on CHERI: Lessons from Porting CRuby

2026-03-05T19:55:24Z

CHERI (Capability Hardware Enhanced RISC Instructions) is a novel hardware designed to address memory safety issues. By replacing traditional pointers with hardware capabilities, it enhances security in modern software systems. A Virtual Machine (VM) is one such system that can benefit from CHERI's protection, as it may contain latent memory vulnerabilities. However, developing and porting VMs to CHERI is a non-trivial task. There are many subtle pitfalls from the assumptions on the undefined behaviors of the C language made based on conventional architectures. Those assumptions conflict with CHERI's stricter memory safety model, causing unexpected failures. Although several prior works have discussed the process of porting VMs, they focus on the overall porting process instead of the pitfalls for VM implementation on CHERI. The guide for programming in CHERI exists, but it is for general programming, not addressing VM-specific issues. We have ported CRuby to CHERI as a case study and surveyed previous works on porting VMs to CHERI. We categorized and discussed the issues found based on their causes. In this paper, we illustrate the VM-specific pitfalls for each category. Most of the pitfalls arise from the undefined behaviors in the C language; in particular, implementation techniques and idioms of VMs often assume behaviors of traditional architectures that are invalid on CHERI. We also discuss workarounds for them and the impacts of those workarounds. We verified the validity of the workarounds by applying them to our CRuby port and by surveying the codebases of prior case studies. This work contributes to the body of knowledge on developing and porting VMs to CHERI and will help guide efforts toward constructing safer VMs.

Hybrid Structured Editing: Structures for Tools, Text for Users

2026-03-05T19:55:07Z

In programming, better tools often yield better results. For that, modern programming environments offer mechanisms to allow for their extensibility. The closer those tools are to the code, the easier it is for programmers to map the information provided by a tool to the code this information is about. However, existing extension mechanisms do not facilitate the close integration of tools with textual source code. Tools must be able to track program structures across edits to appear at the right positions but the parsing step of text complicates tracking structures. We propose hybrid structured editing, an approach that supports tool builders by providing structural guarantees while providing tool users with a familiar and consistent text editing interface. Hybrid structured editing allows tool builders to declare constraints on the structure that a program must conform to and ensures their observance. We present an implementation and several case studies of tools based on hybrid structured editing to demonstrate its effectiveness. Hybrid structured editing supports the safe extension of programming environments with tools that work on a structured representation of code and provide a consistent and reliable user experience.