https://arxiv.org/api/fj7BPM8DucORTv0iP4E055+Hi+Y2026-06-15T06:35:29Z988861515http://arxiv.org/abs/2602.14717v1Optimal Program Synthesis via Abstract Interpretation2026-02-16T13:02:53ZWe consider the problem of synthesizing programs with numerical constants that optimize a quantitative objective, such as accuracy, over a set of input-output examples. We propose a general framework for optimal synthesis of such programs in a given domain specific language (DSL), with provable optimality guarantees. Our framework enumerates programs in a general search graph, where nodes represent subsets of concrete programs. To improve scalability, it uses A* search in conjunction with a search heuristic based on abstract interpretation; intuitively, this heuristic establishes upper bounds on the value of subtrees in the search graph, enabling the synthesizer to identify and prune subtrees that are provably suboptimal. In addition, we propose a natural strategy for constructing abstract transformers for monotonic semantics, which is a common property for components in DSLs for data classification. Finally, we implement our approach in the context of two such existing DSLs, demonstrating that our algorithm is more scalable than existing optimal synthesizers.2026-02-16T13:02:53ZProc. ACM Program. Lang. 8, POPL, Article 16 (January 2024)Stephen MellSteve ZdancewicOsbert Bastani10.1145/3632858http://arxiv.org/abs/2602.14573v1Polar: An Algebraic Analyzer for (Probabilistic) Loops2026-02-16T09:05:42ZWe present the Polar framework for fully automating the analysis of classical and probabilistic loops using algebraic reasoning. The central theme in Polar comes with handling algebraic recurrences that precisely capture the loop semantics. To this end, our work implements a variety of techniques to compute exact closed-forms of recurrences over higher-order moments of variables, infer invariants, and derive loop sensitivities with respect to unknown parameters. Polar can analyze probabilistic loops containing if-statements, polynomial arithmetic, and common probability distributions. By translating loop analysis into linear recurrence solving, Polar uses the derived closed-forms of recurrences to compute the strongest polynomial invariant or to infer parameter sensitivity. Polar is both sound and complete within well-defined programming model restrictions. Lifting any of these restrictions results in significant hardness limits of computation. To overcome computational burdens for the sake of efficiency, Polar also provides incomplete but sound techniques to compute moments of combinations of variables.2026-02-16T09:05:42ZPublished in "Principles of Verification: Cycling the Probabilistic Landscape"Marcel MoosbruggerJulian MüllnerEzio BartocciLaura Kovács10.1007/978-3-031-75783-9_8http://arxiv.org/abs/2602.15078v1Computer Science as Infrastructure: the Spine of the Lean Computer Science Library (CSLib)2026-02-16T07:41:16ZFollowing in the footsteps of the success of Mathlib - the centralised library of formalised mathematics in Lean - CSLib is a rapidly-growing centralised library of formalised computer science and software. In this paper, we present its founding technical principles, operation, abstractions, and semantic framework. We contribute reusable semantic interfaces (reduction and labelled transition systems), proof automation, CI/testing support for maintaining automation and compatibility with Mathlib, and the first substantial developments of languages and models.2026-02-16T07:41:16ZChristopher HensonFabrizio Montesihttp://arxiv.org/abs/2605.04050v1LCM: Lossless Context Management2026-02-14T20:46:47ZWe introduce Lossless Context Management (LCM), a deterministic architecture for LLM memory that outperforms Claude Code on long-context tasks. When benchmarked using Opus 4.6, our LCM-augmented coding agent, Volt, achieves higher scores than Claude Code on the OOLONG long-context eval, including at every context length between 32K and 1M tokens.
LCM may be considered both a vindication and extension of the recursive paradigm pioneered by Recursive Language Models (RLMs). Our results demonstrate that recursive context manipulation can outperform not just conventional LLMs, but frontier coding agents with native file-system access.
LCM departs from RLM by decomposing symbolic recursion into two deterministic, engine-managed mechanisms: recursive context compression, in which a hierarchical summary DAG automatically compacts older messages while retaining lossless pointers to every original; and recursive task partitioning, in which engine-managed parallel primitives like LLM-Map replace model-written loops. This trade-off, analogous to the move from GOTO to structured control flow in program-ming language design, sacrifices maximal flexibility for termination guarantees, zero-cost continuity on short tasks, and lossless retrievability of all prior state.2026-02-14T20:46:47ZClint EhrlichTheodore Blackmanhttp://arxiv.org/abs/2602.07672v2Debugging code world models2026-02-14T20:46:17ZCode World Models (CWMs) are language models trained to simulate program execution by predicting explicit runtime state after every executed command. This execution-based world modeling enables internal verification within the model, offering an alternative to natural language chain-of-thought reasoning. However, the sources of errors and the nature of CWMs' limitations remain poorly understood. We study CWMs from two complementary perspectives: local semantic execution and long-horizon state tracking. On real-code benchmarks, we identify two dominant failure regimes. First, dense runtime state reveals produce token-intensive execution traces, leading to token-budget exhaustion on programs with long execution histories. Second, failures disproportionately concentrate in string-valued state, which we attribute to limitations of subword tokenization rather than program structure. To study long-horizon behavior, we use a controlled permutation-tracking benchmark that isolates state propagation under action execution. We show that long-horizon degradation is driven primarily by incorrect action generation: when actions are replaced with ground-truth commands, a Transformer-based CWM propagates state accurately over long horizons, despite known limitations of Transformers in long-horizon state tracking. These findings suggest directions for more efficient supervision and state representations in CWMs that are better aligned with program execution and data types.2026-02-07T19:32:15Z8 pages, 4 figures, under review in conferenceBabak Rahmanihttp://arxiv.org/abs/2602.13400v1InEx-Bug: A Human Annotated Dataset of Intrinsic and Extrinsic Bugs in the NPM Ecosystem2026-02-13T19:03:28ZUnderstanding the causes of software defects is essential for reliable software maintenance and ecosystem stability. However, existing bug datasets do not distinguish between issues originating within a project from those caused by external dependencies or environmental factors. In this paper we present InEx-Bug, a manually annotated dataset of 377 GitHub issues from 103 NPM repositories, categorizing issues as Intrinsic (internal defect), Extrinsic (dependency/environment issue), Not-a-Bug, or Unknown. Beyond labels, the dataset includes rich temporal and behavioral metadata such as maintainer participation, code changes, and reopening patterns. Analyses show Intrinsic bugs resolve faster (median 8.9 vs 10.2 days), are close more often (92% vs 78%), and require code changes more frequently (57% vs 28%) compared to Extrinsic bugs. While Extrinsic bugs exhibit higher reopen rates (12% vs 4%) and delayed recurrence (median 157 vs 87 days). The dataset provides a foundation for further studying Intrinsic and Extrinsic defects in the NPM ecosystem.2026-02-13T19:03:28ZTanner WrightAdams ChenGema Rodríguez-Pérezhttp://arxiv.org/abs/2602.11481v1Compiler-Guided Inference-Time Adaptation: Improving GPT-5 Programming Performance in Idris2026-02-12T01:48:52ZGPT-5, a state of the art large language model from OpenAI, demonstrates strong performance in widely used programming languages such as Python, C++, and Java; however, its ability to operate in low resource or less commonly used languages remains underexplored. This work investigates whether GPT-5 can effectively acquire proficiency in an unfamiliar functional programming language, Idris, through iterative, feedback driven prompting. We first establish a baseline showing that with zero shot prompting the model solves only 22 out of 56 Idris exercises using the platform Exercism, substantially underperforming relative to higher resource languages (45 out of 50 in Python and 35 out of 47 in Erlang). We then evaluate several refinement strategies, including iterative prompting based on platform feedback, augmenting prompts with documentation and error classification guides, and iterative prompting using local compilation errors and failed test cases. Among these approaches, incorporating local compilation errors yields the most substantial improvements. Using this structured, error guided refinement loop, GPT-5 performance increased to an impressive 54 solved problems out of 56. These results suggest that while large language models may initially struggle in low resource settings, structured compiler level feedback can play a critical role in unlocking their capabilities.2026-02-12T01:48:52ZMinda LiBhaskar Krishnamacharihttp://arxiv.org/abs/2602.11232v1Yaksha-Prashna: Understanding eBPF Bytecode Network Function Behavior2026-02-11T16:15:24ZMany cloud infrastructure organizations increasingly rely on third-party eBPF-based network functions for use cases like security, observability, and load balancing, so that not everyone requires a team of highly skilled eBPF experts. However, the network functions from third parties (e.g., F5, Palo Alto) are available in bytecode format to cloud operators, giving little or no understanding of their functional correctness and interaction with other network functions in a chain. Also, eBPF developers want to provide proof of functional correctness for their developed network functions without disclosing the source code to the operators. We design Yaksha-Prashna, a system that allows operators/developers to assert and query bytecode's conformance to its specification and dependencies on other bytecodes. Our work builds domain-specific models that enable us to employ scalable program analysis to extract and model eBPF programs. Using Yaksha-Prashna language, we express 24 properties on standard and non-standard eBPF-based network functions with 200-1000x speedup over the state-of-the-art work.2026-02-11T16:15:24ZAnimesh SinghK Shiv KumarS. VenkataKeerthyPragna MamidipakaR V B R N AaseeshSayandeep SenPalanivel KodeswaranTheophilus A. BensonRamakrishna UpadrastaPraveen Tammanahttp://arxiv.org/abs/2604.09558v1VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination2026-02-11T06:23:10ZWith the widening gap between compute and memory operation latencies, data movement optimizations have become increasingly important for DNN compilation. Current optimizations such as layout transformations and operator fusion only target a subset of tensor operators and consequently miss important opportunities for reducing data movement in contemporary DNN workloads, including large language models.
We introduce VTC, a novel tensor compilation framework that for the first time eliminates all unnecessary data movement by targeting the full spectrum of data movement operators. VTC proposes the concept of virtual tensors to track data movement between compute operators via index mappings rather than expensive physical data transfers to and from global memory, which can seamlessly interoperate with existing computation kernels and handle arbitrary tensor operator compositions. We also introduce a novel data movement elimination algorithm to automatically identify a profitable virtual tensor creation strategy. Evaluation on a variety of DNNs shows that VTC can outperform existing ML compilers by up to 1.93x (1.28x on average) on NVIDIA GPUs with up to 60% (17.5% on average) inference memory savings.2026-02-11T06:23:10ZAccepted to OSDI'26Muyan HuAhan GuptaJiachen YuanVima GuptaTaeksang KimXin XuJanardhan KulkarniOfer DekelVikram AdveCharith Mendishttp://arxiv.org/abs/2512.02371v2Pushing Tensor Accelerators Beyond MatMul in a User-Schedulable Language2026-02-11T03:04:20ZTensor accelerators now represent a growing share of compute resources in modern CPUs and GPUs. However, they are hard to program, leading developers to use vendor-provided kernel libraries that support tensor accelerators. As a result, the usage of tensor accelerators is limited to the provided interface, mainly designed for traditional ML and scientific computing workloads.
In this paper, we show that tensor accelerators can improve the performance of applications beyond simple variants of MatMul. For example, many image processing pipelines are linear transformations over matrices in disguise and can therefore utilize such specialized hardware. This is nonetheless hindered by the difficulties in programming tensor accelerators. We tackle this problem with compiler-based techniques. We use the Halide user-schedulable language and express operations as Halide algorithms succinctly. To this end, we implement a flexible tensor instruction selector based on equality saturation. The tensor instruction selector supports both CPU- and GPU-attached tensor accelerators and works with existing scheduling operations (e.g., producer-consumer fusion). Together, this enables developers to write diverse accelerator-leveraging applications in a few dozen lines.
Using our system, we demonstrate the potential of tensor accelerators beyond their traditional domains. We implement several image processing pipelines (e.g., filtering, resampling, and denoising) in our system and evaluate them against non-accelerator-leveraging baselines. We show that these pipelines can achieve significant speedups. For example, a downsampling routine is sped up by $6.1\times$ by utilizing Tensor Cores on an Nvidia RTX 4070 GPU.2025-12-02T03:22:21ZCGO 2026Yihong ZhangDerek GerstmannAndrew AdamsMaaz Bin Safeer Ahmadhttp://arxiv.org/abs/2602.10320v1Implementability of Global Distributed Protocols modulo Network Architectures2026-02-10T21:58:45ZGlobal protocols specify distributed, message-passing protocols from a birds-eye view, and are used as a specification for synthesizing local implementations. Implementability asks whether a given global protocol admits a distributed implementation. We present the first comprehensive investigation of global protocol implementability modulo network architectures. We propose a set of network-parametric Coherence Conditions, and exhibit sufficient assumptions under which it precisely characterizes implementability. We further reduce these assumptions to a minimal set of operational axioms describing insert and remove behavior of individual message buffers. Our reduction immediately establishes that five commonly studied asynchronous network architectures, namely peer-to-peer FIFO, mailbox, senderbox, monobox and bag, are instances of our network-parametric result. We use our characterization to derive optimal complexity results for implementability modulo networks, relationships between classes of implementable global protocols, and symbolic algorithms for deciding implementability modulo networks. We implement the latter in the first network-parametric tool Sprout(A), and show that it achieves network generality without sacrificing performance and modularity.2026-02-10T21:58:45ZElaine LiThomas Wieshttp://arxiv.org/abs/2412.15768v2Complete Fusion for Stateful Streams: Equational Theory of Stateful Streams and Fusion as Normalization-by-Evaluation2026-02-10T11:56:16ZProcessing large amounts of data fast, in constant and small space is the point of stream processing and the reason for its increasing use. Alas, the most performant, imperative processing code tends to be almost impossible to read, let alone modify, reuse -- or write correctly.
We present both a stream compilation theory and its implementation as a portable stream processing library Strymonas that lets us assemble complex stream pipelines just by plugging in simple combinators, and yet attain the performance of hand-written imperative loops and state machines. The library supports finite and infinite streams and offers a rich set of combinators. They may be freely composed, and yet the resulting convoluted imperative code has no traces of combinator abstractions: no closures or intermediate objects. The high-performance is portable and statically guaranteed, without relying on compiler or black-box optimizations. We greatly exceed in performance the available stream processing libraries in OCaml. The library generates C and OCaml code.
The declaratively built Strymonas pipelines are all stateful. The stream state introduced in the library is not directly observable. Therefore, the Strymonas API looks like the familiar interface of `pure functional' combinators. Programmers may introduce their own stream state and share it across the pipeline.
Strymonas has been developed in tandem with the equational theory of stateful streams. Our theoretical model represents all desired pipelines and guarantees the existence of unique normal forms, which are mappable to (fused) state machines. We describe the normalization algorithm, as a form of normalization-by-evaluation. The equational theory lets us state and prove the correctness of the complete fusion optimization.2024-12-20T10:40:27ZRevised based on many commentsOleg KiselyovTomoaki KobayashiNick Palladinoshttp://arxiv.org/abs/2602.09197v1Global Protocols under Rendezvous Synchrony: From Realizability to Type Checking2026-02-09T21:05:13ZGlobal protocol specifications are the starting point of top-down verification methodologies, and serve as a blueprint for synthesizing local specifications that guarantee the correctness of distributed implementations. In this work, we study global protocol specifications targeting distributed processes that communicate via rendezvous synchrony. We obtain the following positive results for the synchronous realizability problem: (a) realizability is decidable for global protocols over a transitive concurrency alphabet in 2-EXPTIME in the size of the protocol, and in 3-EXPTIME in the size of the alphabet, and (b) realizability is decidable in EXPTIME for global protocols that unambiguously represent their trace language. Unambiguous global protocols encompass fragments of directed and sender-driven choice protocols studied in prior work. Further, our reductions admit the corollary that the synchronous verification problem is solvable with the same complexity, where a candidate realization is included as part of the input. We then prove a negative result stating that synchronous realizability is undecidable in general. Finally, we propose a type system to check pi-calculus processes against local specifications in the form of synchronous communicating state machines. Our type system is the first to support processes with local mixed choice in the presence of session interleaving and~delegation.2026-02-09T21:05:13ZElaine LiFelix Stutzhttp://arxiv.org/abs/2602.08846v1Impredicativity in Linear Dependent Type Theory2026-02-09T16:12:57ZWe construct a realizability model of linear dependent type theory from a linear combinatory algebra. Our model motivates a number of additions to the type theory. In particular, we add a universe with two decoding operations: one takes codes to cartesian types and the other takes codes to linear types. The universe is impredicative in the sense that it is closed under both large cartesian dependent products and large linear dependent products. We also add a rule for injectivity of the modality turning linear terms into cartesian terms. With all of the additions, we are able to encode (linear) inductive types. As a case study, we consider the type of lists over a linear type, and demonstrate that our encoding has the relevant uniqueness principle. The construction of the realizability model is fully formalized in the proof assistant Rocq.2026-02-09T16:12:57Z20 pages, 2 figuresSam SpeightNiels van der Weidehttp://arxiv.org/abs/2601.02563v3Compressed code: the hidden effects of quantization and distillation on programming tokens2026-02-08T20:13:10ZLarge Language Models (LLMs) have demonstrated exceptional code generation capabilities, yet their token-level mechanisms remain underexplored, particularly in compressed models. Through systematic analysis of programming language token representations, we characterize how programming languages are encoded in LLM tokenizers by analyzing their vocabulary distribution and keyword coverage patterns. We introduce a novel cold-start probability analysis method that provides insights into model behavior without requiring explicit prompts. Additionally, we present a comprehensive evaluation of how different model optimization techniques - including quantization, distillation, model scaling, and task-specific fine-tuning - affect token-level representations and code generation quality. Our experiments, supported by comprehensive probability distribution analysis and evaluation metrics, reveal critical insights into token-level behavior and provide empirically-validated guidelines for maintaining code generation quality under various optimization constraints. These findings advance both theoretical understanding of LLM code generation and practical implementation of optimized models in production environments.2026-01-05T21:32:47Z18 pages, 1 figure and 6 tablesViacheslav SiniaevIaroslav ChelombitkoAleksey Komissarov