https://arxiv.org/api/p2ypLGEsUuBw5GpEE/Z5jqXAJuU 2026-06-21T12:40:43Z 9918 705 15 http://arxiv.org/abs/2601.20783v1 The Monotone Priority System: Foundations of Contract-Specific Sequencing 2026-01-28T17:16:51Z

Modern blockchain applications benefit from the ability to specify sequencing constraints on the transactions that interact with them. This paper proposes a principled and axiomatically justified way of adding sequencing constraints on smart contract function calls that balances expressivity with the tractability of block production. Specifically, we propose a system in which contract developers are allowed to set an integer global priority for each of their calls, so long as that the call's chosen priority is no higher than the priority of any of its referenced calls. Block builders must then simply sequence transactions in priority order (from high to low priority), breaking ties however they would like. We show that this system is the unique system that satisfies five independent axioms.

2026-01-28T17:16:51Z Naveen Durvasula http://arxiv.org/abs/2511.07776v2 Streaming Tensor Programs: A Streaming Abstraction for Dynamic Parallelism 2026-01-28T06:17:06Z

Dynamic behaviors are becoming prevalent in tensor applications, like machine learning, where many widely used models contain data-dependent tensor shapes and control flow. However, the limited expressiveness of prior programming abstractions for spatial dataflow accelerators (SDAs) forces these dynamic behaviors to be implemented statically and/or unoptimized. To address these challenges, we present Streaming Tensor Programs (STeP), a streaming abstraction that enables dynamic tensor workloads to run efficiently on SDAs. STeP introduces flexible routing operators, an explicit memory hierarchy, and symbolic-shape semantics that expose dynamic data rates and tensor dimensions. These capabilities unlock new optimizations, like dynamic tiling, dynamic parallelization, and configuration time-multiplexing, that adapt SDA execution to dynamic behaviors while preserving dataflow efficiency. Using a cycle-approximate simulator on representative LLM layers and a full model with real-world traces, STeP enables: dynamic tiling that breaks the Pareto-optimal frontier from prior work, dynamic parallelization that improves latency by ~2.72x, and configuration time-multiplexing that increases compute utilization by ~2.64x over prior SDA abstractions and their implementations.

2025-11-11T02:49:10Z Gina Sohn Genghan Zhang Konstantin Hossfeld Jungwoo Kim Nathan Sobotka Nathan Zhang Olivia Hsu Kunle Olukotun 10.1145/3779212.3790229 http://arxiv.org/abs/2601.19426v1 For Generalised Algebraic Theories, Two Sorts Are Enough 2026-01-27T10:05:11Z

Generalised algebraic theories (GATs) allow multiple sorts indexed over each other. For example, the theories of categories or Martin-L{ö}f type theories form GATs. Categories have two sorts, objects and morphisms, and the latter are double-indexed over the former. Martin-L{ö}f type theory has four sorts: contexts, substitutions, types and terms. For example, types are indexed over contexts, and terms are indexed over both contexts and types. In this paper we show that any GAT can be reduced to a GAT with only two sorts, and there is a section-retraction correspondence (formally, a strict coreflection) between models of the original and the reduced GAT. In particular, any model of the original GAT can be turned into a model of the reduced (two-sorted) GAT and back, and this roundtrip is the identity. The reduced GAT is simpler than the original GAT in the following aspects: it does not have sort equalities; it does not have interleaved sorts and operations; if the original GAT did not have interleaved sorts and operations, then the reduced GAT won't have operations interleaved between different sorts. In a type-theoretic metatheory, the initial algebra of a GAT is called a quotient inductive-inductive type (QIIT). Our reduction provides a way to implement QIITs with sort equalities or interleaved constructors which are not allowed by Cubical Agda. An instance of our reduction is the well-known method of reducing mutual inductive types to a single indexed family. Our approach is semantic in that it does not rely on a syntactic description of GATs, but instead, on Uemura's bi-initial characterisation of the category of (finite) GATs in the 2-category of finitely complete categories with a chosen exponentiable morphism.

2026-01-27T10:05:11Z Samy Avrillon Ambrus Kaposi Ambroise Lafont Niyousha Najmaei Johann Rosain http://arxiv.org/abs/2506.23281v2 BugLens: Leveraging Bisection for Lightweight Compiler Bug Deduplication 2026-01-27T08:02:32Z

Random testing has proven to be an effective technique for compiler validation. However, the debugging of bugs identified through random testing presents a significant challenge due to the frequent occurrence of duplicate test programs that expose identical compiler bugs. The process to identify duplicates is a practical research problem known as bug deduplication. Prior methodologies for compiler bug deduplication primarily rely on program analysis to extract bug-related features for duplicate identification, which can result in substantial computational overhead and limited generalizability. This paper investigates the feasibility of employing bisection, a standard debugging procedure largely overlooked in prior research on compiler bug deduplication, for this purpose. Our study demonstrates that the utilization of bisection to locate failure-inducing commits provides a valuable criterion for deduplication, albeit one that requires supplementary techniques for more accurate identification. Building on these results, we introduce BugLens, a novel deduplication method that primarily uses bisection, enhanced by the identification of bug-triggering optimizations to minimize false negatives. Empirical evaluations conducted on four real-world datasets demonstrate that BugLens significantly outperforms the state-of-the-art analysis-based methodologies Tamer and D3 by saving an average of 26.98% and 9.64% human effort to identify the same number of distinct bugs. Given the inherent simplicity and generalizability of bisection, it presents a highly practical solution for compiler bug deduplication in real-world applications.

2025-06-29T15:12:57Z Xintong Zhou Zhenyang Xu Yongqiang Tian Chengnian Sun http://arxiv.org/abs/2601.19207v1 Refactoring and Equivalence in Rust: Expanding the REM Toolchain with a Novel Approach to Automated Equivalence Proofs 2026-01-27T05:17:59Z

Refactoring tools are central to modern development, with extract-function refactorings used heavily in day-to-day work. For Rust, however, ownership, borrowing, and advanced type features make automated extract-function refactoring challenging. Existing tools either rely on slow compiler-based analysis, support only restricted language fragments, or provide little assurance beyond "it still compiles." This paper presents REM2.0, a new extract-function and verification toolchain for Rust. REM2.0 works atop rust-analyzer as a persistent daemon, providing low-latency refactorings with a VSCode front-end. It adds a repairer that automatically adjusts lifetimes and signatures when extraction exposes borrow-checker issues, and an optional verification pipeline connecting to CHARON and AENEAS to generate Coq equivalence proofs for a supported Rust subset. The architecture is evaluated on three benchmark suites. On the original REM artefact, REM2.0 achieves 100% compatibility while reducing latency from ~1000ms to single-digit milliseconds in the daemon. On 40 feature-focused extractions from 20 highly starred GitHub repositories, REM2.0 handles most examples involving async/await, const fn, non-local control flow, generics, and higher-ranked trait bounds. On twenty verification benchmarks, the CHARON/AENEAS pipeline constructs end-to-end equivalence proofs for cases within its current subset. Overall, results show that a rust-analyzer-based design can provide fast, feature-rich extract-function refactoring for real Rust programs, while opt-in verification delivers machine-checked behaviour preservation.

2026-01-27T05:17:59Z Matthew Britton Sasha Pak Alex Potanin http://arxiv.org/abs/2503.04003v3 Understanding and Detecting Platform-Specific Violations in Android Auto Apps 2026-01-27T02:18:24Z

Despite over 3.5 million Android apps and 200+ million Android Auto-compatible vehicles, only a few hundred apps support Android Auto due to platform-specific compliance requirements. Android Auto mandates service-based architectures in which the vehicle system invokes app callbacks to render the UI and handle interactions, which is fundamentally different from standard Activity-based Android development. Through an empirical study analysis of 98 issues across 14 Android Auto app repositories, we identified three major compliance failure categories: media playback errors, UI rendering issues, and voice command integration failures in line with mandatory requirements for integrating Android Auto support. We introduce AutoComply, a static analysis framework capable of detecting these compliance violations through the specialized analysis of platform-specific requirements. AutoComply constructs a Car-Control Flow Graph (CCFG) extending traditional control flow analysis to model the service-based architecture of Android Auto apps. Evaluating AutoComply on 31 large-scale open-source apps, it detected 27 violations (13X more than Android Lint), while no false positives were observed, achieving 2X faster analysis. Developers have acknowledged 14 of these violations with 8 fixes already implemented, validating AutoComply's practical effectiveness.

2025-03-06T01:37:02Z 11 pages, 2 tables, 4 figures 7th ACM/IEEE International Conference on Automation of Software Test (AST 2026) (AST '26) Moshood Fakorede Umar Farooq 10.1145/3793654.3793745 http://arxiv.org/abs/2601.18793v1 Handling Scope Checks (Extended Version) 2026-01-26T18:55:58Z

Metaprogramming and effect handlers interact in unexpected, and sometimes undesirable, ways. One example is scope extrusion: the generation of ill-scoped code. Scope extrusion can either be preemptively prevented, via static type systems, or retroactively detected, via dynamic checks. Static type systems exist in theory, but struggle with a range of implementation and usability problems in practice. In contrast, dynamic checks exist in practice (e.g. in MetaOCaml), but are understudied in theory. Designers of metalanguages are thus given little guidance regarding the design and implementation of checks. We present the first formal study of dynamic scope extrusion checks, introducing a calculus ($λ_{\langle\langle\text{op}\rangle\rangle}$) for describing and evaluating checks. Further, we introduce a novel dynamic check $\unicode{x2014}$ the "Cause-for-Concern" check $\unicode{x2014}$ which we prove correct, characterise without reference to its implementation, and argue combines the advantages of existing dynamic checks. Finally, we extend our framework with refined environment classifiers, which statically prevent scope extrusion, and compare their expressivity with the dynamic checks.

2026-01-26T18:55:58Z Extended version of Handling Scope Checks (POPL'26): includes appendices, fixes minor typos, and tweaks phrasing for readability Proceedings of the ACM on Programming Languages (PACMPL), Volume 10, POPL 2026, Article 39 Michael Lee Ningning Xie Oleg Kiselyov Jeremy Yallop 10.1145/3776681 http://arxiv.org/abs/2601.18745v1 Symmetric Proofs of Parameterized Programs 2026-01-26T18:06:05Z

We investigate the problem of safety verification of infinite-state parameterized programs that are formed based on a rich class of topologies. We introduce a new proof system, called parametric proof spaces, which exploits the underlying symmetry in such programs. This is a local notion of symmetry which enables the proof system to reuse proof arguments for isomorphic neighbourhoods in program topologies. We prove a sophisticated relative completeness result for the proof system with respect to a class of universally quantified invariants. We also investigate the problem of algorithmic construction of these proofs. We present a construction, inspired by classic results in model theory, where an infinitary limit program can be soundly and completely verified in place of the parameterized family, under some conditions. Furthermore, we demonstrate how these proofs can be constructed and checked against these programs without the need for axiomatization of the underlying topology for proofs or the programs. Finally, we present conditions under which our algorithm becomes a decision procedure.

2026-01-26T18:06:05Z Ruotong Cheng Azadeh Farzan http://arxiv.org/abs/2601.18067v1 EvolVE: Evolutionary Search for LLM-based Verilog Generation and Optimization 2026-01-26T01:53:54Z

Verilog's design cycle is inherently labor-intensive and necessitates extensive domain expertise. Although Large Language Models (LLMs) offer a promising pathway toward automation, their limited training data and intrinsic sequential reasoning fail to capture the strict formal logic and concurrency inherent in hardware systems. To overcome these barriers, we present EvolVE, the first framework to analyze multiple evolution strategies on chip design tasks, revealing that Monte Carlo Tree Search (MCTS) excels at maximizing functional correctness, while Idea-Guided Refinement (IGR) proves superior for optimization. We further leverage Structured Testbench Generation (STG) to accelerate the evolutionary process. To address the lack of complex optimization benchmarks, we introduce IC-RTL, targeting industry-scale problems derived from the National Integrated Circuit Contest. Evaluations establish EvolVE as the new state-of-the-art, achieving 98.1% on VerilogEval v2 and 92% on RTLLM v2. Furthermore, on the industry-scale IC-RTL suite, our framework surpasses reference implementations authored by contest participants, reducing the Power, Performance, Area (PPA) product by up to 66% in Huffman Coding and 17% in the geometric mean across all problems. The source code of the IC-RTL benchmark is available at https://github.com/weiber2002/ICRTL.

2026-01-26T01:53:54Z 17 pages, 6 figures, 8 tables Wei-Po Hsin Ren-Hao Deng Yao-Ting Hsieh En-Ming Huang Shih-Hao Hung http://arxiv.org/abs/2401.02570v2 Parameterized Hardware Design with Latency-Abstract Interfaces 2026-01-25T23:12:39Z

Hardware designs must use latency-insensitive (LI) interfaces when timing is input-dependent. When timing is input-independent, designs should use latency-sensitive (LS) interfaces for maximum performance. However, designs commonly use LI interfaces to integrate with externally generated LS modules--from, e.g., IP generators, high-level synthesis, or domain specific languages. In every fully integrated design, such uses of LI represent pure overhead. The challenge is that generators can dramatically change timing interfaces of the modules to meet performance objectives, and LI interfaces act as a useful design abstraction and enable timing adaptation. We define latency-abstract (LA) interfaces, a new design abstraction, which provide the timing adaptability of LI interfaces at design-time and the efficient integration of LS interfaces. LA interfaces use output parameters, a novel compile-time mechanism for child modules to return values parent modules, to abstract and encapsulate timing behaviors at design time. During design elaboration, LA interfaces are compiled into efficient LS interfaces based on parameter values. While an attractive option, LA interfaces inherit the complexities of parameterized hardware design: the user must reason how parameters influence timing behaviors of modules and ensure that designs adapt to interface changes. To address this challenge and demonstrate the utility of LA interfaces, we design Lilac, a parameterized HDL that uses a type system track the influence of parameters on timing behaviors and formally guarantee that every parameterization of an LA design results in a circuit without structural hazards.

2024-01-04T23:11:24Z To appear at ASPLOS '26 Rachit Nigam Ethan Gabizon Edmund Lam Carolyn Zech Jonathan Balkind Adrian Sampson 10.1145/3779212.3790199 http://arxiv.org/abs/2511.04768v2 FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow 2026-01-25T22:55:01Z

As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sparse machine learning models written in PyTorch to fused sparse dataflow graphs for reconfigurable dataflow architectures (RDAs). FuseFlow is the first compiler to support general cross-expression fusion of sparse operations. In addition to fusion across kernels (expressions), FuseFlow also supports optimizations like parallelization, dataflow ordering, and sparsity blocking. It targets a cycle-accurate dataflow simulator for microarchitectural analysis of fusion strategies. We use FuseFlow for design-space exploration across four real-world machine learning applications with sparsity, showing that full fusion (entire cross-expression fusion across all computation in an end-to-end model) is not always optimal for sparse models-fusion granularity depends on the model itself. FuseFlow also provides a heuristic to identify and prune suboptimal configurations. Using Fuseflow, we achieve performance improvements, including a ~2.7x speedup over an unfused baseline for GPT-3 with BigBird block-sparse attention.

2025-11-06T19:40:20Z Rubens Lacouture Nathan Zhang Ritvik Sharma Marco Siracusa Fredrik Kjolstad Kunle Olukotun Olivia Hsu 10.1145/3779212.3790165 http://arxiv.org/abs/2509.06752v2 Termination Analysis of Linear-Constraint Programs 2026-01-25T21:15:08Z

This Survey provides an overview of techniques in termination analysis for programs with numerical variables and transitions defined by linear constraints. This subarea of program analysis is challenging due to the existence of undecidable problems, and this Survey systematically explores approaches that mitigate this inherent difficulty. These include foundational decidability results, the use of ranking functions, and disjunctive well-founded transition invariants. The Survey also discusses non-termination witnesses, used to prove that a program will not halt. We examine the algorithmic and complexity aspects of these methods, showing how different approaches offer a trade-off between expressive power and computational complexity. The Survey does not discuss how termination analysis is performed on real-world programming languages, nor does it consider more expressive abstract models that include non-linear arithmetic, probabilistic choice, or term rewriting systems.

2025-09-08T14:40:34Z Final version for FNT-PL Amir M. Ben-Amram Samir Genaim Joël Ouaknine James Worrell http://arxiv.org/abs/2601.17888v1 iResolveX: Multi-Layered Indirect Call Resolution via Static Reasoning and Learning-Augmented Refinement 2026-01-25T15:42:34Z

Indirect call resolution remains a key challenge in reverse engineering and control-flow graph recovery, especially for stripped or optimized binaries. Static analysis is sound but often over-approximates, producing many false positives, whereas machine-learning approaches can improve precision but may sacrifice completeness and generalization. We present iResolveX, a hybrid multi-layered framework that combines conservative static analysis with learning-based refinement. The first layer applies a conservative value-set analysis (BPA) to ensure high recall. The second layer adds a learning-based soft-signature scorer (iScoreGen) and selective inter-procedural backward analysis with memory inspection (iScoreRefine) to reduce false positives. The final output, p-IndirectCFG, annotates indirect edges with confidence scores, enabling downstream analyses to choose appropriate precision--recall trade-offs. Across SPEC CPU2006 and real-world binaries, iScoreGen reduces predicted targets by 19.2% on average while maintaining BPA-level recall (98.2%). Combined with iScoreRefine, the total reduction reaches 44.3% over BPA with 97.8% recall (a 0.4% drop). iResolveX supports both conservative, recall-preserving and F1-optimized configurations and outperforms state-of-the-art systems.

2026-01-25T15:42:34Z Monika Santra Bokai Zhang Mark Lim Vishnu Asutosh Dasu Dongrui Zeng Gang Tan http://arxiv.org/abs/2503.03008v3 MoSE: Hierarchical Self-Distillation Enhances Early Layer Embeddings 2026-01-25T09:44:20Z

Deploying language models often requires navigating accuracy vs. performance trade-offs to meet latency constraints while preserving utility. Traditional model distillation reduces size but incurs substantial costs through training separate models. We introduce ModularStarEncoder (MoSE), a 1-billion-parameter multi-exit encoder for code retrieval and classification that employs a novel Self-Distillation mechanism. This approach significantly enhances lower-layer representations, enabling flexible deployment of different model portions with favorable performance trade-offs. Our architecture improves text-to-code and code-to-code search by targeting specific encoder layers as exit heads, where higher layers guide earlier ones during training, thereby improving intermediate representations at minimal additional cost. We further enhance MoSE with a repository-level contextual loss that maximizes training context window utilization. Additionally, we release a new dataset created through code translation that extends text-to-code benchmarks with cross-language code-to-code pairs. Evaluations demonstrate the effectiveness of Self-Distillation as a principled approach to trading inference cost for accuracy across various code understanding tasks.

2025-03-04T21:08:17Z Accepted in the AAAI 2026 Main Technical Track Andrea Gurioli Federico Pennino João Monteiro Maurizio Gabbrielli http://arxiv.org/abs/2601.17754v1 An MLIR Lowering Pipeline for Stencils at Wafer-Scale 2026-01-25T08:58:07Z

The Cerebras Wafer-Scale Engine (WSE) delivers performance at an unprecedented scale of over 900,000 compute units, all connected via a single-wafer on-chip interconnect. Initially designed for AI, the WSE architecture is also well-suited for High Performance Computing (HPC). However, its distributed asynchronous programming model diverges significantly from the simple sequential or bulk-synchronous programs that one would typically derive for a given mathematical program description. Targeting the WSE requires a bespoke re-implementation when porting existing code. The absence of WSE support in compilers such as MLIR, meant that there was little hope for automating this process. Stencils are ubiquitous in HPC, and in this paper we explore the hypothesis that domain specific information about stencils can be leveraged by the compiler to automatically target the WSE without requiring application-level code changes. We present a compiler pipeline that transforms stencil-based kernels into highly optimized CSL code for the WSE, bridging the semantic gap between the mathematical representation of the problem and the WSE's asynchronous execution model. Based upon five benchmarks across three HPC programming technologies, running on both the Cerebras WSE2 and WSE3, our approach delivers comparable, if not slightly better, performance than manually optimized code. Furthermore, without requiring any application level code changes, performance on the WSE3 is around 14 times faster than 128 Nvidia A100 GPUs and 20 times faster than 128 nodes of a CPU-based Cray-EX supercomputer when using our approach.

2026-01-25T08:58:07Z Paper in ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '26) Nicolai Stawinoga David Katz Anton Lydike Justs Zarins Nick Brown George Bisbas Tobias Grosser 10.1145/3779212.3790124