https://arxiv.org/api/eBW1/m/H0x24aLfGmAiaDOM2xK8 2026-06-23T09:21:02Z 9934 945 15 http://arxiv.org/abs/2511.15819v1 Filling the Gaps of Polarity: Implementing Dependent Data and Codata Types with Implicit Arguments 2025-11-19T19:22:43Z

The expression problem describes a fundamental tradeoff between two types of extensibility: extending a type with new operations, such as by pattern matching on an algebraic data type in functional programming, and extending a type with new constructors, such as by adding a new object implementing an interface in object-oriented programming. Most dependently typed languages have good support for the former style through inductive types, but support for the latter style through coinductive types is usually much poorer. Polarity is a language that treats both kinds of types symmetrically and allows the developer to switch between type representations.However, it currently lacks several features expected of a state-of-the-art dependently typed language, such as implicit arguments. The central aim of this paper is to provide an algorithmic type system and inference algorithm for implicit arguments that respect the core symmetry of the language. Our work provides two key contributions: a complete algorithmic description of the type system backing Polarity, and a comprehensive description of a unification algorithm that covers arbitrary inductive and coinductive types. We give rules for reduction semantics, conversion checking, and a unification algorithm for pattern-matching, which are essential for a usable implementation. A work-in-progress implementation of the algorithms in this paper is available at https://polarity-lang.github.io/. We expect that the comprehensive account of the unification algorithm and our design decisions can serve as a blueprint for other dependently typed languages that support inductive and coinductive types symmetrically.

2025-11-19T19:22:43Z The Art, Science, and Engineering of Programming, 2025, Vol. 10, Issue 3, Article 19 Bohdan Liesnikov Delft University of Technology, Netherlands David Binder University of Kent, Canterbury, UK Tim Süberkrüb University of Tübingen, Germany 10.22152/programming-journal.org/2025/10/19 http://arxiv.org/abs/2511.15767v1 TB or Not TB: Coverage-Driven Direct Preference Optimization for Verilog Stimulus Generation 2025-11-19T17:23:06Z

With the rapid advancement of Large Language Models (LLMs), there is growing interest in applying them to hardware design and verification. Among these stages, design verification remains the most time-consuming and resource-intensive phase, where generating effective stimuli for the design under test (DUT) is both critical and labor-intensive. We present {\it TB or not TB}, a framework for automated stimulus generation using LLMs fine-tuned through Coverage-Driven Direct Preference Optimization (CD-DPO). To enable preference-based training, we introduce PairaNet, a dataset derived from PyraNet that pairs high- and low-quality testbenches labeled using simulation-derived coverage metrics. The proposed CD-DPO method integrates quantitative coverage feedback directly into the optimization objective, guiding the model toward generating stimuli that maximize verification coverage. Experiments on the CVDP CID12 benchmark show that {\it TB or not TB} outperforms both open-source and commercial baselines, achieving up to 77.27\% improvement in code coverage, demonstrating the effectiveness of Coverage-driven preference optimization for LLM-based hardware verification.

2025-11-19T17:23:06Z Bardia Nadimi Khashayar Filom Deming Chen Hao Zheng http://arxiv.org/abs/2511.15589v1 EPSO: A Caching-Based Efficient Superoptimizer for BPF Bytecode 2025-11-19T16:21:20Z

Extended Berkeley Packet Filter (eBPF) allows developers to extend Linux kernel functionality without modifying its source code. To ensure system safety, an in-kernel safety checker, the verifier, enforces strict safety constraints (for example, a limited program size) on eBPF programs loaded into the kernel. These constraints, combined with eBPF's performance-critical use cases, make effective optimization essential. However, existing compilers (such as Clang) offer limited optimization support, and many semantics-preserving transformations are rejected by the verifier, which makes handcrafted optimization rule design both challenging and limited in effectiveness. Superoptimization overcomes the limitations of rule-based methods by automatically discovering optimal transformations, but its high computational cost limits scalability. To address this, we propose EPSO, a caching-based superoptimizer that discovers rewrite rules via offline superoptimization and reuses them to achieve high-quality optimizations with minimal runtime overhead. We evaluate EPSO on benchmarks from the Linux kernel and several eBPF-based projects, including Cilium, Katran, hXDP, Sysdig, Tetragon, and Tracee. EPSO discovers 795 rewrite rules and achieves up to 68.87 percent (average 24.37 percent) reduction in program size compared to Clang's output, outperforming the state-of-the-art BPF optimizer K2 on all benchmarks and Merlin on 92.68 percent of them. Additionally, EPSO reduces program runtime by an average of 6.60 percent, improving throughput and lowering latency in network applications.

2025-11-19T16:21:20Z Qian Zhu Yuxuan Liu Ziyuan Zhu Shangqing Liu Lei Bu http://arxiv.org/abs/2511.15581v1 Graph Rewriting Language as a Platform for Quantum Diagrammatic Calculi 2025-11-19T16:12:56Z

Systematic discovery of optimization paths in quantum circuit simplification remains a challenge. Today, ZX-calculus, a computing model for quantum circuit transformation, is attracting attention for its highly abstract graph-based approach. Whereas existing tools such as PyZX and Quantomatic offer domain-specific support for quantum circuit optimization, visualization and theorem-proving, we present a complementary approach using LMNtal, a general-purpose hierarchical graph rewriting language, to establish a diagrammatic transformation and verification platform with model checking. Our methodology shows three advantages: (1) manipulation of ZX-diagrams through native graph transformation rules, enabling direct implementation of basic rules; (2) quantified pattern matching via QLMNtal extensions, greatly simplifying rule specification; and (3) interactive visualization and validation of optimization paths through state space exploration. Through case studies, we demonstrate how our framework helps understand optimization paths and design new algorithms and strategies. This suggests that the declarative language LMNtal and its toolchain could serve as a new platform to investigate quantum circuit transformation from a different perspective.

2025-11-19T16:12:56Z 27 pages, 27 figures. Extended version (with Appendices) of the paper to be presented at the 28th International Symposium on Practical Aspects of Declarative Languages (PADL 2026), January 2026 Kayo Tei Haruto Mishina Naoki Yamamoto Kazunori Ueda http://arxiv.org/abs/2511.15323v1 SkyEgg: Joint Implementation Selection and Scheduling for Hardware Synthesis using E-graphs 2025-11-19T10:39:45Z

Hardware synthesis from high-level descriptions remains fundamentally limited by the sequential optimization of interdependent design decisions. Current methodologies, including state-of-the-art high-level synthesis (HLS) tools, artificially separate implementation selection from scheduling, leading to suboptimal designs that cannot fully exploit modern FPGA heterogeneous architectures. Implementation selection is typically performed by ad-hoc pattern matching on operations, a process that does not consider the impact on scheduling. Subsequently, scheduling algorithms operate on fixed selection solutions with inaccurate delay estimates, which misses critical optimization opportunities from appropriately configured FPGA blocks like DSP slices. We present SkyEgg, a novel hardware synthesis framework that jointly optimizes implementation selection and scheduling using the e-graph data structure. Our key insight is that both algebraic transformations and hardware implementation choices can be uniformly represented as rewrite rules within an e-graph, modeling the complete design space of implementation candidates to be selected and scheduled together. First, SkyEgg constructs an e-graph from the input program. It then applies both algebraic and implementation rewrites through equality saturation. Finally, it formulates the joint optimization as a mixed-integer linear programming (MILP) problem on the saturated e-graph. We provide both exact MILP solving and an efficient ASAP heuristic for scalable synthesis. Our evaluation on benchmarks from diverse applications targeting Xilinx Kintex UltraScale+ FPGAs demonstrates that SkyEgg achieves an average speedup of 3.01x over Vitis HLS, with improvements up to 5.22x for complex expressions.

2025-11-19T10:39:45Z Youwei Xiao Yuyang Zou Yun Liang http://arxiv.org/abs/2501.08478v4 Modular Compilation for Quantum Chiplet Architectures 2025-11-19T05:08:00Z

As quantum computing technology matures, industry is adopting modular quantum architectures to keep quantum scaling on the projected path and meet performance targets. However, the complexity of chiplet-based quantum devices, coupled with their growing size, presents an imminent scalability challenge for quantum compilation. Contemporary compilation methods are not well-suited to chiplet architectures - in particular, existing qubit allocation methods are often unable to contend with inter-chiplet links, which don't necessarily support a universal basis gate set. Furthermore, existing methods of logical-to-physical qubit placement, swap insertion (routing), unitary synthesis, and/or optimization, are typically not designed for qubit links of significantly varying latency or fidelity. In this work, we propose SEQC, a hierarchical parallelized compilation pipeline optimized for chiplet-based quantum systems, including several novel methods for qubit placement, qubit routing, and circuit optimization. SEQC attains a $9.3\%$ average increase in circuit fidelity (up to $49.99\%$). Additionally, owing to its ability to parallelize compilation, SEQC achieves $3.27\times$ faster compilation on average (up to $6.74\times$) over a chiplet-unaware Qiskit baseline.

2025-01-14T22:41:29Z Mingyoung Jessica Jeng Nikola Vuk Maruszewski Connor Selna Michael Gavrincea Kaitlin N. Smith Nikos Hardavellas http://arxiv.org/abs/2511.15073v1 Cement2: Temporal Hardware Transactions for High-Level and Efficient FPGA Programming 2025-11-19T03:26:56Z

Hardware design faces a fundamental challenge: raising abstraction to improve productivity while maintaining control over low-level details like cycle accuracy. Traditional RTL design in languages like SystemVerilog composes modules through wiring-style connections that provide weak guarantees for behavioral correctness. While high-level synthesis (HLS) and emerging abstractions attempt to address this, they either introduce unpredictable overhead or restrict design generality. Although transactional HDLs provide a promising foundation by lifting design abstraction to atomic and composable rules, they solely model intra-cycle behavior and do not reflect the native temporal design characteristics, hindering applicability and productivity for FPGA programming scenarios. We propose temporal hardware transactions, a new abstraction that brings cycle-level timing awareness to designers at the transactional language level. Our approach models temporal relationships between rules and supports the description of rules whose actions span multiple clock cycles, providing intuitive abstraction to describe multi-cycle architectural behavior. We implement this in Cement2, a transactional HDL embedded in Rust, enabling programming hardware constructors to build both intra-cycle and temporal transactions. Cement2's synthesis framework lowers description abstraction through multiple analysis and optimization phases, generating efficient hardware. With Cement2's abstraction, we program a RISC-V soft-core processor, custom CPU instructions, linear algebra kernels, and systolic array accelerators, leveraging the high-level abstraction for boosted productivity. Evaluation shows that Cement2 does not sacrifice performance and resources compared to hand-coded RTL designs, demonstrating the high applicability for general FPGA design tasks.

2025-11-19T03:26:56Z Youwei Xiao Zizhang Luo Weijie Peng Yuyang Zou Yun Liang http://arxiv.org/abs/2511.14953v1 Compiling to recurrent neurons 2025-11-18T22:26:27Z

Discrete structures are currently second-class in differentiable programming. Since functions over discrete structures lack overt derivatives, differentiable programs do not differentiate through them and limit where they can be used. For example, when programming a neural network, conditionals and iteration cannot be used everywhere; they can break the derivatives necessary for gradient-based learning to work. This limits the class of differentiable algorithms we can directly express, imposing restraints on how we build neural networks and differentiable programs more generally. However, these restraints are not fundamental. Recent work shows conditionals can be first-class, by compiling them into differentiable form as linear neurons. Similarly, this work shows iteration can be first-class -- by compiling to linear recurrent neurons. We present a minimal typed, higher-order and linear programming language with iteration called $\textsf{Cajal}\scriptstyle(\mathbb{\multimap}, \mathbb{2}, \mathbb{N})$. We prove its programs compile correctly to recurrent neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning. With our implementation, we conduct two experiments where we link these recurrent neurons against a neural network solving an iterative image transformation task. This determines part of its function prior to learning. As a result, the network learns faster and with greater data-efficiency relative to a neural network programmed without first-class iteration. A key lesson is that recurrent neurons enable a rich interplay between learning and the discrete structures of ordinary programming.

2025-11-18T22:26:27Z Joey Velez-Ginorio Nada Amin Konrad Kording Steve Zdancewic http://arxiv.org/abs/2509.13429v2 Catalpa: GC for a Low-Variance Software Stack 2025-11-18T19:33:30Z

The performance of an application/runtime is usually conceptualized as a continuous function where, the lower the amount of memory/time used on a given workload, then the better the compiler/runtime is. However, in practice, good performance of an application is viewed as more of a binary function - either the application responds in under, say 100 ms, and provides a good user experience, or it takes a noticeable amount of time, leaving the user waiting and potentially abandoning the task. Thus, performance really means how often the application is fast enough to meet user expectations, leading industrial developers to focus on the 95th and 99th percentile tail-latencies as heavily, or moreso, than average response time. Our vision is to create a software stack that actively supports these needs via programming language and runtime system design. In this paper we present a novel garbage-collector design, the Catalpa collector, for the Bosque programming language and runtime. This allocator is designed to minimize latency and tail-latency variability while maintaining high-throughput and incurring small memory overheads. To achieve these goals we leverage various features of the Bosque language, including immutability and reference-cycle freedom, to construct a collector that has provably bounded collection pauses, incurs a fixed-constant memory overhead, and ensures starvation freedom for the application!

2025-09-16T18:07:21Z Anthony Arnold Mark Marron http://arxiv.org/abs/2505.20302v3 VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification 2025-11-18T19:04:33Z

This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the growing need for automated hardware design tools that can produce verifiably correct implementations from high-level specifications, potentially accelerating the hardware development process while maintaining rigorous correctness guarantees. Our code and data are available at \href{https://github.com/wilyub/VeriThoughts}{this URL}.

2025-05-16T21:33:14Z Patrick Yubeaton Andre Nakkab Weihua Xiao Luca Collini Ramesh Karri Chinmay Hegde Siddharth Garg http://arxiv.org/abs/2507.00595v2 The Secrets Must Not Flow: Scaling Security Verification to Large Codebases (extended version) 2025-11-18T19:02:00Z

Existing program verifiers can prove advanced properties about security protocol implementations, but are difficult to scale to large codebases because of the manual effort required. We develop a novel methodology called *Diodon* that addresses this challenge by splitting the codebase into the protocol implementation (the *Core*) and the remainder (the *Application*). This split allows us to apply powerful semi-automated verification techniques to the security-critical Core, while fully-automatic static analyses scale the verification to the entire codebase by ensuring that the Application cannot invalidate the security properties proved for the Core. The static analyses achieve that by proving *I/O independence*, i.e., that the I/O operations within the Application are independent of the Core's security-relevant data (such as keys), and that the Application meets the Core's requirements. We have proved Diodon sound by first showing that we can safely allow the Application to perform I/O independent of the security protocol, and second that manual verification and static analyses soundly compose. We evaluate Diodon on two case studies: an implementation of the signed Diffie-Hellman key exchange and a large (100k+ LoC) production Go codebase implementing a key exchange protocol for which we obtained secrecy and injective agreement guarantees by verifying a Core of about 1% of the code with the auto-active program verifier Gobra in less than three person months.

2025-07-01T09:25:54Z Linard Arquint Samarth Kishor Jason R. Koenig Joey Dodds Daniel Kroening Peter Müller 10.1109/SP63933.2026.00026 http://arxiv.org/abs/2511.14198v1 DiverseClaire: Simulating Students to Improve Introductory Programming Course Materials for All CS1 Learners 2025-11-18T07:09:19Z

Although CS programs are booming, introductory courses like CS1 still adopt a one-size-fits-all formats that can exacerbate cognitive load and discourage learners with autism, ADHD, dyslexia and other neurological conditions. These call for compassionate pedagogies and Universal Design For Learning (UDL) to create learning environments and materials where cognitive diversity is welcomed. To address this, we introduce DiverseClaire a pilot study, which simulates students including neurodiverse profiles using LLMs and diverse personas. By leveraging Bloom's Taxonomy and UDL, DiverseClaire compared UDL-transformed lecture slides with traditional formats. To evaluate DiverseClaire controlled experiments, we used the evaluation metric the average score. The findings revealed that the simulated neurodiverse students struggled with learning due to lecture slides that were in inaccessible formats. These results highlight the need to provide course materials in multiple formats for diverse learner preferences. Data from our pilot study will be made available to assist future CS1 instructors.

2025-11-18T07:09:19Z 2 pages Wendy Wong Yuchao Jiang Yuekang Li http://arxiv.org/abs/2511.12638v2 Equivalence Checking of ML GPU Kernels 2025-11-18T05:20:44Z

With the rapid progress of deep learning and large language models (LLMs), companies now spend enormous sums executing GPU kernels. These kernels have, therefore, become prime targets for aggressive optimization. Recent efforts increasingly leverage LLMs to generate GPU kernels, but make no formal guarantees about the generated kernels. We present the first equivalence checker for GPU kernels and use it to formally verify the correctness of machine learning (ML) kernels optimized by hand, by LLMs, and by compilers. We show that our equivalence checker is sound and, for a well-defined class of GPU kernels which includes the programs of interest, complete. Our implementation, VOLTA, can verify ML computations such as convolutions, matrix multiplications, and various attention mechanisms.

2025-11-16T15:09:14Z Kshitij Dubey Benjamin Driscoll Anjiang Wei Neeraj Kayal Rahul Sharma Alex Aiken http://arxiv.org/abs/2511.14002v1 FlakyGuard: Automatically Fixing Flaky Tests at Industry Scale 2025-11-18T00:10:51Z

Flaky tests that non-deterministically pass or fail waste developer time and slow release cycles. While large language models (LLMs) show promise for automatically repairing flaky tests, existing approaches like FlakyDoctor fail in industrial settings due to the context problem: providing either too little context (missing critical production code) or too much context (overwhelming the LLM with irrelevant information). We present FlakyGuard, which addresses this problem by treating code as a graph structure and using selective graph exploration to find only the most relevant context. Evaluation on real-world flaky tests from industrial repositories shows that FlakyGuard repairs 47.6 % of reproducible flaky tests with 51.8 % of the fixes accepted by developers. Besides it outperforms state-of-the-art approaches by at least 22 % in repair success rate. Developer surveys confirm that 100 % find FlakyGuard's root cause explanations useful.

2025-11-18T00:10:51Z To appear in ASE 2025 Chengpeng Li Farnaz Behrang August Shi Peng Liu http://arxiv.org/abs/2503.10868v2 From Semantics to Syntax: A Type Theory for Comprehension Categories 2025-11-17T11:28:09Z

Recent models of intensional type theory have been constructed in algebraic weak factorization systems (AWFSs). AWFSs give rise to comprehension categories that feature non-trivial morphisms between types; these morphisms are not used in the standard interpretation of Martin-Löf type theory in comprehension categories. We develop a type theory that internalizes morphisms between types, reflecting this semantic feature back into syntax. Our type theory comes with $Π$-, $Σ$-, and identity types. We discuss how it can be viewed as an extension of Martin-Löf type theory with coercive subtyping, as sketched by Coraglia and Emmenegger. We furthermore define semantic structure that interprets our type theory and prove a soundness result. Finally, we exhibit many examples of the semantic structure, yielding a plethora of interpretations.

2025-03-13T20:34:37Z Niyousha Najmaei Niels van der Weide Benedikt Ahrens Paige Randall North