https://arxiv.org/api/KIcac2LRdzdwgOynMG//EDy1J/o2026-06-22T00:10:25Z991885515http://arxiv.org/abs/2507.13774v2AdapTT: Functoriality for Dependent Type Casts2025-12-08T12:40:13ZThe ability to cast values between related types is a leitmotiv of many flavors of dependent type theory, such as observational type theories, subtyping, or cast calculi for gradual typing. These casts all exhibit a common structural behavior that boils down to the pervasive functoriality of type formers. We propose and extensively study a type theory, called AdapTT, which makes systematic and precise this idea of functorial type formers, with respect to an abstract notion of adapters relating types. Leveraging descriptions for functorial inductive types in AdapTT, we derive structural laws for type casts on general inductive type formers.2025-07-18T09:34:10ZExtended version, with appendices, of a paper published at POPL '26Arthur AdjedjMeven Lennon-BertrandThibaut BenjaminKenji Maillard10.1145/3776664http://arxiv.org/abs/2512.07299v1PIP: Making Andersen's Points-to Analysis Sound and Practical for Incomplete C Programs2025-12-08T08:42:06ZCompiling files individually lends itself well to parallelization, but forces the compiler to operate on incomplete programs. State-of-the-art points-to analyses guarantee sound solutions only for complete programs, requiring summary functions to describe any missing program parts. Summary functions are rarely available in production compilers, however, where soundness and efficiency are non-negotiable. This paper presents an Andersen-style points-to analysis that efficiently produces sound solutions for incomplete C programs. The analysis accomplishes soundness by tracking memory locations and pointers that are accessible from external modules, and efficiency by performing this tracking implicitly in the constraint graph. We show that implicit pointee tracking makes the constraint solver 15$\times$ faster than any combination of five different state-of-the-art techniques using explicit pointee tracking. We also present the Prefer Implicit Pointees (PIP) technique that further reduces the use of explicit pointees. PIP gives an additional speedup of 1.9$\times$, compared to the fastest solver configuration not benefiting from PIP. The precision of the analysis is evaluated in terms of an alias-analysis client, where it reduces the number of MayAlias-responses by 40% compared to LLVM's BasicAA pass alone. Finally, we show that the analysis is scalable in terms of memory, making it suitable for optimizing compilers in practice.2025-12-08T08:42:06Z11 pages, 10 figures. To be published in CGO 2026Håvard Rognebakke KrogstieHelge BahmannMagnus SjälanderNico Reissmannhttp://arxiv.org/abs/2510.19296v4QiMeng-SALV: Signal-Aware Learning for Verilog Code Generation2025-12-08T07:05:30ZThe remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset. Our code is available at https://github.com/QiMeng-IPRC/QiMeng-SALV.2025-10-22T06:58:07ZAccepted to NeurIPS 2025Yang ZhangRui ZhangJiaming GuoLei HuangDi HuangYunpu ZhaoShuyao ChengPengwei JinChongxiao LiZidong DuXing HuQi GuoYunji Chenhttp://arxiv.org/abs/2512.06836v1Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs2025-12-07T13:17:37ZSoftware languages evolve over time for various reasons, such as the addition of new features. When the language's grammar definition evolves, textual instances that originally conformed to the grammar become outdated. For DSLs in a model-driven engineering context, there exists a plethora of techniques to co-evolve models with the evolving metamodel. However, these techniques are not geared to support DSLs with a textual syntax -- applying them to textual language definitions and instances may lead to the loss of information from the original instances, such as comments and layout information, which are valuable for software comprehension and maintenance. This study explores the potential of Large Language Model (LLM)-based solutions in achieving grammar and instance co-evolution, with attention to their ability to preserve auxiliary information when directly processing textual instances. By applying two advanced language models, Claude-3.5 and GPT-4o, and conducting experiments across seven case languages, we evaluated the feasibility and limitations of this approach. Our results indicate a good ability of the considered LLMs for migrating textual instances in small-scale cases with limited instance size, which are representative of a subset of cases encountered in practice. In addition, we observe significant challenges with the scalability of LLM-based solutions to larger instances, leading to insights that are useful for informing future research.2025-12-07T13:17:37ZWeixing ZhangRegina HebigDaniel Strüberhttp://arxiv.org/abs/2504.20964v2OSVBench: Benchmarking LLMs on Specification Generation Tasks for Operating System Verification2025-12-07T08:44:10ZWe introduce OSVBench, a new benchmark for evaluating Large Language Models (LLMs) on the task of generating complete formal specifications for verifying the functional correctness of operating system kernels. This benchmark is built upon a real-world operating system kernel, Hyperkernel, and consists of 245 complex specification generation tasks in total, each of which is a long-context task of about 20k-30k tokens. The benchmark formulates the specification generation task as a program synthesis problem confined to a domain for specifying states and transitions. This formulation is provided to LLMs through a programming model. The LLMs must be able to understand the programming model and verification assumptions before delineating the correct search space for syntax and semantics and generating formal specifications. Guided by the operating system's high-level functional description, the LLMs are asked to generate a specification that fully describes all correct states and transitions for a potentially buggy code implementation of the operating system. Experimental results with 12 state-of-the-art LLMs indicate limited performance of existing LLMs on the specification generation task for operating system verification. Significant disparities in their performance highlight differences in their ability to handle long-context code generation tasks. The code are available at https://github.com/lishangyu-hkust/OSVBench2025-04-29T17:34:49ZShangyu LiJuyong JiangTiancheng ZhaoJiasi Shenhttp://arxiv.org/abs/2512.06442v1Nice to Meet You: Synthesizing Practical MLIR Abstract Transformers2025-12-06T14:09:51ZStatic analyses play a fundamental role during compilation: they discover facts that are true in all executions of the code being compiled, and then these facts are used to justify optimizations and diagnostics. Each static analysis is based on a collection of abstract transformers that provide abstract semantics for the concrete instructions that make up a program. It can be challenging to implement abstract transformers that are sound, precise, and efficient, and in fact both LLVM and GCC have suffered from miscompilations caused by unsound abstract transformers. Moreover, even after more than 20 years of development, LLVM lacks abstract transformers for hundreds of instructions in its intermediate representation (IR). We developed NiceToMeetYou, a program synthesis framework for abstract transformers that are aimed at the kinds of non-relational integer abstract domains that are heavily used by today's production compilers. It exploits a simple but novel technique for breaking the synthesis problem into parts: each of our transformers is the meet of a collection of simpler, sound transformers that are synthesized such that each new piece fills a gap in the precision of the final transformer. Our design point is bulk automation: no sketches are required. Transformers are verified by lowering to a previously created SMT dialect of MLIR. Each of our synthesized transformers is provably sound and some (17 percent) are more precise than those provided by LLVM.2025-12-06T14:09:51ZXuanyu PengDominic KennedyYuyou FanBen GreenmanJohn RegehrLoris D'Antonihttp://arxiv.org/abs/2512.05887v1Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models2025-12-05T17:00:47ZModern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same extensibility that accelerates development also complicates maintaining the testing infrastructure. Extensible languages require automated test generation that is both dialect-agnostic (works across dialects without manual adaptation) and dialect-effective (targets dialect-specific features to find bugs). Existing approaches typically sacrifice one of these goals by either requiring manually constructed seed corpora for each dialect, or by failing to be effective. We present a dialect-agnostic and dialect-effective grammar-based and coverage-guided fuzzing approach for extensible compilers that combines two key insights from existing work: (i) the grammars of dialects, which already encode the structural and type constraints, can often be extracted automatically from the dialect specification; and (ii) these grammars can be used in combination with pre-trained large language models to automatically generate representative and diverse seed inputs from the full dialect space without requiring any manual input or training data. These seeds can then be used to bootstrap coverage-guided fuzzers. We built this approach into a tool, Germinator. When evaluated on six MLIR projects spanning 91 dialects, Germinator generated seeds improve line coverage by 10-120% over grammar-based baselines. We compare against grammar-based baselines because they are the only class of existing automatic seed generators that can be applied uniformly across MLIR's heterogeneous dialect ecosystem. Germinator discovers 88 previously unknown bugs (40 confirmed), including 23 in dialects with no prior automated test generators, demonstrating effective and controllable testing of low-resource dialects at scale.2025-12-05T17:00:47ZSairam VaidyaMarcel BöhmeLoris D'Antonihttp://arxiv.org/abs/2509.23229v2Precise Reasoning About Container-Internal Pointers with Logical Pinning2025-12-05T16:09:13ZMost separation logics hide container-internal pointers for modularity. This makes it difficult to specify container APIs that temporarily expose those pointers to the outside, and to verify programs that use these APIs.
We present logical pinning, a lightweight borrowing model for sequential programs that allows users to selectively track container-internal pointers at the logical level. Our model generalizes the magic-wand operator for representing partial data structures, making it easy to write and prove precise specifications, including pointer-stability properties. Because it only changes the way representation predicates and specifications are written, our approach is compatible with most separation logic variants.
We demonstrate the practicality of logical pinning by verifying small but representative pointer-manipulating programs, and deriving more precise versions of common container specifications. In doing so, we show that our approach subsumes some well-known proof patterns, simplifies some complex proofs, and enables reasoning about program patterns not supported by traditional specifications. All of our results are mechanized in the Rocq proof assistant, using the CFML library.2025-09-27T10:30:33ZThis article will appear in CPP 2026Yawen GuanClément Pit-Claudel10.1145/3779031.3779096http://arxiv.org/abs/2508.04829v2Consistent Updates for Scalable Microservices2025-12-05T15:21:33ZOnline services are commonly implemented with a scalable microservice architecture, where isomorphic workers process client requests, recording persistent state in a backend data store. To maintain service, modifications to service functionality must be made on the fly -- i.e., as the service continues to process client requests -- but doing so is challenging. The central difficulty is that of avoiding inconsistencies from mixed-mode operation, caused by workers of current and new versions interacting via the data store. Some update methods avoid mixed-mode altogether, but only at the cost of substantial inefficiency -- by doubling resources (memory and compute), or by halving throughput. The alternative is an uncontrolled ``rolling'' update, which runs the risk of serious service failures arising from inconsistent mixed-mode behavior.
Ideally, it should appear to every client that a service update takes effect atomically; this ensures that a client is not exposed to inconsistent mixed-mode behavior. In this paper, we introduce a framework that formalizes this intuition and develop foundational theory for reasoning about update consistency. We apply this theory to derive the first algorithms that guarantee consistency for mixed-mode updates. The algorithms rely on semantic properties of service actions, such as commutativity. We show that this is unavoidable, by proving that any semantically oblivious mixed-mode update method must allow inconsistencies.2025-08-06T19:14:16ZDevora Chait-RothKedar S. NamjoshiThomas Wies10.1145/3776700http://arxiv.org/abs/2512.05555v1Compiling Away the Overhead of Race Detection2025-12-05T09:26:08ZDynamic data race detectors are indispensable for flagging concurrency errors in software, but their high runtime overhead limits their adoption. This overhead stems primarily from pervasive instrumentation of memory accesses - a significant fraction of which is redundant. We addresses this inefficiency through a static, compiler-integrated approach that identifies and eliminates redundant instrumentation, drastically reducing the runtime cost of dynamic data race detectors. We introduce a suite of interprocedural static analyses reasoning about memory access patterns, synchronization, and thread creation to eliminate instrumentation for provably race-free accesses and show that the completeness properties of the data race detector are preserved. We further observe that many inserted checks flag a race if and only if a preceding check has already flagged an equivalent race for the same memory location - albeit potentially at a different access. We characterize this notion of equivalence and show that, when limiting reporting to at least one representative for each equivalence class, a further class of redundant checks can be eliminated. We identify such accesses using a novel dominance-based elimination analysis. Based on these two insights, we have implemented five static analyses within the LLVM, integrated with the instrumentation pass of the race detector ThreadSanitizer. Our experimental evaluation on a diverse suite of real-world applications demonstrates that our approach significantly reduces race detection overhead, achieving a geomean speedup of 1.34x, with peak speedups reaching 2.5x under high thread contention. This performance is achieved with a negligible increase in compilation time and, being fully automatic, places no additional burden on developers. Our optimizations have been accepted by the ThreadSanitizer maintainers and are in the process of being upstreamed.2025-12-05T09:26:08Z35 pagesAlexey PaznikovAndrey KogutenkoYaroslav OsipovMichael SchwarzUmang Mathurhttp://arxiv.org/abs/2512.05516v1Compiler-supported reduced precision and AoS-SoA transformations for heterogeneous hardware2025-12-05T08:19:43ZThis study evaluates AoS-to-SoA transformations over reduced-precision data layouts for a particle simulation code on several GPU platforms: We hypothesize that SoA fits particularly well to SIMT, while AoS is the preferred storage format for many Lagrangian codes. Reduced-precision (below IEEE accuracy) is an established tool to address bandwidth constraints, although it remains unclear whether AoS and precision conversions should execute on a CPU or be deployed to a GPU if the compute kernel itself must run on an accelerator. On modern superchips where CPUs and GPUs share (logically) one data space, it is also unclear whether it is advantageous to stream data to the accelerator prior to the calculation, or whether we should let the accelerator transform data on demand, i.e.~work in-place logically. We therefore introduce compiler annotations to facilitate such conversions and to give the programmer the option to orchestrate the conversions in combination with GPU offloading. For some of our compute kernels of interest, Nvidia's G200 platforms yield a speedup of around 2.6 while AMD's MI300A exhibits more robust performance yet profits less. We assume that our compiler-based techniques are applicable to a wide variety of Lagrangian codes and beyond.2025-12-05T08:19:43ZPawel K. RadtkeTobias Weinzierlhttp://arxiv.org/abs/2512.05262v1Verified VCG and Verified Compiler for Dafny2025-12-04T21:26:23ZDafny is a verification-aware programming language that comes with a compiler and static program verifier. However, neither the compiler nor the verifier is proved correct; in fact, soundness bugs have been found in both tools. This paper shows that the aforementioned Dafny tools can be developed with foundational correctness guarantees. We present a functional big-step semantics for an imperative subset of Dafny and, based on this semantics, a verified verification condition generator (VCG) and a verified compiler for Dafny. The subset of Dafny we have formalized includes mutually recursive method calls, while loops, and arrays -- these language features are significant enough to cover challenging examples such as McCarthy's 91 function and array-based programs that are used when teaching Dafny. The verified VCG allows one to prove functional correctness of annotated Dafny programs, while the verified compiler can be used to compile verified Dafny programs to CakeML programs. From there, one can obtain executable machine code via the (already verified) CakeML compiler, all while provably maintaining the functional correctness guarantees that were proved for the source-level Dafny programs. Our work has been mechanized in the HOL4 theorem prover.2025-12-04T21:26:23Z16 pages, 4 figures. To be published in CPP 2026. For mechanization, see https://github.com/CakeML/cakeml/tree/751ecd45c16b11ee0c3fd1280be7a6d798b5c457/compiler/dafnyDaniel NezamabadiMagnus O. MyreenYong Kiam Tan10.1145/3779031.3779092http://arxiv.org/abs/2512.05224v1NVLang: Unified Static Typing for Actor-Based Concurrency on the BEAM2025-12-04T19:52:42ZActor-based systems like Erlang/OTP power critical infrastructure -- from telecommunications to messaging platforms -- handling millions of concurrent connections with legendary reliability. Yet these systems lack static guarantees about message protocols: processes communicate by sending arbitrary messages that pattern-matched at runtime, deferring protocol violations to production failures.
We present NVLang, a statically typed functional language that brings comprehensive type safety to the BEAM virtual machine while preserving actor model's simplicity and power. NVLang's central contribution that algebraic data types (ADTs) naturally encode actor message protocols: each actor declares the sum type representing its message vocabulary, and the type system enforces protocol conformance at compile time. We introduce typed process identifiers (Pid[T]) that encode the protocol an actor expects, and typed futures (Future[T]) that provide type-safe request-reply patterns.
By extending Hindley-Milner type inference to track message protocols, NVLang eliminates an entire class of message-passing errors while maintaining clean syntax that rivals dynamically typed alternatives. Our implementation compiles to Core Erlang, enabling seamless interoperability with the existing Erlang ecosystem. We formalize the type system and provide proof sketches for type soundness, demonstrating that well-typed NVLang programs cannot send messages that violate actor protocols.2025-12-04T19:52:42Z4 figures, 2 tablesMiguel de Oliveira GuerreiroInstituto Superior Tecnico, University of Lisbonhttp://arxiv.org/abs/2504.12158v2What is a monoid?2025-12-04T16:32:49ZIn many situations one encounters an entity that resembles a monoid. It consists of a carrier and two operations that resemble a unit and a multiplication, subject to three equations that resemble associativity and left and right unital laws. The question then arises whether this entity is, in fact, a monoid in a suitable sense.
Category theorists have answered this question by providing a notion of monoid in a monoidal category, or more generally in a multicategory. While these encompass many examples, there remain cases which do not fit into these frameworks, such as the notion of relative monad and the modelling of call-by-push-value sequencing. In each of these examples, the leftmost and/or the rightmost factor of a multiplication or associativity law seems to be distinguished.
To include such examples, we generalize the multicategorical framework in two stages.
Firstly, we move to the framework of a left-skew multicategory (due to Bourke and Lack), which generalizes both multicategory and left-skew monoidal category. The notion of monoid in this framework encompasses examples where only the leftmost factor is distinguished, such as the notion of relative monad.
Secondly, we consider monoids in the novel framework of a bi-skew multicategory. This encompasses examples where both the leftmost and the rightmost factor are distinguished, such as the notion of a category on a span, and the modelling of call-by-push-value sequencing.
In the bi-skew framework (which is the most general), we give a coherence result saying that a monoid corresponds to an unbiased monoid, i.e. a map from the terminal bi-skew multicategory.2025-04-16T15:04:48Z26 pages. Version accepted to POPL 2026Paul Blain LevyMorgan Rogers10.1145/3776727http://arxiv.org/abs/2512.04876v1Optimizations and extensions for fair join pattern matching2025-12-04T15:05:09ZJoin patterns are an underexplored approach for the programming of concurrent and distributed systems. When applied to the actor model, join patterns offer the novel capability of matching combinations of messages in the mailbox of an actor. Previous work by Philipp Haller et al. in the paper "Fair Join Pattern Matching for Actors" (ECOOP 2024) explored join patterns with conditional guards in an actor-based setting with a specification of fair and deterministic matching semantics. Nevertheless, the question of time efficiency in fair join pattern matching has remained underexplored. The stateful tree-based matching algorithm of Haller et al. performs worse than an implementation that adapts the Rete algorithm to the regular version of a join pattern matching benchmark, while outperforming on a variant with heavy conditional guards, which take longer to evaluate. Nevertheless, conforming Rete to the problem of join pattern matching requires heavy manual adaptation.
In this thesis, we enhance and optimize the stateful tree-based matching algorithm of Haller et al. to achieve up to tenfold performance improvements on certain benchmarks, approaching the performance of Rete on regular benchmarks while maintaining the advantages of versatility and performance with heavy guards. We also enhance the benchmark suite, adding new features and enhancing its extensibility and user-friendliness. We extend the join pattern implementation with a less ambiguous syntax as well as dynamic pattern switching. Finally, we present a new complex model use case for join patterns, showing their applicability in a microservice web architecture.2025-12-04T15:05:09ZThis is a Master's thesis for the Master's in Computer Science and Engineering at DTU (Technical University of Denmark)Ioannis Karras