https://arxiv.org/api/sQIuRePY9oE7lUVCqpNhsbMnurM2026-06-27T01:14:41Z9951144015http://arxiv.org/abs/2507.06360v1Pyrosome: Verified Compilation for Modular Metatheory2025-07-08T19:47:55ZWe present Pyrosome, a generic framework for modular language metatheory that embodies a novel approach to extensible semantics and compilation, implemented in Coq. Common techniques for semantic reasoning are often tied to the specific structures of the languages and compilers that they support. In Pyrosome, verified compilers are fully extensible, meaning that to extend a language (even with a new kind of effect) simply requires defining and verifying the compilation of the new feature, reusing the old correctness theorem for all other cases. The novel enabling idea is an inductive formulation of equivalence preservation that supports the addition of new rules to the source language, target language, and compiler.
Pyrosome defines a formal, deeply embedded notion of programming languages with semantics given by dependently sorted equational theories, so all compiler-correctness proofs boil down to type-checking and equational reasoning. We support vertical composition of any compilers expressed in our framework in addition to feature extension. As a case study, we present a multipass compiler from System F with simple references, through CPS translation and closure conversion. Specifically, we demonstrate how we can build such a compiler incrementally by starting with a compiler for simply typed lambda-calculus and adding natural numbers, the unit type, recursive functions, and a global heap, then extending judgments with a type environment and adding type abstraction, all while reusing the original theorems. We also present a linear version of the simply typed CPS pass and compile a small imperative language to the simply typed target to show how Pyrosome handles substructural typing and imperative features.2025-07-08T19:47:55ZDustin JamnerGabriel KammerRitam NagAdam Chlipalahttp://arxiv.org/abs/2506.04544v2hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation2025-07-08T19:43:08ZLarge language models (LLMs) are playing an increasingly large role in domains such as code generation, including hardware code generation, where Verilog is the key language. However, the amount of publicly available Verilog code pales in comparison to the amount of code available for software languages like Python. In this work, we present hdl2v ("HDL-to-Verilog"), a dataset which seeks to increase the amount of available human-written Verilog data by translating or compiling three other hardware description languages - VHDL, Chisel, and PyMTL3 - to Verilog. Furthermore, we demonstrate the value of hdl2v in enhancing LLM Verilog generation by improving performance of a 32 billion-parameter open-weight model by up to 23% (pass@10) in VerilogEvalV2, without utilizing any data augmentation or knowledge distillation from larger models. We also show hdl2v's ability to boost the performance of a data augmentation-based fine-tuning approach by 63%. Finally, we characterize and analyze our dataset to better understand which characteristics of HDL-to-Verilog datasets can be expanded upon in future work for even better performance.2025-06-05T01:29:18ZPublished at ACM/IEEE International Symposium on Machine Learning for CAD (MLCAD) 2025Charles HongBrendan RobertsHuijae AnAlex UmAdvay RatanYakun Sophia Shaohttp://arxiv.org/abs/2507.05886v1Current Practices for Building LLM-Powered Reasoning Tools Are Ad Hoc -- and We Can Do Better2025-07-08T11:19:09ZThere is growing excitement about building software verifiers, synthesizers, and other Automated Reasoning (AR) tools by combining traditional symbolic algorithms and Large Language Models (LLMs). Unfortunately, the current practice for constructing such neurosymbolic AR systems is an ad hoc programming model that does not have the strong guarantees of traditional symbolic algorithms, nor a deep enough synchronization of neural networks and symbolic reasoning to unlock the full potential of LLM-powered reasoning. I propose Neurosymbolic Transition Systems as a principled computational model that can underlie infrastructure for building neurosymbolic AR tools. In this model, symbolic state is paired with intuition, and state transitions operate over symbols and intuition in parallel. I argue why this new paradigm can scale logical reasoning beyond current capabilities while retaining the strong guarantees of symbolic algorithms, and I sketch out how the computational model I propose can be reified in a logic programming language.2025-07-08T11:19:09Z6 pages, 4 figuresAaron BembenekThe University of Melbournehttp://arxiv.org/abs/2411.14330v2Datalog with First-Class Facts2025-07-07T03:06:37ZDatalog is a popular logic programming language for deductive reasoning tasks in a wide array of applications, including business analytics, program analysis, and ontological reasoning. However, Datalog's restriction to flat facts over atomic constants leads to challenges in working with tree-structured data, such as derivation trees or abstract syntax trees. To ameliorate Datalog's restrictions, popular extensions of Datalog support features such as existential quantification in rule heads (Datalog$^\pm$, Datalog$^\exists$) or algebraic data types (Soufflé). Unfortunately, these are imperfect solutions for reasoning over structured and recursive data types, with general existentials leading to complex implementations requiring unification, and ADTs unable to trigger rule evaluation and failing to support efficient indexing.
We present DL$^{\exists!}$, a Datalog with first-class facts, wherein every fact is identified with a Skolem term unique to the fact. We show that this restriction offers an attractive price point for Datalog-based reasoning over tree-shaped data, demonstrating its application to databases, artificial intelligence, and programming languages. We implemented DL$^{\exists!}$ as a system \slog{}, which leverages the uniqueness restriction of DL$^{\exists!}$ to enable a communication-avoiding, massively-parallel implementation built on MPI. We show that Slog outperforms leading systems (Nemo, Vlog, RDFox, and Soufflé) on a variety of benchmarks, with the potential to scale to thousands of threads.2024-11-21T17:22:44ZarXiv admin note: text overlap with arXiv:2211.11573Thomas GilrayArash SahebolamriYihao SunSowmith KunapaneniSidharth KumarKristopher Micinskihttp://arxiv.org/abs/2507.04316v1Retargeting an Abstract Interpreter for a New Language by Partial Evaluation2025-07-06T09:50:16ZIt is well-known that abstract interpreters can be systematically derived from their concrete counterparts using a "recipe," but developing sound static analyzers remains a time-consuming task. Reducing the effort required and mechanizing the process of developing analyzers continues to be a significant challenge. Is it possible to automatically retarget an existing abstract interpreter for a new language?
We propose a novel technique to automatically derive abstract interpreters for various languages from an existing abstract interpreter. By leveraging partial evaluation, we specialize an abstract interpreter for a source language. The specialization is performed using the semantics of target languages written in the source language. Our approach eliminates the need to develop analyzers for new targets from scratch. We show that our method can effectively retarget an abstract interpreter for one language into a correct analyzer for another language.2025-07-06T09:50:16ZPresented at the Student Research Competition (SRC) at PLDI 2025 (https://pldi25.sigplan.org/details/pldi-2025-src/1/)Jay Leehttp://arxiv.org/abs/2507.03773v1RVISmith: Fuzzing Compilers for RVV Intrinsics2025-07-04T18:45:46ZModern processors are equipped with single instruction multiple data (SIMD) instructions for fine-grained data parallelism. Compiler auto-vectorization techniques that target SIMD instructions face performance limitations due to insufficient information available at compile time, requiring programmers to manually manipulate SIMD instructions. SIMD intrinsics, a type of built-in function provided by modern compilers, enable programmers to manipulate SIMD instructions within high-level programming languages. Bugs in compilers for SIMD intrinsics can introduce potential threats to software security, producing unintended calculation results, data loss, program crashes, etc.
To detect bugs in compilers for SIMD intrinsics, we propose RVISmith, a randomized fuzzer that generates well-defined C programs that include various invocation sequences of RVV (RISC-V Vector Extension) intrinsics. We design RVISmith to achieve the following objectives: (i) achieving high intrinsic coverage, (ii) improving sequence variety, and (iii) without known undefined behaviors. We implement RVISmith based on the ratified RVV intrinsic specification and evaluate our approach with three modern compilers: GCC, LLVM, and XuanTie. Experimental results show that RVISmith achieves 11.5 times higher intrinsic coverage than the state-of-the-art fuzzer for RVV intrinsics. By differential testing that compares results across different compilers, optimizations, and equivalent programs, we detect and report 13 previously unknown bugs of the three compilers under test to date. Of these bugs, 10 are confirmed and another 3 are fixed by the compiler developers.2025-07-04T18:45:46ZTo appear in ACM CCS 2025Yibo HeCunjian HuangXianmiao QuHongdeng ChenWei YangTao Xiehttp://arxiv.org/abs/2507.03629v1Towards Automatic Error Recovery in Parsing Expression2025-07-04T14:53:00ZError recovery is an essential feature for a parser that should be plugged in Integrated Development Environments (IDEs), which must build Abstract Syntax Trees (ASTs) even for syntactically invalid programs in order to offer features such as automated refactoring and code completion.
Parsing Expressions Grammars (PEGs) are a formalism that naturally describes recursive top-down parsers using a restricted form of backtracking. Labeled failures are a conservative extension of PEGs that adds an error reporting mechanism for PEG parsers, and these labels can also be associated with recovery expressions to also be an error recovery mechanism. These expressions can use the full expressivity of PEGs to recover from syntactic errors.
Manually annotating a large grammar with labels and recovery expressions can be difficult. In this work, we present an algorithm that automatically annotates a PEG with labels, and builds their corresponding recovery expressions. We evaluate this algorithm by adding error recovery to the parser of the Titan programming language. The results shown that with a small amount of manual intervention our algorithm can be used to produce error recovering parsers for PEGs where most of the alternatives are disjoint.2025-07-04T14:53:00ZarXiv admin note: substantial text overlap with arXiv:1905.02145Sérgio Queiroz de MedeirosFabio Mascarenhas10.1145/3264637.3264638http://arxiv.org/abs/2509.16205v1A 200-Line Python Micro-Benchmark Suite for NISQ Circuit Compilers2025-07-04T10:25:59ZWe present microbench.py, a compact (approx. 200 lines) Python script that automates the collection of key compiler metrics, i.e., gate depth, two-qubit-gate count, wall-clock compilation time, and memory footprint, across multiple open-source quantum circuit transpilers. The suite ships with six didactic circuits (3 to 8 qubits) implementing fundamental quantum algorithms and supports Qiskit, tket, Cirq, and the Qiskit-Braket provider; in this paper we showcase results for Qiskit 0.46 and Braket 1.16. The entire run completes in under three minutes on a laptop, emits a single CSV plus publisheable plot, and reproduces the figure here with one command. We release the code under the MIT licence to serve as a quick-start regression harness for NISQ compiler research.2025-07-04T10:25:59Z9 pages, 1 figure. Includes reproducibility instructions and code artifacts. Companion repository: https://github.com/juhanimerilehto/microbenchJuhani Merilehtohttp://arxiv.org/abs/2411.06094v3Generically Automating Separation Logic by Functors, Homomorphisms and Modules2025-07-04T06:46:33ZFoundational verification considers the functional correctness of programming languages with formalized semantics and uses proof assistants (e.g., Coq, Isabelle) to certify proofs. The need for verifying complex programs compels it to involve expressive Separation Logics (SLs) that exceed the scopes of well-studied automated proof theories, e.g., symbolic heap. Consequently, automation of SL in foundational verification relies heavily on ad-hoc heuristics that lack a systematic meta-theory and face scalability issues. To mitigate the gap, we propose a theory to specify SL predicates using abstract algebras including functors, homomorphisms, and modules over rings. Based on this theory, we develop a generic SL automation algorithm to reason about any data structures that can be characterized by these algebras. In addition, we also present algorithms for automatically instantiating the algebraic models to real data structures. The instantiation reuses the algebraic models of component structures and preserves their data abstractions. Case studies on formalized imperative semantics show our algorithm can instantiate the algebraic models automatically for a variety of complex data structures. Experimental results indicate the automatically instantiated reasoners from our generic theory show similar results to the state-of-the-art systems made of specifically crafted reasoning rules. The presented theories, proofs, and the verification framework are formalized in Isabelle/HOL.2024-11-09T07:13:33ZAccepted by POPL'25Qiyuan XuDavid SananZhe HouXiaokun LuanConrad WattYang Liu10.1145/3704903http://arxiv.org/abs/2504.17460v3A Lightweight Method for Generating Multi-Tier JIT Compilation Virtual Machine in a Meta-Tracing Compiler Framework2025-07-03T10:02:51ZMeta-compiler frameworks, such as RPython and Graal/Truffle, generate high-performance virtual machines (VMs) from interpreter definitions. Although they generate VMs with high-quality just-in-time (JIT) compilers, they still lack an important feature that dedicated VMs (i.e., VMs that are developed for specific languages) have, namely \emph{multi-tier compilation}. Multi-tier compilation uses light-weight compilers at early stages and highly-optimizing compilers at later stages in order to balance between compilation overheads and code quality.
We propose a novel approach to enabling multi-tier compilation in the VMs generated by a meta-compiler framework. Instead of extending the JIT compiler backend of the framework, our approach drives an existing (heavyweight) compiler backend in the framework to quickly generate unoptimized native code by merely embedding directives and compile-time operations into interpreter definitions.
As a validation of the approach, we developed 2SOM, a Simple Object Machine with a two-tier JIT compiler based on RPython. 2SOM first applies the tier-1 threaded code generator that is generated by our proposed technique, then, to the loops that exceed a threshold, applies the tier-2 tracing JIT compiler that is generated by the original RPython framework. Our performance evaluation that runs a program with a realistic workload showed that 2SOM improved, when compared against an RPython-based VM, warm-up performance by 15\%, with merely a 5\% reduction in peak performance.2025-04-24T11:51:28ZECOOP 2025. Fixed DOIYusuke IzawaHidehiko MasuharaCarl Friedrich Bolz-Tereickhttp://arxiv.org/abs/2507.02226v1DecoRTL: A Run-time Decoding Framework for RTL Code Generation with LLMs2025-07-03T01:17:44ZAs one of their many applications, large language models (LLMs) have recently shown promise in automating register transfer level (RTL) code generation. However, conventional LLM decoding strategies, originally designed for natural language, often fail to meet the structural and semantic demands of RTL, leading to hallucinated, repetitive, or invalid code outputs. In this paper, we first investigate the root causes of these decoding failures through an empirical analysis of token-level entropy during RTL generation. Our findings reveal that LLMs exhibit low confidence in regions of structural ambiguity or semantic complexity, showing that standard decoding strategies fail to differentiate between regions requiring determinism (syntax-critical regions) and those that benefit from creative exploratory variability (design-critical regions). Then, to overcome this, we introduce DecoRTL, a novel run-time decoding strategy, that is both syntax-aware and contrastive for RTL code generation. DecoRTL integrates two complementary components: (i) self-consistency sampling, which generates multiple candidates and re-ranks them based on token-level agreement to promote correctness while maintaining diversity; and (ii) syntax-aware temperature adaptation, which classifies tokens by their syntactical and functional roles and adjusts the sampling temperature accordingly, enforcing low temperature for syntax-critical tokens and higher temperature for exploratory ones. Our approach operates entirely at inference time without requiring any additional model fine-tuning. Through evaluations on multiple open-source LLMs using the VerilogEval benchmark, we demonstrate significant improvements in syntactic validity, functional correctness, and output diversity, while the execution overhead (performance overhead) is imperceptible.2025-07-03T01:17:44ZAccepted to the International Conference on Computer-Aided Design (ICCAD 2025)Mohammad AkyashKimia AzarHadi Kamalihttp://arxiv.org/abs/2301.08148v2OblivIO: Securing reactive programs by oblivious execution with bounded traffic overheads2025-07-02T22:09:25ZTraffic analysis attacks remain a significant problem for online security. Communication between nodes can be observed by network level attackers as it inherently takes place in the open. Despite online services increasingly using encrypted traffic, the shape of the traffic is not hidden. To prevent traffic analysis, the shape of a system's traffic must be independent of secrets. We investigate adapting the data-oblivious approach the reactive setting and present OblivIO, a secure language for writing reactive programs driven by network events. Our approach pads with dummy messages to hide which program sends are genuinely executed. We use an information-flow type system to provably enforce timing-sensitive noninterference. The type system is extended with potentials to bound the overhead in traffic introduced by our approach. We address challenges that arise from joining data-oblivious and reactive programming and demonstrate the feasibility of our resulting language by developing an interpreter that implements security critical operations as constant-time algorithms.2023-01-19T15:59:33Z40 pages, 16 figures, Technical report for paper submitted to CSF 2023Jeppe Fredsgaard BlaabjergAslan Askarovhttp://arxiv.org/abs/2501.08496v3Quantifying the Importance of Data Alignment in Downstream Model Performance2025-07-02T21:19:42ZContrary to the conventional emphasis on dataset size, we explore the role of data alignment -- an often overlooked aspect of data quality -- in training capable Large Language Models (LLMs). To do so, we use the Task2Vec-based alignment coefficient, a quantitative measure of the similarity between two datasets, to quantify the impact of alignment between training data and evaluation data on downstream performance. In particular, we conduct controlled \textit{interventional} experiments for two settings: 1. the impact of increased alignment coefficients between various pre-training (pt) against evaluation datasets, and 2. the impact of increased alignment coefficients between domain specific fine-tuning (ft) against domain specific evaluation. The domain specific task we explore is Autoformalization -- the machine translation task between natural language and code for formal verification. In both settings, we find a strong, predictable negative correlation between the alignment coefficient of a model's training and evaluation data and the model's loss/perplexity on the respective downstream task. These findings suggest a re-evaluation of LLM training approaches, demonstrating the relevance of data alignment compared to data quantity, especially in specialized downstream tasks such as Autoformalization.2025-01-14T23:59:23ZICLR DMLR Data-centric Machine Learning Research (2024), ICML DataWorld (2025)Krrish ChawlaAryan SahaiMario DePaviaSudharsan SundarBrando MirandaElyas ObbadSanmi Koyejohttp://arxiv.org/abs/2507.02107v1Structural Code Search using Natural Language Queries2025-07-02T19:42:37ZSearching code is a common task that developers perform to understand APIs, learn common code patterns, and navigate code. Currently, developers most commonly search using keywords and regular expressions that are easy to use and widely available. Beyond keywords and regular expressions, structural code search tools allow developers to search for code based on its syntactic structure. This has numerous applications ranging from bug finding to systematically refactoring code. However, these structural code search tools operate on queries expressed in domain-specific languages (DSL) that can be difficult to learn and write. We propose to allow developers to use natural language to search for code structurally. Expressing queries in natural language provides an intuitive way to search for code and lowers the barrier to entry.
In this work, we develop a novel general approach that combines the reasoning capabilities of an LLM to interpret natural language search queries with the power of structural search tools to efficiently and accurately retrieve relevant code. We then instantiate this approach for two structural code search DSLs: Semgrep and GQL. In our evaluation, we construct a new benchmark for structural code search consisting of 400 queries over 10 Java projects. We show that our approach for structural code search based on translating NL queries to DSL queries using an LLM is effective and robust, achieving a high precision and recall ranging from 55% - 70%. Further, our approach significantly outperforms baselines based on semantic code search and LLM retrievals by up to 57% and 14% on F1 scores.2025-07-02T19:42:37ZBen LimpanukornYanjun WangZach PattersonPranav GargMurali Krishna RamanathanXiaofei MaAnoop DeorasMiryung Kimhttp://arxiv.org/abs/2507.01780v1LeanLTL: A unifying framework for linear temporal logics in Lean2025-07-02T15:07:47ZWe propose LeanLTL, a unifying framework for linear temporal logics in Lean 4. LeanLTL supports reasoning about traces that represent either infinite or finite linear time. The library allows traditional LTL syntax to be combined with arbitrary Lean expressions, making it straightforward to define properties involving numerical or other types. We prove that standard flavors of LTL can be embedded in our framework. The library also provides automation for reasoning about LeanLTL formulas in a way that facilitates using Lean's existing tactics. Finally, we provide examples illustrating the utility of the library in reasoning about systems that come from applications.2025-07-02T15:07:47Z9 pages, 3 figures; for associated project files see https://github.com/UCSCFormalMethods/LeanLTL; to be published in LIPIcs for ITP '25Eric VinKyle A. MillerDaniel J. Fremont