https://arxiv.org/api/JtJVG72dWtfm0N739C+811E3q5Q2026-06-14T13:37:32Z988537515http://arxiv.org/abs/2604.10399v1Vanilla Object Orientation (VOO): A Value-Semantics Approach to Classes in Tcl2026-04-12T01:12:32ZI present Vanilla Object Orientation (VOO), a framework that composes classes from Tcl's native data structures -- lists and dictionaries -- rather than introducing additional framework infrastructure. VOO objects are plain Tcl lists with automatic memory management through copy-on-write semantics, eliminating the destructor burden inherent in TclOO and Itcl. Benchmarks on Tcl 8.6.13 and Tcl 9.0 show VOO achieves 7--18x faster object creation and 4--6x superior memory efficiency compared to TclOO. A companion C++ migration path (VOO C++) further improves field-access speed (setter 2.3--2.6x faster) and memory (6.8--9.8x lighter than TclOO), while preserving an identical Tcl call-site API. Cross-version analysis confirms that VOO's compositional design scales better than framework-based approaches as the interpreter evolves.2026-04-12T01:12:32Z41 pages (11 pages of main content and 30 pages of appendices), 15 tables, 24 code examplesAlan Araujohttp://arxiv.org/abs/2604.10392v1Intent-aligned Formal Specification Synthesis via Traceable Refinement2026-04-12T00:52:27ZLarge language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a formal specification. However, specifications are frequently missing in real-world codebases, and writing high-quality specifications remains expensive and expertise-intensive. We present VeriSpecGen, a traceable refinement framework that synthesizes intent-aligned specifications in Lean through requirement-level attribution and localized repair. VeriSpecGen decomposes natural language into atomic requirements and generates requirement-targeted tests with explicit traceability maps to validate generated specifications. When validation fails, traceability maps attribute failures to specific requirements, enabling targeted clause-level repairs. VeriSpecGen achieve 86.6% on VERINA SpecGen task using Claude Opus 4.5, improving over baselines by up to 31.8 points across different model families and scales. Beyond inference-time gains, we generate 343K training examples from VeriSpecGen refinement trajectories and demonstrate that training on these trajectories substantially improves specification synthesis by 62-106% relative and transfers gains to general reasoning abilities.2026-04-12T00:52:27ZZhe YeAidan Z. H. YangHuangyuan SuZhenyu LiaoSamuel TenkaZhizhen QinUdaya GhaiDawn SongSoonho Konghttp://arxiv.org/abs/2511.15028v3Decoupling Data Layouts from Bounding Volume Hierarchies2026-04-10T21:22:30ZBounding volume hierarchies are ubiquitous acceleration structures in graphics, scientific computing, and data analytics. Their performance depends critically on data layout choices that affect cache utilization, memory bandwidth, and vectorization -- increasingly dominant factors in modern computing. Yet, in most programming systems, these layout choices are hopelessly entangled with the traversal logic. This entanglement prevents developers from independently optimizing data layouts and algorithms across different contexts, perpetuating a false dichotomy between performance and portability. We introduce Scion, a domain-specific language and compiler for specifying the data layouts of bounding volume hierarchies independent of tree traversal algorithms. We show that Scion can express a broad spectrum of layout optimizations used in high-performance computing while remaining architecture-agnostic. We demonstrate empirically that Pareto-optimal layouts (along performance and memory footprint axes) vary across algorithms, architectures, and workload characteristics. Through systematic design exploration, we also identify a novel ray tracing layout that combines optimization techniques from prior work, achieving Pareto-optimality across diverse architectures and scenes.2025-11-19T01:49:42ZCamera-ready version to be published in PACMPL Volume 10, Number PLDI (2026). Received 2025-11-07, accepted 2026-04-03Proceedings of the ACM on Programming Languages 10, PLDI, Article 175 (2026)Christophe GyurgyikAlexander J RootFredrik Kjolstad10.1145/3808253http://arxiv.org/abs/2604.02598v2Explorable Theorems: Making Written Theorems Explorable by Grounding Them in Formal Representations2026-04-10T14:00:37ZLLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We instantiate this idea for mathematical proof comprehension with explorable theorems, a system that uses LLMs to translate a theorem and its written proof into Lean, a programming language for machine-checked proofs, and links the written proof with the Lean code. Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies bridging each step. Each worked-out step is produced by executing the Lean proof on that example and extracting its intermediate state. A user study ($n = 16$) shows potential advantages of this approach: in a proof-reading task, participants who had access to the provided explorability features gave better, more correct, and more detailed answers to comprehension questions, demonstrating a stronger overall understanding of the underlying mathematics.2026-04-03T00:16:52ZHita KambhamettuWill CrichtonSean WelleckHarrison GoldsteinAndrew Headhttp://arxiv.org/abs/2604.09318v1CIR+CVN: Bridging LLM Semantic Understanding and Petri-Net Verification for Concurrent Programs2026-04-10T13:40:58ZRecovering concurrency structure directly from source code is difficult because shared-resource identity and protection relations are often obscured by aliasing, ownership, and API-specific idioms. We therefore study a specification-driven, model-first verification architecture for LLM-assisted concurrent program construction. Instead of verifying arbitrary source code, a large language model first synthesizes a verification-oriented concurrency artifact from a natural-language requirement or system specification. The first formalism, the Concurrency Intermediate Representation (Cir), is a statement-level, alias-free model in which shared resources are globally named, protection relations are explicit, and each statement carries a stable identifier. The second formalism, the Concurrency Verification Net (Cvn), is a weighted place/transition Petri net with a finite global store and three-valued guards for data-dependent branching. A validated Cir artifact is translated mechanically to Cvn, explored exhaustively, and any counterexample is mapped back to statement identifiers to guide targeted repair. To reduce the risk of bug-free but behavior-dropping repairs, acceptance additionally applies a lightweight goal-reachability check over designated critical outcomes. We formalize both representations, prove translation-correspondence results for deadlock and signal-loss analysis, define a two-layer checking architecture with 61 static rules and 5 analysis predicates, and evaluate the pipeline on 9 representative bounded-concurrency patterns. The results show that the method supports iterative bug detection and repair on Cir artifacts and that goal reachability helps filter semantically incomplete repairs. The trust boundary of the present work is the generated Cir artifact rather than arbitrary source code.2026-04-10T13:40:58ZKaiwen ZhangGuanjun Liuhttp://arxiv.org/abs/2604.09301v1Tracers for debugging and program exploration2026-04-10T13:08:49ZProgrammers often use an iterative process of hypothesis generation ("perhaps this function is called twice?") and hypothesis testing ("let's count how many times this breakpoint fires") to understand the behavior of unfamiliar or malfunctioning software. Existing debugging tools are much better suited to testing hypotheses than to generating them. Step debuggers, for example, present isolated snapshots of the program's state, leaving it to the programmer to mentally reconstruct the evolution of that state over time. We advocate for a different approach: building a debugging and program-exploration tool around a *trace*, or complete history, of the program's execution. Our key claim is that the user should see every line *as executed* (in time order) rather than *as written* (in syntax order). We discuss design choices, preliminary results, and interesting challenges.2026-04-10T13:08:49Z13 pages; presented at the 16th annual workshop on the intersection of HCI and PL (PLATEAU 2026), Pittsburgh, PA, USAShardul ChiplunkarClément Pit-Claudelhttp://arxiv.org/abs/2603.26215v2SuperDP: Differential Privacy Refutation via Supermartingales2026-04-10T11:42:07ZDifferential privacy (DP) has established itself as one of the standards for ensuring privacy of individual data. However, reasoning about DP is a challenging and error-prone task, hence methods for formal verification and refutation of DP properties have received significant interest in recent years. In this work, we present a novel method for automated formal refutation of $ε$-DP. Our method refutes $ε$-DP by searching for a pair of inputs together with a non-negative function over outputs whose expected value on these two inputs differs by a significant amount. The two inputs and the non-negative function over outputs are computed simultaneously, by utilizing upper expectation supermartingales and lower expectation submartingales from probabilistic program analysis, which we leverage to introduce a sound and complete proof rule for $ε$-DP refutation. To the best of our knowledge, our method is the first method for $ε$-DP refutation to offer the following four desirable features: (1)~it is fully automated, (2)~it is applicable to stochastic mechanisms with sampling instructions from both discrete and continuous distributions, (3)~it provides soundness guarantees, and (4)~it provides semi-completeness guarantees. Our experiments show that our prototype tool SuperDP achieves superior performance compared to the state of the art and manages to refute $ε$-DP for a number of challenging examples collected from the literature, including ones that were out of the reach of prior methods.2026-03-27T09:39:06ZKrishnendu ChatterjeeEhsan Kafshdar GoharshadyĐorđe Žikelićhttp://arxiv.org/abs/2604.09165v1A Deductive System for Contract Satisfaction Proofs2026-04-10T09:51:24ZHardware-software contracts are abstract specifications of a CPU's leakage behavior. They enable verifying the security of high-level programs against side-channel attacks without having to explicitly reason about the microarchitectural details of the CPU. Using the abstraction powers of a contract requires proving that the targeted CPU satisfies the contract in the sense that the contract over-approximates the CPU's leakage. Besides pen-and-paper reasoning, proving contract satisfaction has been approached mostly from the model-checking perspective, with approaches based on a (semi-)automated search for the necessary invariants.
As an alternative, this paper explores how such proofs can be conducted in interactive proof assistants. We start by observing that contract satisfaction is an instance of a more general problem we call relative trace equality, and we introduce relative bisimulation as an associated proof technique. Leveraging recent advances in the field of coinductive proofs, we develop a deductive proof system for relative trace equality. Our system is provably sound and complete, and it enables a modular and incremental proof style. It also features several reasoning principles to simplify proofs by exploiting symmetries and transitivity properties. We formalized our deductive system in the Rocq proof assistant and applied it to two challenging contract satisfaction proofs.2026-04-10T09:51:24ZArthur CorrensonHaoyi ZengJana Hofmannhttp://arxiv.org/abs/2507.04736v2ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning2026-04-10T04:48:11ZLarge Language Models have emerged as powerful tools for automating Register-Transfer Level (RTL) code generation, yet they face critical limitations: existing approaches typically fail to simultaneously optimize functional correctness and hardware efficiency metrics such as Power, Performance, and Area (PPA). Methods relying on supervised fine-tuning commonly produce functionally correct but suboptimal designs due to the lack of inherent mechanisms for learning hardware optimization principles. Conversely, external post-processing techniques aiming to refine PPA performance after generation often suffer from inefficiency and do not improve the LLMs' intrinsic capabilities.
To overcome these challenges, we propose ChipSeek, a novel hierarchical reward based reinforcement learning framework designed to encourage LLMs to generate RTL code that is both functionally correct and optimized for PPA metrics. Our approach integrates direct feedback from EDA simulators and synthesis tools into a hierarchical reward mechanism, facilitating a nuanced understanding of hardware design trade-offs. Through Curriculum-Guided Dynamic Policy Optimization (CDPO), ChipSeek enhances the LLM's ability to generate high-quality, optimized RTL code. Evaluations on standard benchmarks demonstrate ChipSeek's superior performance, achieving state-of-the-art functional correctness and PPA performance. Furthermore, it excels in specific optimization tasks, consistently yielding highly efficient designs when individually targeting fine-grained optimization goals such as power, delay, and area. The artifact is open-source in https://github.com/rong-hash/chipseek.2025-07-07T08:08:20ZAccepted by ACL 2026 Main ConferenceZhirong ChenKaiyan ChangZhuolin LiCangyuan LiXinyang HeChujie ChenMengdi WangHaobo XuYinhe HanHuawei LiYing Wanghttp://arxiv.org/abs/2604.08792v1Choose, Don't Label: Multiple-Choice Query Synthesis for Program Disambiguation2026-04-09T22:05:41ZHigh-level specifications of code are inherently ambiguous, and prior systems have explored interactive techniques to help users clarify their intent and resolve such ambiguities. However, most existing approaches elicit supervision through labeled examples, which are often error-prone and may fail to capture user intent. This paper introduces a new active learning paradigm for program disambiguation based on multiple-choice queries. In this paradigm, the system presents a small set of high-level behaviors as multiple-choice options, and the user simply selects the intended one. Technically, each answer option corresponds to a Hoare triple that characterizes a cluster of semantically similar candidate programs. This formulation enables formal reasoning about the informativeness and interpretability of queries, and supports systematic construction of optimal queries. Building on this insight, we develop a new active learning algorithm and implement it in a tool called Socrates, which automatically synthesizes informative multiple-choice queries for program disambiguation. We evaluate Socrates across four domains spanning both symbolic and neurosymbolic settings and show that it produces intuitive, easy-to-answer queries and achieves efficient convergence. Most importantly, Socrates identifies the intended program more reliably than existing methods, while maintaining competitive runtime performance.2026-04-09T22:05:41ZCeleste BarnabyDanny DingOsbert BastaniIsil Dillig10.1145/3808279http://arxiv.org/abs/2604.08445v1PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores2026-04-09T16:41:49ZMemory Dependence Prediction (MDP) is a speculative technique to determine which stores, if any, a given load will depend on. Area-constrained cores are increasingly relevant in various applications such as energy-efficient or edge systems, and often have limited space for MDP tables. This leads to a high rate of false dependencies as memory independent loads alias with unrelated predictor entries, causing unnecessary stalls in the processor pipeline.
The conventional way to address this problem is with greater predictor size or complexity, but this is unattractive on area-constrained cores. This paper proposes that targeting the predictor working set is as effective as growing the predictor, and can deliver performance competitive with large predictors while still using very small predictors. This paper introduces profile-guided memory dependence prediction (PG-MDP), a software co-design to label consistently memory independent loads via their opcode and remove them from the MDP working set. These loads bypass querying the MDP when dispatched and always issue as soon as possible. Across SPEC2017 CPU intspeed, PG-MDP reduces the rate of MDP queries by 79%, false dependencies by 77%, and improves geomean IPC for a small simulated core by 1.47% (to within 0.5% of using 16x the predictor entries), with no area cost and no additional instruction bandwidth.2026-04-09T16:41:49ZLuke PanayiJohan JinoSebastian S. KimAlberto RosAlexandra JimboreanJim WhittakerMartin BergerPaul Kellyhttp://arxiv.org/abs/2604.06273v2CobbleDB: Modelling Levelled Storage by Composition2026-04-09T12:08:07ZWe present a composition-based approach to building correctby-construction database backing stores. In previous work, we specified the behaviour of several store variants and proved their correctness and equivalence. Here, we derive a Java implementation: the simplicity of the specification makes manual construction straightforward. We leverage spec-guaranteed store equivalence to compose performance features, then demonstrate practical value with CobbleDB, a reimplementation of RocksDB's levelled storage.2026-04-07T07:23:31ZWorkshop on Principles and Practice of Consistency for Distributed Data, Apr 2026, Edinburgh, United KingdomEmilie MaUBCAyush PandeyTSPAnnette BieniusaRPTUMarc ShapiroDELYShttp://arxiv.org/abs/2604.07902v1Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets2026-04-09T07:18:31ZGranlund and Montgomery proposed an optimization method for unsigned integer division by constants [3]. Their method (called the GM method in this paper) was further improved in part by works such as [1] and [7], and is now adopted by major compilers including GCC, Clang, Microsoft Compiler, and Apple Clang. However, for example, for x/7, the generated code is designed for 32-bit CPUs and therefore does not fully exploit 64-bit capabilities. This paper proposes an optimization method for 32-bit unsigned division by constants targeting 64-bit CPUs. We implemented patches for LLVM/GCC and achieved speedups of 1.67x on Intel Xeon w9-3495X (Sapphire Rapids) and 1.98x on Apple M4 (Apple M-series SoC) in the microbenchmark described later. The LLVM patch has already been merged into llvm:main [6], demonstrating the practical applicability of the proposed method.2026-04-09T07:18:31ZShigeo MitsunariTakashi Hoshinohttp://arxiv.org/abs/2603.18030v2Quine: Realizing LLM Agents as Native POSIX Processes2026-04-09T01:29:42ZCurrent LLM agent frameworks often implement isolation, scheduling, and communication at the application layer, even though these mechanisms are already provided by mature operating systems. Instead of introducing another application-layer orchestrator, this paper presents Quine, a runtime architecture and reference implementation that realizes LLM agents as native POSIX processes. The mapping is explicit: identity is PID, interface is standard streams and exit status, state is memory, environment variables, and filesystem, and lifecycle is fork/exec/exit. A single executable implements this model by recursively spawning fresh instances of itself. By grounding the agent abstraction in the OS process model, Quine inherits isolation, composition, and resource control directly from the kernel, while naturally supporting recursive delegation, context renewal via exec, and shell-native composition. The design also exposes where the POSIX process model stops: processes provide a robust substrate for execution, but not a complete runtime model for cognition. In particular, the analysis points toward two immediate extensions beyond process semantics: task-relative worlds and revisable time. A reference implementation of Quine is publicly available on GitHub.2026-03-08T05:32:46ZMinor revision clarifying exec semanticsHao Kehttp://arxiv.org/abs/2311.11571v4VyZX: Formal Verification of a Graphical Quantum Language2026-04-08T17:21:31ZGraphical languages are a convenient shorthand to represent computation, with rewrite rules relating one graph to another. In contrast, proof assistants rely heavily on inductive datatypes, particularly when giving semantics to embedded languages. This creates obstacles to formally reasoning about graphical languages, since imposing an inductive structure obfuscates the diagrammatic nature of graphical languages, along with their corresponding equational theories. To address this gap, we present VyZX, a verified library for reasoning about inductively defined graphical languages. These inductive constructs arise naturally from category-theoretic definitions. We developed VyZX to Verify the ZX-calculus, a graphical langauge for reasoning about quantum computation. The ZX-calculus comes with a collection of diagrammatic rewrite rules that preserve the graph's semantic interpretation. We show how inductive graphs in VyZX are used to prove the soundness of the ZX-calculus rewrite rules and apply them in practice using standard proof assistant techniques. We also provide an IDE-integrated visualizer for proof engineers to directly reason about diagrams in graphical form.2023-11-20T07:12:22Z29 pages + 10 page appendix, 36 figuresAdrian LehmannBen CaldwellBhakti ShahWilliam SpencerRobert Rand