https://arxiv.org/api/vOYVkzR4g2zv0flA6wbxLUk6Cu8 2026-03-22T08:34:58Z 9537 0 15 http://arxiv.org/abs/2411.14802v3 Enhancing a Hierarchical Graph Rewriting Language based on MELL Cut Elimination 2026-03-19T15:11:05Z

Hierarchical graph rewriting is a highly expressive computational formalism that manipulates graphs enhanced with box structures for representing hierarchies. It has provided the foundations of various graph-based modeling tools, but the design of high-level declarative languages based on hierarchical graph rewriting is still a challenge. For a solid design choice, well-established formalisms with backgrounds other than graph rewriting would provide useful guidelines. Proof nets of Multiplicative Exponential Linear Logic (MELL) is such a framework because its original formulation of cut elimination is essentially graph rewriting involving box structures, where so-called Promotion Boxes with an indefinite number of non-local edges may be cloned, migrated and deleted. This work builds on LMNtal as a declarative language based on hierarchical (port) graph rewriting, and discusses how it can be extended to support the above operations on Promotion Boxes of MELL proof nets. LMNtal thus extended turns out to be a practical graph rewriting language that has strong affinity with MELL proof nets. The language features provided are general enough to encode other well-established models of concurrency. Using the toolchain of LMNtal that provides state-space search and model checking, we implemented cut elimination rules of MELL proof nets in extended LMNtal and demonstrated that the platform could serve as a useful workbench for proof nets.

2024-11-22T09:03:09Z Extended version of the PADL 2025 paper (LNCS, Springer, 2025). This version incorporates into the main text details omitted from the conference version and includes some additional material Kento Takyu Kazunori Ueda http://arxiv.org/abs/2511.08462v3 QLCoder: A Query Synthesizer For Static Analysis of Security Vulnerabilities 2026-03-19T10:53:05Z

Static analysis tools provide a powerful means to detect security vulnerabilities by specifying queries that encode vulnerable code patterns. However, writing such queries is challenging and requires diverse expertise in security and program analysis. To address this challenge, we present QLCoder - an agentic framework that automatically synthesizes queries in CodeQL, a powerful static analysis engine, directly from a given CVE metadata. QLCode embeds an LLM in a synthesis loop with execution feedback, while constraining its reasoning using a custom MCP interface that allows structured interaction with a Language Server Protocol (for syntax guidance) and a RAG database (for semantic retrieval of queries and documentation). This approach allows QLCoder to generate syntactically and semantically valid security queries. We evaluate QLCode on 176 existing CVEs across 111 Java projects. Building upon the Claude Code agent framework, QLCoder synthesizes correct queries that detect the CVE in the vulnerable but not in the patched versions for 53.4% of CVEs. In comparison, using only Claude Code synthesizes 10% correct queries.

2025-11-11T17:06:04Z Claire Wang Ziyang Li Saikat Dutta Mayur Naik http://arxiv.org/abs/2510.06296v2 VeriEquivBench: An Equivalence Score for Ground-Truth-Free Evaluation of Formally Verifiable Code 2026-03-19T09:53:36Z

Formal verification is the next frontier for ensuring the correctness of code generated by Large Language Models (LLMs). While methods that co-generate code and formal specifications in formal languages, like Dafny, can, in principle, prove alignment with user intent, progress is bottlenecked by specification quality evaluation. Current benchmarks rely on matching against ground-truth specifications, a manual and expertise-intensive process that has limited existing datasets to a few hundred simple problems and also suffers from a reliability issue. To address this, we introduce VeriEquivBench, a new benchmark with $2,389$ complex algorithmic problems that probe the limitations of current models in both code generation and formal reasoning. Our evaluation framework replaces ground-truth matching with a formally grounded metric, the equivalence score, and rigorously verifies the quality of generated specifications and code. Our results show that generating formally verifiable code remains a profound challenge for state-of-the-art LLMs. This underscores both the difficulty of the task and the need for benchmarks like VeriEquivBench to drive progress toward scalable and reliable coding agents.

2025-10-07T13:19:05Z Lingfei Zeng Fengdi Che Xuhan Huang Fei Ye Xu Xu Binhang Yuan Jie Fu http://arxiv.org/abs/2603.18477v1 Leveraging Large Language Models for Generalizing Peephole Optimizations 2026-03-19T04:19:51Z

Peephole optimizations are a core component of modern optimizing compilers. It rewrites specific instruction into semantically equivalent but more efficient forms. In practice, creating a new peephole optimization often starts from a concrete optimization instance and requires lifting it into a more general rewrite rule that matches a wider range of instruction patterns. This generalization step is critical to optimization effectiveness, but it is also difficult: producing rules that are both correct and sufficiently general typically demands substantial manual effort and domain expertise. Existing approaches such as Hydra attempt to automate this task with program synthesis, but their generalization capability is often limited by search-space explosion, under-generalization, and restricted support for diverse instruction domains. We present LPG, large language model aided peephole optimization generalization, a framework that uses large language models (LLMs) to generalize peephole optimizations. The design of LPG is motivated by the observation that LLMs are effective at semantic abstraction and exploratory reasoning, while formal analyses are necessary to ensure that generated rules are sound and profitable. Based on this observation, LPG adopts a closed-loop workflow that integrates LLM-driven symbolic constant generalization, structural generalization, constraint relaxation, and bitwidth/precision generalization with feedback from syntactic validation, semantic verification, and profitability checking. We evaluate LPG on real-world peephole optimization issues drawn from the LLVM ecosystem. Overall, LPG successfully generalizes 90 out of 102 optimizations. On the integer-focused subset that is directly comparable to Hydra, LPG generalizes 74 out of 81 optimizations, whereas Hydra generalizes 35.

2026-03-19T04:19:51Z Chunhao Liao Hongxu Xu Xintong Zhou Zhenyang Xu Chengnian Sun http://arxiv.org/abs/2603.18372v1 TENSURE: Fuzzing Sparse Tensor Compilers (Registered Report) 2026-03-19T00:13:14Z

Sparse Tensor Compilers (STCs) have emerged as critical infrastructure for optimizing high-dimensional data analytics and machine learning workloads. The STCs must synthesize complex, irregular control flow for various compressed storage formats directly from high-level declarative specifications, thereby making them highly susceptible to subtle correctness defects. Existing testing frameworks, which rely on mutating computation graphs restricted to a standard vocabulary of operators, fail to exercise the arbitrary loop synthesis capabilities of these compilers. Furthermore, generic grammar-based fuzzers struggle to generate valid inputs due to the strict rules governing how indices are reused across multiple tensors. In this paper, we present TENSURE, the first extensible black-box fuzzing framework specifically designed for the testing of STCs. TENSURE leverages Einstein Summation (Einsum) notation as a general input abstraction, enabling the generation of complex, unconventional tensor contractions that expose corner cases in the code-generation phases of STCs. We propose a novel constraint-based generation algorithm that guarantees 100% semantic validity of synthesized kernels, significantly outperforming the ~3.3% validity rate of baseline grammar fuzzers. To enable metamorphic testing without a trusted reference, we introduce a set of semantic-preserving mutation operators that exploit algebraic commutativity and heterogeneity in storage formats. Our evaluation on two state-of-the-art systems, TACO and Finch, reveals widespread fragility, particularly in TACO, where TENSURE exposed crashes or silent miscompilations in a majority of generated test cases. These findings underscore the critical need for specialized testing tools in the sparse compilation ecosystem.

2026-03-19T00:13:14Z Kabilan Mahathevan Yining Zhang Muhammad Ali Gulzar Kirshanthan Sundararajah 10.14722/fuzzing.2026.23006 http://arxiv.org/abs/2505.06193v4 Ohana trees, linear approximation and multi-types for the $λ$I-calculus: No variable gets left behind or forgotten! 2026-03-18T17:08:35Z

Although the $λ$I-calculus is a natural fragment of the $λ$-calculus, obtained by forbidding the erasure of arguments, its equational theories did not receive much attention. The reason is that all proper denotational models studied in the literature equate all non-normalizable $λ$I-terms, whence the associated theory is not very informative. The goal of this paper is to introduce a previously unknown theory of the $λ$I-calculus, induced by a notion of evaluation trees that we call "Ohana trees". The Ohana tree of a $λ$I-term is an annotated version of its Böhm tree, remembering all free variables that are hidden within its meaningless subtrees, or pushed into infinity along its infinite branches. We develop the associated theories of program approximation: the first approach -- more classic -- is based on finite trees and continuity, the second adapts Ehrhard and Regnier's Taylor expansion. We then prove a Commutation Theorem stating that the normal form of the Taylor expansion of a $λ$I-term coincides with the Taylor expansion of its Ohana tree. As a corollary, we obtain that the equality induced by Ohana trees is compatible with abstraction and application. Subsequently, we introduce a denotational model designed to capture the equality induced by Ohana trees. Although presented as a non-idempotent type system, our model is based on a suitably modified version of the relational semantics of the $λ$-calculus, which is known to yield proper models of the $λ$I-calculus when restricted to non-empty finite multisets. To track variables occurring in subterms that are hidden or pushed to infinity in the evaluation trees, we generalize the system in two ways: first, we reintroduce annotated versions of the empty multiset indexed by sets of variables; second, (...)

2025-05-09T17:13:18Z This is the (submitted) long version of the (published) conference version v2, see https://doi.org/10.4230/LIPIcs.FSCD.2025.12 Rémy Cerda Giulio Manzonetto Alexis Saurin http://arxiv.org/abs/2603.18122v1 Don't Vibe Code, Do Skele-Code: Interactive No-Code Notebooks for Subject Matter Experts to Build Lower-Cost Agentic Workflows 2026-03-18T16:37:29Z

Skele-Code is a natural-language and graph-based interface for building workflows with AI agents, designed especially for less or non-technical users. It supports incremental, interactive notebook-style development, and each step is converted to code with a required set of functions and behavior to enable incremental building of workflows. Agents are invoked only for code generation and error recovery, not orchestration or task execution. This agent-supported, but code-first approach to workflows, along with the context-engineering used in Skele-Code, can help reduce token costs compared to the multi-agent system approach to executing workflows. Skele-Code produces modular, easily extensible, and shareable workflows. The generated workflows can also be used as skills by agents, or as steps in other workflows.

2026-03-18T16:37:29Z Main paper 9 pages. Topics: Agentic Coding, HCI, LLMs, Workflows Sriram Gopalakrishnan http://arxiv.org/abs/2502.05310v5 Oracular Programming: A Modular Foundation for Building LLM-Enabled Software 2026-03-18T11:56:17Z

Large Language Models (LLMs) can solve previously intractable tasks given only natural-language instructions and a few examples, but they remain difficult to steer precisely and lack a key capability for building reliable software at scale: the modular composition of computations under enforceable contracts. As a result, they are often embedded in larger software pipelines that use domain-specific knowledge to decompose tasks and improve reliability through validation and search. Yet the complexity of writing, tuning, and maintaining such pipelines has so far limited their sophistication. We propose oracular programming: a foundational paradigm for integrating traditional, explicit computations with inductive oracles such as LLMs. It rests on two directing principles: the full separation of core and search logic (allowing the latter to freely evolve without breaking the former), and the treatment of few-shot examples as grounded and evolvable program components. Within this paradigm, programmers express high-level problem-solving strategies as programs with unresolved choice points. These choice points are resolved at runtime by LLMs, which generalize from user-provided examples of correct and incorrect decisions. An oracular program is composed of three orthogonal components: a strategy that consists of a nondeterministic program with choice points that can be reified into a search tree, a policy that specifies how to navigate this tree with the help of LLM oracles, and a set of demonstrations that describe successful and unsuccessful tree navigation scenarios across diverse problem instances. Each component is expressed in a dedicated programming language. We address the key programming language design challenges of modularly composing oracular programs and enforcing consistency between their components as they evolve.

2025-02-07T20:24:43Z Jonathan Laurent André Platzer http://arxiv.org/abs/2603.17627v1 The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation 2026-03-18T11:47:15Z

The Program Semantic Graph (PSG) introduced in prior work on Dimensional Type Systems and Deterministic Memory Management encodes compilation-relevant properties as binary edge relations between computation nodes. This representation is adequate for scalar and tensor computations, but becomes structurally insufficient for two classes of problems central to heterogeneous compute: tile co-location and routing constraints in spatial dataflow architectures, which are inherently multi-way; and geometric algebra computation, where graded multi-way products cannot be faithfully represented as sequences of binary operations without loss of algebraic identity. This paper introduces the Program Hypergraph (PHG) as a principled generalization of the PSG that promotes binary edges to hyperedges of arbitrary arity. We demonstrate that grade in Clifford algebra is a natural dimension axis within the existing DTS abelian group framework, that grade inference derives geometric product sparsity eliminating the primary performance objection to Clifford algebra neural networks without manual specialization, and that the k-simplex structure of mesh topology is a direct instance of the hyperedge formalism. We assess the existing geometric algebra library ecosystem, identify the consistent type-theoretic gap that no current system addresses, and show that the PHG closes it within the Fidelity compilation framework. The result is a compilation framework where geometric correctness, memory placement, numerical precision selection, and hardware partitioning are jointly derivable from a single graph structure exposed as interactive design-time feedback.

2026-03-18T11:47:15Z 29 pages, 1 figure, 2 tables Houston Haynes http://arxiv.org/abs/2603.17613v1 VeriAgent: A Tool-Integrated Multi-Agent System with Evolving Memory for PPA-Aware RTL Code Generation 2026-03-18T11:25:40Z

LLMs have recently demonstrated strong capabilities in automatic RTL code generation, achieving high syntactic and functional correctness. However, most methods focus on functional correctness while overlooking critical physical design objectives, including Power, Performance, and Area. In this work, we propose a PPA-aware, tool-integrated multi-agent framework for high-quality verilog code generation. Our framework explicitly incorporates EDA tools into a closed-loop workflow composed of a \textit{Programmer Agent}, a \textit{Correctness Agent}, and a \textit{PPA Agent}, enabling joint optimization of functional correctness and physical metrics. To support continuous improvement without model retraining, we introduce an \textit{Evolved Memory Mechanism} that externalizes optimization experience into structured memory nodes. A dedicated memory manager dynamically maintains the memory pool and allows the system to refine strategies based on historical execution trajectories. Extensive experiments demonstrate that our approach achieves strong functional correctness while delivering significant improvements in PPA metrics. By integrating tool-driven feedback with structured and evolvable memory, our framework transforms RTL generation from one-shot reasoning into a continual, feedback-driven optimization process, providing a scalable pathway for deploying LLMs in real-world hardware design flows.

2026-03-18T11:25:40Z Yaoxiang Wang Qi Shi ShangZhan Li Qingguo Hu Xinyu Yin Bo Guo Xu Han Maosong Sun Jinsong Su http://arxiv.org/abs/2603.17208v1 SYMDIREC: A Neuro-Symbolic Divide-Retrieve-Conquer Framework for Enhanced RTL Synthesis and Summarization 2026-03-17T23:15:24Z

Register-Transfer Level (RTL) synthesis and summarization are central to hardware design automation but remain challenging for Large Language Models (LLMs) due to rigid HDL syntax, limited supervision, and weak alignment with natural language. Existing prompting and retrieval-augmented generation (RAG) methods have not incorporated symbolic planning, limiting their structural precision. We introduce SYMDIREC, a neuro-symbolic framework that decomposes RTL tasks into symbolic subgoals, retrieves relevant code via a fine-tuned retriever, and assembles verified outputs through LLM reasoning. Supporting both Verilog and VHDL without LLM fine-tuning, SYMDIREC achieves ~20% higher Pass@1 rates for synthesis and 15-20% ROUGE-L improvements for summarization over prompting and RAG baselines, demonstrating the benefits of symbolic guidance in RTL tasks.

2026-03-17T23:15:24Z EACL 2026 Prashanth Vijayaraghavan Apoorva Nitsure Luyao Shi Charles Mackin Ashutosh Jadhav David Beymer Ehsan Degan Vandana Mukherjee http://arxiv.org/abs/2603.17204v1 CODMAS: A Dialectic Multi-Agent Collaborative Framework for Structured RTL Optimization 2026-03-17T23:10:07Z

Optimizing Register Transfer Level (RTL) code is a critical step in Electronic Design Automation (EDA) for improving power, performance, and area (PPA). We present CODMAS (Collaborative Optimization via a Dialectic Multi-Agent System), a framework that combines structured dialectic reasoning with domain-aware code generation and deterministic evaluation to automate RTL optimization. At the core of CODMAS are two dialectic agents: the Articulator, inspired by rubber-duck debugging, which articulates stepwise transformation plans and exposes latent assumptions; and the Hypothesis Partner, which predicts outcomes and reconciles deviations between expected and actual behavior to guide targeted refinements. These agents direct a Domain-Specific Coding Agent (DCA) to generate architecture-aware Verilog edits and a Code Evaluation Agent (CEA) to verify syntax, functionality, and PPA metrics. We introduce RTLOPT, a benchmark of 120 Verilog triples (unoptimized, optimized, testbench) for pipelining and clock-gating transformations. Across proprietary and open LLMs, CODMAS achieves ~25% reduction in critical path delay for pipelining and ~22% power reduction for clock gating, while reducing functional and compilation failures compared to strong prompting and agentic baselines. These results demonstrate that structured multi-agent reasoning can significantly enhance automated RTL optimization and scale to more complex designs and broader optimization tasks.

2026-03-17T23:10:07Z EACL 2026 Che-Ming Chang Prashanth Vijayaraghavan Ashutosh Jadhav Charles Mackin Vandana Mukherjee Hsinyu Tsai Ehsan Degan http://arxiv.org/abs/2603.17170v1 PAuth - Precise Task-Scoped Authorization For Agents 2026-03-17T22:05:03Z

The emerging agentic web envisions AI agents that reliably fulfill users' natural-language (NL)-based tasks by interacting with existing web services. However, existing authorization models are misaligned with this vision. In particular, today's operator-scoped authorization, exemplified by OAuth, grants broad permissions tied to operators (e.g., the transfer operator) rather than to the specific operations (e.g., transfer $100 to Bob) implied by a user's task. This will inevitably result in overprivileged agents. We introduce Precise Task-Scoped Implicit Authorization (PAuth), a fundamentally different model in which submitting an NL task implicitly authorizes only the concrete operations required for its faithful execution. To make this enforceable at servers, we propose NL slices: symbolic specifications of the calls each service expects, derived from the task and upstream results. Complementing this, we also propose envelopes: special data structure to bind each operand's concrete value to its symbolic provenance, enabling servers to verify that all operands arise from legitimate computations. PAuth is prototyped in the agent-security evaluation framework AgentDojo. We evaluate it in both benign settings and attack scenarios where a spurious operation is injected into an otherwise normal task. In all benign tests, PAuth executes the tasks successfully without requiring any additional permissions. In all attack tests, PAuth correctly raises warnings about missing permissions. These results demonstrate that PAuth's reasoning about permissions is indeed precise. We further analyze the characteristics of these tasks and measure the associated token costs.

2026-03-17T22:05:03Z Reshabh K Sharma Linxi Jiang Zhiqiang Lin Shuo Chen http://arxiv.org/abs/2603.17150v1 Intent Formalization: A Grand Challenge for Reliable Coding in the Age of AI Agents 2026-03-17T21:28:59Z

Agentic AI systems can now generate code with remarkable fluency, but a fundamental question remains: \emph{does the generated code actually do what the user intended?} The gap between informal natural language requirements and precise program behavior -- the \emph{intent gap} -- has always plagued software engineering, but AI-generated code amplifies it to an unprecedented scale. This article argues that \textbf{intent formalization} -- the translation of informal user intent into a set of checkable formal specifications -- is the key challenge that will determine whether AI makes software more reliable or merely more abundant. Intent formalization offers a tradeoff spectrum suitable to the reliability needs of different contexts: from lightweight tests that disambiguate likely misinterpretations, through full functional specifications for formal verification, to domain-specific languages from which correct code is synthesized automatically. The central bottleneck is \emph{validating specifications}: since there is no oracle for specification correctness other than the user, we need semi-automated metrics that can assess specification quality with or without code, through lightweight user interaction and proxy artifacts such as tests. We survey early research that demonstrates the \emph{potential} of this approach: interactive test-driven formalization that improves program correctness, AI-generated postconditions that catch real-world bugs missed by prior methods, and end-to-end verified pipelines that produce provably correct code from informal specifications. We outline the open research challenges -- scaling beyond benchmarks, achieving compositionality over changes, metrics for validating specifications, handling rich logics, designing human-AI specification interactions -- that define a research agenda spanning AI, programming languages, formal methods, and human-computer interaction.

2026-03-17T21:28:59Z 10 pages Shuvendu K. Lahiri http://arxiv.org/abs/2603.18049v1 Conditional Execution of Transpiler Passes Based on Per-Script Feature Detection 2026-03-17T19:54:22Z

As the ECMAScript specification evolves, industrial-scale JavaScript compilers face the challenge of supporting modern language syntax while maintaining compatibility for diverse execution environments. Traditionally, compilers solve this by running transpilation passes in a monolithic pipeline, where the transpilation passes are chosen to execute strictly based on a target language level. This results in significant computational waste, as compilers perform expensive Abstract Syntax Tree (AST) traversals to lower features that may not exist in the actual input source code. We present a compiler improvement that conditionally executes transpiler passes based on accurately tracking and dynamically maintaining the exact set of language features present in the compilation unit throughout the transpilation process. It is implemented in the production Google Closure Compiler. By populating and maintaining a FeatureSet at every JavaScript script-level, it dynamically skips running unnecessary lowering passes. We detail the architectural safeguards -- including strategic pass ordering and dynamic validation of the transpiled code for feature-correctness. Evaluation of this improvement on large-scale production monorepos produced a considerable reduction in compilation time and saved compute and memory usage.

2026-03-17T19:54:22Z Preprint. Under review at SOAP 2026. Implementation available in Google Closure Compiler Rishipal Singh Bhatia