https://arxiv.org/api/1tt5dvOJOKOhPNAkoVZ/OW62Ip42026-06-13T23:48:25Z988518015http://arxiv.org/abs/2605.12694v1Agentic Interpretation: Lattice-Structured Evidence for LLM-Based Program Analysis2026-05-12T19:46:24ZLarge language models can consult information that fixed static analyzers cannot, such as documentation, current security advisories, version-specific metadata, and informal API contracts. This makes LLMs a compelling option for program analyses that depend on information beyond the source program, or that are otherwise not amenable to conventional static analyzers. However, directly asking an LLM for a one-shot whole-program analysis is brittle because it compresses many evidence-dependent judgments into a single opaque answer, rather than exposing which conclusions are supported or disputed and using intermediate findings to guide later, more focused searches. In this paper, we propose agentic interpretation, a framework that brings the discipline of lattice-based static analysis to LLM-driven program reasoning. At a high level, agentic interpretation decomposes a high-level analysis goal into localized claims, and tracks the LLM's judgment about each claim in a finite-height lattice. A worklist algorithm governs how claims and their judgments evolve during the analysis. We introduce a formal model of agentic interpretation, explore the design space it opens, and illustrate the approach with a worked example analyzing code that depends on opaque third-party components.2026-05-12T19:46:24Z27 pages, 6 figuresJacqueline L. MitchellChao Wanghttp://arxiv.org/abs/2605.12275v1Minimalistic Terminal Editor for Julia Programming -- MinTEJ: A Friendly Approach for a Scientific Programmer2026-05-12T15:36:29ZDevelopers rely on lightweight, terminal-centric workflows for rapid code iteration. However, within a unified environment for Julia programming language, existing tools provide limited support for integrated workflow such as editing, execution, file management, and debugging. As a result, developers frequently incur context-switching overhead and fragmented tool interactions. Therefore, the proposed work predominantly focuses on the minimalistic approach for developing native terminal editor for Julia programming language. This paper introduces MinTEJ, a terminal-based editor built in Julia, and proposes a Sequential Modal Interaction Architecture (SMIA) that unifies file management, code editing, execution, and debugging through a command-oriented workflow. The presented work formalizes model interaction and reduces cognitive load & errors while transitioning among different modes. In SMIA, buffer is the central data structure that persists across all modes. Each mode interprets and manipulates the buffer according to mode-specific rules. The central controller mediates access to the buffer and enforces sequential transitions between modes. To evaluate the approach, the performance benchmarking of MinTEJ is compared against existing tools i.e., VS code and Notepad++. The effectiveness of the proposed MinTEJ is evaluated based on memory consumption and CPU utilization demonstrating that it has less resource overhead. Findings suggest that integrated terminal-based editor environment is a practical lightweight software tool enabling efficient iterative development.2026-05-12T15:36:29Z15 Pages, 42 figuresPoornachandratejasvi Laxman BhattarPayal V. DahiwaleKrishnarjunulu ThotaAnurag Sharmahttp://arxiv.org/abs/2605.12239v1Harness Engineering as Categorical Architecture2026-05-12T15:09:46ZThe agent harness, the system layer comprising prompts, tools, memory, and orchestration logic that surrounds the model, has emerged as the central engineering abstraction for LLMbased agents. Yet harness design remains ad hoc, with no formal theory governing composition, preservation of properties under compilation, or systematic comparison across frameworks. We show that the categorical Architecture triple (G, Know, Phi) from the ArchAgents framework provides exactly this formalization. The four pillars of agent externalization (Memory, Skills, Protocols, Harness Engineering) map onto the triple's components: Memory as coalgebraic state, Skills as operad-composed objects, Protocols as syntactic wiring G, and the full Harness as the Architecture itself. Structural guarantees-integrity gates, quality-based escalation, supported convergence checks-are Know-level certificates whose preservation is structural replay: our compiler checks identity and verifier replay, not output-layer correctness or model behavior. We validate this correspondence with a reference implementation featuring compiler functors targeting Swarms, DeerFlow, Ralph, Scion, and LangGraph: the four configuration compilers preserve three named certificate types by identity or replay, and LangGraph preserves the same certificates through its shared per-stage execution path. The LangGraph compiler creates one node per stage using the same per-stage method as the native runtime, providing LangGraph-native observability without reimplementing harness logic. An end-to-end escalation experiment with real LLM agents confirms that the quality-based escalation control path is model-parametric in this two-model, one-task experiment. The result positions categorical architecture as the formal theory behind harness engineering.2026-05-12T15:09:46ZBogdan Banuhttp://arxiv.org/abs/2605.12576v1Divergent Multi-Version Execution (DME): Canonical Instruction-Trace Fault Detection via Structural Address-Space Decorrelation2026-05-12T13:12:18ZTraditional redundancy (lockstep, TMR) executes identical binaries with identical memory layouts. A single correlated fault - for example, an arbitrary program counter value or a perturbation delta-PC in all replicas - redirects all replicas along the same incorrect path. The same applies to corruption of data pointers. Both types of faults, regardless of their origin (deliberate tampering, software bug, compilation bug, or physical disturbance), cause silent data corruption and erroneous program execution. This work presents Divergent Multi-Version Execution (DME), a runtime semantic consistency verifier for diversified executions. Each replica is compiled independently, producing different code and data memory layouts while preserving identical semantics. Faults are detected by comparing canonical instruction traces, which include opcodes, register identifiers, loaded/stored values, and results, while discarding layout-dependent addresses.2026-05-12T13:12:18ZPetro Baran Yrievichhttp://arxiv.org/abs/2605.12091v1Automated Amortised Analysis of Skew Heaps and Leftist Heaps (Extended Version)2026-05-12T13:12:07ZWe study the fully automated amortised analysis of purely functional data structures like skew heaps, as well as weight- and rank-biased leftist heaps. For that we generalise earlier works on automated amortised resource analysis by developing a type inference based approach with a generic type system. This allows for modular reasoning and the inference of precise and optimal cost bounds.
More specifically, we extend the work on the ATLAS system by Leutgeb et al. which was developed to cover the analysis of splay trees and some closely related data structures. To enable the analysis of skew heaps, however, and the even more challenging (amortised) analysis of leftist heaps, we have developed a range of new techniques for type-based automated analysis. By introducing a generic type system we allow for arbitrary (classes of) potential functions, compared to the use of hard-coded potential functions in ATLAS, which we have implemented in Haskell in an entirely modular way. We have also greatly enhanced the existing type inference algorithm by extensions in multiple directions, including path-sensitive reasoning, data structure invariants, and template parameters for piecewise defined potential functions. We show how our newly developed system supports the use of all known potential functions for analysing skew heaps and leftist heaps, confirming the known bounds.2026-05-12T13:12:07Z41 Pages, 12 Figures. Extended version of the paper to appear in the proceedings of the 38th International Conference on Computer Aided Verification (CAV 2026)Armin WalchGeorg MoserBerry SchoenmakersFlorian Zulegerhttp://arxiv.org/abs/2605.13896v1Neural Code Translation of Legacy Code: APL to C#2026-05-12T12:11:33ZAutomatic translation between programming languages remains a challenging problem, particularly when the source language is highly concise and specialized. This paper investigates the translation of APL into C# using large language models. The task is difficult due to APL's sparse syntax, the scarcity of large-scale parallel corpora, and the requirement for specialized knowledge to interpret APL programs. To address these challenges, we introduce a novel framework for APL-to-C# translation by comparing three guided strategies, namely natural language description-mediated, retrieval-augmented, and iterative refinement, against a baseline direct translation model. We constructed multiple datasets of functionally equivalent code pairs spanning various levels of complexity, and to rigorously assess translation quality, we developed an automated evaluation pipeline that verifies both syntactic compilation and functional execution of the generated C# code. Our results demonstrate that neural code translation can successfully bridge the gap between APL and C# for a wide range of programs, and that incorporating additional context and guidance significantly improves model performance.2026-05-12T12:11:33ZAbdulrahman RamadanHanen BorchaniIben LilholmMikkel AlmindAllan Peter Engsig-Karuphttp://arxiv.org/abs/2512.22168v2TileLoom: Automatic Dataflow Planning for Tile-Based Languages on Spatial Dataflow Accelerators2026-05-12T10:39:26ZSpatial dataflow accelerators are a promising direction for next-generation computer systems because they can reduce the memory bottlenecks of traditional von Neumann machines such as CPUs and GPUs. They organize computation around explicit, compiler-managed data movement over on-chip networks, allowing operands to be forwarded directly between processing elements and reducing reliance on high-latency, bandwidth-limited global shared memory. However, their performance depends strongly on how workloads are mapped to hardware. Naive mappings can perform poorly, and most users rely on hand-tuned vendor libraries. Thus, despite their potential for high performance, energy efficiency, and cost efficiency, limited programmability remains a major barrier to wider adoption.
This paper presents TileLoom, an MLIR-based end-to-end framework that compiles tile-based programs, such as Triton kernels, onto spatial dataflow architectures. Unlike compiler frameworks that focus on optimizing code generation within a single tile, TileLoom distributes tile instances across spatially distributed cores and exploits the on-chip network and distributed memories to increase data reuse and reduce communication. TileLoom introduces a hardware representation that captures interconnect topology, memory hierarchy, and compute capabilities, enabling both architecture-specific optimizations and support for diverse spatial dataflow targets. In experiments on two generations of Tenstorrent systems, TileLoom achieves performance comparable to vendor libraries on various kernels.2025-12-17T11:26:58ZWei LiZhenyu BaiHeru WangPranav DangiZhiqiang ZhangCheng TanHuiying LanWeng-Fai WongTulika Mitrahttp://arxiv.org/abs/2605.12563v1OverrideFuzz: Semantic-Aware Grammar Fuzzing for Script-Runtime Vulnerabilities2026-05-12T03:57:35ZScript-language runtimes such as Python, Lua, and JavaScript are widely deployed in security sensitive contexts, yet they remain difficult to test because valid inputs must satisfy syntax, dynamic type constraints, and object-level semantics. Existing grammar and reflection-based fuzzers improve syntactic validity and interface reachability, but they rarely model override hooks, dynamic rebinding, and attribute-resolution behavior that can redirect built-in operations across the script-native boundary and trigger use-after-free or type-confusion bugs. We present OverrideFuzz, a two-phase, semantic-aware grammar fuzzer for script-language runtimes. Its declaration phase constructs objects with overriding methods, while its execution phase generates operations that route through those hooks. Active reflection tracks runtime types, and passive reflection learns from error messages to remove invalid operation shapes, allowing generation to approach semantic correctness without manual API specification. We evaluate OverrideFuzz on CPython, Lua, and QuickJS. All three targets show consistent coverage growth, with rapid early expansion followed by slower incremental gains, and Lua benefits most from its pervasive metamethod dispatch mechanism. Although OverrideFuzz did not discover novel vulnerabilities during the bounded evaluation period, corpus analysis shows that it reconstructs inputs matching known vulnerability patterns, which suggests that semantic-aware generation reaches the intended script-native boundary behaviors.2026-05-12T03:57:35Z37 pages, 7 figures, Bachelor Thesis, prepared with TypstYiran Qiuhttp://arxiv.org/abs/2605.07782v3CktFormalizer: Autoformalization of Natural Language into Circuit Representations2026-05-12T03:24:43ZLLMs can generate hardware descriptions from natural language specifications, but the resulting Verilog often contains width mismatches, combinational loops, and incomplete case logic that pass syntax checks yet fail in synthesis or silicon. We present CktFormalizer, a framework that redirects LLM-driven hardware generation through a dependently-typed HDL embedded in Lean 4. Lean serves three roles: (i) type checker:dependent types encode bit-width constraints, case coverage, and acyclicity, turning hardware defects into compile-time errors that guide iterative repair; (ii) correctness firewall:compiled designs are structurally free of defects that cause silent backend failures (the baseline loses 20% of correct designs during synthesis and routing; CktFormalizer preserves all of them); (iii) proof assistant:the agent constructs machine-checked equivalence proofs over arbitrary input sequences and parameterized widths, beyond the reach of bounded SMT-based checking. On VerilogEval (156 problems), RTLLM (50 problems), and ResBench (56 problems), CktFormalizer achieves simulation pass rates competitive with direct Verilog generation while delivering substantially higher backend realizability: 95--100% of compiled designs complete the full synthesis, place-and-route, DRC, and LVS flow. A closed-loop PPA optimization stage yields up to 35% area reduction and 30% power reduction through validated architecture exploration, with automated theorem proof ensuring that each optimized variant remains functionally equivalent to its formal specification.2026-05-08T14:20:06ZJing XiongQi HanChenchen DingHe XiaoZunhai SuChaofan TaoNgai Wonghttp://arxiv.org/abs/2603.04334v3SpotIt+: Verification-based Text-to-SQL Evaluation with Database Constraints2026-05-11T20:28:49ZWe present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a best-effort constraint-mining pipeline that combines rule-based specification mining with LLM-based validation over example databases. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.2026-03-04T17:51:42ZAndrew TremanteYang HeRocky KlopfensteinYuepeng WangNina NarodytskaHaoze Wuhttp://arxiv.org/abs/2605.13885v1Quantitative Symbolic Patch Impact Analysis2026-05-11T18:38:58ZTraditional equivalence checking classifies programs as equivalent or non-equivalent, providing insufficient information for tasks like patch impact analysis where it is expected the patched version of the program to be non-equivalent to the original program. When two program versions are non-equivalent, determining under what conditions they differ and what percentage of inputs are affected remains an open challenge. In this work, we introduce quantitative partial equivalence analysis, an approach for assessing software patches by quantifying behavioral differences between the original (vulnerable) code and the patched code. Using symbolic analysis, we identify input conditions under which patched and original programs exhibit identical or divergent behaviors. Our approach refines non-equivalence by measuring the extent of behavioral divergence across the input domain. For efficient quantitative analysis of numerical domains, we propose a range-based search heuristic that provides a sound lower bound on equivalence. We demonstrate our approach on 90 CVE patches from widely used open-source projects (Linux, Qemu, FFmpeg), as well as on a Juliet Test Suite-based dataset containing programs with CWEs. Our results show that quantitative partial equivalence analysis effectively characterizes and quantifies patch impact. Additionally, experiments on the EqBench benchmark reveal five C program pairs that are mislabeled as equivalent, and we identify the input conditions under which their behaviors diverge.2026-05-11T18:38:58Z23 pages, 6 Algorithms, 4 figures, 5 ListingsNASA Formal Methods Symposium (NFM) 2026Laboni SarkerAbdus SatterTevfik Bultanhttp://arxiv.org/abs/2502.20070v3Partial Orders for Precise and Efficient Dynamic Deadlock Prediction2026-05-11T18:11:06ZDeadlocks are a major source of bugs in concurrent programs. They are hard to predict, because they may only occur under specific scheduling conditions. Dynamic analysis attempts to identify potential deadlocks by examining a single execution trace of the program. A standard approach involves monitoring sequences of lock acquisitions in each thread, with the goal of identifying deadlock patterns. A deadlock pattern is characterized by a cyclic chain of lock acquisitions, where each lock is held by one thread while being requested by the next. However, it is well known that not all deadlock patterns identified in this way correspond to true deadlocks, as they may be impossible to manifest under any schedule.
We tackle this deficiency by proposing a new method based on partial orders to eliminate false positives: lock acquisitions must be unordered under a given partial order, and not preceded by other deadlock patterns. We prove soundness (no falsely predicted deadlocks) for the novel TRW partial order, and completeness (no deadlocks missed) for a slightly weakened variant of TRW. Both partial orders can be computed efficiently and report the same deadlocks for an extensive benchmark suite.2025-02-27T13:28:13ZThere's a corrigendum ("Analysis of unsoundness") that comes after the paper. The latest version of the corrigendum contains now a deeper investigation of another reason for unsoundness (counter example 2 already part of the earlier submission but somehow the text explaining the issue got commented out)Bas van den HeuvelMartin SulzmannPeter Thiemannhttp://arxiv.org/abs/2510.18651v2CPSLint: A Domain-Specific Language Providing Data Validation and Sanitisation for Industrial Cyber-Physical Systems2026-05-11T15:20:38ZIndustrial cyber-physical systems generate vast amounts of semi-structured time-series data that require careful preprocessing before they can be effectively used for machine learning applications such as fault detection and identification. Raw sensor datasets are often corrupted or incomplete, making it challenging to develop reliable solutions without proper data preparation and validation. In this paper, we introduce CPSLint, a domain-specific language for data validation and sanitisation. We present the design, implementation and evaluation of CPSLint, demonstrating its ability to automatically detect and correct common data corruption patterns while enabling non-programming domain experts to effectively prepare their data for analysis. We report evaluation results on a representative dataset, tracking memory consumption and CPU-time for sanitisation activities. Our approach offers several advantages over traditional methods, including reduced manual effort, guaranteed consistency and broader applicability across time-series datasets and projects.2025-10-21T13:59:56Z19th ACM SIGPLAN International Conference on Software Language Engineering (SLE 2026)Uraz OdyurtÖmer SayilirMariëlle StoelingaVadim Zaytsev10.1145/3806383.3815519http://arxiv.org/abs/2605.10625v1Verifying Sequential Consistency under Bounded Preemptions2026-05-11T14:17:42ZGibbons and Korach studied a fundamental problem in 1997: given an observed sequence of reads and writes of a multi-threaded program, does there exist an interleaving which is sequentially consistent? Apart from applications in testing shared memory implementations, a procedure for this problem is employed in Dynamic Partial-Order-Reduction (DPOR) algorithms. The problem is known to be NP-hard even when different syntactic parameters are kept bounded. In this paper, we consider a restriction on the kind of interleaving required: does there exist a sequentially-consistent interleaving with at most π preemptions? Empirical evidence suggests that several bugs manifest within a few preemptive switches. This motivates us to investigate the problem under bounded preemptions. Our results exhibit a trichotomy: the problem lends to a polynomial-time algorithm for the class of single-writer programs where for each variable, there is a single thread writing to it; it becomes NP-hard for two-writer programs and finally, for three-writer programs, we get a conditional lower bound under the Exponential-Time-Hypothesis. When the number of preemptions π is not bounded, we show the problem to be W[1]-hard, and hence unlikely to be fixed-parameter-tractable with parameter π.2026-05-11T14:17:42ZA shorter version has been accepted at NETYS 2026 - 14th edition of the International Conference on Networked SystemsR. GovindS. KrishnaSanchari SilB. Srivathsanhttp://arxiv.org/abs/2311.14347v4Typed compositional quantum computation with lenses2026-05-11T11:46:16ZWe propose a type-theoretic framework for describing and proving properties of quantum computations, in particular those presented as quantum circuits. Our proposal is based on an observation that, in the polymorphic type system of Coq, currying on quantum states allows us to apply quantum gates directly inside a complex circuit. By introducing a discrete notion of lens to control this currying, we are further able to separate the combinatorics of the circuit structure from the computational content of gates. We apply our development to define quantum circuits recursively from the bottom up, and prove their correctness compositionally.2023-11-24T08:48:00ZAccepted by Journal of Automated ReasoningJacques GarrigueTakafumi Saikawa