https://arxiv.org/api/MZp/KpD10HcPte6kxziA4jl0BrQ 2026-06-14T05:16:18Z 9885 255 15 http://arxiv.org/abs/2604.27000v1 Adaptive and AI-Augmented Security Testing: A Systematic Survey of Program Analysis, Feedback-Driven Testing, and Hybrid Learning-Based Approaches 2026-04-29T03:32:59Z

Modern software systems are increasingly developed within rapid continuous integration and deployment (CI/CD) pipelines, where ensuring security prior to release presents significant technical and organizational challenges. Traditional static and dynamic analysis tools provide valuable structural and behavioral insights, yet they often operate in non-adaptive workflows and produce large volumes of warnings requiring manual triage. Feedback-driven fuzzing and search-based testing approaches have demonstrated the power of iterative input refinement guided by execution signals, while large language models (LLMs) have shown promise in automated test generation but frequently lack semantic grounding in program structure. This paper presents a systematic survey of adaptive and AI-augmented security testing research across five domains: (1) structural program analysis for vulnerability detection, (2) DevSecOps and continuous security testing, (3) feedback-driven fuzzing and search-based testing, (4) LLM-based automated test generation, and (5) emerging hybrid systems integrating program analysis with adaptive learning. We analyze fifty-five peer-reviewed studies drawn from a systematic search of four major databases yielding 22,088 raw records. Our analysis reveals a persistent disconnect between structural program representations (ASTs, CFGs, and CPGs) and adaptive testing mechanisms. We characterize this as structural-adaptive fragmentation: a systematic separation that neither paradigm individually addresses. No existing system incorporates human triage signals as feedback for refining structural models. We conclude by identifying five open research challenges and outlining a unified agenda for semantically grounded, feedback-driven, polyglot security testing frameworks.

2026-04-29T03:32:59Z 29 pages, submitted for review Michael Wienczkowski http://arxiv.org/abs/2512.22417v2 Open-World Assertion Checking for Smart Contracts via Game Semantics 2026-04-28T23:42:11Z

We present a game semantics framework for open-world safety analysis of Ethereum smart contracts. We model the interaction between a contract and its environment as a two-player game between the contract and the environment, and prove up to gas model approximations soundness: every assertion violation found corresponds to a real execution; and completeness: every open-world execution is captured. To our knowledge, this provides the first formal open-world interaction semantics for Ethereum smart contracts with mathematical guarantees of soundness and completeness. We implement this framework in YulTracer, an assertion reachability tool for real-world Solidity contracts, built on Yul, the intermediate language of the Solidity compiler. YulTracer uses concrete execution and exhaustively explores game traces within user-specified bounds. We evaluate it on reentrancy benchmarks, where YulTracer achieves 100% recall and precision -- the only tool to do so from those we examined -- and on two large real-world exploits (the DAO and PredyPool), where it detects the known vulnerabilities and produces no false positives on fixed versions. To our knowledge, YulTracer is the first tool to achieve this level of precision on real-world contracts without false positives. We additionally demonstrate generality of the approach via the examination of access control benchmarks.

2025-12-27T00:21:42Z 49 pages, 8 figures, 6 tables, 3 listings Vasileios Koutavas Yu-Yang Lin Nikos Tzevelekos http://arxiv.org/abs/2604.26161v1 Finite Functional Programming 2026-04-28T22:49:28Z

We unify functional and logic programming by treating predicatesas functions equipped with their support: the set of inputs whose output is nonzero. Datalog, for instance, is a language of finitely supported boolean functions. Finite support allows representing functions as input-output tables. Generalizing from boolean functions to other pointed sets neatly handles aggregation and weighted logic programming. We refer to the combination of finitely supported functions, represented as data, with higher order functions, represented as code, as finite functional programming. We give a simple type system to check finite support, using graded effects to check variable grounding and relevance types to model pointed sets.

2026-04-28T22:49:28Z 16 pages, 6 figures Michael Arntzenius Max Willsey http://arxiv.org/abs/2604.23035v2 Remote Concolic Multiverse Debugging -- Extended Version with Additional Appendices 2026-04-28T17:59:31Z

Debugging nondeterministic programs is inherently difficult, particularly in microcontroller environments where execution paths can diverge unpredictably due to external sensor inputs. Traditional debugging techniques often fail to capture or reproduce this nondeterministic behavior effectively. Multiverse debugging has emerged as a compelling technique to debug nondeterministic programs, allowing developers to systematically explore all possible execution paths. Unfortunately, current multiverse debuggers are snapshot-based and most operate over a model of the program, limiting their use for debugging resource-constrained microcontrollers. Additionally, current multiverse debuggers, even ones specifically designed for microcontrollers suffer from state explosion making the state space overwhelming during debugging. To address these challenges, we introduce a trace-based multiverse debugger with a novel state-space reduction technique based on concolic execution. Our approach interleaves concolic analysis with live debugging to identify input values that define unique program paths. This hybrid technique efficiently prunes redundant paths from the state space while ensuring full code coverage. Unlike MIO, a recently published multiverse debugger for microcontrollers that focuses on IO consistency, our approach directly targets state explosion by leveraging concolic execution and uses a trace-based approach, significantly reducing the memory and communication overhead. We implemented a prototype using the WARDuino WebAssembly VM, demonstrating the feasibility and efficiency of our approach in real-world scenarios. Our results highlight substantial reductions in the state space compared to traditional multiverse debugging. This makes multiverse debugging more accessible and efficient for developers working with complex, nondeterministic programs running on microcontrollers.

2026-04-24T21:48:10Z Maarten Steevens Tom Lauwaerts Christophe Scholliers http://arxiv.org/abs/2210.13387v5 Towards a Higher-Order Mathematical Operational Semantics 2026-04-28T15:38:43Z

Compositionality proofs in higher-order languages are notoriously involved, and general semantic frameworks guaranteeing compositionality are hard to come by. In particular, Turi and Plotkin's bialgebraic abstract GSOS framework, which has been successfully applied to obtain off-the-shelf compositionality results for first-order languages, so far does not apply to higher-order languages. In the present work, we develop a theory of abstract GSOS specifications for higher-order languages, in effect transferring the core principles of Turi and Plotkin's framework to a higher-order setting. In our theory, the operational semantics of higher-order languages is represented by certain dinatural transformations that we term pointed higher-order GSOS laws. We give a general compositionality result that applies to all systems specified in this way and discuss how compositionality of the SKI calculus and the $λ$-calculus w.r.t. a strong variant of Abramsky's applicative bisimilarity are obtained as instances.

2022-10-24T16:38:55Z This version restores arXiv:2210.13387v2. arXiv:2210.13387v3 was a mistaken replacement, which was withdrawn. The replacement belonged to another submission, namely arXiv:2405.16708 Sergey Goncharov Stefan Milius Lutz Schröder Stelios Tsampas Henning Urbat http://arxiv.org/abs/2604.25478v1 Practical Insights into Fair Comparison and Evaluation Frame for Neutral-Atom Compilers 2026-04-28T10:30:25Z

Neutral-atom quantum computing is among the most promising platforms for scalable quantum computation, and compilation toolchains are crucial for leveraging capabilities such as qubit shuttling and parallel gate execution. An important challenge, however, is that existing neutral-atom compilers are often evaluated using metrics computed over different parts of the toolchain and under non-equivalent assumptions. Consequently, fair quantification and comparison of compiler performance remain difficult. Reported metrics may depend on inconsistent transpilation optimization levels, different movement-duration models, different sets of considered fidelity sources, and even minor implementation bugs or undocumented representation choices. To address this problem, we present a unified and reproducible evaluation framework for neutral-atom compilers. Our framework introduces RSQASM (Routed and Scheduled QASM), a QASM-inspired post-compilation representation that captures mapped, routed, and scheduled circuits, including explicit parallel gate execution and shuttling operations. As part of the framework, we provide adapter scripts that translate existing compiler outputs and intermediate artifacts into RSQASM. As a case study, we compare three well-known neutral-atom compilation toolchains: HybridMapper, DasAtom, and Enola, motivated by the large performance differences reported in prior work. Using our framework and representation, we perform a new evaluation and show that several previously claimed performance gaps become substantially smaller and, in some cases, are not reproduced once evaluation inconsistencies are removed.

2026-04-28T10:30:25Z 11 pages, 2 figures, submitted to IEEE Quantum Week 2026 Emil Khusainov Yanbin Chen Jonas Winklmann Helmut Seidl Christian B. Mendl http://arxiv.org/abs/2605.00034v1 Symbolic Execution Meets Multi-LLM Orchestration: Detecting Memory Vulnerabilities in Incomplete Rust CVE Snippets 2026-04-28T01:27:09Z

This paper presents a system combining symbolic execution (KLEE) with a 4-agent multi-LLM architecture for detecting memory vulnerabilities in Rust unsafe code. A central challenge we address is the incomplete-code problem: CVE database entries provide only isolated code snippets that lack struct definitions, imports, and Cargo manifests, causing all existing formal verification tools to fail at compilation with zero output. Our system resolves this through four specialized agents -- an Oracle/Validator for strategic planning, a Safety Checker for vulnerability analysis, a Code Specialist for FFI wrapper generation, and a Fast Filter for execution optimization -- that collaboratively synthesize KLEE-compatible harnesses from otherwise uncompilable fragments. KLEE's output is then ingested by graph_klee.py, which constructs a Graph Database linking CVE files, CWE categories, error types, and symbolic execution paths as typed nodes and labelled edges, enabling structured cross-CVE vulnerability queries. We evaluated our system on 31 real-world Rust CVEs spanning 11 CWE categories, achieving 90.3% wrapper compilation success where all state-of-the-art formal verification tools achieve 0%. Our system detected 1,206 critical errors across 26 files (83.9% detection rate), compared to 14 warnings across 11 files for Clippy (35.5%) and generic labels for Miri. The 4-agent architecture reduced wrapper compilation failures from 42% (single-agent baseline) to 9.7% and increased detected errors from 487 to 1,206, confirming that role specialization and structured context passing produce measurably better results than a single general-purpose model. Our replication package is publicly available at https://github.com/Zeyad-Ab/Symbolic-Execution-with-Multi-LLM-Architecture-for-Rust-Security

2026-04-28T01:27:09Z 11 pages, 1 figure, to be published in : Ease 2026 The 6th International Workshop on Software Security Engineering Zeyad Abdelrazek Young Lee http://arxiv.org/abs/2604.25960v1 Large Language Models for Multilingual Code Intelligence: A Survey 2026-04-27T20:20:26Z

Large language models have transformed AI-assisted software engineering, but current research remains biased toward high-resource languages such as Python, with weaker performance in languages like Rust and OCaml. Since real-world systems are inherently polyglot, robust multilingual code intelligence is crucial. This survey focuses on two key tasks: multilingual code generation from shared natural-language requirements, and multilingual code translation that preserves semantics across languages. It reviews representative methods, benchmarks, and evaluation metrics, and highlights challenges and opportunities for trustworthy cross-language generalization.

2026-04-27T20:20:26Z Chao Jiang Dugang Liu Cheng Wen Zhiwu Xu Hua Zheng Muhammad Sadiq Jawwad Ahmed Shamsi Shengchao Qin Zhong Ming http://arxiv.org/abs/2603.16437v6 Dimensional Type Systems and Deterministic Memory Management: Design-Time Semantic Preservation in Native Compilation 2026-04-27T17:48:52Z

We present a compilation framework in which dimensional type annotations persist through multi-stage MLIR lowering, enabling the compiler to jointly resolve numeric representation selection and deterministic memory management as coeffect properties of a single program semantic graph (PSG). Dimensional inference determines value ranges, which in turn determine representation selection, word width, memory footprint, allocation strategy, and cross-target transfer fidelity. The Dimensional Type System (DTS) extends Hindley-Milner unification with constraints drawn from finitely generated abelian groups, yielding inference that is decidable in polynomial time, complete (no annotations required), and principal. Where conventional systems erase dimensional annotations before code generation, DTS carries them as compilation metadata through each lowering stage, making them available where representation and memory placement decisions occur. Deterministic Memory Management (DMM), formalized as a coeffect discipline within the same graph, classifies value lifetimes into four escape categories, each mapping to a verified allocation strategy. The dimensional algebra is closed under the chain rule, and forward-mode gradient computation exhibits a specific coeffect signature that the framework can verify at compile time. The practical consequence is a development environment where escape diagnostics, allocation strategy, representation fidelity, and cache locality estimation are design-time views over the compilation graph.

2026-03-17T12:19:07Z 33 pages, 8 tables, 3 appendices with extended examples Houston Haynes http://arxiv.org/abs/2604.13351v3 Optimal Predicate Pushdown Synthesis 2026-04-27T16:20:12Z

Predicate pushdown is a long-standing performance optimization that filters data as early as possible in a computational workflow. In modern data pipelines, this transformation is especially important because much of the computation occurs inside user-defined functions (UDFs) written in general-purpose languages such as Python and Scala. These UDFs capture rich domain logic and complex aggregations and are among the most expensive operations in a pipeline. Moving filters ahead of such UDFs can yield substantial performance gains, but doing so requires semantic reasoning. This paper introduces a general semantic foundation for predicate pushdown over stateful fold-based computations. We view pushdown as a correspondence between two programs that process different subsets of input data, with correctness witnessed by a bisimulation invariant relating their internal states. Building on this foundation, we develop a sound and relatively complete framework for verification, alongside a synthesis algorithm that automatically constructs optimal pushdown decompositions by finding the strongest admissible pre-filters and weakest residual post-filters. We implement this approach in a tool called Pusharoo and evaluate it on 150 real-world pandas and Spark data-processing pipelines. Our evaluation shows that Pusharoo is significantly more expressive than prior work, producing optimal pushdown transformations with a median synthesis time of 1.6 seconds per benchmark. Furthermore, our experiments demonstrate that the discovered pushdown optimizations speed up end-to-end execution by an average of 2.4$\times$ and up to two orders of magnitude.

2026-04-14T23:22:34Z 45 pages, 9 figures, 7 tables. Extended version (with appendices) of the PLDI 2026 paper Proc. ACM Program. Lang. 10, PLDI, Article 234 (June 2026), 25 pages Robert Zhang Eric Hayden Campbell Dixin Tang Isil Dillig 10.1145/3808312 http://arxiv.org/abs/2604.24578v1 Hybrid Path-Sums for Hybrid Quantum Programs 2026-04-27T15:05:17Z

As quantum computing becomes an emerging reality, designing efficient quantum programming capabilities is becoming more and more important. Particularly, the debugging and validation of quantum programs is of paramount importance, as these programs are by definition hard to test. Static analysis and formal verification methods for quantum programs started to emerge a few years now, yet they often miss hybrid quantum/classical reasoning facilities with, e.g., generic quantum control, classical control and classical computation instructions. In this paper, we lay out the foundations of a framework for the automated formal verification of (full) hybrid quantum programs featuring both classical and quantum control, measurement and hybrid data structures. In particular, we propose: (1) a novel symbolic representation for describing and manipulating sets of hybrid quantum/classical states called Hybrid Path-Sums (HPS); (2) a set of rewriting rules providing a rich mechanism for simplifying and reasoning on these symbolic hybrid states, and (3) a core assertion language to specify equivalence of hybrid quantum programs, the satisfaction of properties on (parts of) hybrid states, and the extraction of probabilistic statements about the program behavior. We prove the correctness of the novel symbolic representation, of its rewriting system and of the specification system. Finally, we propose a full implementation of this framework as a dedicated symbolic execution engine for hybrid programs. We present an evaluation of a set of representative hybrid case-studies from the literature, showcasing the advantage of our approach and its efficiency compared to state-of-the-art solutions.

2026-04-27T15:05:17Z Christophe Chareton Jad Issa Mathieu Nguyen Nicolas Blanco Sébastien Bardin 10.1145/3808314 http://arxiv.org/abs/2604.19906v2 Demonstrating a Future for MLIR-native DSL Compilers on a NumPy-like Example 2026-04-27T12:23:59Z

Compilers for general-purpose languages have been shown to be at a disadvantage when it comes to specialized application domains as opposed to their Domain-Specific Language (DSL) counterparts. However, the field of DSL compilers features little consolidation in terms of compiler frameworks and adjacent software ecosystems. As a result, considerable work is duplicated, lost to maintenance issues, or remains undiscovered, and most DSLs are never considered "production-ready". One notable development is the introduction of the Multi-Level Intermediate Representation (MLIR), which promises a similar impact on DSL compilers as LLVM had on general-purpose tooling. In this work, we present a NumPy-like DSL made for offloading numeric tensor kernels that is entirely MLIR-native. In a first for open-source, it implements all frontend actions and semantic analyses directly within MLIR. Most notably, this is made possible by our new dialect-agnostic MLIR type checker, created for the future of DSLs in MLIR. We implement a simple, yet effective, parallel-first lowering scheme that connects our language to another MLIR dataflow dialect for seamless offloading. We show that our approach performs well in real-world use cases from the domain of weather modeling and Computational Fluid Dynamics (CFD) in Fortran.

2026-04-21T18:30:11Z 17 pages, 5 figures Karl F. A. Friebel Jascha A. Ohlmann Jeronimo Castrillon http://arxiv.org/abs/2511.09447v2 SpaDA: A Spatial Dataflow Architecture Programming Language 2026-04-27T09:15:08Z

Spatial dataflow architectures like the Cerebras Wafer-Scale Engine deliver exceptional performance in AI and scientific computing by distributing scratchpad memory across hundreds of thousands of processing elements (PEs). Yet programming these architectures remains difficult: with no shared memory, data movement requires explicit configuration, and asynchronous task management introduces substantial complexity. We present SpaDA, a programming language that offers precise control over data placement, dataflow patterns, and asynchronous operations while abstracting low-level architectural details. We design and implement a compiler targeting Cerebras CSL through multi-level lowering and unique optimization passes. SpaDA functions as a high-level programming interface and an intermediate representation for domain-specific languages (DSLs), demonstrated here with the GT4Py stencil DSL. SpaDA enables concise expression of operations with complex parallel patterns -- including pipelined collective operations, multi-dimensional stencils, and dense linear algebra -- in 14.09x fewer lines than CSL, achieving over 260 TFlop/s across 730,000 PEs on a single device.

2025-11-12T16:03:52Z Lukas Gianinazzi Tal Ben-Nun Torsten Hoefler http://arxiv.org/abs/2512.00127v3 Generating Verifiable Chain of Thoughts from Exection-Traces 2026-04-27T09:10:32Z

Getting language models to reason correctly about code requires training on data where each reasoning step can be checked. Current synthetic Chain-of-Thought (CoT) training data often consists of plausible-sounding explanations generated by teacher models, and not verifiable accounts of actual program behavior. Models trained on such data learn logically flawed reasoning patterns despite syntactic correctness. To address this, we build a pipeline that generates execution-trace-verified CoT rationales by instrumenting code to capture traces, narrating them into natural language, and cross-checking each narration against the original trace. We systematically create 54,000 verified, bi-directional rationales that teach models to reason both forward (input$\rightarrow$output) and backward (output$\rightarrow$input). Models fine-tuned on our verified data achieve substantial improvements, with a peak gain of +26.6 on LiveCodeBench-Exec, +22.2 on CruxEval, and +19.5 on HumanEval across our fine-tuned models, demonstrating that verification quality directly determines both reasoning and code generation capabilities. Complete synthesis pipeline is avilable as open-source: https://github.com/IBM/verified-code-cot/

2025-11-28T07:43:43Z Shailja Thakur Vaibhav Saxena Rohan Kulkarni Shivdeep Singh Parameswaran Selvam Hima Patel Hiroshi Kanayama http://arxiv.org/abs/2511.10135v4 Contextual Refinement of Higher-Order Concurrent Probabilistic Programs (Extended Version) 2026-04-27T08:45:27Z

We present Foxtrot, the first higher-order separation logic for proving contextual refinement of higher-order concurrent probabilistic programs with higher-order local state. From a high level, Foxtrot inherits various concurrency reasoning principles from standard concurrent separation logic, e.g. invariants and ghost resources, and supports advanced probabilistic reasoning principles for reasoning about complex probability distributions induced by concurrent threads, e.g. tape presampling and induction by error amplification. The integration of these strong reasoning principles is highly non-trivial due to the combination of probability and concurrency in the language and the complexity of the Foxtrot model; the soundness of the logic relies on a version of the axiom of choice within the Iris logic, which is not used in earlier work on Iris-based logics. We demonstrate the expressiveness of Foxtrot on a wide range of examples, including the adversarial von Neumann coin and the $\mathsf{randombytes\_uniform}$ function of the Sodium cryptography software library. All results have been mechanized in the Rocq proof assistant and the Iris separation logic framework.

2025-11-13T09:48:37Z Updated version (fix bibliography) Kwing Hei Li Alejandro Aguirre Joseph Tassarotti Lars Birkedal