https://arxiv.org/api/ehwHb2J12udlcVhafjGPxgspUog 2026-03-22T14:42:51Z 25302 75 15 http://arxiv.org/abs/2603.16349v1 SseRex: Practical Symbolic Execution of Solana Smart Contracts 2026-03-17T10:33:11Z Solana is rapidly gaining traction among smart contract developers and users. However, its growing adoption has been accompanied by a series of major security incidents, which have spurred research into automated analysis techniques for Solana smart contracts. Unfortunately, existing approaches do not address the unique and complex account model of Solana. In this paper, we propose SseRex, the first symbolic execution vulnerability detection approach for finding Solana-specific bugs such as missing owner checks, missing signer checks, and missing key checks, as well as arbitrary cross-program invocations. Our evaluation of 8,714 bytecode-only contracts shows that our approach outperforms existing approaches and identifies potential bugs in 467 different contracts. Additionally, we analyzed 120 open-source Solana projects and conducted in-depth case studies on four of them. Our findings reveal that subtle, easily overlooked issues often serve as the root cause of severe exploits, further highlighting the need for specialized analysis tools like SseRex. 2026-03-17T10:33:11Z This paper appeared on the 23rd Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA '26) in July 2026 Tobias Cloosters Pascal Winkler Jens-Rene Giesen Ghassan Karame Lucas Davi http://arxiv.org/abs/2603.16348v1 Prompts Blend Requirements and Solutions: From Intent to Implementation 2026-03-17T10:31:44Z AI coding assistants are reshaping software development by shifting focus from writing code to formulating prompts. In chat-focused approaches such as vibe coding, prompts become the primary arbiter between human intent and executable software. While Requirements Engineering (RE) emphasizes capturing, validating, and evolving requirements, current prompting practices remain informal and adhoc. We argue that prompts should be understood as lightweight, evolving requirement artifacts that blend requirements with solution guidance. We propose a conceptual model decomposing prompts into three interrelated components: Functionality and Quality (the requirement), General Solutions (architectural strategy and technology choices) and Specific Solutions (implementation-level constraints). We assess this model using existing prompts, examining how these components manifest in practice. Based on this model and the initial assessment, we formulate four hypotheses: prompts evolve toward specificity, evolution varies by user characteristics, engineers using prompting engage in increased requirement validation and verification, and progressive prompt refinement yields higher code quality. Our vision is to empirically evaluate these hypotheses through analysis of real-world AI-assisted development, with datasets, corpus analysis, and controlled experiments, ultimately deriving best practices for requirements-aware prompt engineering. By rethinking prompts through the lens of RE, we position prompting not merely as a technical skill, but as a central concern for software engineering's future. 2026-03-17T10:31:44Z 9 pages, 1 figure, 2 tables. Submitted to EASE 2026 Shalini Chakraborty Jan-Philipp Steghöfer http://arxiv.org/abs/2603.16325v1 A Human-Centred Architecture for Large Language Models-Cognitive Assistants in Manufacturing within Quality Management Systems 2026-03-17T09:58:34Z Large Language Models-Cognitive Assistants (LLM-CAs) can enhance Quality Management Systems (QMS) in manufacturing, fostering continuous process improvement and knowledge management. However, there is no human-centred software architecture focused on QMS that enables the integration of LLM-CAs into manufacturing in the current literature. This study addresses this gap by designing a component-based architecture considering requirement analysis and software development process. Validation was conducted via iterative expert focus groups. The proposed architecture ensures flexibility, scalability, modularity, and work augmentation within QMS. Moreover, it paves the way for its operationalization with industrial partners, showcasing its potential for advancing manufacturing processes. 2026-03-17T09:58:34Z Marcos Galdino Johanna Grahl Tobias Hamann Anas Abdelrazeq Ingrid Isenhardt http://arxiv.org/abs/2603.16293v1 Results of the analysis of a survey for young scientists on training quality in HEP instrumentation software and machine learning 2026-03-17T09:30:19Z A 2021 study by the ECFA Early-Career Researchers Panel revealed that 71% of 334 respondents used open-source software tools in their instrumentation work, yet 70% reported receiving no training for these tools. In response, the Software and Machine Learning for Instrumentation group was formed in the ECFA Early-Career Researchers Panel to assess the accessibility and quality of training programs in machine learning and software for early-career researchers in experimental and applied physics. This group launched a new survey, reaching 174 participants. This report summarises the survey results in detail, and is intended to serve as a guiding document to improve the training programs that are available to early-career researchers. 2026-03-17T09:30:19Z Cecilia Borca for the ECFA ECR Panel Javier Jiménez Peña for the ECFA ECR Panel David Marckx for the ECFA ECR Panel Malgorzata Niemiec for the ECFA ECR Panel Elisabetta Spadaro Norella for the ECFA ECR Panel Marta Urbaniak for the ECFA ECR Panel http://arxiv.org/abs/2603.16265v1 GitOps for Capture the Flag Platforms 2026-03-17T08:55:01Z In this paper, we present CTF Pilot, a GitOps-based framework for the deployment and management of Capture The Flag (CTF) competitions. By leveraging Git repositories as the single source of truth for challenge definitions and infrastructure configurations, CTF Pilot enables automated, version-controlled deployments that enhance collaboration among challenge authors and organizers. We detail the design criteria and implementation of CTF Pilot and evaluate our approach through a real-world CTF event, demonstrating its cost efficiency and its effectiveness in handling high participant concurrency while ensuring robust isolation and ease of challenge development. Our results indicate that CTF Pilot improves the experience for organizers and participants, and we present the lessons learned, highlighting opportunities for future improvement. 2026-03-17T08:55:01Z Mikkel Bengtson Albrechtsen Jacopo Mauro Torben Worm http://arxiv.org/abs/2504.04537v2 ICCheck: A Portable, Language-Agnostic Tool for Synchronizing Code Clones 2026-03-17T07:53:48Z Inconsistent modifications to code clones can lead to software defects. Many approaches exist to support consistent modifications based on clone detection and/or change pattern extraction. However, no tool currently supports synchronization of code clones across diverse programming languages and development environments. We propose ICCheck, a tool designed to be language-agnostic and portable across various environments. By leveraging an existing language-agnostic clone search technique and limiting the tool's external dependency to an existing Git repository, we developed a tool that can assist in synchronizing code clones in diverse environments. We validated the tool's functionality in multiple open-source repositories, where ICCheck was able to detect overlooked clone modifications in over 30 programming and domain-specific languages and delivered interactive suggestions within a median of 0.27 seconds in editor environments, demonstrating its language independence and responsiveness. Furthermore, by supporting the Language Server Protocol, we confirmed that ICCheck can be integrated into multiple development environments with minimal effort. ICCheck is available at https://github.com/salab/iccheck 2025-04-06T16:28:14Z Accepted manuscript in Science of Computer Programming Motoki Abe Shinpei Hayashi http://arxiv.org/abs/2603.16208v1 SoK: Systematizing Software Artifacts Traceability via Associations, Techniques, and Applications 2026-03-17T07:39:31Z Software development relies heavily on traceability links between various software artifacts to ensure quality and facilitate maintenance. While automated traceability recovery techniques have advanced for different artifact pairs, the field remains fragmented with an incomplete overview of artifact associations, ambiguous linking techniques, and fragmented knowledge of application scenarios. To bridge these gaps, we conducted a systematic literature review on software traceability recovery to synthesize the linked artifacts, recovery tools, and usage scenarios across the traceability ecosystem. First, we constructed the first global artifacts traceability graph of 23 associations among 22 artifact types, exposing a severe research imbalance that heavily favors code-related links. Second, while recovery techniques are shifting toward deep semantic models, a reproducibility crisis persists (e.g., only 37% of studies released code); to address this, we provided a comprehensive evaluation framework including a technical decision map and standardized benchmarks. Finally, we quantified an industrial adoption gap (i.e., 95% of tools remain confined to academia) and proposed a role-centric framework to dynamically align artifact paths with concrete engineering activities. This review contributes a coherent knowledge framework for artifacts traceability research, identifies current trends, and provides directions for future work. 2026-03-17T07:39:31Z Zhifei Chen Nanjing University of Science and Technology, China Lata Yi Nanjing University of Science and Technology, China Liming Nie Shenzhen Technology University, China Yangyang Zhao Zhejiang Sci-Tech University, China Hao Liu Shenzhen Technology University, China Yiqing Shi Nanjing University of Science and Technology, China Wei Song Nanjing University of Science and Technology, China http://arxiv.org/abs/2602.23592v2 KEEP: A KV-Cache-Centric Memory Management System for Efficient Embodied Planning 2026-03-17T07:20:39Z Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global view, thereby avoiding repetitive exploration. However, existing approaches often store the memory as raw text, leading to excessively long prompts and high prefill latency. While it is possible to store and reuse the KV caches, the efficiency benefits are greatly undermined due to frequent KV cache updates. In this paper, we propose KEEP, a KV-cache-centric memory management system for efficient embodied planning. KEEP features 3 key innovations: (1) a Static-Dynamic Memory Construction algorithm that reduces KV cache recomputation by mixed-granularity memory group; (2) a Multi-hop Memory Re-computation algorithm that dynamically identifies important cross-attention among different memory groups and reconstructs memory interactions iteratively; (3) a Layer-balanced Memory Loading that eliminates unbalanced KV cache loading and cross-attention computation across different layers. Extensive experimental results have demonstrated that KEEP achieves 2.68x speedup with negligible accuracy loss compared with text-based memory methods on ALFRED dataset. Compared with the KV re-computation method CacheBlend (EuroSys'25), KEEP shows 4.13% success rate improvement and 1.90x time-to-first-token (TTFT) reduction. Our code is available on https://github.com/PKU-SEC-Lab/KEEP_Embodied_Memory. 2026-02-27T01:48:07Z DAC 2026 Zebin Yang Tong Xie Baotong Lu Shaoshan Liu Bo Yu Meng Li http://arxiv.org/abs/2603.16155v1 Dialect-Agnostic SQL Parsing via LLM-Based Segmentation 2026-03-17T06:18:37Z SQL is a widely adopted language for querying data, which has led to the development of various SQL analysis and rewriting tools. However, due to the diversity of SQL dialects, such tools often fail when encountering unrecognized dialect-specific syntax. While Large Language Models (LLMs) have shown promise in understanding SQL queries, their inherent limitations in handling hierarchical structures and hallucination risks limit their direct applicability in parsing. To address these limitations, we propose SQLFlex, a novel query rewriting framework that integrates grammar-based parsing with LLM-based segmentation to parse diverse SQL dialects robustly. Our core idea is to decompose hierarchical parsing to sequential segmentation tasks, which better aligns with the strength of LLMs and improves output reliability through validation checks. Specifically, SQLFlex uses clause-level segmentation and expression-level segmentation as two strategies that decompose elements on different levels of a query. We extensively evaluated SQLFlex on both real-world use cases and in a standalone evaluation. In SQL linting, SQLFlex outperforms SQLFluff in ANSI mode by 63.68% in F1 score while matching its dialect-specific mode performance. In test-case reduction, SQLFlex outperforms SQLess by up to 10 times in simplification rate. In the standalone evaluation, it parses 91.55% to 100% of queries across eight distinct dialects, outperforming all baseline parsers. We believe SQLFlex can serve as a foundation for many query analysis and rewriting use cases. 2026-03-17T06:18:37Z Junwen An Kabilan Mahathevan Manuel Rigger 10.1145/3802038 http://arxiv.org/abs/2603.16124v1 SWE-QA-Pro: A Representative Benchmark and Scalable Training Recipe for Repository-Level Code Understanding 2026-03-17T05:12:48Z Agentic repository-level code understanding is essential for automating complex software engineering tasks, yet the field lacks reliable benchmarks. Existing evaluations often overlook the long tail topics and rely on popular repositories where Large Language Models (LLMs) can cheat via memorized knowledge. To address this, we introduce SWE-QA-Pro, a benchmark constructed from diverse, long-tail repositories with executable environments. We enforce topical balance via issue-driven clustering to cover under-represented task types and apply a rigorous difficulty calibration process: questions solvable by direct-answer baselines are filtered out. This results in a dataset where agentic workflows significantly outperform direct answering (e.g., a ~13-point gap for Claude Sonnet 4.5), confirming the necessity of agentic codebase exploration. Furthermore, to tackle the scarcity of training data for such complex behaviors, we propose a scalable synthetic data pipeline that powers a two-stage training recipe: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning from AI Feedback (RLAIF). This approach allows small open models to learn efficient tool usage and reasoning. Empirically, a Qwen3-8B model trained with our recipe surpasses GPT-4o by 2.3 points on SWE-QA-Pro and substantially narrows the gap to state-of-the-art proprietary models, demonstrating both the validity of our evaluation and the effectiveness of our agentic training workflow. 2026-03-17T05:12:48Z Songcheng Cai Zhiheng Lyu Yuansheng Ni Xiangchao Chen Baichuan Zhou Shenzhe Zhu Yi Lu Haozhe Wang Chi Ruan Benjamin Schneider Weixu Zhang Xiang Li Andy Zheng Yuyu Zhang Ping Nie Wenhu Chen http://arxiv.org/abs/2505.07243v2 A Black-box Testing Framework for Oracle Quantum Programs 2026-03-17T04:45:17Z Oracle quantum programs are a fundamental class of quantum programs that serve as a critical bridge between quantum computing and classical computing. Many important quantum algorithms are built upon oracle quantum programs, making it essential to ensure their correctness during development. Although software testing is a well-established approach for improving program reliability, no systematic method has been developed to test oracle quantum programs. This paper proposes a black-box testing framework designed for general oracle quantum programs. We formally define these programs, establish the foundational theory for their testing, and propose a detailed testing framework. We develop a prototype tool and conduct extensive experimental evaluations to evaluate the effectiveness of the framework. Our results demonstrate that the proposed framework significantly aids developers in testing oracle quantum programs, providing insights to enhance the reliability of quantum software. 2025-05-12T05:31:55Z 46 pages, 11 figures Peixun Long Jianjun Zhao http://arxiv.org/abs/2603.16107v1 RepoReviewer: A Local-First Multi-Agent Architecture for Repository-Level Code Review 2026-03-17T04:17:51Z Repository-level code review requires reasoning over project structure, repository context, and file-level implementation details. Existing automated review workflows often collapse these tasks into a single pass, which can reduce relevance, increase duplication, and weaken prioritization. We present RepoReviewer, a local-first multi-agent system for automated GitHub repository review with a Python CLI, FastAPI API, LangGraph orchestration layer, and Next.js user interface. RepoReviewer decomposes review into repository acquisition, context synthesis, file-level analysis, finding prioritization, and summary generation. We describe the system design, implementation tradeoffs, developer-facing interfaces, and practical failure modes. Rather than claiming benchmark superiority, we frame RepoReviewer as a technical systems contribution: a pragmatic architecture for repository-level automated review, accompanied by reusable evaluation and reporting infrastructure for future empirical study. 2026-03-17T04:17:51Z Peng Zhang http://arxiv.org/abs/2603.10249v2 DUCTILE: Agentic LLM Orchestration of Engineering Analysis in Product Development Practice 2026-03-17T02:22:06Z Engineering analysis automation in product development relies on rigid interfaces between tools, data formats and documented processes. When these interfaces change, as they routinely do as the product evolves in the engineering ecosystem, the automation support breaks. This paper presents a DUCTILE (Delegated, User-supervised Coordination of Tool- and document-Integrated LLM-Enabled) agentic orchestration, an approach for developing, executing and evaluating LLM-based agentic automation support of engineering analysis tasks. The approach separates adaptive orchestration, performed by the LLM agent, from deterministic execution, performed by verified engineering tools. The agent interprets documented design practices, inspects input data and adapts the processing path, while the engineer supervises and exercises final judgment. DUCTILE is demonstrated on an industrial structural analysis task at an aerospace manufacturer, where the agent handled input deviations in format, units, naming conventions and methodology that would break traditional scripted pipelines. Evaluation against expert-defined acceptance criteria and deployment with practicing engineers confirm that the approach produces correct, methodologically compliant results across 10 repeated independent runs. The paper discusses the paradigm shift and the practical consequences of adopting agentic automation, including unintended effects on the nature of engineering work when removing mundane tasks and creating an exhausting supervisory role. 2026-03-10T22:00:47Z 22 pages, including supplemental material. 9 Figures Alejandro Pradas-Gomez Arindam Brahma Ola Isaksson http://arxiv.org/abs/2603.16057v1 Toward Reliable Scientific Visualization Pipeline Construction with Structure-Aware Retrieval-Augmented LLMs 2026-03-17T01:52:11Z Scientific visualization pipelines encode domain-specific procedural knowledge with strict execution dependencies, making their construction sensitive to missing stages, incorrect operator usage, or improper ordering. Thus, generating executable scientific visualization pipelines from natural-language descriptions remains challenging for large language models, particularly in web-based environments where visualization authoring relies on explicit code-level pipeline assembly. In this work, we investigate the reliability of LLM-based scientific visualization pipeline generation, focusing on vtk.js as a representative web-based visualization library. We propose a structure-aware retrieval-augmented generation workflow that provides pipeline-aligned vtk.js code examples as contextual guidance, supporting correct module selection, parameter configuration, and execution order. We evaluate the proposed workflow across multiple multi-stage scientific visualization tasks and LLMs, measuring reliability in terms of pipeline executability and human correction effort. To this end, we introduce correction cost as metric for the amount of manual intervention required to obtain a valid pipeline. Our results show that structured, domain-specific context substantially improves pipeline executability and reduces correction cost. We additionally provide an interactive analysis interface to support human-in-the-loop inspection and systematic evaluation of generated visualization pipelines. 2026-03-17T01:52:11Z Guanghui Zhao Zhe Wang Yu Dong Guan Li GuiHua Shan http://arxiv.org/abs/2603.16012v1 Making Software Metrics Useful 2026-03-16T23:42:44Z Most engineers use measurements to make decisions. However, measurements are rarely used for decisions about constructing software products. While many approaches to measuring attributes of software (``metrics'') have been developed, they are rarely used to answer useful questions such as ``Do I need to refactor this class?'' or ``Are these integration tests sufficient?'' Practitioners therefore question the value of software metrics. We argue that this situation arose because software metrics were developed without understanding metrology (the science of measurement) and suggest directions software metrics research should take. 2026-03-16T23:42:44Z IEEE Computer. In Press Ewan Tempero Paul Ralph 10.1109/MC.2026.3666634