https://arxiv.org/api/xIpCztkygGe4xTi6b2D11QDW6oQ 2026-03-18T10:14:43Z 46472 0 15 http://arxiv.org/abs/2603.16745v1 Persistent Device Identity for Network Access Control in the Era of MAC Address Randomization: A RADIUS-Based Framework 2026-03-17T16:21:57Z Modern operating systems increasingly randomize Media Access Control (MAC) addresses to protect user privacy, fundamentally disrupting Network Access Control (NAC) systems that have relied on MAC addresses as persistent device identifiers for over two decades. This disruption affects critical enterprise environments including federal government agencies operating under FISMA, healthcare organizations subject to HIPAA, financial institutions governed by PCI-DSS, and educational networks managing large-scale BYOD deployments. This paper presents a comprehensive framework for maintaining persistent device identity in NAC environments through a RADIUS protocol-based approach that assigns and distributes a Globally Unique Identifier (GUID) to endpoints via RADIUS Access-Accept messages. The proposed architecture addresses the complete device lifecycle including initial enrollment, re-authentication across randomized addresses, device management integration, certificate-based identity binding, and device attribute correlation. We describe the framework's design across six distinct use cases -- BYOD, managed devices, VPN-based posture assessment, non-VPN posture, guest access, and IoT device profiling -- and analyze its effectiveness in maintaining device visibility, accurate license counting, and regulatory compliance under continuous MAC address randomization. The approach is compatible with existing 802.1X and MAB infrastructure, requires no client-side operating system modifications, and aligns with the recently published RFC 9797 and IEEE 802.11bh-2024 standards. Our framework enables organizations to maintain regulatory compliance while preserving the privacy benefits that MAC address randomization was designed to provide. 2026-03-17T16:21:57Z 26 pages, 4 figures, 3 tables. Preprint Premanand Seralathan http://arxiv.org/abs/2603.16735v1 Ember: A Serverless Peer-to-Peer End-to-End Encrypted Messaging System over an IPv6 Mesh Network 2026-03-17T16:18:03Z This paper presents Ember, a serverless peer-to-peer messaging system providing end-to-end encrypted communication over a decentralised IPv6 mesh network. Ember operates without central servers, enforces data minimisation through ciphertext-only local storage and time-based message expiration, and prioritises architectural clarity, explicit trust boundaries, and practical deployability on Android. The paper describes the system architecture, cryptographic design, network model, and security properties -- including dynamic testing results demonstrating that no plaintext is recoverable from captured network traffic -- and discusses limitations and future work 2026-03-17T16:18:03Z 54 pages Hamish Alsop Leandros Maglaras Naghmeh Moradpoor http://arxiv.org/abs/2412.15004v4 From Vulnerabilities to Remediation: A Systematic Literature Review of LLMs in Code Security 2026-03-17T16:08:05Z Large Language Models (LLMs) have emerged as powerful tools for automating programming tasks, including security-related ones. However, they can also introduce vulnerabilities during code generation, fail to detect existing vulnerabilities, or report nonexistent ones. This systematic literature review investigates the security benefits and drawbacks of using LLMs for code-related tasks. In particular, it focuses on the types of vulnerabilities introduced by LLMs when generating code. Moreover, it analyzes the capabilities of LLMs to detect and fix vulnerabilities, and examines how prompting strategies impact these tasks. Finally, it examines how data poisoning attacks impact LLMs performance in the aforementioned tasks. 2024-12-19T16:20:22Z Enna Basic Alberto Giaretta http://arxiv.org/abs/2603.16694v1 SynthChain: A Synthetic Benchmark and Forensic Analysis of Advanced and Stealthy Software Supply Chain Attacks 2026-03-17T15:50:59Z Advanced software supply chain (SSC) attacks are increasingly runtime-only and leave fragmented evidence across hosts, services, and build/dependency layers, so any single telemetry stream is inherently insufficient to reconstruct full compromise chains under realistic access and budget limits. We present SynthChain, a near-production testbed and a multi-source runtime dataset with chain-level ground truth, derived from real-world malicious packages and exploit campaigns. SynthChain covers seven representative supply-chain exploit scenarios across PyPI, npm, and a native C/C++ supply-chain case, spanning Windows and Linux, and involving four hosts and one containerized environment. Scenarios span realistic time windows from minutes to hours and are annotated with 14 MITRE ATT&CK tactics and 161 techniques (29-104 techniques per scenario). Beyond releasing the data, we quantify observability constraints by mapping each chain step to the minimum evidence needed for detection and cross-source correlation. With realistic trace availability, no single source is chain-complete: the best single source reaches only 0.391 weighted tag/step coverage and 0.403 mean chain reconstruction. Even minimal two-source fusion boosts coverage to 0.636 and reconstruction to 0.639 (approximately 1.6x gain), with consistent chain coverage/recall improvements (0.545). The corpus contains approximately 0.58M raw multi-source events and 1.50M evaluation rows, enabling controlled studies of detection under constrained telemetry. We release the dataset, ground truth, and artifacts to support reproducible, forensic-aware runtime defenses and to guide efficient detection for software supply chains. 2026-03-17T15:50:59Z Zhuoran Tan Wenbo Guo Taylor Brierley Jiewen Luo Jeremy Singer Christos Anagnostopoulos 10.5281/zenodo.18481571 http://arxiv.org/abs/2603.13517v2 CTI-REALM: Benchmark to Evaluate Agent Performance on Security Detection Rule Generation Capabilities 2026-03-17T15:49:46Z CTI-REALM (Cyber Threat Real World Evaluation and LLM Benchmarking) is a benchmark designed to evaluate AI agents' ability to interpret cyber threat intelligence (CTI) and develop detection rules. The benchmark provides a realistic environment that replicates the security analyst workflow. This enables agents to examine CTI reports, execute queries, understand schema structures, and construct detection rules. Evaluation involves emulated attacks of varying complexity across Linux systems, cloud platforms, and Azure Kubernetes Service (AKS), with ground truth data for accurate assessment. Agent performance is measured through both final detection results and trajectory-based rewards that capture decision-making effectiveness. This work demonstrates the potential of AI agents to support labor-intensive aspects of detection engineering. Our comprehensive evaluation of 16 frontier models shows that Claude Opus 4.6 (High) achieves the highest overall reward (0.637), followed by Claude Opus 4.5 (0.624) and the GPT-5 family. An ablation study confirms that CTI-specific tools significantly improve agent performance, a variance analysis across repeated runs demonstrates result stability. Finally, a memory augmentation study shows that seeded context can close 33\% of the performance gap between smaller and larger models. 2026-03-13T18:48:40Z 11 pages, 5 figures, 4 tables Arjun Chakraborty Sandra Ho Adam Cook Manuel Meléndez http://arxiv.org/abs/2406.07714v4 LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing 2026-03-17T15:37:48Z Greybox fuzzing has achieved success in revealing bugs and vulnerabilities in programs. However, randomized mutation strategies have limited the fuzzer's performance on structured data. Specialized fuzzers can handle complex structured data, but require additional efforts in grammar and suffer from low throughput. In this paper, we explore the potential of utilizing the Large Language Model to enhance greybox fuzzing for structured data. We utilize the pre-trained knowledge of LLM about data conversion and format to generate new valid inputs. We further fine-tuned it with paired mutation seeds to learn structured format and mutation strategies effectively. Our LLM-based fuzzer, LLAMAFUZZ, integrates the power of LLM to understand and mutate structured data to fuzzing. We conduct experiments on the standard bug-based benchmark Magma and a wide variety of real-world programs. LLAMAFUZZ outperforms our top competitor by 41 bugs on average. We also identified 47 unique bugs across all trials. Moreover, LLAMAFUZZ demonstrated consistent performance on both bug trigger and bug reached. Compared to AFL++, LLAMAFUZZ achieved 27.19% more branches in real-world program sets on average. We also demonstrate a case study to explain how LLMs enhance the fuzzing process in terms of code coverage. 2024-06-11T20:48:28Z The 7th ACM/IEEE International Conference on Automation of Software Test (AST 2026) Hongxiang Zhang Yuyang Rong Yifeng He Hao Chen http://arxiv.org/abs/2509.10206v2 Feature Attribution in 5G Intrusion Detection: A Statistical vs. Logic-Based Comparison 2026-03-17T14:54:45Z With the rise of fifth-generation (5G) networks in critical applications, it is urgent to move from detection of malicious activity to systems capable of providing a reliable verdict suitable for mitigation. In this regard, understanding and interpreting machine learning (ML) models' security alerts is crucial for enabling actionable incident response orchestration. Explainable Artificial Intelligence (XAI) techniques are expected to enhance trust by providing insights into why alerts are raised. Under the umbrella of XAI, interpretability of outcomes is crucially dependent on understanding the influence of specific inputs, referred to as feature attribution. {A dominant approach to feature attribution statistically associates feature sets that can be correlated to a given alert. This paper investigates its merits against the backdrop of criticism from recent literature, in comparison with feature attribution based on logic. We extensively study two methods, SHAP and VoTE-XAI, as representatives of each feature attribution approach by analyzing their interpretations of alerts generated by an XGBoost model across three 5G-relevant datasets (5G-NIDD, MSA, and PFCP) covering multiple attack scenarios. We identify three metrics for assessing explanations: sparsity, how concise they are; stability, how consistent they are across samples from the same attack type; and efficiency, how fast an explanation is generated. Our results reveal that logic-based attributions are consistently more sparse and stable across alerts. More importantly, we found a significant divergence between features selected by SHAP and VoTE-XAI. However, none of the top-ranked features selected by SHAP were missed by VoTE-XAI. Finally, we analyze the efficiency of both methods, discussing their suitability for real-time security monitoring even in high-dimensional 5G environments (478 features). 2025-09-12T12:55:48Z Federica Uccello Simin Nadjm-Tehrani http://arxiv.org/abs/2506.12846v9 VFEFL: Privacy-Preserving Federated Learning against Malicious Clients via Verifiable Functional Encryption 2026-03-17T14:54:19Z Federated learning is a promising distributed learning paradigm that enables collaborative model training without exposing local client data, thereby protecting data privacy. However, it also brings new threats and challenges. The advancement of model inversion attacks has rendered the plaintext transmission of local models insecure, while the distributed nature of federated learning makes it particularly vulnerable to attacks raised by malicious clients. To protect data privacy and prevent malicious client attacks, this paper proposes a privacy-preserving Federated Learning framework based on Verifiable Functional Encryption (VFEFL), without a non-colluding dual-server assumption or additional trusted third-party. Specifically, we propose a novel Cross-Ciphertext Decentralized Verifiable Functional Encryption (CC-DVFE) scheme that enables the verification of specific relationships over multi-dimensional ciphertexts. This scheme is formally treated, in terms of definition, security model and security proof. Furthermore, based on the proposed CC-DVFE scheme, we design a privacy-preserving federated learning framework that incorporates a novel robust aggregation rule to detect malicious clients, enabling the effective training of high-accuracy models under adversarial settings. Finally, we provide the formal analysis and empirical evaluation of VFEFL. The results demonstrate that our approach achieves the desired privacy protection, robustness, verifiability and fidelity, while eliminating the reliance on non-colluding dual-server assumption or trusted third parties required by most existing methods. 2025-06-15T13:38:40Z Nina Cai Jinguang Han Weizhi Meng http://arxiv.org/abs/2602.16564v2 A Scalable Approach to Solving Simulation-Based Network Security Games 2026-03-17T14:41:01Z We introduce MetaDOAR, a lightweight meta-controller that augments the Double Oracle / PSRO paradigm with a learned, partition-aware filtering layer and Q-value caching to enable scalable multi-agent reinforcement learning on very large cyber-network environments. MetaDOAR learns a compact state projection from per node structural embeddings to rapidly score and select a small subset of devices (a top-k partition) on which a conventional low-level actor performs focused beam search utilizing a critic agent. Selected candidate actions are evaluated with batched critic forwards and stored in an LRU cache keyed by a quantized state projection and local action identifiers, dramatically reducing redundant critic computation while preserving decision quality via conservative k-hop cache invalidation. Empirically, MetaDOAR attains higher player payoffs than SOTA baselines on large network topologies, without significant scaling issues in terms of memory usage or training time. This contribution provide a practical, theoretically motivated path to efficient hierarchical policy learning for large-scale networked decision problems. 2026-02-18T16:07:01Z Michael Lanier Yevgeniy Vorobeychik http://arxiv.org/abs/2601.14455v2 Unpacking Security Scanners for GitHub Actions Workflows 2026-03-17T14:40:25Z GitHub Actions is a widely used platform to automate the build and deployment of software projects through configurable workflows. As the platform's popularity grows, it also becomes a target of choice for software supply chain attacks. These attacks exploit excessive permissions, ambiguous versions or the absence of artifact integrity checks to compromise the workflows. In response to these attacks, several security scanners have emerged to help developers harden their workflows. In this paper, we perform the first systematic comparison of 9 GitHub Actions Workflows security scanners. We compare them regarding scope (which security weaknesses they target), detection capabilities (how many weaknesses they detect), and performance (how long they take to scan a workflow). In order to compare the scanners on a common ground, we first establish a classification of 10 common security weaknesses that can be found in GitHub Actions Workflows. Then, we run the scanners against a curated set of 2722 workflows. Our study reveals that the landscape of GitHub Actions Workflows security scanners is very diverse, with both general purpose and focused scanners. More importantly, we provide evidence that these scanners implement fundamentally different analysis strategies, leading to major gaps regarding the nature and the number of reported security weaknesses. Based on these empirical evidence we make actionable recommendations for developers to harden their GitHub Actions Workflows. 2026-01-20T20:25:11Z 16 pages, 3 figures, 5 tables Madjda Fares Yogya Gamage Benoit Baudry http://arxiv.org/abs/2603.16576v1 REFORGE: Multi-modal Attacks Reveal Vulnerable Concept Unlearning in Image Generation Models 2026-03-17T14:29:01Z Recent progress in image generation models (IGMs) enables high-fidelity content creation but also amplifies risks, including the reproduction of copyrighted content and the generation of offensive content. Image Generation Model Unlearning (IGMU) mitigates these risks by removing harmful concepts without full retraining. Despite growing attention, the robustness under adversarial inputs, particularly image-side threats in black-box settings, remains underexplored. To bridge this gap, we present REFORGE, a black-box red-teaming framework that evaluates IGMU robustness via adversarial image prompts. REFORGE initializes stroke-based images and optimizes perturbations with a cross-attention-guided masking strategy that allocates noise to concept-relevant regions, balancing attack efficacy and visual fidelity. Extensive experiments across representative unlearning tasks and defenses demonstrate that REFORGE significantly improves attack success rate while achieving stronger semantic alignment and higher efficiency than involved baselines. These results expose persistent vulnerabilities in current IGMU methods and highlight the need for robustness-aware unlearning against multi-modal adversarial attacks. Our code is at: https://github.com/Imfatnoily/REFORGE. 2026-03-17T14:29:01Z Accepted by ICME 2026 Yong Zou Haoran Li Fanxiao Li Shenyang Wei Yunyun Dong Li Tang Wei Zhou Renyang Liu http://arxiv.org/abs/2603.16572v1 Malicious Or Not: Adding Repository Context to Agent Skill Classification 2026-03-17T14:27:35Z Agent skills extend local AI agents, such as Claude Code or Open Claw, with additional functionality, and their popularity has led to the emergence of dedicated skill marketplaces, similar to app stores for mobile applications. Simultaneously, automated skill scanners were introduced, analyzing the skill description available in SKILL.md, to verify their benign behavior. The results for individual market places mark up to 46.8% of skills as malicious. In this paper, we present the largest empirical security analysis of the AI agent skill ecosystem, questioning this high classification of malicious skills. Therefore, we collect 238,180 unique skills from three major distribution platforms and GitHub to systematically analyze their type and behavior. This approach substantially reduces the number of skills flagged as non-benign by security scanners to only 0.52% which remain in malicious flagged repositories. Consequently, out methodology substantially reduces false positives and provides a more robust view of the ecosystem's current risk surface. Beyond that, we extend the security analysis from the mere investigation of the skill description to a comparison of its congruence with the GitHub repository the skill is embedded in, providing additional context. Furthermore, our analysis also uncovers several, by now undocumented real-world attack vectors, namely hijacking skills hosted on abandoned GitHub repositories. 2026-03-17T14:27:35Z 23 Pages, 10 Figures Florian Holzbauer David Schmidt Gabriel Gegenhuber Sebastian Schrittwieser Johanna Ullrich http://arxiv.org/abs/2603.16548v1 SAMSEM -- A Generic and Scalable Approach for IC Metal Line Segmentation 2026-03-17T14:13:21Z In light of globalized hardware supply chains, the assurance of hardware components has gained significant interest, particularly in cryptographic applications and high-stakes scenarios. Identifying metal lines on scanning electron microscope (SEM) images of integrated circuits (ICs) is one essential step in verifying the absence of malicious circuitry in chips manufactured in untrusted environments. Due to varying manufacturing processes and technologies, such verification usually requires tuning parameters and algorithms for each target IC. Often, a machine learning model trained on images of one IC fails to accurately detect metal lines on other ICs. To address this challenge, we create SAMSEM by adapting Meta's Segment Anything Model 2 (SAM2) to the domain of IC metal line segmentation. Specifically, we develop a multi-scale segmentation approach that can handle SEM images of varying sizes, resolutions, and magnifications. Furthermore, we deploy a topology-based loss alongside pixel-based losses to focus our segmentation on electrical connectivity rather than pixel-level accuracy. Based on a hyperparameter optimization, we then fine-tune the SAM2 model to obtain a model that generalizes across different technology nodes, manufacturing materials, sample preparation methods, and SEM imaging technologies. To this end, we leverage an unprecedented dataset of SEM images obtained from 48 metal layers across 14 different ICs. When fine-tuned on seven ICs, SAMSEM achieves an error rate as low as 0.72% when evaluated on other images from the same ICs. For the remaining seven unseen ICs, it still achieves error rates as low as 5.53%. Finally, when fine-tuned on all 14 ICs, we observe an error rate of 0.62%. Hence, SAMSEM proves to be a reliable tool that significantly advances the frontier in metal line segmentation, a key challenge in post-manufacturing IC verification. 2026-03-17T14:13:21Z Christian Gehrmann Jonas Ricker Simon Damm Deruo Cheng Julian Speith Yiqiong Shi Asja Fischer Christof Paar http://arxiv.org/abs/2502.05987v2 Simulating Virtual Players for UNO without Computers 2026-03-17T13:44:17Z UNO is a popular multiplayer card game. In each turn, a player has to play a card in their hand having the same number or color as the most recently played card. When having few people, adding virtual players to play the game can easily be done in UNO video games. However, this is a challenging task for physical UNO without computers. In this paper, we propose an unconventional protocol that can simulate virtual players using nothing but physical UNO cards. In particular, our protocol can uniformly select a valid card to play from each virtual player's hand at random, or report that none exists, without revealing the rest of its hand. The protocol can also be applied to simulate virtual players in other turn-based card or tile games where each player has to select a valid card or tile to play in each turn. 2025-02-09T18:37:31Z This paper has appeared at UCNC 2025 Suthee Ruangwises Kazumasa Shinagawa 10.1007/978-3-032-15641-9_3 http://arxiv.org/abs/2501.12080v2 Balance-Based Cryptography: Physically Computing Any Boolean Function 2026-03-17T13:39:24Z Secure multi-party computation is an area in cryptography which studies how multiple parties can compare their private information without revealing it. Besides digital protocols, many unconventional protocols for secure multi-party computation using physical objects have also been developed. The vast majority of them use playing cards as the main tools. In 2024, Kaneko et al. introduced the use of a balance scale and coins in zero-knowledge proof protocols for pencil puzzles. In this paper, we extend the use of these tools to secure multi-party computation. In particular, we develop four protocols that can securely compute any $n$-variable Boolean function using a balance scale and coins. 2025-01-21T12:11:00Z This paper has appeared at UCNC 2025 Suthee Ruangwises 10.1007/978-3-032-15641-9_5