https://arxiv.org/api/y+A2eeQNXtGW7Fqpax5YT8/9R7g 2026-03-22T14:21:35Z 26796 135 15 http://arxiv.org/abs/2603.13358v1 Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving 2026-03-09T06:11:23Z

Prefill-Decode (PD) disaggregation has become the standard architecture for modern LLM inference engines, which alleviates the interference of two distinctive workloads. With the growing demand for multi-turn interactions in chatbots and agentic systems, we re-examined PD in this case and found two fundamental inefficiencies: (1) every turn requires prefilling the new prompt and response from the last turn, and (2) repeated KV transfers between prefill and decode nodes saturate the bandwidth, leading to high latency and even service degradation. Our key insight is that not all prefill operations are equally disruptive: append-prefill -- processing only the new input tokens while reusing cached KV states -- incurs substantially less decoding slowdown than full prefill. This motivates routing append-prefill to decode nodes locally. However, through comprehensive analysis, we show that no single fixed routing strategy satisfies all Service Level Objectives (SLOs) simultaneously. Based on this insight, we propose Prefill Prefill-capable Decode (PPD) disaggregation, a dynamic routing system that decides when to process Turn 2+ requests locally on decode nodes using cached KV states. PPD adapts to varying SLOs via configurable weights and seamlessly integrates with traditional PD deployments. With extensive evaluations, we show that PPD reduces Turn 2+ time-to-first-token (TTFT) by 68% while maintaining competitive time-per-output-token (TPOT), effectively alleviating KV transfer congestion under high load. We believe PPD represents a flexible and efficient paradigm for multi-turn LLM serving.

2026-03-09T06:11:23Z 14 pages, 9 figures Zongze Li Jingyu Liu Zach Xu Yineng Zhang Tahseen Rabbani Ce Zhang http://arxiv.org/abs/2603.07987v1 PreHO: Predictive Handover for LEO Satellite Networks 2026-03-09T05:51:51Z

Low-Earth Orbit (LEO) Satellite Networks (LSNs) offer a promising solution for extending connectivity to areas not covered by Terrestrial Networks (TNs). However, the rapid movement, broad coverage, and high communication latency of LEO satellites pose significant challenges to conventional handover mechanisms, resulting in unacceptable signaling overhead and handover latency. To address these issues, this paper identifies a fundamental difference between the mobility patterns in LSNs and TNs: users are typically stationary relative to the fast- moving satellites, and channel states in LSNs are often stable and predictable. This observation enables handovers to be planned in advance rather than triggered reactively. Motivated by this insight, we propose PreHO, a predictive handover mechanism tailored for LSNs that proactively determines optimal handover strategies, thereby simplifying the handover process and enhancing overall efficiency. To optimize the pre-planned handover decisions, we further formulate the handover planning problem and develop an efficient iterative algorithm based on alternating optimization and dynamic programming. Extensive evaluations driven by real-world data demonstrate that PreHO significantly outperforms traditional handover schemes in terms of signaling overhead, handover latency, and user experience.

2026-03-09T05:51:51Z Xingqiu He Zijie Ying Chaoqun You Yue Gao http://arxiv.org/abs/2603.07984v1 Energy-Efficient Online Scheduling for Wireless Powered Mobile Edge Computing Networks 2026-03-09T05:46:34Z

Wireless Powered Mobile Edge Computing (WP-MEC) integrates mobile edge computing (MEC) with wireless power transfer (WPT) to simultaneously extend the operational lifetime and enhance the computational capability of wireless devices (WDs). In WPMEC systems, WPT and computation offloading compete for limited wireless resources, which makes their joint scheduling particularly challenging. In this paper, we investigate the energy-efficient online scheduling problem for WPMEC networks with multiple WDs and multiple access points (APs). Based on Lyapunov optimization, we develop an online optimization framework that transforms the original stochastic problem into deterministic per-slot optimization problems. To reduce computational complexity, we introduce the concept of marginal energy efficiency and derive an associated optimality condition, based on which a relax-then-adjust approach is proposed to efficiently obtain feasible solutions. For the resulting non-convex computation offloading subproblem, we analyze the structural properties of its optimal solution and transform it into an assignment problem that can be solved efficiently. We further provide theoretical performance guarantees for both the per-slot and long-term solution, establishing a fundamental trade-off between latency and energy consumption. To improve practical performance, additional mechanisms are introduced to balance the magnitudes of different queues and reduce latency without increasing energy consumption. Extensive simulation results demonstrate the effectiveness and robustness of the proposed algorithm under various system settings.

2026-03-09T05:46:34Z Xingqiu He Chaoqun You Yuzhi Yang Zihan Chen Yuhang Shen Tony Q. S. Quek Yue Gao http://arxiv.org/abs/2312.01049v3 Joint User Association and Resource Allocation for Adaptive Semantic Communication in 5G and Beyond Networks 2026-03-09T05:29:02Z

Semantic communication (SemCom) has emerged as a promising paradigm that leverages Deep Neural Networks (DNNs) to extract task-relevant information, thereby substantially reducing the volume of transmitted data. In existing implementations, the semantic transceiver is typically pre-trained for a specific task and uniformly adopted by all users. However, due to user heterogeneity in computational and communication capabilities, employing a single, fixed semantic transceiver may degrade the coding efficiency and transmission robustness. To address this issue, we first demonstrate the feasibility of dynamically adjusting the computational and communication overhead of DNN-based semantic transceivers, enabling a more flexible paradigm referred to as Adaptive Semantic Communication (ASC). Building on this concept, we formulate a joint user association and resource allocation problem for ASC in 5G and beyond networks, aiming to maximize overall system utility under energy and latency constraints. However, the problem is very challenging due to the inherent interdependencies among decision variables. To tackle this complexity, we decompose the original problem into three subproblems: (i) ASC scheme selection for each user, (ii) spectrum allocation at each Small-cell Base Station (SBS), and (iii) user association across SBSs. Each subproblem is solved sequentially based on the solutions of the preceding stages. The proposed algorithm efficiently yields near-optimal solutions with polynomial-time complexity. Simulation results demonstrate our approach outperforms existing baselines under various situations.

2023-12-02T07:17:42Z Xingqiu He Chaoqun You Zihan Chen Yao Sun Dongzhu Liu Tony Q. S. Quek Yue Gao http://arxiv.org/abs/2603.07932v1 Hard/Soft NLoS Detection via Combinatorial Data Augmentation for 6G Positioning 2026-03-09T03:56:16Z

A key enabler for meeting the stringent requirements of 6G positioning is the ability to exploit site-dependent information governing line-of-sight (LoS) and non-line-of-sight (NLoS) propagation. However, acquiring such environmental information in real time is challenging in practice. To address this issue, we propose a novel NLoS detection algorithm termed combinatorial data augmentation-guided NLoS detection (CDA-ND), which builds upon our prior work. CDA-ND generates numerous preliminary estimated locations (PELs) by applying multilateration over many gNodeB (gNB) combinations using a single snapshot of range measurements. When a target gNB is in NLoS, the resulting PELs split into two clusters: one derived using the target gNB's range measurement and the other derived without it. Their displacement is summarized by a single vector, called the NLoS evidence vector (NEV), which is used to compute an NLoS likelihood score. Based on this score, two modes of NLoS detection are developed. First, each gNB is classified as LoS or NLoS, termed hard decision (HD), using a simple threshold test. Second, each gNB's NLoS confidence is probabilistically quantified, termed soft decision (SD), which extends HD with weak site-survey priors, namely empirical NLoS-score samples and the average NLoS probability. We then design positioning algorithms tailored to these two modes by excluding gNBs deemed NLoS and re-weighting the remaining gNBs for SD. The proposed CDA-ND achieves high reliability in indoor factory environments under frequency range 1, attaining NLoS detection accuracies of 96.6% and 91.1% when the proportion of NLoS gNBs is approximately 18% and 56%, respectively. As a result, integrating CDA-ND into positioning significantly reduces mean absolute error by 20.04% and 65.99% in LoS- and NLoS-dominant environments, respectively.

2026-03-09T03:56:16Z 16 pages, 10 figures, 4 tables. An earlier version of this work will be presented in part at IEEE Wireless Communications and Networking Conference (WCNC) 2026 Sang-Hyeok Kim Inha University, South Korea Seung Min Yu Korea Railroad Research Institute, South Korea Jihong Park Singapore University of Technology and Design, Singapore Seung-Woo Ko Inha University, South Korea http://arxiv.org/abs/2511.01743v2 Towards Efficient Federated Learning of Networked Mixture-of-Experts for Mobile Edge Computing 2026-03-08T18:44:31Z

Recent advancements in large artificial intelligence models (LAMs) are driving significant innovations in mobile edge computing within next-generation wireless networks. However, the substantial demands for computational resources and larges-cale training data required to train LAMs conflict with the limited storage and computational capacity of edge devices, posing significant challenges to training and deploying LAMs at the edge. In this work, we introduce the Networked Mixture-of-Experts (NMoE) system, in which clients perform inference collaboratively by distributing tasks to suitable neighbors based on their expertise and aggregate the returned results. For training the NMoE, we propose a federated learning framework that integrates both supervised and self-supervised learning to balance personalization and generalization, while preserving communication efficiency and data privacy. We conduct extensive experiments to demonstrate the efficacy of the proposed NMoE system, providing insights for the NMoE training algorithms.

2025-11-03T16:54:06Z Song Gao Songyang Zhang Shusen Jing Shuai Zhang Xiangwei Zhou Yue Wang Zhipeng Cai http://arxiv.org/abs/2603.07750v1 Structured Gossip: A Partition-Resilient DNS for Internet-Scale Dynamic Networks 2026-03-08T17:54:36Z

Network partitions pose fundamental challenges to distributed name resolution in mobile ad-hoc networks (MANETs) and edge computing. Existing solutions either require active coordination that fails to scale, or use unstructured gossip with excessive overhead. We present \textit{Structured Gossip DNS}, exploiting DHT finger tables to achieve partition resilience through \textbf{passive stabilization}. Our approach reduces message complexity from $O(n)$ to $O(n/\log n)$ while maintaining $O(\log^2 n)$ convergence. Unlike active protocols requiring synchronous agreement, our passive approach guarantees eventual consistency through commutative operations that converge regardless of message ordering. The system handles arbitrary concurrent partitions via version vectors, eliminating global coordination and enabling billion-node deployments.

2026-03-08T17:54:36Z Rejected from ACM SIGMOD 2026 Demo Track Priyanka Sinha Dilys Thomas http://arxiv.org/abs/2309.16680v3 Toward 6G Sidelink Reliability: MAC PRR Modeling for NR Mode 2 SPS and ns-3 Validation 2026-03-08T13:19:37Z

5G New Radio (NR) Sidelink (SL) Mode 2 has enabled decentralized, infrastructure-less direct communications which is evolving to serve reliability-critical services in 6G SL. Particularly, the channel access in NR SL Mode 2 relies on the Sensing-based Semi-Persistent Scheduling (SPS) whose key features significantly influence the packet reception ratio (PRR). While SPS has been widely studied, existing analytical models typically abstract or omit several NR-specific SPS features that are standardized in the 3rd Generation Partnership Project (3GPP), limiting their ability to explain how SPS parameters shape MAC collision dynamics and PRR. This paper develops an analytical MAC-layer PRR model for broadcast NR SL mode 2 by explicitly modeling SPS-driven MAC collision events. The model captures (i) Collisions caused by simultaneous resource reselection and (ii) Persistent collisions induced by resource keeping across resource reservation intervals (RRIs). Based on the event-level characterization, we derive closed-form expressions for the steady-state MAC collision probability and PRR. We further extend the analysis to incorporate under-explored SPS features, including the duplicate transmissions per RRI and the minimum resource-availability requirement for reselection, and quantify their impact on PRR in under-saturated regimes. The analytical results are validated using ns-3 simulations based on the 5G-LENA framework, showing close agreement under under-saturation and revealing deviations as the system approaches saturation. The proposed model provides mechanistic insight and design guidance of tuning the SPS parameters to improve 6G SL reliability.

2023-07-27T03:48:52Z This work has been submitted to the IEEE for possible publication. 29 pages, 22 figures Liu Cao Zhaoyu Liu Lyutianyang Zhang http://arxiv.org/abs/2409.20306v2 Diagnosing and Repairing Distributed Routing Configurations Using Selective Symbolic Simulation 2026-03-08T10:38:41Z

Although substantial progress has been made in automatically verifying whether distributed routing configurations conform to certain requirements, diagnosing and repairing configuration errors remains manual and time-consuming. To fill this gap, we propose S^2Sim, a novel system for automatic routing configuration diagnosis and repair. Our key insight is that by selectively simulating variants of the given configuration in a symbolic way, we can find an intent-compliant variant, whose differences between the given configuration reveal the errors in the given configuration and suggest the patches. Building on this insight, we also design techniques to support complex scenarios (e.g., multiple protocol networks) and requirements (e.g., k-link failure tolerance). We implement a prototype of S^2Sim and evaluate its performance using networks of size O(10) ~ O(1000) with synthetic real-world configurations. Results show that S^2Sim diagnoses and repairs errors for 1) all WAN configurations within 10 s and 2) all DCN configurations within 20 minutes.

2024-09-30T14:05:50Z 13pages, accepted by NSDI'26 Rulan Yang Gao Han Hanyang Shao Xiaoqiang Zheng Xing Fang Ziyi Wang Lizhao You Ruiting Zhou Linghe Kong Ennan Zhai Qiao Xiang Jiwu Shu http://arxiv.org/abs/2603.07560v1 Learning the APT Kill Chain: Temporal Reasoning over Provenance Data for Attack Stage Estimation 2026-03-08T09:48:37Z

Advanced Persistent Threats (APTs) evolve through multiple stages, each exhibiting distinct temporal and structural behaviors. Accurate stage estimation is critical for enabling adaptive cyber defense. This paper presents StageFinder, a temporal graph learning framework for multi-stage attack progression inference from fused host and network provenance data. Provenance graphs are encoded using a graph neural network to capture structural dependencies among processes, files, and connections, while a long short-term memory (LSTM) model learns temporal dynamics to estimate stage probabilities aligned with the MITRE ATT&CK framework. The model is pretrained on the DARPA OpTC dataset and fine-tuned on labeled DARPA Transparent Computing data. Experimental results demonstrate that StageFinder achieves a macro F1-score of 0.96 and reduces prediction volatility by 31 percent compared to state-of-the-art baselines (Cyberian, NetGuardian). These results highlight the effectiveness of fused provenance and temporal learning for accurate and stable APT stage inference.

2026-03-08T09:48:37Z Trung V. Phan Thomas Bauschert http://arxiv.org/abs/2603.07408v1 Toward Real-Time Mirrors Intelligence: System-Level Latency and Computation Evaluation in Internet of Mirrors (IoM) 2026-03-08T01:40:45Z

The Internet of Mirrors (IoM) is an emerging IoT ecosystem of interconnected smart mirrors designed to deliver personalised services across a three-tier node hierarchy spanning consumer, professional, and hub nodes. Determining where computation should reside within this hierarchy is a critical design challenge, as placement decisions directly affect end-to-end latency, resource utilisation, and user experience. This paper presents the first physical IoM testbed study, evaluating four computational placement strategies across the IoM tier hierarchy under real Wi-Fi and 5G network conditions. Results show that offloading classification to higher-tier nodes substantially reduces latency and consumer resource load, but introduces network overhead that scales with payload size and hop count. No single strategy is universally optimal: the best choice depends on available network, node proximity, and concurrent user load. These findings empirically characterise the computation-communication trade-off space of the IoM and motivate the need for intelligent, adaptive task placement responsive to application requirements and live ecosystem conditions.

2026-03-08T01:40:45Z 6 pages, 6 figures, conference Haneen Fatima Muhammad Ali Imran Ahmad Taha Lina Mohjazi http://arxiv.org/abs/2508.08380v2 Experimental Validation of Provably Covert Communication Using Software-Defined Radio 2026-03-08T00:26:30Z

The fundamental information-theoretic limits of covert, or low probability of detection/intercept (LPD/LPI), communication have been extensively studied for over a decade, resulting in the square root law (SRL): only $L\sqrt{n}$ covert bits can be reliably transmitted over time-bandwidth product $n$, for constant $L>0$. Transmitting more either results in detection or decoding errors. The SRL imposes significant constraints on hardware realization of mathematically-guaranteed covert communication. Indeed, they preclude using standard link maintenance operations that are taken for granted in non-covert communication. Thus, experimental validation of covert communication is underexplored: to date, only two experimental studies of SRL-based covert communication are available, both focusing on optical channels. Here, we report a demonstration of provably-secure covert radio-frequency (RF) communication using software-defined radios (SDRs). This validates theoretical predictions, opens practical avenues for implementing covert communication systems, and raises further research questions.

2025-08-11T18:06:16Z Rohan Bali Trevor E. Bailey Michael S. Bullock Boulat A. Bash http://arxiv.org/abs/2603.07373v1 Scheduling Parallel Optical Circuit Switches for AI Training 2026-03-07T22:57:12Z

The rapid growth of AI training has dramatically increased datacenter traffic demand and energy consumption, which has motivated renewed interest in optical circuit switches (OCSes) as a high-bandwidth, energy-efficient alternative for AI fabrics. Deploying multiple parallel OCSes is a leading alternative. However, efficiently scheduling time-varying traffic matrices across parallel optical switches with non-negligible reconfiguration delays remains an open challenge. We consider the problem of scheduling a single AI traffic demand matrix $D$ over $s$ parallel OCSes while minimizing the makespan under reconfiguration delay $δ$. Our algorithm Spectra relies on a three-step approach: Decompose $D$ into a minimal set of weighted permutations; Schedule these permutations across parallel switches using load-aware assignment; then Equalize the imbalanced loads on the switches via controlled permutation splitting. Evaluated on realistic AI training workloads (GPT model and Qwen MoE expert routing) as well as standard benchmarks, Spectra vastly outperforms a baseline based on state-of-the-art algorithms, reducing schedule makespan by an average factor of $1.4\times$ on GPT AI workloads, $1.9\times$ on MoE AI workloads, and $2.4\times$ on standard benchmarks. Further, the makespans achieved by Spectra consistently approach newly derived lower bounds.

2026-03-07T22:57:12Z Kevin Liang Litao Qiao Isaac Keslassy Bill Lin http://arxiv.org/abs/2603.07345v1 Uber's Failover Architecture: Reconciling Reliability and Efficiency in Hyperscale Microservice Infrastructure 2026-03-07T21:13:09Z

Operating a global, real-time platform at Uber's scale requires infrastructure that is both resilient and cost-efficient. Historically, reliability was ensured through a costly 2x capacity model--each service provisioned to handle global traffic independently across two regions--leaving half the fleet idle. We present Uber's Failover Architecture (UFA), which replaces the uniform 2x model with a differentiated architecture aligned to business criticality. Critical services retain failover guarantees, while non-critical services opportunistically use failover buffer capacity reserved for critical services during steady state. During rare "full-peak" failovers, non-critical services are selectively preempted and rapidly restored, with differentiated Service-Level Agreements (SLAs) using on-demand capacity. Automated safeguards, including dependency analysis and regression gates, ensure critical services continue to function even while non-critical services are unavailable. The quantitative impact is significant: UFA reduces steady-state provisioning from 2x to 1.3x, raising utilization from ~20% to ~30% while sustaining 99.97% availability. To date, UFA has hardened over 4,000 unsafe dependencies, eliminated over one million CPU cores from a baseline of about four million cores.

2026-03-07T21:13:09Z Mayank Bansal Milind Chabbi Kenneth Bogh Srikanth Prodduturi Kevin Xu Amit Kumar David Bell Ranjib Dey Yufei Ren Sachin Sharma Juan Marcano Shriniket Kale Subhav Pradhan Ivan Beschastnikh Miguel Covarrubias Chien-Chih Liao Sandeep Koushik Sheshadri Wen Luo Kai Song Ashish Samant Sahil Rihan Nimish Sheth Uday Kiran Medisetty http://arxiv.org/abs/2603.07338v1 A Lightweight Digital-Twin-Based Framework for Edge-Assisted Vehicle Tracking and Collision Prediction 2026-03-07T20:56:04Z

Vehicle tracking, motion estimation, and collision prediction are fundamental components of traffic safety and management in Intelligent Transportation Systems (ITS). Many recent approaches rely on computationally intensive prediction models, which limits their practical deployment on resource-constrained edge devices. This paper presents a lightweight digital-twin-based framework for vehicle tracking and spatiotemporal collision prediction that relies solely on object detection, without requiring complex trajectory prediction networks. The framework is implemented and evaluated in Quanser Interactive Labs (QLabs), a high-fidelity digital twin of an urban traffic environment that enables controlled and repeatable scenario generation. A YOLO-based detector is deployed on simulated edge cameras to localize vehicles and extract frame-level centroid trajectories. Offline path maps are constructed from multiple traversals and indexed using K-D trees to support efficient online association between detected vehicles and road segments. During runtime, consistent vehicle identifiers are maintained, vehicle speed and direction are estimated from the temporal evolution of path indices, and future positions are predicted accordingly. Potential collisions are identified by analyzing both spatial proximity and temporal overlap of predicted future trajectories. Our experimental results across diverse simulated urban scenarios show that the proposed framework predicts approximately 88% of collision events prior to occurrence while maintaining low computational overhead suitable for edge deployment. Rather than introducing a computationally intensive prediction model, this work introduces a lightweight digital-twin-based solution for vehicle tracking and collision prediction, tailored for real-time edge deployment in ITS.

2026-03-07T20:56:04Z 6 pages, 2 figures, IEEE ICC 2026 Workshops (under submission) Murat Arda Onsu Poonam Lohan Burak Kantarci Aisha Syed Matthew Andrews Sean Kennedy