https://arxiv.org/api/80Jd6MbTkX3gHqN1s8tfzcCP4h42026-06-21T16:18:10Z137916515http://arxiv.org/abs/2601.21011v1Meta-ROS: A Next-Generation Middleware Architecture for Adaptive and Scalable Robotic Systems2026-01-28T20:06:30ZThe field of robotics faces significant challenges related to the complexity and interoperability of existing middleware frameworks, like ROS2, which can be difficult for new developers to adopt. To address these issues, we propose Meta-ROS, a novel middleware solution designed to streamline robotics development by simplifying integration, enhancing performance, and ensuring cross-platform compatibility. Meta-ROS leverages modern communication protocols, such as Zenoh and ZeroMQ, to enable efficient and low-latency communication across diverse hardware platforms, while also supporting various data types like audio, images, and video. We evaluated Meta-ROS's performance through comprehensive testing, comparing it with existing middleware frameworks like ROS1 and ROS2. The results demonstrated that Meta-ROS outperforms ROS2, achieving up to 30% higher throughput, significantly reducing message latency, and optimizing resource usage. Additionally, its robust hardware support and developer-centric design facilitate seamless integration and ease of use, positioning Meta-ROS as an ideal solution for modern, real-time robotics AI applications.2026-01-28T20:06:30ZCheckout the Python Library - https://pypi.org/project/metaros/ To be Submitted in ACM Transactions on Autonomous and Adaptive Systems (TAAS) JournalAnshul RanjanAnoosh DamodarNeha ChouguleDhruva S NayakAnantharaman P. NShylaja S Shttp://arxiv.org/abs/2601.20629v1/dev/SDB: Software Defined Boot -- A novel standard for diskless booting anywhere and everywhere2026-01-28T14:07:18ZA computer is nothing but a device that processes the instructions supplied to it. However, as computers evolved, the instructions or codes started to be more complicated. As computers started to be used by non-technical people, it became imperative that the users be able to use the machine without having underlying knowledge of the code or the hardware. And operating system became the backbone for translating the inputs from the user to actual operation on the hardware. With the increasing complexity and the choices of operating system, it became clear that different groups of people, especially in an enterprise scenario, required different operating systems. Installing them all on a single machine, for shared computers became a difficult task, giving rise to network-based booting. But network-based booting was confined to only wired connectivity, keeping it restricted to very small geographical areas. The proposed system, /dev/SDB, is aimed at creating a standard where any user, anyone on the globe, can access the operating system authorized to them without having to be on the corporate network. It aims to offer the same over Wi-Fi as well as cellular connectivity, ensuring employees can truly work from anywhere, while following the policies for operating systems and without redundant hardware.2026-01-28T14:07:18ZAditya MitraHamza HaroonAmaan Rais ShahMohammad Elham RasooliBogdan Itsam Dorantes NikolaevTuğçe Ballıhttp://arxiv.org/abs/2601.20435v1Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process Workloads2026-01-28T09:46:46ZThe convergence of high-performance computing (HPC) and artificial intelligence (AI) is driving the emergence of increasingly complex parallel applications and workloads. These workloads often combine multiple parallel runtimes within the same application or across co-located jobs, creating scheduling demands that place significant stress on traditional OS schedulers. When oversubscribed (there are more ready threads than cores), OS schedulers rely on periodic preemptions to multiplex cores, often introducing interference that may degrade performance. In this paper, we present: (1) The User-space Scheduling Framework (USF), a novel seamless process scheduling framework completely implemented in user-space. USF enables users to implement their own process scheduling algorithms without requiring special permissions. We evaluate USF with its default cooperative policy, (2) SCHED_COOP, designed to reduce interference by switching threads only upon blocking. This approach mitigates well-known issues such as Lock-Holder Preemption (LHP), Lock-Waiter Preemption (LWP), and scalability collapse. We implement USF and SCHED_COOP by extending the GNU C library with the nOS-V runtime, enabling seamless coordination across multiple runtimes (e.g., OpenMP) without requiring invasive application changes. Evaluations show gains up to 2.4x in oversubscribed multi-process scenarios, including nested BLAS workloads, multi-process PyTorch inference with LLaMA-3, and Molecular Dynamics (MD) simulations.2026-01-28T09:46:46ZAleix RocaVicenç Beltran10.1145/3774934.3786451http://arxiv.org/abs/2601.07600v2Peformance Isolation for Inference Processes in Edge GPU Systems2026-01-27T08:40:12ZThis work analyzes the main isolation mechanisms available in modern NVIDIA GPUs: MPS, MIG, and the recent Green Contexts, to ensure predictable inference time in safety-critical applications using deep learning models. The experimental methodology includes performance tests, evaluation of partitioning impact, and analysis of temporal isolation between processes, considering both the NVIDIA A100 and Jetson Orin platforms. It is observed that MIG provides a high level of isolation. At the same time, Green Contexts represent a promising alternative for edge devices by enabling fine-grained SM allocation with low overhead, albeit without memory isolation. The study also identifies current limitations and outlines potential research directions to improve temporal predictability in shared GPUs.2026-01-12T14:49:52ZJuan José MartínJosé FlichCarles Hernándezhttp://arxiv.org/abs/2601.18216v1Rhea: Detecting Privilege-Escalated Evasive Ransomware Attacks Using Format-Aware Validation in the Cloud2026-01-26T07:05:09ZRansomware variants increasingly combine privilege escalation with sophisticated evasion strategies such as intermittent encryption, low-entropy encryption, and imitation attacks. Such powerful ransomware variants, privilege-escalated evasive ransomware (PEER), can defeat existing solutions relying on I/O-pattern analysis by tampering with or obfuscating I/O traces. Meanwhile, conventional statistical content-based detection becomes unreliable as the encryption size decreases due to sampling noises. We present Rhea, a cloud-offloaded ransomware defense system that analyzes replicated data snapshots, so-called mutation snapshots. Rhea introduces Format-Aware Validation that validates the syntactic and semantic correctness of file formats, instead of relying on statistical or entropy-based indicators. By leveraging file-format specifications as detection invariants, Rhea can reliably identify fine-grained and evasive encryption even under elevated attacker privileges. Our evaluation demonstrates that Rhea significantly outperforms existing approaches, establishing its practical effectiveness against modern ransomware threats.2026-01-26T07:05:09Z12 pages, 6 figures, under review (Jan 2026)Beom Heyn KimSeok Min HongMohammad Mannanhttp://arxiv.org/abs/2601.16032v2Sawtooth Wavefront Reordering: Enhanced CuTile FlashAttention on NVIDIA GB102026-01-26T02:45:04ZHigh-performance attention kernels are essential for Large Language Models. This paper presents analysis of CuTile-based Flash Attention memory behavior and a technique to improve its cache performance. In particular, our analysis on the NVIDIA GB10 (Grace Blackwell) identifies the main cause of L2 cache miss. Leveraging this insight, we introduce a new programming technique called Sawtooth Wavefront Reordering that reduces L2 misses. We validate it in both CUDA and CuTile, observing 50\% or greater reduction in L2 misses and up to 60\% increase in throughput on GB10.2026-01-22T15:05:31ZYifan ZhuYekai PanChen Dinghttp://arxiv.org/abs/2601.17944v1Credit Fairness: Online Fairness In Shared Resource Pools2026-01-25T18:44:24ZWe consider a setting in which a group of agents share resources that must be allocated among them in each discrete time period. Agents have time-varying demands and derive constant marginal utility from each unit of resource received up to their demand, with zero utility for any additional resources. In this setting, it is known that independently maximizing the minimum utility in each round satisfies sharing incentives (agents weakly prefer participating in the mechanism to not participating), strategyproofness (agents have no incentive to misreport their demands), and Pareto efficiency (Freeman et al. 2018). However, recent work (Vuppalapati et al. 2023) has shown that this max-min mechanism can lead to large disparities in the total resources received by agents, even when they have the same average demand. In this paper, we introduce credit fairness, a strengthening of sharing incentives that ensures agents who lend resources in early rounds are able to recoup them in later rounds. Credit fairness can be achieved in conjunction with either Pareto efficiency or strategyproofness, but not both. We propose a mechanism that is credit fair and Pareto efficient, and we evaluate its performance in a computational resource-sharing setting.2026-01-25T18:44:24ZSeyed Majid ZahediRupert Freemanhttp://arxiv.org/abs/2601.16935v1AERO: Adaptive and Efficient Runtime-Aware OTA Updates for Energy-Harvesting IoT2026-01-23T17:49:36ZEnergy-harvesting (EH) Internet of Things (IoT) devices operate under intermittent energy availability, which disrupts task execution and makes energy-intensive over-the-air (OTA) updates particularly challenging. Conventional OTA update mechanisms rely on reboots and incur significant overhead, rendering them unsuitable for intermittently powered systems. Recent live OTA update techniques reduce reboot overhead but still lack mechanisms to ensure consistency when updates interact with runtime execution. This paper presents AERO, an Adaptive and Efficient Runtime-Aware OTA update mechanism that integrates update tasks into the device's Directed Acyclic Graph (DAG) and schedules them alongside routine tasks under energy and timing constraints. By identifying update-affected execution regions and dynamically adjusting dependencies, AERO ensures consistent up date integration while adapting to intermittent energy availability. Experiments on representative workloads demonstrate improved update reliability and efficiency compared to existing live update approaches.2026-01-23T17:49:36ZAccepted at DATE 2026Wei WeiJingye XuSahidul IslamDakai ZhuChen PanMimi Xiehttp://arxiv.org/abs/2601.05072v3DAVOS: An Autonomous Vehicle Operating System in the Vehicle Computing Era2026-01-23T16:11:28ZVehicle computing represents a fundamental shift in how autonomous vehicles are designed and deployed, transforming them from isolated transportation systems into mobile computing platforms that support both safety-critical, real-time driving and data-centric services. In this setting, vehicles simultaneously support real-time driving pipelines and a growing set of data-driven applications, placing increased responsibility on the vehicle operating system to coordinate computation, data movement, storage, and access. These demands highlight recurring system considerations related to predictable execution, data and execution protection, efficient handling of high-rate sensor data, and long-term system evolvability, commonly summarized as Safety, Security, Efficiency, and Extensibility (SSEE). Existing vehicle operating systems and runtimes address these concerns in isolation, resulting in fragmented software stacks that limit coordination between autonomy workloads and vehicle data services. This paper presents DAVOS, the Dependable Autonomous Vehicle Operating System, a unified vehicle operating system architecture designed for the vehicle computing context. DAVOS provides a cohesive operating system foundation that supports both real-time autonomy and extensible vehicle computing within a single system framework.2026-01-08T16:17:48ZYuxin WangYuankai HeBoyang TianLichen XianWeisong Shihttp://arxiv.org/abs/2601.15084v2DeLog: An Efficient Log Compression Framework with Pattern Signature Synthesis2026-01-22T11:59:24ZParser-based log compression, which separates static templates from dynamic variables, is a promising approach to exploit the unique structure of log data. However, its performance on complex production logs is often unsatisfactory. This performance gap coincides with a known degradation in the accuracy of its core log parsing component on such data, motivating our investigation into a foundational yet unverified question: does higher parsing accuracy necessarily lead to better compression ratio?
To answer this, we conduct the first empirical study quantifying this relationship and find that a higher parsing accuracy does not guarantee a better compression ratio. Instead, our findings reveal that compression ratio is dictated by achieving effective pattern-based grouping and encoding, i.e., the partitioning of tokens into low entropy, highly compressible groups.
Guided by this insight, we design DeLog, a novel log compressor that implements a Pattern Signature Synthesis mechanism to achieve efficient pattern-based grouping. On 16 public and 10 production datasets, DeLog achieves state-of-the-art compression ratio and speed.2026-01-21T15:26:09Z23 pages, 11 figuresSiyu YuYifan WuJunjielong XuYing FuNing WangMaoyin LiuPancheng JiangXiang ZhangTong JiaPinjia HeYing Lihttp://arxiv.org/abs/2601.14555v1WebAssembly Based Portable and Secure Sensor Interface for Internet of Things2026-01-21T00:36:58ZAs the expansion of IoT connectivity continues to provide quality-of-life improvements around the world, they simultaneously introduce increasing privacy and security concerns. The lack of a clear definition in managing shared and protected access to IoT sensors offer channels by which devices can be compromised and sensitive data can be leaked. In recent years, WebAssembly has received considerable attention for its efficient application sandboxing suitable for embedded systems, making it a prime candidate for exploring a secure and portable sensor interface. This paper introduces the first WebAssembly System Interface (WASI) extension offering a secure, portable, and low-footprint sandbox enabling multi-tenant access to sensor data across heterogeneous embedded devices. The runtime extensions provide application memory isolation, ensure appropriate resource privileges by intercepting sensor access, and offer an MQTT-SN interface enabling in-network access control. When targeting the WebAssembly byte-code with the associated runtime extensions implemented atop the Zephyr RTOS, our evaluation of sensor access indicates a latency overhead of 6% with an additional memory footprint of 5% when compared to native execution. As MQTT-SN requests are dominated by network delays, the WASI-SN implementation of MQTT-SN introduces less than 1% additional latency with similar memory footprint.2026-01-21T00:36:58ZBotong OuBaijian Yanghttp://arxiv.org/abs/2601.14129v1"Range as a Key" is the Key! Fast and Compact Cloud Block Store Index with RASK2026-01-20T16:26:00ZIn cloud block store, indexing is on the critical path of I/O operations and typically resides in memory. With the scaling of users and the emergence of denser storage media, the index has become a primary memory consumer, causing memory strain. Our extensive analysis of production traces reveals that write requests exhibit a strong tendency to target continuous block ranges in cloud storage systems. Thus, compared to current per-block indexing, our insight is that we should directly index block ranges (i.e., range-as-a-key) to save memory.
In this paper, we propose RASK, a memory-efficient and high-performance tree-structured index that natively indexes ranges. While range-as-a-key offers the potential to save memory and improve performance, realizing this idea is challenging due to the range overlap and range fragmentation issues. To handle range overlap efficiently, RASK introduces the log-structured leaf, combined with range-tailored search and garbage collection. To reduce range fragmentation, RASK employs range-aware split and merge mechanisms. Our evaluations on four production traces show that RASK reduces memory footprint by up to 98.9% and increases throughput by up to 31.0x compared to ten state-of-the-art indexes.2026-01-20T16:26:00ZHaoru ZhaoMingkai DongErci XuZhongyu WangHaibo Chenhttp://arxiv.org/abs/2601.14021v1OAMAC: Origin-Aware Mandatory Access Control for Practical Post-Compromise Attack Surface Reduction2026-01-20T14:40:26ZModern operating systems provide powerful mandatory access control mechanisms, yet they largely reason about who executes code rather than how execution originates. As a result, processes launched remotely, locally, or by background services are often treated equivalently once privileges are obtained, complicating security reasoning and enabling post-compromise abuse of sensitive system interfaces. We introduce origin-aware mandatory access control (OAMAC), a kernel-level enforcement model that treats execution origin -- such as physical user presence, remote access, or service execution -- as a first-class security attribute. OAMAC mediates access to security-critical subsystems based on execution provenance rather than identity alone, enabling centralized governance over multiple attack surfaces while significantly reducing policy complexity. We present a deployable prototype implemented entirely using the Linux eBPF LSM framework, requiring no kernel modifications. OAMAC classifies execution origin using kernel-visible metadata, propagates origin across process creation, and enforces origin-aware policies on both sensitive filesystem interfaces and the kernel BPF control plane. Policies are maintained in kernel-resident eBPF maps and can be reconfigured at runtime via a minimal userspace tool. Our evaluation demonstrates that OAMAC effectively restricts common post-compromise actions available to remote attackers while preserving normal local administration and system stability. We argue that execution origin represents a missing abstraction in contemporary operating system security models, and that elevating it to a first-class concept enables practical attack surface reduction without requiring subsystem-specific expertise or heavyweight security frameworks.2026-01-20T14:40:26ZOmer Abdelmajeed Idris MohammedIlhami M. Orakhttp://arxiv.org/abs/2601.09258v2LatencyPrism: Online Non-intrusive Latency Sculpting for SLO-Guaranteed LLM Inference2026-01-20T07:29:49ZLLM inference latency critically determines user experience and operational costs, directly impacting throughput under SLO constraints. Even brief latency spikes degrade service quality despite acceptable average performance. However, distributed inference environments featuring diverse software frameworks and XPU architectures combined with dynamic workloads make latency analysis challenging. Constrained by intrusive designs that necessitate service restarts or even suspension, and by hardware-bound implementations that fail to adapt to heterogeneous inference environments, existing AI profiling methods are often inadequate for real-time production analysis.
We present LatencyPrism, the first zero-intrusion multi-platform latency sculpting system. It aims to break down the inference latency across pipeline, proactively alert on inference latency anomalies, and guarantee adherence to SLOs, all without requiring code modifications or service restarts. LatencyPrism has been deployed across thousands of XPUs for over six months. It enables low-overhead real-time monitoring at batch level with alerts triggered in milliseconds. This approach distinguishes between workload-driven latency variations and anomalies indicating underlying issues with an F1-score of 0.98. We also conduct extensive experiments and investigations into root cause analysis to demonstrate LatencyPrism's capability. Furthermore, we introduce the first LLM anomaly simulation toolkit to facilitate future research in robust and predictable inference systems.2026-01-14T07:46:59Z13 pages, 6 figuresYin DuJiayi RenXiayu SunTianyao ZhouHaizhu ZhouRuiyan MaDanyang Zhanghttp://arxiv.org/abs/2601.13631v1ContiguousKV: Accelerating LLM Prefill with Granularity-Aligned KV Cache Management2026-01-20T05:58:19ZEfficiently serving Large Language Models (LLMs) with persistent Prefix Key-Value (KV) Cache is critical for applications like conversational search and multi-turn dialogue. Serving a request requires loading the pre-computed prefix KV cache and generating the first token, defined as the Re-Prefill Phase. Offloading this shared prefix cache to secondary storage is essential for memory scalability. Re-Prefill with offloading suffers from severe I/O bottlenecks in two aspects. First, semantic-aware KV cache pruning algorithms select important tokens in fine granularity, while systems manage I/O in coarse, fixed-size blocks, causing severe read amplification. Second, the sequential dependency between identifying important tokens and loading KV cache creates idle I/O and compute bubbles, under-utilizing system resources.
This paper proposes \textit{ContiguousKV}, a high-performance prefix KV cache offloading system that bridges algorithmic semantics with I/O efficiency to accelerate the Re-Prefill phase. We first introduce \textit{ContiguousChunk}, a unified data management granularity that aligns KV cache pruning with I/O operations. All the mechanisms critical for I/O performance are performed at the granularity of ContiguousChunk, thereby eliminating read amplification. By exploiting the high similarity in important ContiguousChunk indices across layers, we propose intra- and inter-period asynchronous prefetching to break the sequential dependency between I/O and compute, effectively eliminating idle bubbles. Finally, we propose attention-guided cache management to retain semantically critical prefix data in memory. Evaluations on Qwen2.5 series models show that ContiguousKV achieves a 3.85x speedup in the Re-Prefill phase over the state-of-the-art offloading system IMPRESS, while maintaining high output quality.2026-01-20T05:58:19ZJing ZouShangyu WuHancong DuanQiao LiChun Jason Xue