https://arxiv.org/api/950iZ9KK+ZdQ28J2Imxwq1B1QPs2026-06-21T17:27:42Z137918015http://arxiv.org/abs/2502.13163v3A Survey of Fuzzing Open-Source Operating Systems2026-01-19T18:59:22ZVulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection technique in various domains, OS fuzzing (OSF) faces unique challenges due to OS complexity and multi-layered interaction, and has not been comprehensively reviewed. Therefore, this work systematically surveys the state-of-the-art OSF techniques, categorizes them based on the general fuzzing process, and investigates challenges specific to kernel, file system, driver, and hypervisor fuzzing. Finally, future research directions for OSF are discussed.2025-02-17T02:53:02Z35 pagesKun HuQicai ChenWenzhuo ZhangZilong LuBihuan ChenYou LuHaowen JiangBingkun SunXin PengWenyun Zhaohttp://arxiv.org/abs/2601.11743v1Nixie: Efficient, Transparent Temporal Multiplexing for Consumer GPUs2026-01-16T19:52:19ZConsumer machines are increasingly running large ML workloads such as large language models (LLMs), text-to-image generation, and interactive image editing. Unlike datacenter GPUs, consumer GPUs serve single-user, rapidly changing workloads, and each model's working set often nearly fills the GPU memory. As a result, existing sharing mechanisms (e.g., NVIDIA Unified Virtual Memory) perform poorly due to memory thrashing and excessive use of CPU pinned memory when multiple applications are active.
We design and implement Nixie, a system that enables efficient and transparent temporal multiplexing on consumer GPUs without requiring any application or driver changes. Nixie is a system service that coordinates GPU memory allocation and kernel launch behavior to efficiently utilize the CPU-GPU bi-directional bandwidth and CPU pinned memory. A lightweight scheduler in Nixie further improves responsiveness by automatically prioritizing latency-sensitive interactive jobs using MLFQ-inspired techniques. Our evaluations show that Nixie improves latency of real interactive code-completion tasks by up to $3.8\times$ and saves up to 66.8% CPU pinned memory usage given the same latency requirement.2026-01-16T19:52:19ZYechen XuYifei WangNathanael RenYiran ChenDanyang Zhuohttp://arxiv.org/abs/2509.00105v2AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving2026-01-15T21:12:02ZLarge language model (LLM) applications often reuse previously processed context, such as chat history and documents, which introduces significant redundant computation. Existing LLM serving systems address such redundant computation by storing the KV caches of processed context and loading the corresponding KV cache when a new request reuses the context. Further, as these LLM applications scale, the total size of KV caches becomes excessively large and requires both DRAM and SSD for full storage.
However, prior work that stores KV caches in DRAM and SSD suffers from high loading delays, as most KV cache hits come from SSD, which is slow to load. To increase the KV cache hit rate on DRAM, we identify lossy KV cache compression as a promising approach. We design a lossy compression system that decides the compression algorithm, compression rate and device placement for each KV cache entry to maximise DRAM hits and minimise loading delay without significantly degrading generation quality. Compared to various static compression baselines across three tasks, our system AdaptCache achieves 1.43--2.4 x delay savings at the same quality and 6--55% quality improvements at the same delay.2025-08-28T00:46:51ZAccepted at SOSP 2025 - The International Workshop on Big Memory (BigMem)Shaoting FengHanchen LiKuntai DuZhuohan GuYuhan LiuJiayi YaoSiddhant RaySamuel ShenYihua ChengGanesh AnanthanarayananJunchen Jianghttp://arxiv.org/abs/2509.22256v4Secure and Efficient Access Control for Computer-Use Agents via Context Space2026-01-14T06:07:30ZLarge language model (LLM)-based computer-use agents represent a convergence of AI and OS capabilities, enabling natural language to control system- and application-level functions. However, due to LLMs' inherent uncertainty issues, granting agents control over computers poses significant security risks. When agent actions deviate from user intentions, they can cause irreversible consequences. Existing mitigation approaches, such as user confirmation and LLM-based dynamic action validation, still suffer from limitations in usability, security, and performance. To address these challenges, we propose CSAgent, a system-level, static policy-based access control framework for computer-use agents. To bridge the gap between static policy and dynamic context and user intent, CSAgent introduces intent- and context-aware policies, and provides an automated toolchain to assist developers in constructing and refining them. CSAgent enforces these policies through an optimized OS service, ensuring that agent actions can only be executed under specific user intents and contexts. CSAgent supports protecting agents that control computers through diverse interfaces, including API, CLI, and GUI. We implement and evaluate CSAgent, which successfully defends against all attacks in the benchmarks while introducing only 1.99% performance overhead and 5.42% utility decrease.2025-09-26T12:19:27ZHaochen GongChenxiao LiRui ChangWenbo Shenhttp://arxiv.org/abs/2601.06331v1Rethinking Inter-Process Communication with Memory Operation Offloading2026-01-09T22:08:15ZAs multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading, current IPC stacks lack a unified runtime model to coordinate them effectively.
This paper presents a unified IPC runtime suite that integrates both hardware- and software-based memory offloading into shared-memory communication. The system characterizes the interaction between offload strategies and IPC execution, including synchronization, cache visibility, and concurrency, and introduces multiple IPC modes that balance throughput, latency, and CPU efficiency.
Through asynchronous pipelining, selective cache injection, and hybrid coordination, the system turns offloading from a device-specific feature into a general system capability. Evaluations on real-world workloads show instruction count reductions of up to 22%, throughput improvements of up to 2.1x, and latency reductions of up to 72%, demonstrating that coordinated IPC offloading can deliver tangible end-to-end efficiency gains in modern data-intensive systems.2026-01-09T22:08:15ZPreprint. 12 pages, 15 figuresMisun ParkRichi DubeyYifan YuanNam Sung KimAda Gavrilovskahttp://arxiv.org/abs/2512.24637v2Towards Fully-fledged GPU Multitasking via Proactive Memory Scheduling2026-01-02T15:59:49ZThe limited HBM capacity has become the primary bottleneck for hosting an increasing number of larger-scale GPU tasks. While demand paging extends capacity via host DRAM, it incurs up to 78x slowdown due to the massive working sets and poor locality of GPU workloads. We observe, however, that GPU memory access patterns are inherently predictable via kernel launch arguments and their asynchronous execution nature. Leveraging this, we propose MSched, an OS-level scheduler that extends GPU context switching to include proactive working set preparation, thereby coalescing fragmented, eventual, and expensive page faults into a single efficient migration. MSched employs a template-based approach to predict working sets with near-perfect accuracy and proposes a co-design between task scheduler and memory manager to enforce a globally optimal page placement policy. Evaluation demonstrates that MSched outperforms demand paging by up to 11.05x for scientific and deep learning workloads, and 57.88x for LLM under memory oversubscription.2025-12-31T05:18:52ZWeihang ShenYinqiu ChenRong ChenHaibo Chenhttp://arxiv.org/abs/2601.00252v1Evolution of Android's Permission-based Security Model and Challenges2026-01-01T08:02:44ZAndroid Permission Model and Application (app) analysis has consistently remained the focus of the investigation of research groups and stakeholders of the Android ecosystem since it was launched in 2008. Even though the Android smartphone operating system (OS) permission model has evolved significantly from `all-or-none access' to `user-chosen dangerous resource access', specific challenges and issues remain unresolved even after 15 years after the smartphone OS launch. This study addresses the issues and documents the research work in this arena through a comprehensive literature survey and comparative analysis.
The survey's focal point is the Android permission model and relevant research between 2010-2022. We systematize the knowledge on (i) Android API Calls to permissions mapping, (ii) Android Permissions evolution, and (iii) how permissions are checked. Furthermore, the survey identifies the permission-related issues and relevant research addressed during the last decade. We reference seminal work in these areas. We summarize the identified research gaps and present future directions for early and experienced researchers.2026-01-01T08:02:44ZRajendra Kumar SolankiVijay LaxmiManoj Singh Gaurhttp://arxiv.org/abs/2410.08618v3SwitchFS: Asynchronous Metadata Updates for Distributed Filesystems with In-Network Coordination2025-12-30T14:15:18ZDistributed filesystem metadata updates are typically synchronous. This creates inherent challenges for access efficiency, load balancing, and directory contention, especially under dynamic and skewed workloads. This paper argues that synchronous updates are overly conservative. We propose SwitchFS with asynchronous metadata updates that allow operations to return early and defer directory updates until reads, both hiding latency and amortizing overhead. The key challenge lies in efficiently maintaining the synchronous POSIX semantics of metadata updates. To address this, SwitchFS is co-designed with a programmable switch, leveraging the limited on-switch resources to track directory states with negligible overhead. This allows SwitchFS to aggregate and apply delayed updates efficiently, using batching and consolidation before directory reads. Evaluation shows that SwitchFS achieves up to 13.34$\times$ and 3.85$\times$ higher throughput, and 61.6% and 57.3% lower latency than two state-of-the-art distributed filesystems, Emulated-InfiniFS and Emulated-CFS, respectively, under skewed workloads. For real-world workloads, SwitchFS improves end-to-end throughput by 21.1$\times$, 1.1$\times$, and 0.3$\times$ over CephFS, Emulated-InfiniFS, and Emulated-CFS, respectively.2024-10-11T08:33:58ZAccepted by EuroSys'26Jingwei XuMingkai DongQiulin TianZiyi TianTong XinHaibo Chenhttp://arxiv.org/abs/2512.23380v1A unified framework for detecting point and collective anomalies in operating system logs via collaborative transformers2025-12-29T11:18:34ZLog anomaly detection is crucial for preserving the security of operating systems. Depending on the source of log data collection, various information is recorded in logs that can be considered log modalities. In light of this intuition, unimodal methods often struggle by ignoring the different modalities of log data. Meanwhile, multimodal methods fail to handle the interactions between these modalities. Applying multimodal sentiment analysis to log anomaly detection, we propose CoLog, a framework that collaboratively encodes logs utilizing various modalities. CoLog utilizes collaborative transformers and multi-head impressed attention to learn interactions among several modalities, ensuring comprehensive anomaly detection. To handle the heterogeneity caused by these interactions, CoLog incorporates a modality adaptation layer, which adapts the representations from different log modalities. This methodology enables CoLog to learn nuanced patterns and dependencies within the data, enhancing its anomaly detection capabilities. Extensive experiments demonstrate CoLog's superiority over existing state-of-the-art methods. Furthermore, in detecting both point and collective anomalies, CoLog achieves a mean precision of 99.63%, a mean recall of 99.59%, and a mean F1 score of 99.61% across seven benchmark datasets for log-based anomaly detection. The comprehensive detection capabilities of CoLog make it highly suitable for cybersecurity, system monitoring, and operational efficiency. CoLog represents a significant advancement in log anomaly detection, providing a sophisticated and effective solution to point and collective anomaly detection through a unified framework and a solution to the complex challenges automatic log data analysis poses. We also provide the implementation of CoLog at https://github.com/NasirzadehMoh/CoLog.2025-12-29T11:18:34Z72 pages, 19 figures, 19 tables, accepted in scientific reports on 5 November 2025Scientific Reports 15, 45698 (2025)Mohammad NasirzadehJafar TahmoresnezhadParviz Rashidi-Khazaee10.1038/s41598-025-27693-4http://arxiv.org/abs/2512.21701v1LEFT-RS: A Lock-Free Fault-Tolerant Resource Sharing Protocol for Multicore Real-Time Systems2025-12-25T14:52:59ZEmerging real-time applications have driven the transition to multicore embedded systems, where tasks must share resources due to functional demands and limited availability. These resources, whether local or global, are protected within critical sections to prevent race conditions, with locking protocols ensuring both exclusive access and timing requirements. However, transient faults occurring within critical sections can disrupt execution and propagate errors across multiple tasks. Conventional locking protocols fail to address such faults, and integrating traditional fault tolerance techniques often increases blocking. Recent approaches improve fault recovery through parallel replica execution; however, challenges remain due to sequential accessing, coordination overhead, and susceptibility to common-mode faults. In this paper, we propose a Lock-frEe Fault-Tolerant Resource Sharing (LEFT-RS) protocol for multicore real-time systems. LEFT-RS allows tasks to concurrently access and read global resources while entering their critical sections in parallel. Each task can complete its access earlier upon successful execution if other tasks experience faults, thereby improving the efficiency of resource usage. Our design also limits the overhead and enhances fault resilience. We present a comprehensive worst-case response time analysis to ensure timing guarantees. Extensive evaluation results demonstrate that our method significantly outperforms existing approaches, achieving up to an 84.5% improvement in schedulability on average.2025-12-25T14:52:59ZAccepted by IEEE Real-Time Systems Symposium (RTSS 2025)Nan ChenXiaotian DaiTong ChengAlan BurnsIain BateShuai Zhaohttp://arxiv.org/abs/2512.20860v1pokiSEC: A Multi-Architecture, Containerized Ephemeral Malware Detonation Sandbox2025-12-24T00:38:40ZDynamic malware analysis requires executing untrusted binaries inside strongly isolated, rapidly resettable environments. In practice, many detonation workflows remain tied to heavyweight hypervisors or dedicated bare-metal labs, limiting portability and automation. This challenge has intensified with the adoption of ARM64 developer hardware (e.g., Apple Silicon), where common open-source sandbox recipes and pre-built environments frequently assume x86_64 hosts and do not translate cleanly across architectures. This paper presents pokiSEC, a lightweight, ephemeral malware detonation sandbox that packages the full virtualization and access stack inside a Docker container. pokiSEC integrates QEMU with hardware acceleration (KVM when available) and exposes a browser-based workflow that supports bring-your-own Windows disk images. The key contribution is a Universal Entrypoint that performs runtime host-architecture detection and selects validated hypervisor configurations (machine types, acceleration modes, and device profiles), enabling a single container image and codebase to launch Windows guests on both ARM64 and x86_64 hosts. We validate pokiSEC on Apple Silicon (ARM64) and Ubuntu (AMD64), demonstrating interactive performance suitable for analyst workflows and consistent teardown semantics via ephemeral container lifecycles.2025-12-24T00:38:40Z12 pagesAlejandro AvinaYashas HariprasadNaveen Kumar Chaudharyhttp://arxiv.org/abs/2512.12615v2gpu_ext: Extensible OS Policies for GPUs via eBPF2025-12-20T15:00:39ZPerformance in modern GPU-centric systems increasingly depends on resource management policies, including memory placement, scheduling, and observability. However, uniform policies typically yield suboptimal performance across diverse workloads. Existing approaches present a tradeoff: user-space runtimes provide programmability and flexibility but lack cross-tenant visibility and fine-grained control of hardware resources; meanwhile, modifications to the OS kernel introduce significant complexity and safety risks. To address this, we argue that the GPU driver and device layer should provide an extensible OS interface for policy enforcement. While the emerging eBPF technology shows potential, directly applying existing host-side eBPF is insufficient because they lack visibility and control into critical device-side events, and directly embedding policy code into GPU kernels could compromise safety and efficiency. We propose gpu_ext, an eBPF-based runtime that treats the GPU driver and device as a programmable OS subsystem. gpu_ext extends GPU drivers by exposing safe programmable hooks and introduces a device-side eBPF runtime capable of executing verified policy logic within GPU kernels, enabling coherent and transparent policies. Evaluation across realistic workloads including inference, training, and vector search demonstrates that gpu_ext improves throughput by up to 4.8x and reduces tail latency by up to 2x, incurring low overhead, without modifying or restarting applications2025-12-14T09:39:59ZYusheng ZhengTong YuYiwei YangMinghui JiangXiangyu GaoJianchang SuYanpeng HuWenan MaoWei ZhangDan WilliamsAndi Quinnhttp://arxiv.org/abs/2512.16238v2Trustworthy and Controllable Professional Knowledge Utilization in Large Language Models with TEE-GPU Execution2025-12-19T09:05:37ZFuture improvements in large language model (LLM) services increasingly hinge on access to high-value professional knowledge rather than more generic web data. However, the data providers of this knowledge face a skewed tradeoff between income and risk: they receive little share of downstream value yet retain copyright and privacy liability, making them reluctant to contribute their assets to LLM services. Existing techniques do not offer a trustworthy and controllable way to use professional knowledge, because they keep providers in the dark and combine knowledge parameters with the underlying LLM backbone.
In this paper, we present PKUS, the Professional Knowledge Utilization System, which treats professional knowledge as a first-class, separable artifact. PKUS keeps the backbone model on GPUs and encodes each provider's contribution as a compact adapter that executes only inside an attested Trusted Execution Environment (TEE). A hardware-rooted lifecycle protocol, adapter pruning, multi-provider aggregation, and split-execution scheduling together make this design practical at serving time. On SST-2, MNLI, and SQuAD with GPT-2 Large and Llama-3.2-1B, PKUS preserves model utility, matching the accuracy and F1 of full fine-tuning and plain LoRA, while achieving the lowest per-request latency with 8.1-11.9x speedup over CPU-only TEE inference and naive CPU-GPU co-execution.2025-12-18T06:33:24ZYifeng CaiZhida AnYuhan MengHouqian LiuPengli WangHanwen LeiYao GuoDing Lihttp://arxiv.org/abs/2512.14946v1EVICPRESS: Joint KV-Cache Compression and Eviction for Efficient LLM Serving2025-12-16T22:21:55ZReusing KV cache is essential for high efficiency of Large Language Model (LLM) inference systems. With more LLM users, the KV cache footprint can easily exceed GPU memory capacity, so prior work has proposed to either evict KV cache to lower-tier storage devices, or compress KV cache so that more KV cache can be fit in the fast memory. However, prior work misses an important opportunity: jointly optimizing the eviction and compression decisions across all KV caches to minimize average generation latency without hurting quality.
We propose EVICPRESS, a KV-cache management system that applies lossy compression and adaptive eviction to KV cache across multiple storage tiers. Specifically, for each KV cache of a context, EVICPRESS considers the effect of compression and eviction of the KV cache on the average generation quality and delay across all contexts as a whole. To achieve this, EVICPRESS proposes a unified utility function that quantifies the effect of quality and delay of the lossy compression or eviction. To this end, EVICPRESS's profiling module periodically updates the utility function scores on all possible eviction-compression configurations for all contexts and places KV caches using a fast heuristic to rearrange KV caches on all storage tiers, with the goal of maximizing the utility function scores on each storage tier. Compared to the baselines that evict KV cache or compress KV cache, EVICPRESS achieves higher KV-cache hit rates on fast devices, i.e., lower delay, while preserving high generation quality by applying conservative compression to contexts that are sensitive to compression errors. Evaluation on 12 datasets and 5 models demonstrates that EVICPRESS achieves up to 2.19x faster time-to-first-token (TTFT) at equivalent generation quality.2025-12-16T22:21:55ZShaoting FengYuhan LiuHanchen LiXiaokun ChenSamuel ShenKuntai DuZhuohan GuRui ZhangYuyang HuangYihua ChengJiayi YaoQizheng ZhangGanesh AnanthanarayananJunchen Jianghttp://arxiv.org/abs/2512.12530v1Principled Performance Tunability in Operating System Kernels2025-12-14T02:57:02ZThe Linux kernel source code contains numerous constant values that critically influence system performance. Many of these constants, which we term perf-consts, are magic numbers that encode brittle assumptions about hardware and workloads. As systems and workloads evolve, such constants often become suboptimal. Unfortunately, deployed kernels lack support for safe and efficient in-situ tuning of perf-consts without a long and disruptive process of rebuilding and redeploying the kernel image.
This paper advocates principled OS performance tunability. We present KernelX, a system that provides a safe, efficient, and programmable interface for in-situ tuning of arbitrary perf-consts on a running kernel. KernelX transforms any perf-const into a tunable knob on demand using a novel mechanism called Scoped Indirect Execution (SIE). SIE precisely identifies the binary boundaries where a perf-const influences system state and redirects execution to synthesized instructions that update the state as if new values were used. KernelX goes beyond version atomicity to guarantee side-effect safety, a property not provided by existing kernel update mechanisms. KernelX also provides a programmable interface that allows policies to incorporate application hints, hardware heuristics, and fine-grained isolation, without modifying kernel source code or disrupting deployed OS kernels.
Case studies across multiple kernel subsystems demonstrate that KernelX enables significant performance improvements by making previously untunable perf-consts safely tunable at runtime, while supporting millisecond-scale policy updates.2025-12-14T02:57:02Z12 pagesZhongjie ChenWentao ZhangYulong TangRan ShuFengyuan RenTianyin XuJing Liu