DynamicAdaptiveClimb: Adaptive Cache Replacement with Dynamic Resizing

2025-11-26T10:02:24Z

Efficient cache management is critical for optimizing the system performance, and numerous caching mechanisms have been proposed, each exploring various insertion and eviction strategies. In this paper, we present AdaptiveClimb and its extension, DynamicAdaptiveClimb, two novel cache replacement policies that leverage lightweight, cache adaptation to outperform traditional approaches. Unlike classic Least Recently Used (LRU) and Incremental Rank Progress (CLIMB) policies, AdaptiveClimb dynamically adjusts the promotion distance (jump) of the cached objects based on recent hit and miss patterns, requiring only a single tunable parameter and no per-item statistics. This enables rapid adaptation to changing access distributions while maintaining low overhead. Building on this foundation, DynamicAdaptiveClimb further enhances adaptability by automatically tuning the cache size in response to workload demands. Our comprehensive evaluation across a diverse set of real-world traces, including 1067 traces from 6 different datasets, demonstrates that DynamicAdaptiveClimb consistently achieves substantial speedups and higher hit ratios compared to other state-of-the-art algorithms. In particular, our approach achieves up to a 29% improvement in hit ratio and a substantial reduction in miss penalties compared to the FIFO baseline. Furthermore, it outperforms the next-best contenders, AdaptiveClimb and SIEVE [43], by approximately 10% to 15%, especially in environments characterized by fluctuating working set sizes. These results highlight the effectiveness of our approach in delivering efficient performance, making it well-suited for modern, dynamic caching environments.

MigGPT: Harnessing Large Language Models for Automated Migration of Out-of-Tree Linux Kernel Patches Across Versions

2025-11-26T06:32:12Z

Out-of-tree kernel patches are essential for adapting the Linux kernel to new hardware or enabling specific functionalities. Maintaining and updating these patches across different kernel versions demands significant effort from experienced engineers. Large language models (LLMs) have shown remarkable progress across various domains, suggesting their potential for automating out-of-tree kernel patch migration. However, our findings reveal that LLMs, while promising, struggle with incomplete code context understanding and inaccurate migration point identification. In this work, we propose MigGPT, a framework that employs a novel code fingerprint structure to retain code snippet information and incorporates three meticulously designed modules to improve the migration accuracy and efficiency of out-of-tree kernel patches. Furthermore, we establish a robust benchmark using real-world out-of-tree kernel patch projects to evaluate LLM capabilities. Evaluations show that MigGPT significantly outperforms the direct application of vanilla LLMs, achieving an average completion rate of 74.07 for migration tasks.

SARA: A Stall-Aware Memory Allocation Strategy for Mixed-Criticality Systems

2025-11-25T06:58:09Z

The memory capacity in edge devices is often limited due to constraints on cost, size, and power. Consequently, memory competition leads to inevitable page swapping in memory-constrained mixed-criticality edge devices, causing slow storage I/O and thus performance degradation. In such scenarios, inefficient memory allocation disrupts the balance between application performance, causing soft real-time (soft RT) tasks to miss deadlines or preventing non-real-time (non-RT) applications from optimizing throughput. Meanwhile, we observe unpredictable, long system-level stalls (called long stalls) under high memory and I/O pressure, which further degrade performance. In this work, we propose a Stall-Aware Real-Time Memory Allocator (SARA), which discovers opportunities for performance balance by allocating just enough memory to soft RT tasks to meet deadlines and, at the same time, optimizing the remaining memory for non-RT applications. To minimize the memory usage of soft RT tasks while meeting real-time requirements, SARA leverages our insight into how latency, caused by memory insufficiency and measured by our proposed PSI-based metric, affects the execution time of each soft RT job, where a job runs per period and a soft RT task consists of multiple periods. Moreover, SARA detects long stalls using our definition and proactively drops affected jobs, minimizing stalls in task execution. Experiments show that SARA achieves an average of 97.13% deadline hit ratio for soft RT tasks and improves non-RT application throughput by up to 22.32x over existing approaches, even with memory capacity limited to 60% of peak demand.

Crash-Consistent Checkpointing for AI Training on macOS/APFS

2025-11-23T07:29:06Z

Deep learning training relies on periodic checkpoints to recover from failures, but unsafe checkpoint installation can leave corrupted files on disk. This paper presents an experimental study of checkpoint installation protocols and integrity validation for AI training on macOS/APFS. We implement three write modes with increasing durability guarantees: unsafe (baseline, no fsync), atomic_nodirsync (file-level durability via fsync()), and atomic_dirsync (file + directory durability). We design a format-agnostic integrity guard using SHA-256 checksums with automatic rollback. Through controlled experiments including crash injection (430 unsafe-mode trials) and corruption injection (1,600 atomic-mode trials), we demonstrate that the integrity guard detects 99.8-100% of corruptions with zero false positives. Performance overhead is 56.5-108.4% for atomic_nodirsync and 84.2-570.6% for atomic_dirsync relative to the unsafe baseline. Our findings quantify the reliability-performance trade-offs and provide deployment guidance for production AI infrastructure.

eBPF-PATROL: Protective Agent for Threat Recognition and Overreach Limitation using eBPF in Containerized and Virtualized Environments

2025-11-22T18:51:36Z

With the increasing use and adoption of cloud and cloud-native computing, the underlying technologies (i.e., containerization and virtualization) have become foundational. However, strict isolation and maintaining runtime security in these environments has become increasingly challenging. Existing approaches like seccomp and Mandatory Access Control (MAC) frameworks offer some protection up to a limit, but often lack context awareness, syscall argument filtering, and adaptive enforcement, providing the ability to adjust decisions at runtime based on observed application behavior, workload changes, or detected anomalies rather than relying solely on static or predefined rules.This paper introduces eBPF-PATROL (eBPF-Protective Agent for Threat Recognition and Overreach Limitation), an extensible lightweight runtime security agent that uses extended Berkeley Packet Filter (eBPF) technology to monitor and enforce policies in containerized and virtualized environments. By intercepting system calls, analyzing execution context, and applying user-defined rules, eBPF-PATROL detects and prevents real-time boundary violations, such as reverse shells, privilege escalation, and container escape attempts. We describe the architecture, implementation, and evaluation of eBPF-PATROL, demonstrating its low overhead (< 2.5 percent) and high detection accuracy across real-world attack scenarios.

Valet: Efficient Data Placement on Modern SSDs

2025-11-20T21:42:45Z

The increasing demand for SSDs coupled with scaling difficulties has left manufacturers scrambling for newer SSD interfaces which promise better performance and durability. While these interfaces reduce the rigidity of traditional abstractions, they require application or system-level changes that can impact the stability, security, and portability of systems. To make matters worse, such changes are rendered futile with the introduction of next-generation interfaces. It is therefore no surprise that such interfaces have seen limited adoption, leaving behind a graveyard of experimental interfaces ranging from open-channel SSDs to stream SSDs. Our solution, Valet, leverages userspace shim layers to add placement hints for application data, delivering up to 2-4x write throughput over filesystems and comparable or better performance than application-specific solutions, with up to 6x lower tail latency. Valet generates dynamic placement hints, remapping application data to modern SSDs with zero modifications to the application, the filesystem, or the kernel. We demonstrate performance, efficiency, and multi-tenancy benefits of Valet across a set of widely-used applications: RocksDB, MongoDB, and CacheLib, presenting a solution that combines the performance of application-specific solutions with wide applicability to log-structured data-intensive applications.

AVS: A Computational and Hierarchical Storage System for Autonomous Vehicles

2025-11-19T15:46:48Z

Autonomous vehicles (AVs) are evolving into mobile computing platforms, equipped with powerful processors and diverse sensors that generate massive heterogeneous data, for example 14 TB per day. Supporting emerging third-party applications calls for a general-purpose, queryable onboard storage system. Yet today's data loggers and storage stacks in vehicles fail to deliver efficient data storage and retrieval. This paper presents AVS, an Autonomous Vehicle Storage system that co-designs computation with a hierarchical layout: modality-aware reduction and compression, hot-cold tiering with daily archival, and a lightweight metadata layer for indexing. The design is grounded with system-level benchmarks on AV data that cover SSD and HDD filesystems and embedded indexing, and is validated on embedded hardware with real L4 autonomous driving traces. The prototype delivers predictable real-time ingest, fast selective retrieval, and substantial footprint reduction under modest resource budgets. The work also outlines observations and next steps toward more scalable and longer deployments to motivate storage as a first-class component in AV stacks.

A Flower-Inspired Solution for Computer Memory Wear-Leveling

2025-11-19T00:36:20Z

Lengthening a computer memory's lifespan is important for e-waste and sustainability. Uneven wear of memory is a major barrier. The problem is becoming even more urgent as emerging memory such as phase-change memory is subject to even shorter lifespan. Various solutions have been proposed, but they either require complicated hardware extensions or apply only to certain program constructs such as loops. This research proposes a new method, dual-ring wear leveling. It takes inspiration from the natural law known as the ``golden ratio" and how it helps flower petals evenly receive sun lights. By modeling memory as two rings and combines the idea with existing memory management, garbage collection, the new solution offers an effective way to reduce memory wear and hence lengthen memory lifespan. It is deterministic, able to automatically adapt to memory size, requiring no hardware changes, and adding no slowdown to program executions.

Sharpe-Driven Stock Selection and Liquidiy-Constrained Portfolio Optimization: Evidence from the Chinese Equity Market

2025-11-17T11:10:24Z

This paper develops and empirically evaluates a Sharpe-driven stock selection and liquidity-constrained portfolio optimization framework designed for the Chinese equity market. The proposed methodology integrates three sequential stages: Sharpe-ratio-based universe selection, liquidity-adjusted mean-variance optimization, and multi-layered risk management implemented within an automated trading bot. Using daily price and volume data from 2023 to 2025 across the A-share universe, the framework dynamically identifies stocks exhibiting strong risk-adjusted performance while accounting for trading frictions and liquidity asymmetries that are common in emerging markets. Empirical backtests reveal that the proposed strategy achieves an annualized return of 25 percent, a Sharpe ratio of 1.71, and a maximum drawdown of 8.2 percent. These results significantly outperform the Buy-and-Hold benchmark, which records an annualized return of 21 percent, a Sharpe ratio of 1.62, and a drawdown of 7.6 percent over the same period. The superior performance demonstrates that incorporating risk-adjusted selection and liquidity-aware constraints enhances both profitability and stability, enabling the portfolio to capture upside potential while maintaining drawdown resilience. Beyond its empirical success, this study contributes methodologically by bridging classical mean-variance theory with practical liquidity adjustments and dynamic Sharpe-based screening. The resulting system not only improves the tradability of optimized portfolios but also provides a scalable and adaptive framework for quantitative asset allocation in liquidity-sensitive markets, offering new evidence that disciplined risk-return optimization can outperform passive investment strategies in the post-2023 Chinese equity landscape.

Taiji: A DPU Memory Elasticity Solution for In-production Cloud Environments

2025-11-14T07:22:34Z

The growth of cloud computing drives data centers toward higher density and efficiency. Data processing units (DPUs) enhance server network and storage performance but face challenges such as long hardware upgrade cycles and limited resources. To address these, we propose Taiji, a resource-elasticity architecture for DPUs. Combining hybrid virtualization with parallel memory swapping, Taiji switches the DPU's operating system (OS) into a guest OS and inserts a lightweight virtualization layer, making nearly all DPU memory swappable. It achieves memory overcommitment for the switched guest OS via high-performance memory elasticity, fully transparent to upper-layer applications, and supports hot-switch and hot-upgrade to meet in-production cloud requirements. Experiments show that Taiji expands DPU memory resources by over 50%, maintains virtualization overhead around 5%, and ensures 90% of swap-ins complete within 10 microseconds. Taiji delivers an efficient, reliable, low-overhead elasticity solution for DPUs and is deployed in large-scale production systems across more than 30,000 servers.

Vmem: A Lightweight Hot-Upgradable Memory Management for In-production Cloud Environment

2025-11-13T04:43:36Z

Traditional memory management suffers from metadata overhead, architectural complexity, and stability degradation, problems intensified in cloud environments. Existing software/hardware optimizations are insufficient for cloud computing's dual demands of flexibility and low overhead. This paper presents Vmem, a memory management architecture for in-production cloud environments that enables flexible, efficient cloud server memory utilization through lightweight reserved memory management. Vmem is the first such architecture to support online upgrades, meeting cloud requirements for high stability and rapid iterative evolution. Experiments show Vmem increases sellable memory rate by about 2%, delivers extreme elasticity and performance, achieves over 3x faster boot time for VFIO-based virtual machines (VMs), and improves network performance by about 10% for DPU-accelerated VMs. Vmem has been deployed at large scale for seven years, demonstrating efficiency and stability on over 300,000 cloud servers supporting hundreds of millions of VMs.

Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments

2025-11-12T22:17:44Z

The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for different LLM downstream tasks. To handle multi-task settings with vast LLM parameter counts, methods like Low-Rank Adaptation (LoRA) enable task-specific fine-tuning while sharing most of the base LLM model across tasks. Hence, they allow concurrent task serving with minimal memory requirements. However, existing LLM serving systems face inefficiencies: they overlook workload heterogeneity, impose high link bandwidth from frequent adapter loading, and suffer from head-of-line blocking in their schedulers. To address these challenges, we present Chameleon, a novel LLM serving system optimized for many adapter environments, that relies on two core ideas: adapter caching and adapter-aware scheduling. First, Chameleon caches popular adapters in GPU memory, minimizing the adapter loading times. Importantly, it uses the otherwise idle GPU memory, avoiding extra memory costs. Second, Chameleon uses a non-preemptive multi-queue scheduling to efficiently account for workload heterogeneity. In this way, Chameleon simultaneously prevents head of line blocking and starvation. We implement Chameleon on top of a state-of-the-art LLM serving platform and evaluate it with real-world production traces and open-source LLMs. Under high loads, Chameleon reduces P99 and P50 TTFT latency by 80.7% and 48.1%, respectively, while improving throughput by 1.5x compared to state-of-the-art baselines.

Work-in-Progress: Function-as-Subtask API Replacing Publish/Subscribe for OS-Native DAG Scheduling

2025-11-11T14:31:22Z

The Directed Acyclic Graph (DAG) task model for real-time scheduling finds its primary practical target in Robot Operating System 2 (ROS 2). However, ROS 2's publish/subscribe API leaves DAG precedence constraints unenforced: a callback may publish mid-execution, and multi-input callbacks let developers choose topic-matching policies. Thus preserving DAG semantics relies on conventions; once violated, the model collapses. We propose the Function-as-Subtask (FasS) API, which expresses each subtask as a function whose arguments/return values are the subtask's incoming/outgoing edges. By minimizing description freedom, DAG semantics is guaranteed at the API rather than by programmer discipline. We implement a DAG-native scheduler using FasS on a Rust-based experimental kernel and evaluate its semantic fidelity, and we outline design guidelines for applying FasS to Linux Linux sched_ext.

Integrating Artificial Intelligence into Operating Systems: A Survey on Techniques, Applications, and Future Directions

2025-11-11T13:07:47Z

Heterogeneous hardware and dynamic workloads worsen long-standing OS bottlenecks in scalability, adaptability, and manageability. At the same time, advances in machine learning (ML), large language models (LLMs), and agent-based methods enable automation and self-optimization, but current efforts lack a unifying view. This survey reviews techniques, architectures, applications, challenges, and future directions at the AI-OS intersection. We chart the shift from heuristic- and rule-based designs to AI-enhanced systems, outlining the strengths of ML, LLMs, and agents across the OS stack. We summarize progress in AI for OS (core components and the wider ecosystem) and in OS for AI (component- and architecture-level support for short- and long-context inference, distributed training, and edge inference). For practice, we consolidate evaluation dimensions, methodological pipelines, and patterns that balance real-time constraints with predictive accuracy. We identify key challenges, such as complexity, overhead, model drift, limited explainability, and privacy and safety risks, and recommend modular, AI-ready kernel interfaces; unified toolchains and benchmarks; hybrid rules-plus-AI decisions with guardrails; and verifiable in-kernel inference. Finally, we propose a three-stage roadmap including AI-powered, AI-refactored, and AI-driven OSs, to bridge prototypes and production and to enable scalable, reliable AI deployment.

WebAssembly on Resource-Constrained IoT Devices: Performance, Efficiency, and Portability

2025-11-11T11:05:34Z

The increasing heterogeneity of hardware and software in the Internet of Things (IoT) poses a major challenge for the portability, maintainability and deployment of software on devices with limited resources. WebAssembly (WASM), originally designed for the web, is increasingly recognized as a portable, secure and efficient runtime environment that can overcome these challenges. This paper explores the feasibility of using WASM in embedded IoT systems by evaluating its performance, memory footprint and energy consumption on three representative microcontrollers: the Raspberry Pi Pico, the ESP32 C6 and the nRF5340. Two lightweight WASM runtimes, WAMR and wasm3, are compared with the native C execution. The results show that while the native execution remains superior in terms of speed and energy efficiency, WASM offers acceptable trade-offs in return for cross-platform compatibility and sandbox execution. The results highlight that WASM is a viable option for embedded IoT applications when portability and security outweigh strict performance constraints, and that further runtime optimization could extend its practicality in this area.