https://arxiv.org/api/5r1AIq9HHsHqLGoE4AGRJHJNwzw 2026-04-14T12:59:16Z 28013 600 15 http://arxiv.org/abs/2603.04377v1 Benchmarking Quantum Computers via Protocols, Comparing IBM's Heron vs IBM's Eagle 2026-03-04T18:41:19Z

As quantum computing hardware rapidly advances, objectively evaluating the capabilities and error rates of new processors remains a critical challenge for the field. A clear and realistic understanding of current quantum performance is essential to guide research priorities and drive meaningful progress. In this work, we apply and extend a protocol-based benchmarking methodology (presented in arXiv:2505.12441) that utilizes well-defined quantumness thresholds. By evaluating performance at protocol level rather then the gate level, this approach provides a transparent and intuitive assessment of whether specific quantum processors, or isolated sub-chips within them, can demonstrate a practical quantum advantage. To illustrate the utility of this method, we compare two generations of IBM quantum computers: the older Eagle architecture and the newer Heron architecture. Our findings reveal the genuine operational strengths and limitations of these devices, demonstrating substantial performance improvements in the newer Heron generation.

2026-03-04T18:41:19Z 42 pages, 51 figures Nitay Mayo Tal Mor Yossi Weinstein http://arxiv.org/abs/2603.04323v1 PTOPOFL: Privacy-Preserving Personalised Federated Learning via Persistent Homology 2026-03-04T17:44:39Z

Federated learning (FL) faces two structural tensions: gradient sharing enables data-reconstruction attacks, while non-IID client distributions degrade aggregation quality. We introduce PTOPOFL, a framework that addresses both challenges simultaneously by replacing gradient communication with topological descriptors derived from persistent homology (PH). Clients transmit only 48-dimensional PH feature vectors-compact shape summaries whose many-to-one structure makes inversion provably ill-posed-rather than model gradients. The server performs topology-guided personalised aggregation: clients are clustered by Wasserstein similarity between their PH diagrams, intra-cluster models are topology-weighted,and clusters are blended with a global consensus. We prove an information-contraction theorem showing that PH descriptors leak strictly less mutual information per sample than gradients under strongly convex loss functions, and we establish linear convergence of the Wasserstein-weighted aggregation scheme with an error floor strictly smaller than FedAvg. Evaluated against FedAvg, FedProx, SCAFFOLD, and pFedMe on a non-IID healthcare scenario (8 hospitals, 2 adversarial) and a pathological benchmark (10 clients), PTOPOFL achieves AUC 0.841 and 0.910 respectively-the highest in both settings-while reducing reconstruction risk by a factor of 4.5 relative to gradient sharing. Code is publicly available at https://github.com/MorillaLab/TopoFederatedL and data at https://doi.org/10.5281/zenodo.18827595.

2026-03-04T17:44:39Z 22 pages, 6 Figures Kelly L Vomo-Donfack Adryel Hoszu Grégory Ginot Ian Morilla http://arxiv.org/abs/2604.09593v1 Benchmarking Compound AI Applications for Hardware-Software Co-Design 2026-03-04T15:59:46Z

Compound AI applications, composed from interactions between Large Language Models (LLMs), Machine Learning (ML) models, external tools and data sources are quickly becoming an integral workload in datacenters. Their diverse sub-components and use-cases present a large configuration-space across the deployment stack -- ranging from applications and serving software down to hardware -- each of which may influence the application performance, deployment cost, and/or resource consumption. Despite their rapid adoption, however, the systems community lacks a standardized benchmark for analyzing this complicated design-space and guiding in system design. In this work, we present our benchmarking suite used for cross-stack analysis of Compound AI applications. Using this, we derive key takeaways and design principles spanning several layers of the stack for hardware-software co-design to unlock higher resource-efficiency.

2026-03-04T15:59:46Z Paramuth Samuthrsindh Angel Cervantes Varun Gohil Gohar Irfan Chaudhry Christina Delimitrou Adam Belay http://arxiv.org/abs/2603.04126v1 Efficient Time-Aware Partitioning of Quantum Circuits for Distributed Quantum Computing 2026-03-04T14:43:10Z

To overcome the physical limitations of scaling monolithic quantum computers, distributed quantum computing (DQC) interconnects multiple smaller-scale quantum processing units (QPUs) to form a quantum network. However, this approach introduces a critical challenge, namely the high cost of quantum communication between remote QPUs incurred by quantum state teleportation and quantum gate teleportation. To minimize this communication overhead, DQC compilers must strategically partition quantum circuits by mapping logical qubits to distributed physical QPUs. Static graph partitioning methods are fundamentally ill-equipped for this task as they ignore execution dynamics and underlying network topology, while metaheuristics require substantial computational runtime. In this work, we propose a heuristic based on beam search to solve the circuit partitioning problem. Our time-aware algorithm incrementally constructs a low-cost sequence of qubit assignments across successive time steps to minimize overall communication overhead. The time and space complexities of the proposed algorithm scale quadratically with the number of qubits and linearly with circuit depth, offering a significant computational speedup over common metaheuristics. We demonstrate that our proposed algorithm consistently achieves significantly lower communication costs than static baselines across varying circuit sizes, depths, and network topologies, providing an efficient compilation tool for near-term distributed quantum hardware.

2026-03-04T14:43:10Z 5 pages, 3 figures, conference: accepted at QCNC 2026 Raymond P. H. Wu Chathu Ranaweera Sutharshan Rajasegarar Ria Rushin Joseph Jinho Choi Seng W. Loke http://arxiv.org/abs/2411.19058v4 Carbon-Aware Quality Adaptation for Energy-Intensive Services 2026-03-04T14:09:36Z

The energy demand of modern cloud services, particularly those related to generative AI, is increasing at an unprecedented pace. To date, carbon-aware computing strategies have primarily focused on batch process scheduling or geo-distributed load balancing. However, such approaches are not applicable to services that require constant availability at specific locations due to latency, privacy, data, or infrastructure constraints. In this paper, we explore how the carbon footprint of energy-intensive services can be reduced by adjusting the fraction of requests served by different service quality tiers. We show that adapting this quality of responses with respect to grid carbon intensity can lead to additional carbon savings beyond resource and energy efficiency. Building on this, we introduce a forecast-based multi-horizon optimization that reaches close-to-optimal carbon savings and is able to automatically adapt service quality for best-effort users to stay within an annual carbon budget. Our approach can reduce the emissions of large-scale LLM services, which we estimate at multiple 10,000 tons of CO2 annually, by up to 10%.

2024-11-28T11:17:30Z Extended version of our paper published at e-Energy'25. Compared to the published version, we (i) add a time-based vs. utilization-based power attribution perspective together with a proof that both yield equivalent provisioning decisions under mild assumptions and (ii) extend the online approach with an automatic quality adaptation to meet a fixed annual carbon budget Philipp Wiesner Dennis Grinwald Philipp Weiß Patrick Wilhelm Ramin Khalili Odej Kao 10.1145/3679240.3734614 http://arxiv.org/abs/2603.04027v1 Performance Optimization in Stream Processing Systems: Experiment-Driven Configuration Tuning for Kafka Streams 2026-03-04T13:04:03Z

Configuring stream processing systems for efficient performance, especially in cloud-native deployments, is a challenging and largely manual task. We present an experiment-driven approach for automated configuration optimization that combines three phases: Latin Hypercube Sampling for initial exploration, Simulated Annealing for guided stochastic search, and Hill Climbing for local refinement. The workflow is integrated with the cloud-native Theodolite benchmarking framework, enabling automated experiment orchestration on Kubernetes and early termination of underperforming configurations. In an experimental evaluation with Kafka Streams and a Kubernetes-based cloud testbed, our approach identifies configurations that improve throughput by up to 23% over the default. The results indicate that Latin Hypercube Sampling with early termination and Simulated Annealing are particularly effective in navigating the configuration space, whereas additional fine-tuning via Hill Climbing yields limited benefits.

2026-03-04T13:04:03Z Accepted for the 9th Workshop on Hot Topics in Cloud Computing Performance (HotCloudPerf 2026) at ACM/SPEC ICPE 2026 David Chen Sören Henning Kassiano Matteussi Rick Rabiser 10.1145/3777911.3800636 http://arxiv.org/abs/2603.04008v1 Lambdas at the Far Edge: a Tale of Flying Lambdas and Lambdas on Wheels 2026-03-04T12:50:07Z

Aggregate Programming (AP) is a paradigm for programming the collective behaviour of sets of distributed devices, possibly situated at the network far edge, by relying on asynchronous proximity-based interactions. The eXchange Calculus (XC), a recently proposed foundational model for AP, is essentially a typed lambda calculus extended with an operator (the exchange operator) providing an implicit communication mechanism between neighbour devices. This paper provides a gentle introduction to XC and to its implementation as a C++ library, called FCPP. The FCPP library and toolchain has been mainly developed at the Department of Computer Science of the University of Turin, where Stefano Berardi spent most of his academic career conducting outstanding research about logical foundation of computer science and transmitting his passion for research to students and young researchers, often exploiting typed lambda calculi. An FCCP program is essentially a typed lambda term, and FCPP has been used to write code that has been deployed on devices at the far edge of the network, including rovers and (soon) Uncrewed Aerial Vehicles (UAVs); hence the title of the paper.

2026-03-04T12:50:07Z In Proceedings LTT 2026, arXiv:2603.02912 EPTCS 441, 2026, pp. 19-45 Giorgio Audrito Department of Computer Science Daniele Bortoluzzi Department of Computer Science Ferruccio Damiani Department of Computer Science Giordano Scarso Department of Computer Science Gianluca Torta Department of Computer Science Andrea Basso MITO Technology, Milan, Italy Monica Cochi Torino Airport Lorenzo Gusman Torino Airport Lorenzo Comba Department of Agricultural, Forest and Food Sciences Paolo Gay Department of Agricultural, Forest and Food Sciences Paola Dal Zovo Concept Engineering Reply, Turin, Italy Giada Galati Eurix, Turin, Italy Francesco Gallo Eurix, Turin, Italy Aljaž Grdadolnik Faculty of Computer and Information Science University of Ljubljana, Ljubljana, Slovenia Massimo Pescarollo Department of Economics and Statistics Cognetti de Martiis, University of Turin, Turin, Italy Paola Pisano Department of Economics and Statistics, Cognetti de Martiis, University of Turin, Turin, Italy 10.4204/EPTCS.441.2 http://arxiv.org/abs/2602.09937v2 Why Do AI Agents Systematically Fail at Cloud Root Cause Analysis? 2026-03-04T10:13:10Z

Failures in large-scale cloud systems incur substantial financial losses, making automated Root Cause Analysis (RCA) essential for operational stability. Recent efforts leverage Large Language Model (LLM) agents to automate this task, yet existing systems exhibit low detection accuracy even with capable models, and current evaluation frameworks assess only final answer correctness without revealing why the agent's reasoning failed. This paper presents a process level failure analysis of LLM-based RCA agents. We execute the full OpenRCA benchmark across five LLM models, producing 1,675 agent runs, and classify observed failures into 12 pitfall types across intra-agent reasoning, inter-agent communication, and agent-environment interaction. Our analysis reveals that the most prevalent pitfalls, notably hallucinated data interpretation and incomplete exploration, persist across all models regardless of capability tier, indicating that these failures originate from the shared agent architecture rather than from individual model limitations. Controlled mitigation experiments further show that prompt engineering alone cannot resolve the dominant pitfalls, whereas enriching the inter-agent communication protocol reduces communication-related failures by up to 15 percentage points. The pitfall taxonomy and diagnostic methodology developed in this work provide a foundation for designing more reliable autonomous agents for cloud RCA.

2026-02-10T16:14:05Z Taeyoon Kim Woohyeok Park Hoyeong Yun Kyungyong Lee http://arxiv.org/abs/2603.03899v1 A framework to reason about consistency and atomicity guarantees in a sparsely-connected, partially-replicated peer-to-peer system 2026-03-04T09:59:25Z

For an offline-first collaborative application to operate in true peer-to-peer fashion, its collaborative features must function even in environments where internet connectivity is limited or unavailable. Each peer may only be interested in a subset of the application data relevant to its workload, and this subset can overlap in different ways with those of other peers. Limitations imposed by access control and mesh network technologies often result in peers being sparsely connected. Reasoning about consistency in these systems is hard, especially when considering transactional updates that may alter different sets of data in the same transaction. We present \textsc{IntersectionAtomicity} and \textsc{IntersectionCC} as models to reason about offline-first collaborative applications that are sparsely-connected and rely on partially replicating different subsets of a broader set of data. We then use these models to propose a set of guidelines to help developers design their application with atomicity and consistency guarantees.

2026-03-04T09:59:25Z 7 pages, 1 figure Sreeja S. Nair Nicholas E. Marino Nick Pascucci Russell Brown Arthur P. R. Silva Tim Cummings Connor M. Power http://arxiv.org/abs/2512.03685v2 Distributed Quantum Computing with Fan-Out Operations and Qudits: the Case of Distributed Global Gates 2026-03-04T07:43:07Z

Much recent work on distributed quantum computing have focused on the use of entangled pairs and distributed two qubit gates. But there has also been work on efficient schemes for achieving multipartite entanglement between nodes in a single shot, removing the need to generate multipartite entangled states using many entangled pairs. This paper looks at how multipartite entanglement resources (e.g., GHZ states) can be useful for distributed fan-out operations; we also consider the use of qudits of dimension four for distributed quantum circuit compression. In particular, we consider how such fan-out operations and qudits can be used to implement circuits which are challenging for distributed quantum computation, involving pairwise qubit interactions, i.e., what has been called global gates (a.k.a. global Mølmer-Sørensen gates). Such gates have been explored to possibly yield more efficient computations via reduced circuit depth, and can be carried out efficiently in some types of quantum hardware (e.g., trapped-ion quantum computers); we consider this as an exploration of an ``extreme'' case for distribution given the global qubit-qubit interactions. We also conclude with some implications for future work on quantum circuit compilation and quantum data centre design.

2025-12-03T11:26:47Z 8 pages, 10 figures; preliminary version (if mistakes found - please contact the author); accepted at QCNC 2026 Seng W. Loke http://arxiv.org/abs/2604.09592v1 EdgeWeaver: Accelerating IoT Application Development Across Edge-Cloud Continuum 2026-03-04T05:35:53Z

The rise of complex, latency-sensitive IoT applications across the Edge-Cloud continuum exposes the limitations of current Function-as-a-Service (FaaS) platforms in seamlessly addressing the complexity, heterogeneity, and intermittent connectivity of Edge-Cloud environments. Developers are left to manage integration and Quality of Service (QoS) enforcement manually, rendering application development complicated and costly. To overcome these limitations, we introduce the EdgeWeaver platform that offers a unified "object" abstraction that is seamlessly distributed across the continuum to encapsulate application logic, state, and QoS. EdgeWeaver automates "class" deployment across edge and cloud by composing established distributed algorithms (e.g., Raft, CRDTs)-enabling developers to declaratively express QoS (e.g., availability and consistency) desires that, in turn, guide internal resource allocation, function placement, and runtime adaptation to fulfill them. We implement a prototype of EdgeWeaver and evaluate it under diverse settings and using human subjects. Results show that EdgeWeaver boosts development productivity by 31%, while declaratively enforcing strong consistency and achieving 9 nines availability, 10,000X higher than the current standard, with negligible performance impact.

2026-03-04T05:35:53Z Published in IPDPS 2026 Conference Pawissanutt Lertpongrujikorn Juahn Kwon Hai Duc Nguyen Mohsen Amini Salehi http://arxiv.org/abs/2603.03743v1 The Semantic Arrow of Time, Part II: The Semantics of Open Atomic Ethernet 2026-03-04T05:29:23Z

This is the second of five papers comprising The Semantic Arrow of Time. Part I established that computing's arrow of time is semantic rather than thermodynamic, and that the Forward-In-Time-Only (FITO) assumption constitutes a category mistake. This paper develops the constructive alternative. We present the semantics of Open Atomic Ethernet (OAE) links as a concrete realization of a non-FITO protocol architecture. The key insight is that causal order is not assumed a priori but created through transaction structure: the link state machine progresses through TENTATIVE to REFLECTING to COMMITTED, with the option to abort at any point before commitment. Delivery does not imply commitment; commitment requires reflective acknowledgment -- proof that information has round-tripped and been semantically validated by both endpoints. We formalize this through three frameworks. First, the OAE link state machine, a six-state finite automaton whose normative invariants guarantee that semantic corruption cannot occur at the link level. Second, Indefinite Logical Timestamps (ILT), a four-valued causal structure that admits a genuinely indefinite relation between concurrent events, resolving only after symmetric link-level exchange. Third, the Slowdown Theorem applied to links, which establishes that round-trip measurement is the minimum interaction required to establish causal order. We show that ILT is strictly more expressive than Definite Causal Order systems for reversible link protocols. We connect these results to the Knowledge Balance Principle from quantum information theory. The paper concludes with a comparative analysis showing that OAE achieves infinite consensus number while RDMA, NVLink, and UALink remain limited to finite consensus numbers due to their FITO semantics.

2026-03-04T05:29:23Z Paul Borrill http://arxiv.org/abs/2603.03738v1 Exploring Challenges in Developing Edge-Cloud-Native Applications Across Multiple Business Domains 2026-03-04T05:15:33Z

As the convergence of cloud computing and advanced networking continues to reshape modern software development, edge-cloud-native paradigms have become essential for enabling scalable, resilient, and agile digital services that depend on high-performance, low-latency, and reliable communication. This study investigates the practical challenges of developing, deploying, and maintaining edge-cloud-native applications through in-depth interviews with professionals from diverse domains, including IT, finance, healthcare, education, and industry. Despite significant advancements in cloud technologies, practitioners, particularly those from non-technical backgrounds-continue to encounter substantial complexity stemming from fragmented toolchains, steep learning curves, and operational overhead of managing distributed networking and computing, ensuring consistent performance across hybrid environments, and navigating steep learning curves at the cloud-network boundary. Across sectors, participants consistently prioritized productivity, Quality of Service, and usability over conventional concerns such as cost or migration. These findings highlight the need for operationally simplified, SLA-aware, and developer-friendly platforms that streamline the full application lifecycle. This study contributes a practice-informed perspective to support the alignment of edge-cloud-native systems with the realities and needs of modern enterprises, offering critical insights for the advancement of seamless cloud-network convergence.

2026-03-04T05:15:33Z Pawissanutt Lertpongrujikorn Hai Duc Nguyen Juahn Kwon Mohsen Amini Salehi http://arxiv.org/abs/2603.03736v1 The Ghost in the Datacenter: Link Flapping, Topology Knowledge Failures, and the FITO Category Mistake 2026-03-04T05:12:40Z

Every link disconnection or flap in a datacenter corrupts the network's self-knowledge -- its graph. We call this corruption a ghost: a node that appears reachable but is not, a link that reports "up" but silently drops traffic, or an IP address that resolves to a partitioned machine. Ghosts arise at every scale -- chiplet-to-chiplet (PCIe, UCIe), GPU-to-GPU (NVLink, NVSwitch), node-to-node (Ethernet, Thunderbolt), and cluster-to-cluster (IP, BGP) -- because all these protocols inherit Shannon's forward-in-time-only (FITO) channel model and use Timeout And Retry (TAR) as their failure detector. TAR cannot distinguish "slow" from "dead," which is precisely the ambiguity that Fischer--Lynch--Paterson proved unresolvable in asynchronous systems. We survey the problem using production data from Meta (419 interruptions in 54 days of LLaMA 3 training), ByteDance (38,236 explicit and 5,948 implicit failures in three months), Google (TPUv4 optical circuit switching), and Alibaba (0.057% NIC--ToR link failures per month). At 2025 cluster scale (${\sim}3$ million GPUs, ${>}10$ million optical links), a link flap occurs every 48 seconds. We show that every existing mitigation -- Phi Accrual failure detectors, SWIM, BFD, OSPF/ISIS fast convergence, SmartNIC offload, lossless Ethernet (RoCE/PFC), and Kubernetes pod eviction -- still creates ghosts because each is fundamentally timeout-based. We connect ghosts to gray failures (Huang et al., HotOS 2017) and metastable failures (Bronson et al., HotOS 2021; validated across 22 failures at 11 organizations, OSDI 2022). We argue that Open Atomic Ethernet eliminates ghosts at the link layer through a Reliable Link Failure Detector, Perfect Information Feedback, triangle failover, and atomic token transfer -- making topology knowledge transactional.

2026-03-04T05:12:40Z Paul Borrill http://arxiv.org/abs/2603.03731v1 HyperParallel: A Supernode-Affinity AI Framework 2026-03-04T05:03:33Z

The emergence of large-scale, sparse, multimodal, and agentic AI models has coincided with a shift in hardware toward supernode architectures that integrate hundreds to thousands of accelerators with ultra-low-latency interconnects and unified memory pools. However, existing AI frameworks are not designed to exploit these architectures efficiently, leading to high programming complexity, load imbalance, and poor memory utilization. In this paper, we propose a supernode-affinity AI framework that treats the supernode as a single logical computer and embeds hardware-aware orchestration into the framework. Implemented in MindSpore, our HyperParallel architecture comprises HyperOffload for automated hierarchical memory management, HyperMPMD for fine-grained MPMD parallelism across heterogeneous workloads, and HyperShard for declarative parallel strategy specification. Together, these techniques significantly improve training and inference efficiency while reducing parallel programming and system tuning overhead, demonstrating the necessity of supernode affinity for next-generation AI frameworks.

2026-03-04T05:03:33Z Xin Zhang Beilei Sun Teng Su Qinghua Zhang Chong Bao Lei Chen Xuefeng Jin