https://arxiv.org/api/K8Rd9wxgLuW0eVmCFMmu/mDy2w8 2026-06-13T20:56:51Z 12619 150 15 http://arxiv.org/abs/2606.04628v1 RAMPART: Registry-based Agentic Memory with Priority-Aware Runtime Transformation 2026-06-03T09:01:46Z

RAMPART is a compile-time memory model and pure in-RAM block registry for LLM-based agents. Context assembly is a programmable runtime operation where content is compiled from a structured registry under explicit policy for ordering, inclusion, and eviction. Five composable primitives (promote, gate, write, evict, rollback) act on named addressable blocks before compilation at zero prompt-token cost. Provenance tags and non-evictable authorship flags implement a permissioned memory model with block-level ownership. Controlled probes with Qwen3-8B Q4 show that compile-time placement and the structural relationship between blocks and the task query affect task success, with the cliff falling at roughly the seventh block position when the task follows the registry and the twelfth when it precedes. Grouping the critical block with content-adjacent neighbours and promoting the group as a unit lifts task success by tens of percentage points at positions where single-block placement fails. Cross-model replication on Qwen2.5-7B, Llama-3.1-8B, Mistral-7B-v0.3, and Qwen3-14B shows the content-priming effect appears at the same absolute positions across families, with magnitude varying with model strength. Block grouping raises Mistral's mean pass rate roughly fivefold at the hardest registry size, and a smaller model with the intervention can outperform a larger model without it in the mid-registry zone. Relevance gating reduces prompt cost by 67.8\% while recovering 83% of the promoted-condition success rate. Schema eviction produces 0% invocations against 100% with the schema present, a property policy-based approaches cannot guarantee by construction. Shared-registry coordination reduces inter-agent communication to a method call at zero coordination token cost.

2026-06-03T09:01:46Z Nikodem Tomczak http://arxiv.org/abs/2606.01979v2 A Simple Hierarchical Causality Primer 2026-06-03T08:41:41Z

We provide a brief primer for the idea behind formalising hierarchical causality in the context of complex systems. Here actors are not simply agents. Actors instantiate causation classes. Agents implement local dynamics in given levels or organisation in a given system. Hierarchical causality then describes how actor-level roles constrain, select, and organise agent-level behaviour across levels. The system then necessarily requires three additional structures. First, causation classes to abstract a given form of causal influence that an actor instantiates. Second, aggregation operators to move across the levels. Third, discrete event-time maps are required because the system comprises events, and the relation between local event counts and any global clock must be specified. Our formulation here is purposefully simple and discrete.

2026-06-01T09:41:31Z 8 pages, 1 figure; short technical primer with a toy example in an appendix, corrected minor typos, refined the admissible kernel notation Tim Gebbie http://arxiv.org/abs/2602.17434v2 Multi-Agent Temporal Logic Planning via Penalty Functions and Block-Coordinate Optimization 2026-06-03T08:34:26Z

Multi-agent planning under Signal Temporal Logic (STL) is often hindered by collaborative tasks that lead to computational challenges due to the inherent high dimensionality of the problem, preventing scalable synthesis with satisfaction guarantees. To address this, we formulate STL planning as an optimization program under multi-agent STL constraints and introduce a penalty-based unconstrained relaxation that can be efficiently solved via a Block-Coordinate Gradient Descent (BCGD) method, where each block corresponds to a single agent's decision variables, thereby mitigating complexity. By utilizing a quadratic penalty function defined via smooth STL semantics, we show that BCGD iterations converge to a stationary point of the penalized problem under standard regularity assumptions. To enforce feasibility, the BCGD solver is embedded within a two-layer optimization scheme: inner BCGD updates are performed for a fixed penalty parameter, which is then increased in an outer loop to progressively improve multi-agent STL robustness. The proposed framework enables scalable computations and is validated through various complex multi-robot planning scenarios.

2026-02-19T15:05:16Z Eleftherios E. Vlahakis Arash Bahari Kordabad Lars Lindemann Pantelis Sopasakis Sadegh Soudjani Dimos V. Dimarogonas 10.1109/LCSYS.2026.3699405 http://arxiv.org/abs/2606.09884v1 Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix 2026-06-03T06:49:15Z

We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor--critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency $δ$, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index $Δ= 0.69 \pm 0.11$, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48\% and adding latency drives it to a minimum of $Δ= 0.28$. The fix has clearly documented costs: it is partial ($Δ$ remains supra-Bertrand), it is non-monotone in $δ$, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at $λ= 5$ and corrupts the phase-diagram cell at $(λ{=}5, δ{=}1)$. We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-shock non-recovery.

2026-06-03T06:49:15Z Shree Murthy Rohan Pandey http://arxiv.org/abs/2606.04484v1 AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning 2026-06-03T06:02:52Z

We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client nodes execute arbitrary agents on arbitrary devices. This design provides capabilities that are difficult to support in centralized frameworks: (1) heterogeneous multi-model reinforcement learning, enabling the training of heterogeneous multi-agent teams with multiple LLM as brains; (2) multi-task cocktail training with isolated agent runtimes; (3) fault-tolerant execution that prevents external environment failures from interrupting the training process; and (4) live code iteration, which allows agents to be edited during training by replacing swarm client nodes. To support efficient RL in multi-model, multi-turn, and multi-agent settings, AgentJet introduces a context tracking module with timeline merging, which consolidates redundant context and achieves a 1.5-10x training speedup. Finally, AgentJet introduces an automated research system that takes a research topic as input and autonomously conducts long-horizon, multi-day RL studies on large-scale clusters. By leveraging the swarm architecture, this system reproduces key exploratory workflows of RL researchers without human intervention during execution.

2026-06-03T06:02:52Z Technical report, 27 pages Qingxu Fu Boyin Liu Shuchang Tao Zhaoyang Liu Bolin Ding http://arxiv.org/abs/2606.04361v1 When Freshness Is Not Enough: Distribution-Aware Age of Information for Networked LQR Control 2026-06-03T02:22:46Z

Age of Information (AoI) has become a central metric for the design of wireless update systems, especially in applications where fresh measurements support tracking, estimation, and control. Despite its popularity, the use of mean AoI or peak AoI as a surrogate for closed-loop performance is often motivated by intuition rather than by a control-theoretic derivation. This paper examines whether minimizing the mean AoI is in fact optimal for networked control systems. For scalar linear time-invariant systems with delayed intermittent updates, we show that, under state-independent scheduling policies, the infinite-horizon LQR tracking problem reduces to an optimization over the distribution of inter-scheduling intervals. The resulting objective depends on higher-order statistical moments, and in unstable or correlated regimes on exponential moments, of the inter-scheduling process rather than only on its mean. Consequently, policies with identical mean AoI can induce substantially different tracking costs. We further extend the analysis to disturbances with exponentially decaying autocorrelation and derive equivalent cost formulations that expose the role of the full interval distribution. Finally, we validate the theory using real vehicle trajectories from the NGSIM US-101 dataset. The empirical results match the predicted performance trends, demonstrating that mean AoI alone is insufficient for control-oriented network design.

2026-06-03T02:22:46Z Abdullah Y. Etcibasi C. Emre Koksal Eylem Ekici http://arxiv.org/abs/2606.04306v1 Organizational Control Layer: Governance Infrastructure at the Execution Boundary of LLM Agent Systems 2026-06-03T00:25:56Z

LLM-based agents are increasingly deployed in workflows where generated outputs may directly trigger state-changing actions. This creates an execution-boundary problem: proposed actions must be governed before they are executed. We study this problem through economically consequential multi-agent interactions and argue that deployment-grade agent systems should separate proposal generation from environment-facing execution. To operationalize this principle, we introduce the Organizational Control Layer (OCL), a model-agnostic governance infrastructure that intercepts generated actions before execution through policy enforcement and escalation, without modifying the underlying LLM generator. We evaluate OCL on adversarial buyer--seller negotiation environments adapted from AgenticPay. Across multiple frontier LLM backends, OCL reduces unsafe executions from 88% to near-zero while increasing valid success from 12% to 96%. Results further reveal a safety--utility tradeoff: strict governance improves compliance and reliability against policy and constraint violations, but can reduce flexibility in tightly constrained markets. These findings suggest that deployment-grade LLM agent systems require explicit governance at the boundary between language generation and executable actions. The source code is available at: https://github.com/SHITIANYU-hue/amai_ocl

2026-06-03T00:25:56Z 13 pages, 2 figures Tianyu Shi Yang Mo Yiou Liu Zhuonan Hao Yin Wang Wenzhuo Hu Nan Yu Meng Zhou Jiangbo Yu http://arxiv.org/abs/2606.04260v1 What Makes Majority Illusion Easy to Detect? 2026-06-02T22:25:11Z

Majority illusion is an undesirable phenomenon in social networks in which agents incorrectly perceive a minority opinion as dominant. This can severely distort collective behavior and decision-making. We study the fundamental question of detecting whether a social network allows for a majority illusion. Formally, in the $q$-Majority Illusion problem, we ask whether there exists a binary labeling of agents in which at least a $q$-fraction of agents have the majority of neighbors with the minority label. We investigate how various structural properties of the underlying social network influence the tractability of this question, and provide a detailed map of its computational complexity.

2026-06-02T22:25:11Z Šimon Schierreich Ildikó Schlotter http://arxiv.org/abs/2606.04197v1 Exploring the Topology and Memory of Consensus: How LLM Agents Agree, Fragment, or Settle When Forming Conventions 2026-06-02T20:31:54Z

How much should an LLM agent remember, and how should multi-agent systems be connected when trying to reach consensus? We show these two design choices interact in a way that flips the sign of memory's effect on coordination. Across 432 simulation runs of a networked Naming Game on eight fixed 16-agent topologies, we vary memory depth and network structure. Longer memory slows the time to reach steady state in decentralized networks but accelerates it in centralized ones; the same parameter pushes the system in opposite directions depending on topology. Critically, "faster settling" in centralized networks means locking in to a fragmented plateau more quickly, not reaching system-wide consensus, which can be used to generate diverging opinions. We further document a memory-mediated speed-unity trade-off: centralized networks consistently preserve more competing conventions than decentralized networks, but their settling speed depends sharply on memory. At the agent level, within-network analyses show that high-betweenness bridges suffer a brokerage penalty while agents in locally clustered neighborhoods achieve higher coordination success. Finally, in search of analytically tractable generative mechanisms, we find that agents' choices are well captured by Fictitious Play, indicating belief-based rather than reward-based adaptation. The practical implication: memory depth and communication topology should be co-designed, not optimized in isolation.

2026-06-02T20:31:54Z Submitted to the Journal of Artificial Societies and Social Simulation (JASSS) Aliakbar Mehdizadeh Martin Hilbert http://arxiv.org/abs/2507.21638v2 Assistax: A Multi-Agent Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics 2026-06-02T17:07:17Z

The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.

2025-07-29T09:49:11Z Accepted at the Reinforcement Learning Conference 2026 Leonard Hinckeldey Elliot Fosong Rimvydas Rubavicius Elle Miller Trevor McInroe Fan Zhang Patricia Wollstadt Stefano V. Albrecht Subramanian Ramamoorthy http://arxiv.org/abs/2606.03876v1 From 'What' to 'How' and 'Why': Sharing LLM-Generated Retrospective Summaries of Older Adults' Passive Tracking Data with Remote Family Members 2026-06-02T16:46:00Z

With the growing prevalence of modern ubiquitous computing technologies, multi-modal tracking systems hold promise for providing timely awareness and reassurance to stakeholders such as remote family members (RFMs) of older adults, who play a central role in care coordination. However, combining heterogeneous data streams into high-level, meaningful content - such as retrospective summaries - remains challenging. While recent work has demonstrated the promise of large language models (LLMs) for interpreting multi-modal tracking data, less attention has been given to generating narrative accounts for stakeholders like RFMs, who possess rich personal knowledge of older adults and strong emotional responsibility, yet have limited visibility into their daily lives and limited capacity for caregiving. In this work, we explore how LLMs can be used to generate retrospective summaries from multi-modal tracking data for RFMs of older adults. We leveraged and customized an existing system, Vital Insight, to generate initial summaries on different dates and data availability scenarios as technology probes, and conducted interviews with 11 RFMs to gather feedback. Based on these insights, we redesigned the system into a multi-layer, multi-agent, insight-driven summary approach that builds from objective statistics and descriptions to enriched, context-aware narratives. We then compared the redesigned summaries with the initial versions through a survey with the same 11 RFMs and found significant improvements in satisfaction, perceived helpfulness, trust, and willingness to receive the summaries. We conclude by presenting design implications for AI-generated summaries for RFMs and broader contexts, emphasizing the need to support RFMs' sensemaking shift from simply presenting ''What'' data were collected, to explaining ''How'' is my loved one doing and ''Why''.

2026-06-02T16:46:00Z Jiachen Li Reina Szeyi Chan Akshat Choube Xiang Zhi Tan Elizabeth Mynatt Varun Mishra http://arxiv.org/abs/2602.12430v4 Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward 2026-06-02T16:14:54Z

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL$.$md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries, autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills

2026-02-12T21:33:25Z Accepted by Agent Skills '26 Workshop at ACM Conference on AI and Agentic Systems 2026 Renjun Xu Yang Yan http://arxiv.org/abs/2606.03735v1 On dynamic multi-agent pathfinding methods: review, simulations and modifications 2026-06-02T14:51:40Z

This paper presents a systematic study of pathfinding algorithms in the context of Dynamic Multi-Agent Pathfinding (D-MAPF), a setting that combines dynamic obstacles, partial observability, and inter-agent conflicts. We evaluate six representative algorithms: Dijkstra, D* Lite, Space-Time A*, WHCA*, M*, and a novel method denoted as A** within a unified simulation framework. The proposed A** algorithm introduces a template-based approach that decouples offline geometric path generation from online temporal adaptation. By precomputing multiple diverse candidate paths and dynamically reconnecting to them using space-time planning, A** improves solution quality in environments with frequent changes and limited sensing

2026-06-02T14:51:40Z Gabriel Fejziaj Salama Hassona Wieslaw Marszalek http://arxiv.org/abs/2606.01925v2 QoEReasoner: An Agentic Reasoning Framework for Automated and Explainable QoE Diagnosis in RANs 2026-06-02T13:36:36Z

Diagnosing Quality-of-Experience (QoE) degradations in operational Radio Access Networks (RANs) is a critical but notoriously complex task, traditionally requiring labor-intensive expert analysis over high-dimensional, cross-layer telemetry. While Large Language Models (LLMs) offer unprecedented reasoning capabilities, they are fundamentally unsuited for raw RANs troubleshooting: they fail at numeric time-series analysis, hallucinate protocol-violating causal links, and lack the stateful rigor required for multi-step fault localization. To bridge this gap, we present QoEReasoner, an end-to-end, LLM-driven agentic system designed for automated and explainable QoE diagnosis. QoEReasoner tames the inherent unpredictability of LLMs by grounding their reasoning in the physical realities of the network. It employs deterministic tools to reliably translate raw numeric KPIs into structured evidence, enforces protocol-consistent fault propagation through a domain-specific Knowledge Base, and leverages a Historical Bank of expert-validated cases to guide hypothesis generation. A stateful central planner orchestrates this closed-loop process across anomaly detection, causal tracing, and root-cause localization. Evaluations on real-world operational RANs datasets demonstrate that QoEReasoner outperforms strong baselines by 18\%-40\% in accuracy across multiple diagnostic tasks. Furthermore, it reduces diagnostic time from approximately 30 minutes of manual expert analysis to just 3 minutes per session, delivering highly interpretable, expert-grade reports while remaining robust across diverse LLM backbones.

2026-06-01T08:58:07Z Qizhe Li Haolong Chen Shan Dai Zhuo Li Zhiwei Hu Xuan Li Guangxu Zhu Qingjiang Shi http://arxiv.org/abs/2606.03543v1 D2MDT: Department-aware Multidisciplinary Team Consultation with Deliberation for Efficient Clinical Prediction 2026-06-02T12:07:56Z

Electronic health records (EHRs) are central to clinical prediction, but existing methods either rely on correlation-driven deep models or use single large language models (LLMs), making it difficult to support multidisciplinary clinical reasoning. Recent multi-agent systems (MAS) provide a promising alternative, yet current EHR-grounded MAS methods still suffer from weak evidence differentiation across agents and redundant multi-round interaction. We propose D2MDT, a Department-aware MultiDisciplinary Team Consultation with Deliberation for Efficient clinical prediction. D2MDT first constructs structured EHR evidence and consultation-ready semantic evidence for multi-agent consultation. It then assigns patient-specific department perspectives to doctor agents and retrieves complementary evidence for collaborative consultation. To improve efficiency, D2MDT further introduces residual deliberation, which updates only unresolved consensus rather than replaying the full discussion history. Finally, D2MDT fuses the refined consensus report with structured EHR representations for prediction. Experiments on mortality prediction show that D2MDT improves both predictive performance and consultation efficiency. We release the code online to ease the reproducibility of this paper.

2026-06-02T12:07:56Z Preprint. 17 pages Yongqi Liang Qidong Liu Chunze Yang Lei Wu Jiusong Ge Ni Zhang Chen Li