https://arxiv.org/api/75iAfKIdXy1T10xcrgZpj/NULrA 2026-06-18T19:14:27Z 12677 420 15 http://arxiv.org/abs/2605.17036v3 Reliability and Effectiveness of Autonomous AI Agents in Supply Chain Management 2026-05-25T21:23:52Z

This paper studies autonomous generative AI agents in multi-echelon supply chains using the MIT Beer Game. We identify four inference-time levers that shape performance: model selection, policies and guardrails, centralized data sharing, and prompt engineering. Model capability is the dominant factor: an out-of-the-box reasoning model exceeds human-level performance, and optimized reasoning models reduce costs by up to 67% relative to human teams. However, strong average performance masks substantial reliability risks. We introduce agent bullwhip: the amplification of run-to-run decision instability in autonomous multi-echelon systems. A central component is decision bullwhip, the portion of order variability generated by stochastic agent decisions rather than by changes in customer demand. We show that decision instability can amplify both across facilities at a fixed point in time and within the same facility over time, even when the demand path is held fixed. Repeated sampling, a natural test-time remedy, fails to meaningfully reduce this instability, suggesting that reliability requires changing the underlying decision policy rather than merely averaging over model outputs. To address this limitation, we propose a Group Relative Policy Optimization (GRPO)-based reinforcement-learning post-training framework that trains a shared base LLM using system-level supply-chain rewards. Post-training substantially reduces tail events, curtails agent bullwhip, and improves the reliability of autonomous supply-chain agents.

2026-05-16T15:11:35Z Carol Xuan Long David Simchi-Levi Feng Zhu Huangyuan Su Andre P. Calmon Flavio P. Calmon http://arxiv.org/abs/2605.26302v1 Your Agents Are Aging Too: Agent Lifespan Engineering for Deployed Systems 2026-05-25T19:55:12Z

Long-lived AI agents are increasingly deployed as persistent operational systems, yet they are still evaluated like freshly initialized models. Day-one benchmarks miss a basic systems question: how long does an agent remain reliable after deployment? Even when model weights are frozen, an agent's effective state keeps changing as it compresses interaction history, retrieves from a growing memory store, revises facts after updates, and undergoes routine maintenance. Reliability therefore becomes a lifespan property of the full agent harness, not only a snapshot property of the base model. We introduce AgingBench, a longitudinal reliability benchmark for agent lifespan engineering: measuring not only whether deployed agents degrade, but what form the degradation takes and where repair should target. AgingBench organizes agent aging into four mechanisms: compression aging, interference aging, revision aging, and maintenance aging. To diagnose these failures, AgingBench uses temporal dependency graphs and paired counterfactual probes that produce diagnostic profiles for the write, retrieval, and utilization stages of the memory pipeline. Across 7 scenarios, 14 models, multiple memory policies, and both runner-controlled and autonomous agents, over ~400 runs spanning 8 - 200 sessions show that agent aging is not one-dimensional: behavioral tests can remain clean while factual precision decays; derived-state tracking can collapse sharply within a single model; and the same wrong answer can require different repairs depending on what the diagnostic profile points to. These results suggest that reliable agent deployment requires lifespan evaluation, mechanism-level diagnosis, and stage-targeted repair, not only stronger day-one models.

2026-05-25T19:55:12Z Jianing Zhu Yeonju Ro John Robertson Kevin Wang Junbo Li Haris Vikalo Aditya Akella Zhangyang Wang http://arxiv.org/abs/2604.07028v2 Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation 2026-05-25T19:22:50Z

Strategic interaction in adversarial domains such as law, diplomacy, and negotiation is mediated by language, yet most game-theoretic models abstract away the mechanisms of persuasion that operate through discourse. We present the Strategic Courtroom Framework, a multi-agent simulation environment in which prosecution and defense teams composed of trait-conditioned Large Language Model (LLM) agents engage in iterative, round-based legal argumentation. Agents are instantiated using nine interpretable traits organized into four archetypes, enabling systematic control over rhetorical style and strategic orientation. We evaluate the framework across 10 synthetic legal cases and 84 three-trait team configurations, totaling over 7{,}000 simulated trials using DeepSeek-R1 and Gemini~2.5~Pro. Our results show that heterogeneous teams with complementary traits consistently outperform homogeneous configurations, that moderate interaction depth yields more stable verdicts, and that certain traits (notably quantitative and charismatic) contribute disproportionately to persuasive success. We further introduce a reinforcement-learning-based Trait Orchestrator that dynamically generates defense traits conditioned on the case and opposing team, discovering strategies that outperform static, human-designed trait combinations. Together, these findings demonstrate how language can be treated as a first-class strategic action space and provide a foundation for building autonomous agents capable of adaptive persuasion in multi-agent environments.

2026-04-08T12:46:03Z Philipp D. Siedler http://arxiv.org/abs/2605.26286v1 Decoupled Delay Compensation: Enhancing Pre-trained MARL Policies via Learned Dynamics Filtering 2026-05-25T19:19:46Z

Real-world multi-agent reinforcement learning (MARL) systems must often operate under stale observations, stochastic communication delays, and intermittent packet loss. Policies trained under idealized synchronous conditions frequently exhibit significant performance degradation in these regimes because they act on outdated feedback. We propose a modular execution-stage state-estimation layer that replaces delayed communicated observations with current belief-state estimates. The framework integrates a learned Gated transition model with a recursive Kalman filtering layer to estimate instantaneous states from asynchronous measurements. A primary advantage of this approach is its modularity, The estimator serves as a plug-in for pre-trained policies, requiring no modifications to the original MARL training algorithm, architecture, or reward structure. Evaluation across diverse multi-agent and continuous-control benchmarks demonstrates that the proposed layer consistently enhances robustness to communication latency and message loss. The most significant performance gains are observed in coordination-intensive and dynamically unstable tasks where temporal consistency is critical for control.

2026-05-25T19:19:46Z 8 pages, 7 figures Maxim Mednikov Oren Gal http://arxiv.org/abs/2605.26239v1 Sentinel: Embodied Cooperative Spatial Reasoning and Planning 2026-05-25T18:04:41Z

In this work, we study Cooperative Spatial Intelligence, the ability of decentralized embodied agents to coordinate effectively under dynamic environmental constraints across city-scale outdoor domains. We introduce Sentinel Challenge, a benchmark where multiple decentralized embodied agents must communicate in natural language to agree on a mutually safe and convenient meeting point within large, city-scale outdoor environments. Each agent must then navigate safely while avoiding dynamic sentinels patrolling the area, using a tool that provides coarse spatial information. To address this, we propose CoSaR (Cooperative Spatial Reasoning and Planning), a framework that bridges the high-level communication and planning abilities of foundation models with the precision of classical spatial navigation algorithms. CoSaR enables agents to exchange situational updates, reason over evolving spatial constraints, and collaboratively replan trajectories. Evaluated across 14 city-level scenes with 3-5 agents, CoSaR consistently leads to faster gathering, shorter path lengths, and improved safety. Our results demonstrate that integrating dynamic communication with spatial reasoning is essential for robust multi-agent cooperation. By formalizing this new setting and providing a scalable benchmark, we aim to build a foundation for advancing cooperative spatial intelligence in embodied multi-agent systems. Code and challenge are available at https://github.com/UMass-Embodied-AGI/Sentinel.

2026-05-25T18:04:41Z The first two authors contributed equally Xiangye Lin Hongxin Zhang Ruxi Deng Qinhong Zhou Chuang Gan http://arxiv.org/abs/2605.26203v1 AgentSociety: Incentivizing Agentic Social Intelligence 2026-05-25T17:59:59Z

The success of deployed agents relies on their ability to handle open-ended user requests using their inherent capabilities, not only in solving requests directly but also in effectively leveraging inter-agent communication channels and feedback signals over time. This requires a multi-agent environment where agents can operate autonomously, strategically communicate, behave collaboratively and be driven by economic incentives, much like humans in society. Towards this vision, we propose $\mathtt{AgentSociety}$, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory. We show that $\mathtt{AgentSociety}$ provides an environment for agents to make autonomous decisions utilizing their local context to maximize their utility while achieving collective outcomes through incentivized collaboration. Specifically, we prove that delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus. Additionally, our mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, so as to garner influence. We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions. We compare and benchmark strategy profiles adopted by open and proprietary state-of-the-art language models deployed in $\mathtt{AgentSociety}$ against best response. Finally, we evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in $\mathtt{AgentSociety}$ on real-world datasets.

2026-05-25T17:59:59Z Aditya Vema Reddy Kesari Krishna Reddy Kesari http://arxiv.org/abs/2605.07524v3 Dynamic Representational Synchrony through Collective Predictive Coding: A Computational Model of Parent-Infant Homeostatic Co-Regulation 2026-05-25T17:07:53Z

Inter-brain synchrony (IBS) observed in real-time dyadic interactions, including parent-infant exchanges, suggests that two agents can align their internal representations through interaction. Yet computational accounts of how such alignment can arise between agents that have only local sensory access and asymmetric internal knowledge remain underdeveloped. We propose a constructive model of parent-infant homeostatic co-regulation that integrates a POMDP formulation of active interoceptive inference with the Metropolis-Hastings Naming Game (MHNG) derived from the Collective Predictive Coding (CPC) hypothesis. In our model, the parent and infant agents agree on homeostatic regulatory actions for the infant's visceral state through a shared communicative variable generated by a locally computable Metropolis-Hastings probability. The parent observes the infant through body-generated exteroceptive cues, whereas the infant directly senses its own visceral state through interoception. This difference in access modality is implemented as asymmetric generative-model knowledge: the parent knows how actions transform visceral states but must learn what the infant's bodily cues indicate, whereas the infant perceives its visceral state directly but must learn how actions affect it. We quantify the degree of representational alignment using the Jensen-Shannon divergence between the two agents' latent representations. Notably, this synchrony emerged far earlier than the generative-model convergence and was maintained despite heterogeneous generative-model knowledge, indicating that it does not require fully shared world models. These findings support CPC as a candidate computational framework for explaining how dynamic representational synchrony relevant to IBS can emerge through local interactions.

2026-05-08T09:54:40Z 11pages Yushi Tsubamoto Takato Horii http://arxiv.org/abs/2605.25746v1 Multi-Agent Coordination Adaptation via Structure-Guided Orchestration 2026-05-25T11:59:58Z

As large language model (LLM)-based multi-agent systems scale to handle increasingly complex tasks, balancing structural stability and dynamic adaptability becomes increasingly challenging. Existing systems typically adopt either structure-centric methods, committing to structures determined upfront that limit fine-grained control, or orchestration-centric methods, adapting decisions dynamically while leaving coordination structure implicit and unstable. To address this challenge, we revisit multi-agent coordination from a probabilistic perspective, casting it as posterior inference over the joint distribution of structure and orchestration. We introduce MACA, an automated coordination framework that learns a task- and budget-conditioned structural prior over agent participation and interactions. This prior guides a policy-based orchestration as an approximation to posterior inference, enabling efficient solutions with fine-grained control. Across benchmarks, MACA outperforms adaptive multi-agent baselines by an average of 8.42% while using 43.19% fewer tokens. Further investigation reveals that joint adaptation of structure and orchestration suppresses redundant interactions, converging coordination toward task-effective execution.

2026-05-25T11:59:58Z 21 pages Haoran Li Shulun Chen Shaoyuan Sun Hanchen Wang http://arxiv.org/abs/2605.25741v1 Collaborative Threat-Aware Autonomy (CTAA) 2026-05-25T11:54:10Z

Navigating teams of unmanned vehicles through environments containing dynamic, adversarial Weapon Engagement Zones~(WEZs) poses a fundamental challenge to mission success: a single vehicle, however capable its onboard guidance, remains a single point of failure. This paper presents a role-differentiated multi-agent framework for collaborative threat-aware trajectory planning in which a fleet of Autonomous Collaborative Platforms~(ACPs) is assigned distinct roles primary intercept, escort, and decoy to improve team-level mission success probability while managing individual WEZ exposure. Each ACP independently employs a reactive guidance law derived from the Collision Sphere Boundary for Evader Zero-Set~(CSBEZ), which accounts for pursuer maneuverability constraints imposed by minimum turn radius, and steers the vehicle toward the safest heading that also makes progress toward its goal. Role assignment and spatial route separation induce two complementary effects: probabilistic redundancy, in which $N$ independent paths raise the team success probability and threat saturation, in which lower-priority escorts and decoys draw adversary attention and free the primary vehicle to transit uncontested.

2026-05-25T11:54:10Z Rajnikant Sharma Abhinav Sinha Isaac Weintraub http://arxiv.org/abs/2605.25693v1 From Facts to Insights: A Persona-Driven Dual Memory Framework and Dataset for Role-Playing Agents 2026-05-25T10:48:24Z

While role-playing agents excel in short-term interactions, long-term conversations overwhelm context windows, motivating external memory frameworks. Current systems typically rely on persona-agnostic summarization, which records facts without persona-specific interpretation, yielding generic responses that compromise persona fidelity. To bridge this gap, we introduce RoleMemo, a dataset featuring four reasoning tasks where the factual fragments must be interpreted through the persona to reach the correct answer. Evaluation on RoleMemo exposes critical limitations of persona-agnostic frameworks. We thus propose DualMem, which decouples memory into two streams: factual cognition and persona-conditioned insight. Trained through Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), our framework with a 4B-parameter model outperforms zero-shot persona-agnostic frameworks powered by DeepSeek-V3.2 for sustained persona fidelity. Our resources are available at https://github.com/role2026/rolememo.

2026-05-25T10:48:24Z Preprint Rongsheng Zhang Ruofan Hu Weijie Chen Jiji Tang Junnan Ren Wanying Wu Xunuoyan Chen Tangjie Lv Tao Jin Zhou Zhao http://arxiv.org/abs/2605.25653v1 When Agents Control Robots: A Zero Trust Policy Model for Agentic Cyber-Physical Systems 2026-05-25T10:00:34Z

Multi-agent systems powered by large foundation models (LFMs) are increasingly deployed to control industrial robots through natural language, creating deployments in which security failures produce physical consequences. We analyse this threat landscape through Cobot-Claw, a deployed four-agent system for UR3e robotic arm control, and identify five attack classes specific to agentic cyber-physical systems. We propose ZTPM, a Zero Trust Policy Model comprising 25 typed primitives across five enforcement domains with Physical Impact Tiers as a runtime policy dimension. An empirical evaluation across 60 execution traces on two LFM backends provides initial evidence that actuation parameter selection is model-dependent and non-deterministic, motivating the need for policy-level enforcement at the physical actuation boundary.

2026-05-25T10:00:34Z 12 pages, 4 figures Tharindu Ranathunga Kavishka Fernando Susan Rea http://arxiv.org/abs/2606.07557v1 SPIN: Decentralized Swarm Control via Tensorized Policy Coordination 2026-05-25T07:45:51Z

Decentralized multi-agent swarm coordination on resource-constrained edge platforms remains fundamentally bottlenecked by the exponential scaling of joint action spaces and high-latency communication overhead. This paper introduces the Swarm Policy Interference Network (SPIN) framework, an architectural paradigm that bypasses these limitations by modeling swarm topologies as a compressed tensor network. We factorize the joint policy tensors of local multi-agent cliques into Matrix Product State (MPS) chains, reducing the computational complexity of evaluation from an exponential $O(n^m)$ wall to a strictly linear $O(m \cdot n \cdot χ^2)$ constraint. To bridge local continuous spatial geometry with this discrete algebraic backend without requiring power-intensive online training loops, we introduce a decoupled, hybrid neuro-symbolic control pipeline. Local multi-layered neural networks operate as structural coordination encoders, pre-trained offline to nonlinearly map hand-engineered geometric descriptors into abstract environmental target measures. At runtime, edge agents execute instantaneous behavioral adaptations by applying the Radon-Nikodým derivative directly as a zero-shot importance-reweighting filter. We validate the framework within a discrete-time multi-agent simulation sandbox spanning tracking, decentralized dispersion/area coverage, and multi-goal coordination regimes. Qualitative telemetry demonstrates that the integrated pipeline achieves stable target-directed motion, anti-collapse spatial spreading under decentralized constraints, and structured subgroup formation across multiple targets, providing a mathematically grounded route to tractable, low-power edge swarm intelligence.

2026-05-25T07:45:51Z 11 pages, 2 figures, 1 tables, 6 sections Zhaowen Fan http://arxiv.org/abs/2605.26178v1 ATOM: Instantiating Budget-Controllable Multi-Agent Collaboration via Nucleus-Electron Hierarchy 2026-05-25T06:41:11Z

Large Language Model (LLM)-based multi-agent systems rely on optimized collaboration topologies to balance performance and communication costs. However, current methods struggle with the inherent stability-extensibility trade-off and often misalign computational budgets with query difficulty. We propose \textsc{ATOM}, an adaptive framework that generates budget-controllable collaboration graphs via a novel task-driven reinforcement learning paradigm. Inspired by atomic structures, \textsc{ATOM} employs a nucleus-electron hierarchy: it maintains a stable, offline-learned collaboration backbone (the nucleus) while dynamically activating query-conditioned agents (electrons) during inference. Crucially, a complexity-aware budgeting strategy aligns resource consumption with task demands by estimating query difficulty to strictly regulate electron instantiation. Extensive experiments across six diverse benchmarks demonstrate that \textsc{ATOM} achieves state-of-the-art performance while improving token efficiency by up to $30\%$ compared to strong baselines.

2026-05-25T06:41:11Z Xinkui Zhao Sai Liu Yifan Zhang Qingyu Ma Zewen Lin Naibo Wang Guanjie Cheng Chang Liu Yueshen Xu http://arxiv.org/abs/2605.25440v1 A Multi-Agent LLM Framework for Rating the Quality of Surgical Feedback 2026-05-25T05:31:44Z

Verbal feedback delivered by attending surgeons in the operating room plays a critical formative role in resident trainee skill acquisition. Yet, assessing the quality of trainer feedback and its effectiveness in influencing trainee behavior during live surgery remains a challenge. Prior studies assessed feedback content relying on extensive manual annotation by expert human raters and focused on developing broad taxonomies that overlook the qualitative aspects of feedback delivery such as clarity or urgency. Limited existing automated methods, including keyword analysis and topic modeling, also fail to capture these nuanced aspects. We introduce a two-stage LLM-based framework that discovers interpretable feedback quality criteria grounded in the context of surgical training. Our method uses multi-agent prompting and surgical domain knowledge injection to discover a small set of human interpretable scoring criteria (e.g., Encouraging, Urgent, Clear). These criteria are then used to automatically score live surgical feedback via an LLM-as-a-judge approach. Evaluation on 4.2k trainer feedback instances demonstrates that our AI-discovered criteria outperform prior content-based frameworks in predicting feedback effectiveness, including observed trainee behavioral adjustments and trainer approval. This work advances scalable, human-aligned assessment of communication quality in the operating room and provides a foundation for improving surgical teaching practices.

2026-05-25T05:31:44Z 25 pages, 3 figures Rafal Kocielnik J. Everett Knudsen Steven Y. Cen Jasmine Lin Cherine H. Yang Atharva Deo Ujjwal Pasupulety Peter Wager Anima Anandkumar Andrew J. Hung http://arxiv.org/abs/2605.25431v1 Mode 0: A New 3GPP V2X Resource Allocation Category for Roadside Computing Unit-Assisted Safety Communication 2026-05-25T05:13:16Z

The 3GPP V2X resource allocation framework defines two entity classes -- the base station and the vehicle UE -- and four modes across LTE and NR generations. We demonstrate that this binary taxonomy is structurally incomplete. Base station-led scheduling saturates at high-density traffic nodes, producing latency-tail failures that persist even when mean packet delivery ratios approach the service-class target. UE autonomy is categorically incapable of pre-emergence warning for occluded traffic participants and insufficient for large-scope cascading environmental hazards. We propose Mode 0, a new 3GPP V2X category whose defining entity is the Roadside Computing Unit (RCU) -- an infrastructure ensemble integrating elevated sensing (Seeing), sidelink communication (Speaking), and local computational evaluation (Thinking), owned by traffic management authorities. Mode 0 defines a subfamily spectrum from Mode 0a (all-passive UEs, the guaranteed minimum) through Mode 0c (all-active UEs, the optimal target). Convergent deployment evidence from Chinese national standards (DB11/T 2329.1-2024, T/ITS 0224.1-2025), China Unicom RS-MEC infrastructure, and European and US C-V2X programs confirms that both institutional sides are converging on the roadside traffic node without a coordination standard. A fifteen-run Multi-Agent Proximal Policy Optimization (MAPPO) simulation validates the architectural family: Mode 0a in shared-pool baseline sits at the analytical symmetric-Nash coordination floor; Mode 0c with demand separation achieves strict Pareto improvement for both traffic classes (M0 PDR 0.999, M1 PDR 0.998 at $ρ_{\rm pool} \leq 1$) and lifts the worst-TTI delivery ratio from near-zero to 0.601 -- the only configuration satisfying the latency safety requirement structurally. We call for a 3GPP study item on Mode 0 within the NR-V2X sidelink enhancement work programme.

2026-05-25T05:13:16Z 13 pages, 7 figures, 4 tables. Submitted to IEEE Transactions on Intelligent Transportation Systems Dewei Jiang Nantong University Xiang Gu Nantong University