https://arxiv.org/api/uKo82nTqPPKimBqVPwILFOtus2w 2026-03-22T14:40:23Z 11553 75 15 http://arxiv.org/abs/2510.06436v2 R3R: Decentralized Multi-Agent Collision Avoidance with Infinite-Horizon Safety 2026-03-15T16:11:15Z Existing decentralized methods for multi-agent motion planning lack formal, infinite-horizon safety guarantees, especially for communication-constrained systems. We present R3R which, to our knowledge, is the first decentralized and asynchronous framework for multi-agent motion planning under range-limited communication constraints with infinite-horizon safety guarantees for systems of nonlinear agents. R3R's novelty lies in combining our gatekeeper safety framework with a geometric constraint termed R-Boundedness, which together establish a formal link between an agent's communication radius and its ability to plan safely. We constrain trajectories to lie within a fixed planning radius, determined by a function of the agent's communication radius. This enables trajectories to be certified as provably safe for all time using only local information. Our algorithm is fully asynchronous, and ensures the forward invariance of these guarantees even in time-varying networks where agents asynchronously join and replan. We evaluate our approach in simulations of up to 128 Dubins vehicles, validating our theoretical safety guarantees in dense, obstacle-rich scenarios. We further show that R3R's computational complexity scales with local agent density rather than problem size, providing a practical solution for scalable and provably safe multi-agent systems. 2025-10-07T20:13:49Z 8 pages, LaTeX; submitted to the American Control Conference (ACC) 2026 Thomas Marshall Vielmetti Devansh R. Agrawal Dimitra Panagou http://arxiv.org/abs/2504.12961v5 QLLM: Do We Really Need a Mixing Network for Credit Assignment in Multi-Agent Reinforcement Learning? 2026-03-15T14:37:36Z Credit assignment remains a fundamental challenge in multi agent reinforcement learning (MARL) and is commonly addressed through value decomposition under the centralized training with decentralized ex ecution (CTDE) paradigm. However, existing value decomposition meth ods typically rely on predefined mixing networks that require additional training, often leading to imprecise credit attribution and limited in terpretability. We propose QLLM, a novel framework that leverages large language models (LLMs) to construct training-free credit assign ment functions (TFCAFs), where the TFCAFs are nonlinear with re spect to the global state and offer enhanced interpretability while intro ducing no extra learnable parameters. A coder-evaluator framework is employed to ensure the correctness and executability of the generated code. Extensive experiments on standard MARL benchmarks demon strate that QLLM consistently outperforms baselines while requiring fewer learnable parameters. Furthermore, it demonstrates generalization across a broad set of value decomposition algorithms. Code is available at https://github.com/MaoMaoLYJ/pymarl-qllm. 2025-04-17T14:07:11Z Yuanjun Li Zhouyang Jiang Bin Zhang Mingchao Zhang Junhao Zhao Zhiwei Xu http://arxiv.org/abs/2510.05174v3 Emergent Coordination in Multi-Agent Language Models 2026-03-15T14:32:20Z When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members. 2025-10-05T11:26:41Z Christoph Riedl http://arxiv.org/abs/2410.16686v3 SERN: Bandwidth-Adaptive Cross-Reality Synchronization for Simulation-Enhanced Robot Navigation 2026-03-15T12:08:53Z Cross reality integration of simulation and physical robots is a promising approach for multi-robot operations in contested environments, where communication may be intermittent, interference may be present, and observability may be degraded. We present SERN (Simulation-Enhanced Realistic Navigation), a framework that tightly couples a high-fidelity virtual twin with physical robots to support real-time collaborative decision making. SERN makes three main contributions. First, it builds a virtual twin from geospatial and sensor data and continuously corrects it using live robot telemetry. Second, it introduces a physics-aware synchronization pipeline that combines predictive modeling with adaptive PD control. Third, it provides a bandwidth-adaptive ROS bridge that prioritizes critical topics when communication links are constrained. We also introduce a multi-metric cost function that balances latency, reliability, computation, and bandwidth. Theoretically, we show that when the adaptive controller keeps the physical and virtual input mismatch small, synchronization error remains bounded under moderate packet loss and latency. Empirically, SERN reduces end-to-end message latency by 15% to 25% and processing load by about 15% compared with a standard ROS setup, while maintaining tight real-virtual alignment with less than 5 cm positional error and less than 2 degrees rotational error. In a navigation task, SERN achieves a 95% success rate, compared with 85% for a real-only setup and 70% for a simulation-only setup, while also requiring fewer interventions and less time to reach the goal. These results show that a simulation-enhanced cross-reality stack can improve situational awareness and multi-agent coordination in contested environments by enabling look-ahead planning in the virtual twin while using real sensor feedback to correct discrepancies. 2024-10-22T04:35:57Z Jumman Hossain Emon Dey Snehalraj Chugh Masud Ahmed MS Anwar Abu-Zaher Faridee Jason Hoppes Theron Trout Anjon Basak Rafidh Chowdhury Rishabh Mistry Hyun Kim Jade Freeman Niranjan Suri Adrienne Raglin Carl Busart Anuradha Ravi Nirmalya Roy http://arxiv.org/abs/2603.14312v1 Autonomous Agents Coordinating Distributed Discovery Through Emergent Artifact Exchange 2026-03-15T10:06:57Z We present ScienceClaw + Infinite, a framework for autonomous scientific investigation in which independent agents conduct research without central coordination, and any contributor can deploy new agents into a shared ecosystem. The system is built around three components: an extensible registry of over 300 interoperable scientific skills, an artifact layer that preserves full computational lineage as a directed acyclic graph (DAG), and a structured platform for agent-based scientific discourse with provenance-aware governance. Agents select and chain tools based on their scientific profiles, produce immutable artifacts with typed metadata and parent lineage, and broadcast unsatisfied information needs to a shared global index. The ArtifactReactor enables plannerless coordination: peer agents discover and fulfill open needs through pressure-based scoring, while schema-overlap matching triggers multi-parent synthesis across independent analyses. An autonomous mutation layer actively prunes the expanding artifact DAG to resolve conflicting or redundant workflows, while persistent memory allows agents to continuously build upon complex epistemic states across multiple cycles. Infinite converts these outputs into auditable scientific records through structured posts, provenance views, and machine-readable discourse relations, with community feedback steering subsequent investigation cycles. Across four autonomous investigations, peptide design for the somatostatin receptor SSTR2, lightweight impact-resistant ceramic screening, cross-domain resonance bridging biology, materials, and music, and formal analogy construction between urban morphology and grain-boundary evolution, the framework demonstrates heterogeneous tool chaining, emergent convergence among independently operating agents, and traceable reasoning from raw computation to published finding. 2026-03-15T10:06:57Z Fiona Y. Wang Lee Marom Subhadeep Pal Rachel K. Luu Wei Lu Jaime A. Berkovich Markus J. Buehler http://arxiv.org/abs/2603.14265v1 MedPriv-Bench: Benchmarking the Privacy-Utility Trade-off of Large Language Models in Medical Open-End Question Answering 2026-03-15T07:47:35Z Recent advances in Retrieval-Augmented Generation (RAG) have enabled large language models (LLMs) to ground outputs in clinical evidence. However, connecting LLMs with external databases introduces the risk of contextual leakage: a subtle privacy threat where unique combinations of medical details enable patient re-identification even without explicit identifiers. Current benchmarks in healthcare heavily focus on accuracy, ignoring such privacy issues, despite strict regulations like Health Insurance Portability and Accountability Act (HIPAA) and General Data Protection Regulation (GDPR). To fill this gap, we present MedPriv-Bench, the first benchmark specifically designed to jointly evaluate privacy preservation and clinical utility in medical open-ended question answering. Our framework utilizes a multi-agent, human-in-the-loop pipeline to synthesize sensitive medical contexts and clinically relevant queries that create realistic privacy pressure. We establish a standardized evaluation protocol leveraging a pre-trained RoBERTa-Natural Language Inference (NLI) model as an automated judge to quantify data leakage, achieving an average of 85.9% alignment with human experts. Through an extensive evaluation of 9 representative LLMs, we demonstrate a pervasive privacy-utility trade-off. Our findings underscore the necessity of domain-specific benchmarks to validate the safety and efficacy of medical AI systems in privacy-sensitive environments. 2026-03-15T07:47:35Z 17 pages, 5 figures Shaowei Guan Yu Zhai Hin Chi Kwok Jiawei Du Xinyu Feng Jing Li Harry Qin Vivian Hui http://arxiv.org/abs/2302.00797v3 Combining Tree-Search, Generative Models, and Nash Bargaining Concepts in Game-Theoretic Reinforcement Learning 2026-03-15T07:01:36Z Opponent modeling methods typically involve two crucial steps: building a belief distribution over opponents' strategies, and exploiting this opponent model by playing a best response. However, existing approaches typically require domain-specific heurstics to come up with such a model, and algorithms for approximating best responses are hard to scale in large, imperfect information domains. In this work, we introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning. We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS) with a learned deep generative model that samples world states during planning. This new method scales to large imperfect information domains and can be plug and play in a variety of multiagent algorithms. We use this new method under the framework of Policy Space Response Oracles (PSRO), to automate the generation of an \emph{offline opponent model} via iterative game-theoretic reasoning and population-based training. We propose using solution concepts based on bargaining theory to build up an opponent mixture, which we find identifying profiles that are near the Pareto frontier. Then GenBR keeps updating an \emph{online opponent model} and reacts against it during gameplay. We conduct behavioral studies where human participants negotiate with our agents in Deal-or-No-Deal, a class of bilateral bargaining games. Search with generative modeling finds stronger policies during both training time and test time, enables online Bayesian co-player prediction, and can produce agents that achieve comparable social welfare and Nash bargaining score negotiating with humans as humans trading among themselves. 2023-02-01T23:06:23Z Accepted by IJCAI'25 main track Proc. 34th Int. Joint Conf. Artif. Intell. (IJCAI 2025), pp. 161-169 Zun Li Marc Lanctot Kevin R. McKee Luke Marris Ian Gemp Daniel Hennes Paul Muller Kate Larson Yoram Bachrach Michael P. Wellman 10.24963/ijcai.2025/19 http://arxiv.org/abs/2603.11445v2 Verified Multi-Agent Orchestration: A Plan-Execute-Verify-Replan Framework for Complex Query Resolution 2026-03-15T05:39:42Z We present Verified Multi-Agent Orchestration (VMAO), a framework that coordinates specialized LLM-based agents through a verification-driven iterative loop. Given a complex query, our system decomposes it into a directed acyclic graph (DAG) of sub-questions, executes them through domain-specific agents in parallel, verifies result completeness via LLM-based evaluation, and adaptively replans to address gaps. The key contributions are: (1) dependency-aware parallel execution over a DAG of sub-questions with automatic context propagation, (2) verification-driven adaptive replanning that uses an LLM-based verifier as an orchestration-level coordination signal, and (3) configurable stop conditions that balance answer quality against resource usage. On 25 expert-curated market research queries, VMAO improves answer completeness from 3.1 to 4.2 and source quality from 2.6 to 4.1 (1-5 scale) compared to a single-agent baseline, demonstrating that orchestration-level verification is an effective mechanism for multi-agent quality assurance. 2026-03-12T02:10:10Z ICLR 2026 Workshop on MALGAI Xing Zhang Yanwei Cui Guanghui Wang Wei Qiu Ziyuan Li Fangwei Han Yajing Huang Hengzhi Qiu Bing Zhu Peiyang He http://arxiv.org/abs/2603.14206v1 Understanding Strategic Platform Entry and Seller Exploration: A Stackelberg Model 2026-03-15T03:40:44Z Online market platforms play an increasingly powerful role in the economy. An empirical phenomenon is that platforms, such as Amazon, Apple, and DoorDash, also enter their own marketplaces, imitating successful products developed by third-party sellers. We formulate a Stackelberg model, where the platform acts as the leader by committing to an entry policy: when will it enter and compete on a product? We study this model through a theoretical and computational framework. We begin with a single seller, and consider different kinds of policies for entry. We characterize the seller's optimal explore-exploit strategy via a Gittins-index policy, and give an algorithm to compute the platform's optimal entry policy. We then consider multiple sellers, to account for competition and information spillover. Here, the Gittins-index characterization fails, and we employ deep reinforcement learning to examine seller equilibrium behavior. Our findings highlight the incentives that drive platform entry and seller innovation, consistent with empirical evidence from markets such as Amazon and Google Play, with implications for regulatory efforts to preserve innovation and market diversity. 2026-03-15T03:40:44Z 12 pages, 3 figures, Accepted to The Web Conference (WWW) 2026 Garrett Seo Xintong Wang David C. Parkes http://arxiv.org/abs/2603.14141v1 Chance-Constrained Correlated Equilibria for Robust Noncooperative Coordination 2026-03-14T22:15:13Z Correlated equilibria enable a coordinator to influence the self-interested agents by recommending actions that no player has an incentive to deviate from. However, the effectiveness of this mechanism relies on accurate knowledge of the agents' cost structures. When cost parameters are uncertain, the recommended actions may no longer be incentive compatible, allowing agents to benefit from deviating from them. We study a chance-constrained correlated equilibrium problem formulation that accounts for uncertainty in agents' costs and guarantees incentive compatibility with a prescribed confidence level. We derive sensitivity results that quantify how uncertainty in individual incentive constraints affects the expected coordination outcome. In particular, the analysis characterizes the value of information by relating the marginal benefit of reducing uncertainty to the dual sensitivities of the incentive constraints, providing guidance on which sources of uncertainty should be prioritized for information acquisition. The results further reveal that increasing the confidence level is not always beneficial and can introduce a tradeoff between robustness and system efficiency. Numerical experiments demonstrate that the proposed framework maintains coordination performance in uncertain environments and are consistent with the theoretical insights developed in the analysis. 2026-03-14T22:15:13Z Jaehan Im Ufuk Topcu David Fridovich-Keil http://arxiv.org/abs/2603.04421v2 Do Mixed-Vendor Multi-Agent LLMs Improve Clinical Diagnosis? 2026-03-14T19:38:07Z Multi-agent large language model (LLM) systems have emerged as a promising approach for clinical diagnosis, leveraging collaboration among agents to refine medical reasoning. However, most existing frameworks rely on single-vendor teams (e.g., multiple agents from the same model family), which risk correlated failure modes that reinforce shared biases rather than correcting them. We investigate the impact of vendor diversity by comparing Single-LLM, Single-Vendor, and Mixed-Vendor Multi-Agent Conversation (MAC) frameworks. Using three doctor agents instantiated with o4-mini, Gemini-2.5-Pro, and Claude-4.5-Sonnet, we evaluate performance on RareBench and DiagnosisArena. Mixed-vendor configurations consistently outperform single-vendor counterparts, achieving state-of-the-art recall and accuracy. Overlap analysis reveals the underlying mechanism: mixed-vendor teams pool complementary inductive biases, surfacing correct diagnoses that individual models or homogeneous teams collectively miss. These results highlight vendor diversity as a key design principle for robust clinical diagnostic systems. 2026-02-14T18:42:58Z Accepted as Oral at the EACL 2026 Workshop on Healthcare and Language Learning (HeaLing) Grace Chang Yuan Xiaoman Zhang Sung Eun Kim Pranav Rajpurkar http://arxiv.org/abs/2508.13459v4 Multi-Robot Navigation in Social Mini-Games: Definitions, Taxonomy, and Algorithms 2026-03-14T19:18:57Z The "Last Mile Challenge" has long been considered an important, yet unsolved, challenge for autonomous vehicles, public service robots, and delivery robots. A central issue in this challenge is the ability of robots to navigate constrained and cluttered environments that have high agency (e.g., doorways, hallways, corridor intersections), often while competing for space with other robots and humans. We refer to these environments as "Social Mini-Games" (SMGs). Traditional navigation approaches designed for MRN do not perform well in SMGs, which has led to focused research on dedicated SMG solvers. However, publications on SMG navigation research make different assumptions, and have different objective functions (safety versus liveness). These assumptions and objectives are sometimes implicitly assumed or described informally. This makes it difficult to establish appropriate baselines for comparison in research papers, as well as making it difficult for practitioners to find the papers relevant to their concrete application. Such ad-hoc representation of the field also presents a barrier to new researchers wanting to start research in this area. SMG navigation research requires its own taxonomy, definitions, and evaluation protocols to guide effective research moving forward. This survey is the first to catalog SMG solvers using a well-defined and unified taxonomy and to classify existing methods accordingly. It also discusses the essential properties of SMG solvers, defines what SMGs are and how they appear in practice, outlines how to evaluate SMG solvers, and highlights the differences between SMG solvers and general navigation systems. The survey concludes with an overview of future directions and open challenges in the field. Our project is open-sourced at https://socialminigames.github.io/{https://socialminigames.github.io/. 2025-08-19T02:33:15Z Accepted for publication in Autonomous Robots 2026 Rohan Chandra Shubham Singh Wenhao Luo Katia Sycara http://arxiv.org/abs/2603.14066v1 A Benchmark for Multi-Party Negotiation Games from Real Negotiation Data 2026-03-14T18:12:06Z Many real-world multi-party negotiations unfold as sequences of binding, action-level commitments rather than a single final outcome. We introduce a benchmark for this under-studied regime featuring a configurable game generator that sweeps key structural properties such as incentive alignment, goal complexity, and payoff distribution. To evaluate decision-making, we test three value-function approximations - myopic reward, an optimistic upper bound, and a pessimistic lower bound - that act as biased lenses on deal evaluation. Through exact evaluation on small games and comparative evaluation on large, document-grounded instances derived from the Harvard Negotiation Challenge, we map the strategic regimes where each approximation succeeds or fails. We observe that different game structures demand different valuation strategies, motivating agents that learn robust state values and plan effectively over long horizons under binding commitments and terminal only rewards. 2026-03-14T18:12:06Z Leo Benac Jonas Raedler Zilin Ma Finale Doshi-Velez http://arxiv.org/abs/2603.13890v1 Beyond Self-Interest: Modeling Social-Oriented Motivation for Human-like Multi-Agent Interactions 2026-03-14T10:57:59Z Large Language Models (LLMs) demonstrate significant potential for generating complex behaviors, yet most approaches lack mechanisms for modeling social motivation in human-like multi-agent interaction. We introduce Autonomous Social Value-Oriented agents (ASVO), where LLM-based agents integrate desire-driven autonomy with Social Value Orientation (SVO) theory. At each step, agents first update their beliefs by perceiving environmental changes and others' actions. These observations inform the value update process, where each agent updates multi-dimensional desire values through reflective reasoning and infers others' motivational states. By contrasting self-satisfaction derived from fulfilled desires against estimated others' satisfaction, agents dynamically compute their SVO along a spectrum from altruistic to competitive, which in turn guides activity selection to balance desire fulfillment with social alignment. Experiments across School, Workplace, and Family contexts demonstrate substantial improvements over baselines in behavioral naturalness and human-likeness. These findings show that structured desire systems and adaptive SVO drift enable realistic multi-agent social simulations. 2026-03-14T10:57:59Z 9 pages, 6 figures. Accepted to AAMAS 2026 (Oral) Jingzhe Lin Ceyao Zhang Yaodong Yang Yizhou Wang Song-Chun Zhu Fangwei Zhong http://arxiv.org/abs/2603.13876v1 How do Role Models Shape Collective Morality? Exemplar-Driven Moral Learning in Multi-Agent Simulation 2026-03-14T10:23:29Z Do We Need Role Models? How do Role Models Shape Collective Morality? To explore the questions, we build a multi-agent simulation powered by a Large Language Model, where agents with diverse intrinsic drives, ranging from cooperative to competitive, interact and adapt through a four-stage cognitive loop (plan-act-observe-reflect). We design four experimental games (Alignment, Collapse, Conflict, and Construction) and conduct motivational ablation studies to identify the key drivers of imitation. The results indicate that identity-driven conformity can powerfully override initial dispositions. Agents consistently adapt their values to align with a perceived successful exemplar, leading to rapid value convergence. 2026-03-14T10:23:29Z Junjie Liao Huacong Tang Zhou Ziheng Yizhou Wang Fangwei Zhong