https://arxiv.org/api/zW+p14PxrjSdM8A9KFegsPSwG6c2026-06-18T10:42:30Z1267730015http://arxiv.org/abs/2606.01170v1Coordinating Task Switching in a Robotics Multi-Agent System Using Behavior Trees2026-05-31T11:22:16ZThe application of multi-agent systems in robotics is a very challenging field. Several competitions involving such systems are proposed to foster research and development of strategies and mechanisms using games as the underlying domain. Among them are the ones from the \textit{IEEE Very Small Soccer (VSSS)} category, which is the case study described in this paper. In VSSS, two teams of three robots each compete in a very dynamic environment of a soccer game. Thus, coordination of robots' behavior during the game is crucial to win it. In this paper, we present a Behavior-Tree-based approach to support multi-robot coordination within the VSSS team of the ThundeRatz robotics team from the Universidade de S$\tilde{a}$o Paulo. Moreover, a comparison between the proposed approach and the previous one, which was based on a Finite State Machine (FSM), was conducted using the FIRASim simulator. Besides that, the performance of this new strategy was further evaluated in an academic robotics competition.2026-05-31T11:22:16Z7 pages, 7 figures. Preprint of a manuscript submitted to the XXVI Congresso Brasileiro de Automática (CBA 2026)Lucas HaugAnarosa Alves Franco BrandãoArthur Casalshttp://arxiv.org/abs/2512.16167v3Ev-Trust: An Evolutionarily Stable Trust Mechanism for Decentralized LLM-Based Multi-Agent Service Economies2026-05-31T10:13:46ZDecentralized LLM-based multi-agent service economies face three vulnerabilities that undermine traditional trust mechanisms: reduced cost of fraud, difficulty in evaluating service quality, and instability of service content. These compounding vulnerabilities can trigger population-level trust collapse and the proliferation of short-sighted strategies. We propose Ev-Trust, an evolutionarily stable trust mechanism that addresses these vulnerabilities through three targeted designs: a cross-validation gate leveraging requestor semantic comprehension to assess response validity, a variance-standardized drift measure filtering endogenous stochasticity from genuine behavioral anomalies, and an embedding of trust signals into the expected revenue function that converts trustworthiness into an evolutionary survival advantage. Based on replicator dynamics with a noisy best response micro-foundation, we prove the asymptotic stability of cooperative evolutionarily stable strategies and derive explicit threshold conditions for maintaining cooperative equilibria. We evaluate Ev-Trust through 100-round simulations with at least 100 heterogeneous LLM-driven agents covering seven behavioral types. The experiments are conducted on TruthfulQA and TriviaQA, two factual question-answering benchmarks. Compared to baselines based on transitive trust aggregation, reinforcement-learning reputation, and pure evolutionary imitation, Ev-Trust reduces malicious agent participation by approximately 60%, suppresses the fraudulent service rate by approximately 50%, and maintains stable trust differentiation under a 30% adversarial mutation. These results demonstrate that coupling semantic trust evaluation with evolutionary incentives provides a principled foundation for securing cooperation in decentralized LLM-based multi-agent systems.2025-12-18T04:39:13Z19 pages, 9 figuresJiye WangShiduo YangTing QiaoJiayu QinJianbin LiYu WangYuanhe Zhaohttp://arxiv.org/abs/2606.13696v1AGORA: Can Deliberation and Governance Gates Absorb Participation Bias in Transit Planning?2026-05-31T08:00:37ZTransit network design depends not only on the optimization algorithm but also on who shows up to the public hearing. Current practice often collects one-directional comments from self-selected attendees, leaving participant mix as an uncontrolled source of outcome variation. We present AGORA, a framework that holds the network, demand, and solver fixed while systematically varying meeting composition through stakeholder agents, structured deliberation, and governance gates. Across two standard benchmark networks at different scales, we find that (i) aggregate outcomes vary little across compositions, but on tail risk and fairness disparity, representative sampling still tends to outperform skewed compositions; (ii) without deliberation, composition produces no variation at all, showing that deliberation is the mechanism through which who attends affects outcomes; and (iii) governance gates compress cross-profile variance without shifting the average outcome on Mandl, but low acceptance on Mumford0 shows thresholds require instance-specific calibration. These findings reframe participation bias from an uncontrollable input to a process-design problem: even without guaranteed representative attendance, well-structured deliberation and governance criteria can substantially reduce how much outcomes depend on who is in the room.2026-05-31T08:00:37ZJung-Hoon ChoCathy Wuhttp://arxiv.org/abs/2503.24183v3Scalable Ride-Sourcing Vehicle Rebalancing with Service Accessibility Guarantee: A Constrained Mean-Field Reinforcement Learning Approach2026-05-31T02:42:34ZThe expansion of ride-sourcing services such as Uber and Lyft has reshaped urban transportation by offering flexible, on-demand mobility via mobile applications. Despite convenience, these platforms confront significant operational challenges, particularly vehicle rebalancing-strategic repositioning of a fleet of vehicles to address spatiotemporal mismatches in supply and demand. Inadequate rebalancing results in prolonged rider waiting times and inefficient vehicle utilization, but also leads to fairness issues, such as the inequitable distribution of service and disparities in driver income. To tackle these, we introduce continuous-state mean-field control (MFC) and mean-field reinforcement learning (MFRL) models with continuous repositioning actions. MFC and MFRL offer scalable solutions by modeling each vehicle's behavior through interaction with the vehicle distribution, rather than with individual vehicles. This mitigates the curse of dimensionality with respect to the number of agents, enabling coordination across large fleets with significantly reduced computational complexity and eliminating the need to retrain the model when fleet size changes. To ensure equitable service access across geographic regions, we integrate an accessibility constraint into models and derive rebalancing policies that strike a balance between high fulfillment of rider demand and fair coverage of vehicle supply. Extensive evaluation using data-driven simulation of Shenzhen demonstrates the efficiency and robustness of our approach. Remarkably, it scales to tens of thousands of vehicles, with training times comparable to linear programming rebalancing. Besides, our policies effectively explore the efficiency-equity Pareto front, outperforming conventional benchmarks across key metrics like fleet utilization, fulfilled requests, and pickup distance, while ensuring equitable service access.2025-03-31T15:00:11Z34 pages, 15 figuresTransportation Research Part C: Emerging Technologies, Vol. 188, 105705 (2026)Matej JusupKenan ZhangZhiyuan HuBarna PásztorAndreas KrauseFrancesco Cormanhttp://arxiv.org/abs/2606.00953v1When Parallelism Pays Off: Cohesion-Aware Task Partitioning for Multi-Agent Coding2026-05-31T02:10:12ZMulti-agent Large Language Model (LLM) systems offer a way to decompose complex tasks, such as coding, through parallelization and context isolation. However, adding agents in practice introduces inter-agent communication overhead, which incurs extra cost and can sometimes offset the efficiency gains. We formalize multi-agent orchestration as a graph partitioning problem that captures the communication-to-computation trade-off: task decomposition can shorten critical-path computation, but cross-agent dependencies require costly context transfer. We instantiate this view in repository-level software engineering and present Cohesion-aware Coder (Co-Coder), which builds dependency graphs from static analysis, isolates structural hub files, partitions the graph via community detection, and executes the partition with a dependency-aware scheduler.
Across 28 real-world tasks on DevEval and CodeProjectEval, Co-Coder advances the Pareto-frontier over sequential and file-based parallel baselines as well as Claude Code with Agent Teams, lifting pass rate by up to 14.0%, achieving up to a 2.10x wall-clock speedup, and reducing API cost by up to 35%, with the largest gains on the most dependency-dense projects. Co-coder demonstrates how cohesion-aware orchestration can make parallel coding agents both theoretically grounded and practically efficient, suggesting a broader design principle for multi-agent systems.2026-05-31T02:10:12ZXu YangLunyiu NieEthan ChandraStanislav GannutinFangru LinSwarat Chaudhurihttp://arxiv.org/abs/2606.00939v1FinCom: A Financial Multi-Agent Demo with Disagree-or-Commit Deliberation2026-05-31T00:53:05ZMulti-agent systems powered by large language models (LLMs) are increasingly used for financial analysis and decision support. However, existing coordination schemes, especially those emphasizing consensus or debate, are vulnerable to sycophancy: agents conform to peer reasoning instead of evidence, leading to premature agreement and degraded outcomes. We introduce FinCom (Financial Committee), a governed multi-agent framework and interactive system that operationalizes the Disagree-or-Commit (DoC) protocol to embed structured dissent into financial AI committees. A central Supervisor orchestrates three ReAct-enabled specialist agents: Research, Quantitative, and Risk. Each agent is equipped with role-specific tools for retrieval, computation, and stress testing. During deliberation, agents must either explicitly critique or commit to their peers' reasoning before converging on a unified recommendation.
This demonstration showcases how FinCom supports committee-style financial analysis through coordinated multi-agent interaction, including structured report generation and interactive decision support. Evaluated across the most recent financial agent benchmark, in addition to 90 internal handcrafted financial tasks using an LLM-as-a-Judge protocol, DoC improves reasoning accuracy and risk awareness significantly over a consensus-seeking baseline on both an in-house and external evaluation set. By reframing disagreement as a governance primitive rather than noise, FinCom offers a lightweight, prompt-only recipe for improving accountability, transparency, and epistemic robustness in agentic financial systems.2026-05-31T00:53:05ZChao Peter YangZixiao TanKaisen YaoZiyu ZhouEleanor JiangMichael Wuhttp://arxiv.org/abs/2606.00825v1SuperMemory-VQA: An Egocentric Visual Question-Answering Benchmark for Long-Horizon Memory2026-05-30T17:53:10ZAI glasses present a compelling platform for AI agents to serve as personalized memory assistants. To be genuinely useful, such systems must move beyond short-term video comprehension and address memory gaps that humans experience for practical, personal, or social purposes over longitudinal egocentric video streams. However, existing egocentric datasets predominantly focus on action recognition or generic QAs from short clips, measuring perceptual capabilities rather than realistic human memory needs. We introduce SuperMemory-VQA, an egocentric visual question answering (VQA) dataset for evaluating AI assistants on practical, long-horizon memory tasks. It contains 52.9 hours of everyday activities recorded with AI glasses, including synchronized RGB video, audio transcription, eye gaze, IMU, and SLAM trajectories. Through a human-verified annotation pipeline, we construct grounded 4,853 question-answer pairs that span object and location memory, intent recall, visual scene recall, timeline reconstruction, conversational memory, and in-context retrieval. Each question is posed as multiple-choice with an explicit "unanswerable" option to test hallucination robustness. Benchmarking leading agentic frameworks and LLM backbones reveals that existing systems remain far from reliable on real-world memory tasks, highlighting the need for new architectures for grounded AI memory that can answer only when evidence is sufficient. A participant survey further supports that our questions are realistic, useful, and aligned with everyday memory needs.2026-05-30T17:53:10Z34 pages, 21 figures, 5 tablesSamiul AlamShakhrul Iman SiamMichael J. ProulxJames FortRichard NewcombeHyo Jin KimMi Zhanghttp://arxiv.org/abs/2509.19246v2Proactive-reactive detection and mitigation of intermittent faults in robot swarms2026-05-30T12:30:36ZIntermittent faults are transient errors that sporadically appear and disappear. Although intermittent faults pose substantial challenges to reliability and coordination, existing studies of fault tolerance in robot swarms focus instead on permanent faults. One reason for this is that intermittent faults are prohibitively difficult to detect in the fully self-organized ad-hoc networks typical of robot swarms, as their network topologies are transient and often unpredictable. However, in the recently introduced self-organizing nervous systems (SoNS) approach, robot swarms are able to self-organize persistent network structures for the first time, easing the problem of detecting intermittent faults. To address intermittent faults in robot swarms that have persistent networks, we propose a novel proactive-reactive strategy to detection and mitigation, based on self-organized backup layers and distributed consensus in a multiplex network. Proactively, the robots self-organize dynamic backup paths before faults occur, adapting to changes in the primary network topology and the robots' relative positions. Reactively, robots use one-shot likelihood ratio tests to compare information received along different paths in the multiplex network, enabling early fault detection. Upon detection, communication is temporarily rerouted in a self-organized way, until the detected fault resolves. We validate the approach in representative scenarios of faulty positional data occurring during formation control, demonstrating that intermittent faults are prevented from disrupting convergence to desired formations, with high fault detection accuracy and low rates of false positives.2025-09-23T17:06:52ZSinan OğuzEmanuele GaroneMarco DorigoMary Katherine Heinrichhttp://arxiv.org/abs/2606.00655v1Scaling Behavior of Single LLM-Driven Multi-Agent Systems2026-05-30T09:57:49ZThe burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.2026-05-30T09:57:49ZJialing LiZhouhong GuYin CaiHongwei Fenghttp://arxiv.org/abs/2504.16129v5MARFT: Multi-Agent Reinforcement Fine-Tuning2026-05-30T09:48:39ZLarge Language Model (LLM)-based Multi-Agent Systems (LaMAS) have demonstrated strong capabilities on complex agentic tasks requiring multifaceted reasoning and collaboration, from high-quality presentation generation to scientific research. Meanwhile, Reinforcement Learning (RL) is widely recognized for enhancing agent intelligence, but limited work has studied fine-tuning LaMAS with foundational RL techniques. Directly applying conventional Multi-Agent Reinforcement Learning (MARL) to LaMAS also introduces major challenges due to the unique mechanisms of LaMAS. To address these challenges, this article presents a comprehensive study of LLM-based MARL and proposes Multi-Agent Reinforcement Fine-Tuning (MARFT). We introduce Flex-MG, a new Markov Game formulation aligned with real-world LaMAS optimization, together with a universal algorithmic framework tailored to LaMAS. We review the evolution from traditional RL to Reinforcement Fine-Tuning (RFT), then analyze the multi-agent counterpart. For LaMAS, we identify key differences between classical MARL and MARFT, including asynchronous agent interactions, profile-aware agent design, and heterogeneous architectures. These differences motivate a LaMAS-oriented formulation of RFT. We present a robust and scalable MARFT framework, detail its modular algorithm, and provide an open-source implementation to support adoption and further research. The paper further discusses application perspectives and open challenges, including dynamic environment modeling, sample inefficiency, and the lack of cohesive frameworks. By connecting theoretical foundations with practical methodology, this work aims to serve as a roadmap for advancing MARFT toward resilient, adaptive, and human-aligned agentic systems. Implementation: https://github.com/jwliao-ai/MARFT.2025-04-21T07:03:54Z37 pagesJunwei LiaoMuning WenJun WangWeinan Zhanghttp://arxiv.org/abs/2410.02511v2Stop Wandering, Find the Keys: LLMs Discriminate Key States for Efficient Multi-Agent Exploration2026-05-30T09:14:19ZWith expansive state-action spaces, efficient multi-agent exploration remains a longstanding challenge in reinforcement learning. Although pursuing novelty, diversity, or uncertainty attracts increasing attention, redundant efforts brought by exploration without proper guidance choices poses a practical issue for the community. This paper introduces a systematic approach, termed LEMAE, choosing to channel informative task-relevant guidance from a knowledgeable Large Language Model (LLM) for Efficient Multi-Agent Exploration. Specifically, we ground linguistic knowledge from LLM into symbolic key states, that are critical for task fulfillment, in a discriminative manner at low LLM inference costs. To unleash the power of key states, we design Subspace-based Hindsight Intrinsic Reward (SHIR) to guide agents toward key states by increasing reward density. Additionally, we build the Key State Memory Tree (KSMT) to track transitions between key states in a specific task for organized exploration. Benefiting from diminishing redundant explorations, LEMAE outperforms existing SOTA approaches on the challenging benchmarks (e.g., SMAC and MPE) by a large margin, achieving a 10x acceleration in certain scenarios.2024-10-03T14:21:23ZSCIENCE CHINA Information Sciences 2026Yun QuBoyuan WangYuhang JiangJianzhun ShaoYixiu MaoHeming ZouChang LiuCheems WangMeiqin LiuXiangyang Ji10.1007/s11432-025-4978-2http://arxiv.org/abs/2606.00610v1MemGraphRAG: Memory-based Multi-Agent System for Graph Retrieval-Augmented Generation2026-05-30T08:18:53ZRetrieval-Augmented Generation (RAG) has become an essential method for mitigating hallucinations in Large Language Models (LLMs) by leveraging external knowledge. Although effective for simple queries, traditional RAG struggles with large-scale, unstructured corpora where information is highly fragmented. Graph-based RAG (GraphRAG) incorporates knowledge graphs to capture structural relationships, enabling more comprehensive retrieval for complex reasoning. However, existing GraphRAG methods rely on isolated, fragment-level extraction for graph construction, lacking a global perspective on the whole corpus. As a result, these methods frequently lead to thematically inconsistent, logically conflicting, and structurally fragmented graphs that degrade retrieval performance. In this paper, we propose MemGraphRAG, a novel framework that introduces a memory-based multi-agent system to ensure high-quality graph construction. Specifically, MemGraphRAG employs a collaborative society of agents supported by shared memory, which provides a unified global context throughout the extraction process. This mechanism allows agents to dynamically resolve logical conflicts and maintain structural connectivity throughout the corpus. Furthermore, we propose a memory-aware hierarchical retrieval algorithm tailored for the constructed graph. Extensive experiments on multiple benchmarks demonstrate that MemGraphRAG outperforms the state-of-the-art baseline models with comparable efficiency. Our code is available at https://github.com/XMUDeepLIT/MemGraphRAG.2026-05-30T08:18:53ZAccepted by KDD 2026Chuanjie WuZhishang XiangYunbo TangZerui ChenQinggang ZhangJinsong Suhttp://arxiv.org/abs/2606.00531v1State Machine Guided Multi-Relational Synthetic Data from Logs for Anomaly Detection2026-05-30T04:49:32ZSoftware systems generate massive unstructured logs that record execution behavior, failures, and interactions across components, yet existing log anomaly detection methods treat these logs primarily as flat sequences of templates, overlooking the relational execution structure that governs how events co-occur and evolve over time. We propose a framework that discovers this hidden structure by recovering an execution state machine directly from logs and inducing a corresponding multi-table relational schema connecting traces, events, states, transitions, and parameters. This discovered state machine serves as a generative prior to produce realistic multi-relational synthetic data that preserves structural, temporal, and process constraints while amplifying rare but valid execution behaviors. We assess the fidelity of the generated data through constraint validation, distributional similarity, and process-level metrics, and demonstrate its usefulness by showing that augmenting real logs with the synthetic relational data significantly improves anomaly and bug detection on held-out real datasets compared to sequence-based baselines and naive oversampling. Our results show that execution logs implicitly encode a relational database governed by a latent state machine, and that recovering this structure enables principled synthetic data generation for robust and interpretable anomaly detection.2026-05-30T04:49:32ZProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD 2026)Aja KhanalApurva Narayan10.1145/3770855.3818134http://arxiv.org/abs/2601.10102v6When Identity Overrides Incentives: Representational Choices as Governance Decisions in Multi-Agent LLM Systems2026-05-30T01:47:47ZMulti-agent systems built on large language models are increasingly deployed in strategic policy and governance settings, where agents representing stakeholders with conflicting interests must coordinate under shared constraints. These systems typically assign role-based personas to agents, describing their motivations and objectives. Whether agents with role-based identities follow explicit payoffs or their assigned roles in strategic decision-making remains untested. Here we show that assigning role-based personas suppresses payoff-aligned behavior in four-agent strategic games, shifting equilibrium attainment by up to 90 percentage points even when agents have complete payoff information. We test a 2x2 factorial design (persona presence x payoff visibility) across four models (Qwen-7B, Qwen-32B, Llama-8B, Mistral-7B), and 53 environmental policy scenarios with two equilibria: Tragedy of the Commons, where individual payoff dominates, and Green Transition, where collective payoff dominates. With personas present, all models reach near-zero Tragedy equilibrium in the Tragedy-dominant scenarios despite complete payoff information, and 100% of equilibria correspond to Green Transition. No model reaches Tragedy equilibrium by removing personas alone; only Qwen models reach 65-90% Tragedy equilibrium rates when personas are removed, and payoffs are made explicit. Three distinct behavioral profiles emerge: Qwen shifts equilibrium selection based on framing condition, Mistral increases response variance without reaching the Tragedy equilibrium, and Llama holds near-constant across all conditions. Representational choices in multi-agent LLM systems are governance decisions: persona assignment determines which equilibrium a simulation produces, independent of the underlying incentive structure.2026-01-15T06:14:01ZAccepted to ACM FAccT 2026Viswonathan ManoranjanSnehalkumar `Neil' S. Gaikwad10.1145/3805689.3812218http://arxiv.org/abs/2605.30169v2Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms2026-05-29T21:39:49ZAs autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural governance intuition is to extend human identity verification and reputation mechanisms, from ``Know Your Customer'' and credit scores to ``Know Your Agent'' regimes. However, we argue that this analogy is fundamentally incomplete. Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility. Yet language model agents are ontologically \emph{dissociative}: they are essentially an assemblage of mutable modules -- foundation models, system prompts, tool-access policies, external memory, and, in some cases, a multi-agent system as a whole -- any of which may change agent behavior -- with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability -- the very properties that reputation mechanisms aim to sustain -- thereby collapsing trust. We argue that identity-based, ex post, regulative, sanction-based governance, such as reputation, is structurally inapplicable to dissociative agents, and we suggest a shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.2026-05-28T16:20:19ZAccepted by FaccT 2026Botao Amber HuHelena RongMax Van Kleek10.1145/3805689.3806748