https://arxiv.org/api/RXKUTPmXLlXVxiLPj31ZdViE1ds2026-06-21T22:02:59Z1269569015http://arxiv.org/abs/2605.07637v2Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding2026-05-12T12:34:31ZMulti-agent pathfinding (MAPF) is a widely used abstraction for multi-robot trajectory planning problems, where multiple homogeneous agents move simultaneously within a shared environment. Although solving MAPF optimally is NP-hard, scalable and efficient solvers are critical for real-world applications such as logistics and search-and-rescue. To this end, the research community has proposed various decentralized suboptimal MAPF solvers that leverage machine learning. Such methods frame MAPF (from a single agent perspective) as a Dec-POMDP where at each time step an agent has to decide an action based on the local observation and typically solve the problem via reinforcement learning or imitation learning. We follow the same approach but additionally introduce a learnable communication module tailored to enhance cooperation between agents via efficient feature sharing. We present the Local Communication for Multi-agent Pathfinding (LC-MAPF), a generalizable pre-trained model that applies multi-round communication between neighboring agents to exchange information and improve their coordination. Our experiments show that the introduced method outperforms the existing learning-based MAPF solvers, including IL and RL-based approaches, across diverse metrics in a diverse range of (unseen) test scenarios. Remarkably, the introduced communication mechanism does not compromise LC-MAPF's scalability, a common bottleneck for communication-based MAPF solvers.2026-05-08T12:05:08ZValeriy VyaltsevAlsu SagirovaAnton AndreychukOleg BulichevYuri KuratovKonstantin YakovlevAleksandr PanovAlexey Skrynnikhttp://arxiv.org/abs/2605.11880v1Adaptive TD-Lambda for Cooperative Multi-agent Reinforcement Learning2026-05-12T09:56:24ZTD($λ$) in value-based MARL algorithms or the Temporal Difference critic learning in Actor-Critic-based (AC-based) algorithms synergistically integrate elements from Monte-Carlo simulation and Q function bootstrapping via dynamic programming, which effectively addresses the inherent bias-variance trade-off in value estimation. Based on that, some recent works link the adaptive $λ$ value to the policy distribution in the single-agent reinforcement learning area. However, because of the large joint action space from multiple number of agents, and the limited transition data in Multi-agent Reinforcement Learning, the policy distribution is infeasible to be calculated statistically. To solve the policy distribution calculation problem in MARL settings, we employ a parametric likelihood-free density ratio estimator with two replay buffers instead of calculating statistically. The two replay buffers of different sizes store the historical trajectories that represent the data distribution of the past and current policies correspondingly. Based on the estimator, we assign Adaptive TD($λ$), \textbf{ATD($λ$)}, values to state-action pairs based on their likelihood under the stationary distribution of the current policy. We apply the proposed method on two competitive baseline methods, QMIX for value-based algorithms, and MAPPO for AC-based algorithms, over SMAC benchmarks and Gfootball academy scenarios, and demonstrate consistently competitive or superior performance compared to other baseline approaches with static $λ$ values.2026-05-12T09:56:24ZYue DengZirui WangYin Zhanghttp://arxiv.org/abs/2605.11720v1A Research Agenda on Agents and Software Engineering: Outcomes from the Rio A2SE Seminar2026-05-12T08:06:29ZThe rise of agentic AI is reshaping software engineering in two intertwined directions: agents are increasingly applied to support software engineering tasks, and Agentic AI systems themselves are complex systems that require re-thinking currently established software engineering practices. To chart a coherent research agenda covering the two directions, we organized the A2SE seminar in Rio de Janeiro, bringing together 18 experts from academia and industry. Through structured presentations, collaborative topic clustering, and focused group discussions, participants identified six thematic areas: Governance, Software Engineering for Agents, Agents for Software Architecture, Quality and Evaluation, Sustainability, and Code, and they prioritized short-term and long-term research directions for each. This paper presents the resulting community-driven, opinionated research agenda, offering the SE community a structured foundation for coordinating efforts at this critical juncture.2026-05-12T08:06:29Z6 pages, 1 table, A2SE meeting, https://sites.google.com/view/a2se2026/homeDavide TaibiHenry MucciniKarthik VaidhyanathanMarcos KalinowskiMichele AlbanoAntonio Pedro Santos AlvesRenato CerqueiraMateus DevinoMatteo EspositoRodrigo FalcãoVinicius HenningFoutse KhomhValentina LenarduzziQinghua LuMatías MartínezHenrique MelloDaniel MendezLucas Romaohttp://arxiv.org/abs/2605.11688v1Shaping Zero-Shot Coordination via State Blocking2026-05-12T07:46:03ZZero-shot coordination (ZSC) aims to enable agents to cooperate with independently trained partners without prior interaction, a key requirement for real-world multi-agent systems and human-AI collaboration. Existing approaches have largely emphasized increasing partner diversity during training, yet such strategies often fall short of achieving reliable generalization to unseen partners. We introduce State-Blocked Coordination (SBC), a simple yet effective framework that improves ZSC by inducing diverse interaction scenarios without direct environment modification. Specifically, SBC generates a family of virtual environments through state blocking, allowing agents to experience a wide range of suboptimal partner policies. Across multiple benchmarks, SBC demonstrates superior performance in zero-shot coordination, including strong generalization to human partners.2026-05-12T07:46:03Z9 technical page followed by references and appendixMingu KangSunwoo LeeYonghyeon JoSeungyul Hanhttp://arxiv.org/abs/2605.11645v1GeomHerd: A Forward-looking Herding Quantification via Ricci Flow Geometry on Agent Interactive Simulations2026-05-12T07:07:58ZHerding -- where agents align their behaviors and act collectively -- is a central driver of market fragility and systemic risk. Existing approaches to quantify herding rely on price-correlation statistics, which inherently lag because they only detect coordination after it has already moved realised returns. We propose GeomHerd, a forward-looking geometric framework that bypasses this observability lag by quantifying coordination directly on upstream agent-interaction graphs. To generate these graphs, we treat a heterogeneous LLM-driven multi-agent simulator -- each financial trader instantiated by a persona-conditioned LLM call -- as a forecastable world, and evaluate the geometric pipeline on the Cividino--Sornette continuous-spin agent-based substrate as our headline financial testbed. By tracking the discrete Ollivier--Ricci curvature of these action graphs, GeomHerd captures the structural topology of emerging coordination. Theoretically, we establish a mean-field bridge mapping our graph-theoretic metric to CSAD, the classical macroscopic herding statistic, linking GeomHerd to downstream price-dispersion measurement. Empirically, GeomHerd anticipates herding long before aggregate market baselines: on the continuous-spin substrate, our primary detector fires a median of 272 steps before order-parameter onset; a contagion detector ($β_{-}$) recalls 65% of critical trajectories 318 steps early; and on co-firing trajectories the agent-graph signal precedes price-correlation-graph baselines by 40 steps. As a complementary indicator, the effective vocabulary of agent actions contracts during cascades. The geometric signature transfers out-of-domain to the Vicsek self-driven-particle model, and a curvature-conditioned forecasting head reduces cascade-window log-return MAE over detector-conditioned and price-only baselines.2026-05-12T07:07:58ZLake YangJunwei SuJingfeng ZengWenhao LuXingzhi QianWeitong ZhangChuan WuDunhong Jinhttp://arxiv.org/abs/2507.21159v3MAC: Masked Agent Collaboration Boosts Large Language Model Medical Decision-Making2026-05-12T06:29:43ZLarge language models (LLMs) have proven effective in artificial intelligence, where the multi-agent system (MAS) holds considerable promise for healthcare development by achieving the collaboration of LLMs. However, the absence of a systematic pipeline for agent construction and the rigidity of static collaboration patterns render current MAS-based models vulnerable to collaboration failures, resulting in substantial performance degradation in medical decision-making scenarios. To this end, we propose a novel Masked Agent Collaboration (MAC) framework that harnesses Pareto-optimal agent construction and cross-consistency maximization mechanisms to achieve adaptive progressive propagation of collaborative information, boosting the medical decision-making capacity. Specifically, we first conduct a Pareto-frontier factors analysis towards the LLMs pool to consider their key factors, including the model size, inference time, diversity score, and throughput ratio, where we calculate the similarity between pairwise outputs within an LLM to derive its diversity score. Beyond this analysis, we enable the identification of Pareto-optimal models that balance efficiency and capability, which are subsequently selected as collaborative agents to consider the fundamental trade-offs inherent in practical LLM deployment. Afterward, we measure the pairwise similarity between the outputs from collaborative agents to determine their cross-consistency values, subsequently masking out the agent with the lowest cross-consistency value to eliminate the output that is likely semantically inconsistent. Finally, we conduct collaboration of agents by achieving adaptive progressive propagation, where each agent aggregates the outputs of unmasked agents from the previous layer as its input to generate the corresponding output via prompt engineering.2025-07-25T04:21:16ZZhihao PengLiuxin BaoYixuan Yuanhttp://arxiv.org/abs/2511.14715v3FLARE: Adaptive Multi-Dimensional Reputation for Robust Client Reliability in Federated Learning2026-05-12T06:00:59ZFederated learning (FL) enables collaborative model training while preserving data privacy. However, it remains vulnerable to malicious clients who compromise model integrity through Byzantine attacks, data poisoning, or adaptive adversarial behaviors. Existing defense mechanisms rely on static thresholds and binary classification, failing to adapt to evolving client behaviors in real-world deployments. We propose FLARE, an adaptive reputation-based framework that transforms client reliability assessment from binary decisions to a continuous, multi-dimensional trust evaluation. FLARE integrates: (i) a multi-dimensional reputation score capturing performance consistency, statistical anomaly indicators, and temporal behavior, (ii) a self-calibrating adaptive threshold mechanism that adjusts security strictness based on model convergence and recent attack intensity, (iii) reputation-weighted aggregation with soft exclusion to proportionally limit suspicious contributions rather than eliminating clients outright, and (iv) a Local Differential Privacy (LDP) mechanism enabling reputation scoring on privatized client updates. We further introduce a highly evasive Statistical Mimicry (SM) attack, a benchmark adversary that blends honest gradients with synthetic perturbations and persistent drift to remain undetected by traditional filters. Extensive experiments with 100 clients on MNIST, CIFAR-10, and SVHN demonstrate that FLARE maintains high model accuracy and converges faster than state-of-the-art Byzantine-robust methods under diverse attack types, including label flipping, gradient scaling, adaptive attacks, ALIE, and SM. FLARE improves robustness by up to 16% and preserves model convergence within 30% of the non-attacked baseline, while achieving strong malicious-client detection performance with minimal computational overhead. https://github.com/Anonymous0-0paper/FLARE2025-11-18T17:57:40ZThe authors want to withdraw this manuscript for further verification and revision. We may release a substantially revised version in the futureAbolfazl YounesiLeon KissZahra Najafabadi SamaniJuan Aznar PovedaThomas Fahringerhttp://arxiv.org/abs/2605.11509v1Hierarchical LLM-Driven Control for HAPS-Assisted UAV Networks: Joint Optimization of Flight and Connectivity2026-05-12T04:27:14ZUncrewed aerial vehicles (UAVs) are increasingly deployed in complex networked environments, yet the joint optimization of multi-UAV motion control and connectivity remains a fundamental challenge. In this paper, we study a multi-UAV system operating in an integrated terrestrial and non-terrestrial network (ITNTN) comprising terrestrial base stations and high-altitude platform stations (HAPS). We consider a three-dimensional (3D) aerial highway scenario where UAVs must adapt their motion to ensure collision avoidance, efficient traffic flow, and reliable communication under dynamic and partially observable conditions. We first model the problem as a hierarchical multi-objective partially observable Markov decision process (H-MO-POMDP), capturing the coupling between control and communication objectives. Based on this formulation, we propose a large language model (LLM)-driven hierarchical multi-rate control framework. At the global level, an LLM-based controller on the HAPS performs long-term planning for load balancing and handover decisions. At the local level, each UAV employs a hybrid controller that integrates a slow-timescale LLM for high-level spatial reasoning with a reinforcement learning agent for faster UAV-to-infrastructure (U2I) communication and motion control. We further develop a high-fidelity 3D simulation platform by integrating the gym-pybullet-drones environment with 3GPP-compliant RF/THz channel models. Numerical results demonstrate that the proposed framework significantly outperforms state-of-the-art baselines, achieving a 14% increase in transportation efficiency and a 25% improvement in telecommunication throughput. Additionally, it achieves a 23% reduction in physical collision rates, demonstrating strong handover stability and zero-shot generalization in dynamic scenarios.2026-05-12T04:27:14ZSubmission for possible publicationZijiang YanHao ZhouWael JaafarJianhua PeiPing WangHalim YanikomerogluHina Tabassumhttp://arxiv.org/abs/2605.11503v1Distance-Constrained Unlabeled Multi-Agent Pathfinding2026-05-12T04:23:21ZWe study a graph pathfinding problem Distance-$r$ Independent Unlabeled Multi-Agent Pathfinding, finding a set of collision-free paths between two sets where agents must stay at pairwise distance at least $r+1$ at all times. This additional constraint, generalizing collision modeling for classical MAPF, targets aspects of real-world multi-agent coordination. This additional distance constraint makes feasibility (i.e., whether a solution exists) PSPACE-complete, in contrast to standard (unlabeled) MAPF, where it can be decided in polynomial time. We address the challenge via two complementary approaches: (i) reduction-based optimal algorithms with a feasibility-preserving compression procedure, and (ii) a configuration generator-based search. Despite the hardness, empirical results show that our algorithm can handle hundreds of agents in a practical timeframe.2026-05-12T04:23:21ZTakahiro SuzukiYuma TamuraKeisuke Okumurahttp://arxiv.org/abs/2605.11487v1Digital Identity for Agentic Systems: Toward a Portable Authorization Standard for Autonomous Agents2026-05-12T04:04:34ZEnterprise AI is shifting from copilots to autonomous agents capable of executing workflows, negotiating outcomes, and making decisions with limited human oversight. As these systems extend across organizational boundaries, identity alone is insufficient: an agent's authority must also be explicit, constrained, auditable, revocable, and consistently interpretable by independent receivers. This paper analyzes representative enterprise use cases in insurance claims processing and supply chain integrity to surface structural gaps in existing identity and access models. It proposes a portable authorization model for autonomous agents based on issuer-authored authorization payloads, typed constraint algebra, decision-consistent evaluation semantics, delegation attenuation, governed semantic resolution, fail-closed processing, and pre-flight discovery. The model separates credential containers, authorization payload semantics, and enforcement engines, allowing profiles such as JWT/JWS, Verifiable Credentials, OAuth Rich Authorization Requests, or policy-engine bindings to preserve a common authorization meaning across trust boundaries.2026-05-12T04:04:34Z46 pages, 10 figuresPartha Madhirahttp://arxiv.org/abs/2605.07069v3Social Theory Should Be a Structural Prior for Agentic AI: A Formal Framework for Multi-Agent Social Systems2026-05-12T03:22:59ZAgentic AI systems are increasingly deployed not in isolation, but inside social environments populated by other agents and humans, such as in social media platforms, multi-agent LLM pipelines or autonomous robotics fleets. In these settings, system behavior emerges not from individual agents alone, but from the multi-agent interactions over time. Emergent dynamics of individuals in a social group have been long studied by social scientists in human contexts. \textbf{This position paper argues that agentic AI systems must be modeled with social theory as a structural prior, and formalizes a Multi-Agent Social Systems (MASS) framework for how agents interact and influence to generate system-level outcomes.} We represent MASS as a class of dynamical system of information generation, local influence and interaction structure, formulated by four structural priors anchored in social theory: strategic heterogeneity, networked-constrained dependence, co-evolution and distributional instability. We demonstrate the importance of each structural prior through formal propositions, and articulate a research agenda for how MASS should be modeled, evaluated and governed.2026-05-08T00:30:02ZLynnette Hui Xian NgIain J. CruickshankAdrian Xuan Wei LimKathleen M. Carleyhttp://arxiv.org/abs/2602.15006v2Distributed Quantum Gaussian Processes for Multi-Agent Systems2026-05-12T02:57:48ZGaussian Processes (GPs) are a powerful tool for probabilistic modeling, but their performance is often constrained in complex, large-scale real-world domains due to the limited expressivity of classical kernels. Quantum computing offers the potential to overcome this limitation by embedding data into exponentially large Hilbert spaces, capturing complex correlations that remain inaccessible to classical computing approaches. In this paper, we propose a Distributed Quantum Gaussian Process (DQGP) method in a multi-agent setting to enhance modeling capabilities and scalability. To address the challenging non-Euclidean optimization problem, we develop a Distributed consensus Riemannian Alternating Direction Method of Multipliers (DR-ADMM) algorithm that aggregates local agent models into a global model. We evaluate the efficacy of our method through numerical experiments conducted on a quantum simulator in classical hardware. We use real-world, non-stationary elevation datasets of NASA's Shuttle Radar Topography Mission and synthetic datasets generated by Quantum Gaussian Processes. Beyond modeling advantages, our framework highlights potential computational speedups that quantum hardware may provide, particularly in Gaussian processes and distributed optimization.2026-02-16T18:46:23Z9 pages, 4 figures, accepted at AAMAS 2026 (International Conference on Autonomous Agents and Multiagent Systems)2026 International Conference on Autonomous Agents and Multiagent SystemsMeet GandhiGeorge P. Kontoudis10.65109/ADPL7324http://arxiv.org/abs/2509.15103v3Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning2026-05-12T02:18:50ZPartial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose failure causes worst-case system performance degradations. We study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL). We frame VAI as a Hierarchical Adversarial Decentralized Mean Field Control (HAD-MFC), where the upper level selects vulnerable agents as an NP-hard task and the lower level learns their worst-case adversarial policies via mean-field MARL. The two problems are coupled together, making HAD-MFC difficult to solve. To handle this, we first decouple the hierarchical process by Fenchel-Rockafellar transform, resulting a regularized mean-field Bellman operator for upper level that enables independent learning at each level, thus reducing computational complexity. We next reformulate the upper-level NP-hard problem as an MDP with dense rewards, allowing sequential identification of vulnerable agents via greedy and RL algorithms. This decomposition provably preserves the optimal solution. Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and reveals the vulnerability of each agent in large systems. Code available at https://github.com/Waken-dream/VAI2025-09-18T16:03:50ZAccepted by ICML 2026Simin LiZihao MaoZheng YuweiLinhao WangRuixiao XuChengdong MaZhiqian LiuXin YuYuqing MaXin WangJie LuoBo AnYaodong YangWeifeng LvXianglong Liuhttp://arxiv.org/abs/2603.28488v2Courtroom-Style Multi-Agent Debate with Progressive RAG and Role-Switching for Controversial Claim Verification2026-05-12T00:38:08ZLarge language models (LLMs) remain unreliable for high-stakes claim verification due to hallucinations and shallow reasoning. While retrieval-augmented generation (RAG) and multi-agent debate (MAD) address this, they are limited by one-pass retrieval and unstructured debate dynamics. We propose a courtroom-style multi-agent framework, PROClaim, that reformulates verification as a structured, adversarial deliberation. Our approach integrates specialized roles (e.g., Plaintiff, Defense, Judge) with Progressive RAG (P-RAG) to dynamically expand and refine the evidence pool during the debate. Furthermore, we employ evidence negotiation, self-reflection, and heterogeneous multi-judge aggregation to enforce calibration, robustness, and diversity. In zero-shot evaluations on the Check-COVID benchmark, PROClaim achieves 81.7% accuracy, outperforming standard multi-agent debate by 10.0 percentage points, with P-RAG driving the primary performance gains (+7.5 pp). We ultimately demonstrate that structural deliberation and model heterogeneity effectively mitigate systematic biases, providing a robust foundation for reliable claim verification. Our code and data are publicly available at https://github.com/mnc13/PROClaim.2026-03-30T14:23:15ZUnder review, 7 figures, 12 tablesMasnun Nuha ChowdhuryNusrat Jahan BegUmme Hunny KhanSyed Rifat RaiyanMd Kamrul HasanHasan Mahmudhttp://arxiv.org/abs/2605.11294v1Information and Contract Design for Repeated Interactions between Agents with Misaligned Incentives2026-05-11T22:26:00ZWe study the consequences of information asymmetries and misaligned incentives in settings with multiple independent agents. We model an interaction between a Sender, who holds vital private information but cannot act, and a Receiver, who must make decisions but is dependent on the Sender's information. We find that the Sender learns an optimal communication strategy that the Receiver reliably acts on. Importantly, this strategy is highly sensitive to the degree of conflict in the agents' rewards and the amount of environmental information the Receiver can already observe. We introduce a mechanism allowing the agents to form linear contracts, where a price is established for the information. We demonstrate that the Sender learns to use these payment structures to improve its rewards, though this comes at a cost of "fairness" between agents as the Sender is able to extract much of the Receiver's surplus. This raises questions about fairness, contract design, and learning in the context of multi-agent systems.2026-05-11T22:26:00ZAccepted to IJCAI 2026Nanda Kishore SreenivasKate Larson