https://arxiv.org/api/yV2JrZHHKwdAWGFFlo7XB+6JMTs2026-06-25T21:10:35Z1275093015http://arxiv.org/abs/2508.17557v4The Price of Uncertainty for Social Consensus2026-05-04T20:57:25ZHow hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the \emph{price of uncertainty}, a metric defined in previous work by Balcan et al. to quantify the effect of uncertainty in network games.2025-08-24T23:48:37Z17 pagesYunzhe BaiAlec Sun10.1145/3774904.3792496http://arxiv.org/abs/2605.03142v1MARS-DA: A Hierarchical Reinforcement Learning Framework for Risk-Aware Multi-Agent Bidding in Power Grids2026-05-04T20:31:48ZThe increasing penetration of renewable energy has introduced substantial volatility into wholesale electricity markets, complicating the optimal bidding strategies for power producers. Traditional Reinforcement Learning (RL) approaches often struggle to balance profit maximization with risk management, frequently overfitting to specific market conditions or failing to account for the stochastic spread between Day-Ahead (DA) and Real-Time (RT) settlements. To address these challenges, this paper makes two primary contributions. First, we introduce and open-source a high-fidelity gymnasium environment for two-settlement electricity market bidding. Grounded in extensive empirical data from the PJM Interconnection, the environment explicitly models the interplay between DA commitments and RT deviations, providing a standardized testbed for general and risk-sensitive agents. Second, we propose MARS-DA (Multi-Agent Regime-Switching for Day-Ahead markets), a novel hierarchical framework that orchestrates distinct sub-policies for risk management and profit seeking. MARS-DA utilizes a top-level Meta-Controller to dynamically blend the actions of two specialized base agents: a "Safe Agent" that optimizes for reliable DA allocation and a "Speculator Agent" that targets volatile RT arbitrage opportunities. Extensive experiments demonstrate that MARS-DA achieves superior risk-adjusted returns compared to state-of-the-art baselines while maintaining robust regime alignment during periods of extreme market volatility.2026-05-04T20:31:48ZJiayi ChenXuan ZhangGuiling Wanghttp://arxiv.org/abs/2604.04409v2FORMULA: FORmation MPC with neUral barrier Learning for safety Assurance2026-05-04T20:28:46ZMulti-robot systems (MRS) are essential for large-scale applications such as disaster response, material transport, and warehouse logistics, yet ensuring robust, safety-aware formation control in cluttered and dynamic environments remains a major challenge. Existing model predictive control (MPC) approaches suffer from limitations in scalability and provable safety, while control barrier functions (CBFs), though principled for safety enforcement, are difficult to handcraft for large-scale nonlinear systems. This paper presents FORMULA, a safe distributed, learning-enhanced predictive control framework that integrates MPC with Control Lyapunov Functions (CLFs) for stability and neural network-based CBFs for decentralized safety, eliminating manual safety constraint design. This scheme maintains formation integrity during obstacle avoidance, resolves deadlocks in dense configurations, and reduces online computational load. Simulation results demonstrate that FORMULA enables scalable, safety-aware, formation-preserving navigation for multi-robot teams in complex environments.2026-04-06T04:21:09ZAccepted to IEEE Intelligent Vehicles Symposium (IV) 2026Qintong XieWeishu ZhanPeter Chinhttp://arxiv.org/abs/2506.10874v2Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium2026-05-04T19:14:57ZWe study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.2025-06-12T16:42:01ZSarah A. ToonsiJeff S. Shammahttp://arxiv.org/abs/2605.06696v1Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations2026-05-04T16:59:08ZCollections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary.
We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.2026-05-04T16:59:08Z18 pagesCameron BergSusan L. SchneiderMark M. Baileyhttp://arxiv.org/abs/2605.02697v1Executor-Side Progressive Risk-Gated Actuation for Agentic AI in Wireless Supervisory Control2026-05-04T15:09:41ZAgentic artificial intelligence (AI) shows promise for automating O-RAN wireless supervisory control, but translated intents still require an executor-side decision before live network actuation. Existing control flows lack explicit semantics for whether an intent should commit, gate for evidence, or reject under stale telemetry, concurrent policies, deadline and bandwidth limits, and rollback constraints. We propose Progressive Risk-Gated Actuation (PRGA), an executor-side contract for risk-gated wireless intent execution. PRGA structures each intent into executable local triage (C0), on-demand coordination evidence (C1), and post-hoc provenance support (C2), with C2 kept off the online safety path. A deterministic two-stage policy checks expiry, freshness, rollback-handle validity, local conflict, blocking preconditions, and planner-executor risk divergence from C0, then retrieves C1 only for gated intents when deadline and bandwidth budgets allow; evidence-mandatory gates reject when required C1 is unavailable. On two 3GPP-parameterized energy-saving and slice-SLA benchmarks, PRGA reduces time-to-first-safe-action by 23.3-27.4% and per-commit control-plane bytes by 52.7-54.2% against a decision-identical eager full-evidence cost-overlay comparator, thereby isolating retrieval-cost accounting; remains non-inferior within a pre-declared 0.5 percentage-point unsafe-action margin against an invariant-respecting static-threshold comparator; and rejects 100% of injected over-threshold stale inputs in the stale-state fault campaign. On these benchmarks, PRGA improves supervisory responsiveness and control-plane efficiency within the evaluated unsafe-action boundary.2026-05-04T15:09:41ZZhenyu LiuYi MaRahim Tafazollihttp://arxiv.org/abs/2502.03506v2Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning2026-05-04T09:30:33ZThe Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to suboptimal solutions. While such underestimation is typically attributed to the representational limitations of monotonic structures, we provide a novel perspective by demonstrating that the insufficient sampling of optimal joint actions during exploration is also a critical factor. To address this problem, we propose Optimistic $ε$-Greedy Exploration. Our method introduces optimistic action-value networks that serve as decoupled exploration indicators, which we theoretically prove to converge in probability to the maximum achievable returns. By sampling actions from these distributions with a probability of $ε$, we effectively increase the selection frequency of high-return joint actions. Experimental results in various environments reveal that our strategy effectively prevents the algorithm from falling into suboptimal solutions and significantly improves final returns, win rates, and convergence speeds compared to other enhanced algorithms. Our code has been open-sourced at https://github.com/qxqxtxdy/OptimisticExploration.2025-02-05T12:06:54ZRuoning ZhangSiying WangWenyu ChenYang ZhouZhitong ZhaoZixuan ZhangRuijie ZhangStefano V. Albrechthttp://arxiv.org/abs/2605.02335v1LLM-enabled Social Agents2026-05-04T08:39:58ZLarge Language Models (LLMs) have transformed agent-agent and human-agent interaction by enabling software, physical, and simulation agents to communicate and deliberate through natural language. Yet fluent language use does not by itself yield socially intelligible behaviour. Most current systems remain weakly grounded in roles, norms, intentions, and contextual constraints, limiting their capacity for meaningful participation in social environments. This paper develops a conceptual baseline for LLM-enabled social agents by arguing that they should be grounded in role definitions operationalized through persona descriptions. On this basis, we outline research directions for representation, hybrid control, and evaluation. The paper concludes that persona-based role definitions are a necessary foundation for turning language competence into social behaviour.2026-05-04T08:39:58Z11 pages, 1 figure, Hybrid Human Artificial Intelligence (HHAI) 2026Önder GürcanMoharram Challengerhttp://arxiv.org/abs/2511.01045v2GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering2026-05-04T08:15:33ZIn this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.2025-11-02T18:42:45Zsubmitted to Elsevier Signal Processing May 2026George JonesAngel Garcia-Fernandezhttp://arxiv.org/abs/2605.02307v1SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind2026-05-04T07:59:18ZAs LLM-based agents are increasingly interacting in multi-party settings, they need to properly handle information asymmetry, i.e., knowing when and to whom to disclose information is appropriate. Yet, existing benchmarks fail to measure this ability in realistic multi-party settings. Thus, we introduce SOTOPIA-TOM, a multi-dimensional benchmarking framework to evaluate LLM agents' ability to successfully navigate information asymmetric and privacy sensitive multi-party interactions. We create an interaction environment which enables both public (broadcast) and private (direct message) communication, and craft 160 human-reviewed scenarios across eight industry sectors, each involving 3 to 5 agents with partitioned private knowledge and channel-dependent sharing policies. To measure interaction abilities, we create a multi-dimensional evaluation framework to assess how well agents share useful information, seek missing details, coordinate efficiently, and protect privacy, which we also combine into a composite INFOMGMT metric. Results show that, across 6 LLM backbones and prompting strategies (vanilla, CoT-privacy, and ToM-based interventions), even the largest high-reasoning model (GPT-5) reaches only a 62% INFOMGMT score, which indicates persistent deficiencies in information seeking and privacy-aware decision-making. Additionally, ToM-based interventions more consistently improve the overall coordination-privacy balance (for example, relative to the vanilla baseline, ToM-Coach reduces critical privacy violations on GPT-4o from 9.9% to 2.2% while increasing the composite InfoMgmt score more than 2.5x from 15% to 40%). Overall, SOTOPIA-TOM exposes persistent limitations of current LLM agents in complex, information-asymmetric coordination and provides an extensible testbed for developing more privacy-aware, theory-of-mind capable multi-agent systems.2026-05-04T07:59:18Z37 pages, 22 FiguresYashwanth YSRuichen WangShihua ZengXuhui ZhouKoichi OnoueVasudha VaradarajanMaarten Saphttp://arxiv.org/abs/2605.00420v2Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents2026-05-04T07:21:37ZEvaluating the true forecasting ability of AI agents requires environments that are resistant to environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL -- a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score -- proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy's classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a deterministic, seed-controlled simulation study calibrated to literature-reported Brier-score ranges, illustrating how Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. Live results from the deployed benchmark will be reported in a future revision. All smart contracts and evaluation infrastructure are open-source.2026-05-01T05:33:10Zv2: Reframed Section 6 as an illustrative simulation study with explicit disclosure that the numerical results in Section 6 come from a calibrated Monte Carlo simulation rather than a live deployment; added live-evaluation-pending limitationMaksym NechepurenkoPavel Shuvalovhttp://arxiv.org/abs/2605.02235v1Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems2026-05-04T05:09:41ZDecentralized strategies are of interest for local decision-making over multi-vehicle networks. This paper studies mixed traffic networks of human-driven and autonomous vehicles with partial sensor measurements. The idea is to enable the group of connected autonomous vehicles (CAVs) to track the state of a group of human-driven vehicles (HDVs) via distributed consensus-based observers/estimators. Particularly, we make no assumption that the group of HDVs is locally observable in the direct neighborhood of any CAV. Then, the main contribution is to design local residual-based fault detection and isolation (FDI) at every CAV to detect possible faults/attacks in the sensor measurements. This distributed detection strategy enables every CAV to locally find possible anomalies in its taken sensor measurement with no need for a central processing unit. Two FDI logics are proposed with and without considering the history of the residuals. These FDI techniques are based on probabilistic threshold design on the residuals (in contrast to the existing deterministic threshold FDI techniques) with no assumption that the noise is of bounded support. This is more realistic in real-world multi-vehicle transportation systems.2026-05-04T05:09:41ZEuropean journal of controlMohammadreza DoostmohammadianHamid R. Rabieehttp://arxiv.org/abs/2605.02168v1Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning2026-05-04T02:58:05ZLanguage model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task execution, and a memory manager for contextual reasoning. While this modular decomposition aligns with established design patterns, our core contribution lies in a systematic compute-allocation analysis, revealing that planning is the dominant factor influencing task performance. Execution and memory management require significantly less compute and model capacity to achieve competitive results. Building on these insights, we introduce a planner-centric reinforcement learning approach, which exclusively optimizes the planner using trajectory-level rewards from a VLM-as-judge, while freezing the other components. Extensive experiments on benchmarks spanning web navigation, OS control, and tool use demonstrate that concentrating model capacity and learning on high-level planning yields robust and compute-efficient improvements in long-horizon agent automation. Our code is publicly released.2026-05-04T02:58:05ZWenyi WuSibo ZhuKun ZhouBiwei Huanghttp://arxiv.org/abs/2605.02162v1AAFLOW: Scalable Patterns for Agentic AI Workflows2026-05-04T02:39:13ZAgentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.2026-05-04T02:39:13Z10 pages, 8 Figures, 3 Tables. preprint for SC2026Arup Kumar SarkerMills StaylorAymen AlsaadiGregor von LaszewskiShantenu JhaGeoffrey Foxhttp://arxiv.org/abs/2605.02063v1Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition2026-05-03T21:14:06ZWe present Coopetition-Gym v1, a benchmark platform for mixed-motive multi-agent reinforcement learning under strategic coopetition. The platform comprises twenty environments organized into four mechanism classes that correspond to four foundational technical reports: interdependence and complementarity (arXiv:2510.18802), trust and reputation dynamics (arXiv:2510.24909), collective action and loyalty (arXiv:2601.16237), and sequential interaction and reciprocity (arXiv:2604.01240). Each environment carries a closed-form payoff structure and a calibrated interdependence matrix derived from the corresponding report.
Every environment exposes a parameterized reward layer configurable across three structurally distinct modes (private, integrated, cooperative). This separation of payoff from reward enables reward-type ablation, the platform's principal methodological apparatus. Four of the twenty environments are calibrated against historically documented coopetitive relationships and reproduce their outcomes at 98.3, 81.7, 86.7, and 87.3 percent on the validation rubric (Samsung-Sony LCD, Renault-Nissan Alliance, Apache HTTP Server, Apple iOS App Store).
The platform exposes Gymnasium, PettingZoo Parallel, and PettingZoo AEC interfaces and ships 126 reference algorithms: 16 learning algorithms, 7 game-theoretic oracles, 2 heuristic baselines, and 101 constant-action policies. A reference experimental study trained the 16 learning algorithms on every environment under every reward configuration with seven random seeds, producing a 25,708-run training corpus and a 1,116-run behavioral audit corpus, both released under CC-BY-4.0 with Croissant 1.0 metadata. Coopetition-Gym v1 is the first platform to combine continuous-action mixed-motive environments, parameterized reward mutuality, calibrated interdependence coefficients, game-theoretic oracle baselines, and validated case studies.2026-05-03T21:14:06Z82 pages, 14 figures, 9 tables, 51 references. AI-track technical report companion to the four-paper foundational series; should be read with arXiv:2510.18802, arXiv:2510.24909, arXiv:2601.16237, and arXiv:2604.01240. Reproducibility package and source code: https://github.com/vikpant/strategic-coopetition. Datasets released under CC-BY-4.0 at https://huggingface.co/vikpantVik PantEric Yu