https://arxiv.org/api/yV2JrZHHKwdAWGFFlo7XB+6JMTs 2026-06-25T21:10:35Z 12750 930 15 http://arxiv.org/abs/2508.17557v4 The Price of Uncertainty for Social Consensus 2026-05-04T20:57:25Z

How hard is it to achieve consensus in a social network under uncertainty? In this paper we model this problem as a social graph of agents where each vertex is initially colored red or blue. The goal of the agents is to achieve consensus, which is when the colors of all agents align. Agents attempt to do this locally through steps in which an agent changes their color to the color of the majority of their neighbors. In real life, agents may not know exactly how many of their neighbors are red or blue, which introduces uncertainty into this process. Modeling uncertainty as perturbations of relative magnitude $1+\varepsilon$ to these color neighbor counts, we show that even small values of $\varepsilon$ greatly hinder the ability to achieve consensus in a social network. We prove theoretically tight upper and lower bounds on the \emph{price of uncertainty}, a metric defined in previous work by Balcan et al. to quantify the effect of uncertainty in network games.

2025-08-24T23:48:37Z 17 pages Yunzhe Bai Alec Sun 10.1145/3774904.3792496 http://arxiv.org/abs/2605.03142v1 MARS-DA: A Hierarchical Reinforcement Learning Framework for Risk-Aware Multi-Agent Bidding in Power Grids 2026-05-04T20:31:48Z

The increasing penetration of renewable energy has introduced substantial volatility into wholesale electricity markets, complicating the optimal bidding strategies for power producers. Traditional Reinforcement Learning (RL) approaches often struggle to balance profit maximization with risk management, frequently overfitting to specific market conditions or failing to account for the stochastic spread between Day-Ahead (DA) and Real-Time (RT) settlements. To address these challenges, this paper makes two primary contributions. First, we introduce and open-source a high-fidelity gymnasium environment for two-settlement electricity market bidding. Grounded in extensive empirical data from the PJM Interconnection, the environment explicitly models the interplay between DA commitments and RT deviations, providing a standardized testbed for general and risk-sensitive agents. Second, we propose MARS-DA (Multi-Agent Regime-Switching for Day-Ahead markets), a novel hierarchical framework that orchestrates distinct sub-policies for risk management and profit seeking. MARS-DA utilizes a top-level Meta-Controller to dynamically blend the actions of two specialized base agents: a "Safe Agent" that optimizes for reliable DA allocation and a "Speculator Agent" that targets volatile RT arbitrage opportunities. Extensive experiments demonstrate that MARS-DA achieves superior risk-adjusted returns compared to state-of-the-art baselines while maintaining robust regime alignment during periods of extreme market volatility.

2026-05-04T20:31:48Z Jiayi Chen Xuan Zhang Guiling Wang http://arxiv.org/abs/2604.04409v2 FORMULA: FORmation MPC with neUral barrier Learning for safety Assurance 2026-05-04T20:28:46Z

Multi-robot systems (MRS) are essential for large-scale applications such as disaster response, material transport, and warehouse logistics, yet ensuring robust, safety-aware formation control in cluttered and dynamic environments remains a major challenge. Existing model predictive control (MPC) approaches suffer from limitations in scalability and provable safety, while control barrier functions (CBFs), though principled for safety enforcement, are difficult to handcraft for large-scale nonlinear systems. This paper presents FORMULA, a safe distributed, learning-enhanced predictive control framework that integrates MPC with Control Lyapunov Functions (CLFs) for stability and neural network-based CBFs for decentralized safety, eliminating manual safety constraint design. This scheme maintains formation integrity during obstacle avoidance, resolves deadlocks in dense configurations, and reduces online computational load. Simulation results demonstrate that FORMULA enables scalable, safety-aware, formation-preserving navigation for multi-robot teams in complex environments.

2026-04-06T04:21:09Z Accepted to IEEE Intelligent Vehicles Symposium (IV) 2026 Qintong Xie Weishu Zhan Peter Chin http://arxiv.org/abs/2506.10874v2 Higher-Order Uncoupled Learning Dynamics and Nash Equilibrium 2026-05-04T19:14:57Z

We study learnability of mixed-strategy Nash Equilibrium (NE) in general finite games using higher-order replicator dynamics as well as classes of higher-order uncoupled heterogeneous dynamics. In higher-order uncoupled learning dynamics, players have no access to utilities of opponents (uncoupled) but are allowed to use auxiliary states to further process information (higher-order). We establish a link between uncoupled learning and feedback stabilization with decentralized control. Using this association, we show that for any finite game with an isolated completely mixed-strategy NE, there exist higher-order uncoupled learning dynamics that lead (locally) to that NE. We further establish the lack of universality of learning dynamics by linking learning to the control theoretic concept of simultaneous stabilization. We construct two games such that any higher-order dynamics that learn the completely mixed-strategy NE of one of these games can never learn the completely mixed-strategy NE of the other. Next, motivated by imposing natural restrictions on allowable learning dynamics, we introduce the Asymptotic Best Response (ABR) property. Dynamics with the ABR property asymptotically learn a best response in environments that are asymptotically stationary. We show that the ABR property relates to an internal stability condition on higher-order learning dynamics. We provide conditions under which NE are compatible with the ABR property. Finally, we address learnability of mixed-strategy NE in the bandit setting using a bandit version of higher-order replicator dynamics.

2025-06-12T16:42:01Z Sarah A. Toonsi Jeff S. Shamma http://arxiv.org/abs/2605.06696v1 Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations 2026-05-04T16:59:08Z

Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.

2026-05-04T16:59:08Z 18 pages Cameron Berg Susan L. Schneider Mark M. Bailey http://arxiv.org/abs/2605.02697v1 Executor-Side Progressive Risk-Gated Actuation for Agentic AI in Wireless Supervisory Control 2026-05-04T15:09:41Z

Agentic artificial intelligence (AI) shows promise for automating O-RAN wireless supervisory control, but translated intents still require an executor-side decision before live network actuation. Existing control flows lack explicit semantics for whether an intent should commit, gate for evidence, or reject under stale telemetry, concurrent policies, deadline and bandwidth limits, and rollback constraints. We propose Progressive Risk-Gated Actuation (PRGA), an executor-side contract for risk-gated wireless intent execution. PRGA structures each intent into executable local triage (C0), on-demand coordination evidence (C1), and post-hoc provenance support (C2), with C2 kept off the online safety path. A deterministic two-stage policy checks expiry, freshness, rollback-handle validity, local conflict, blocking preconditions, and planner-executor risk divergence from C0, then retrieves C1 only for gated intents when deadline and bandwidth budgets allow; evidence-mandatory gates reject when required C1 is unavailable. On two 3GPP-parameterized energy-saving and slice-SLA benchmarks, PRGA reduces time-to-first-safe-action by 23.3-27.4% and per-commit control-plane bytes by 52.7-54.2% against a decision-identical eager full-evidence cost-overlay comparator, thereby isolating retrieval-cost accounting; remains non-inferior within a pre-declared 0.5 percentage-point unsafe-action margin against an invariant-respecting static-threshold comparator; and rejects 100% of injected over-threshold stale inputs in the stale-state fault campaign. On these benchmarks, PRGA improves supervisory responsiveness and control-plane efficiency within the evaluated unsafe-action boundary.

2026-05-04T15:09:41Z Zhenyu Liu Yi Ma Rahim Tafazolli http://arxiv.org/abs/2502.03506v2 Optimistic ε-Greedy Exploration for Cooperative Multi-Agent Reinforcement Learning 2026-05-04T09:30:33Z

The Centralized Training with Decentralized Execution (CTDE) paradigm is widely used in cooperative multi-agent reinforcement learning. However, conventional methods based on CTDE can suffer from value underestimation and converge to suboptimal solutions. While such underestimation is typically attributed to the representational limitations of monotonic structures, we provide a novel perspective by demonstrating that the insufficient sampling of optimal joint actions during exploration is also a critical factor. To address this problem, we propose Optimistic $ε$-Greedy Exploration. Our method introduces optimistic action-value networks that serve as decoupled exploration indicators, which we theoretically prove to converge in probability to the maximum achievable returns. By sampling actions from these distributions with a probability of $ε$, we effectively increase the selection frequency of high-return joint actions. Experimental results in various environments reveal that our strategy effectively prevents the algorithm from falling into suboptimal solutions and significantly improves final returns, win rates, and convergence speeds compared to other enhanced algorithms. Our code has been open-sourced at https://github.com/qxqxtxdy/OptimisticExploration.

2025-02-05T12:06:54Z Ruoning Zhang Siying Wang Wenyu Chen Yang Zhou Zhitong Zhao Zixuan Zhang Ruijie Zhang Stefano V. Albrecht http://arxiv.org/abs/2605.02335v1 LLM-enabled Social Agents 2026-05-04T08:39:58Z

Large Language Models (LLMs) have transformed agent-agent and human-agent interaction by enabling software, physical, and simulation agents to communicate and deliberate through natural language. Yet fluent language use does not by itself yield socially intelligible behaviour. Most current systems remain weakly grounded in roles, norms, intentions, and contextual constraints, limiting their capacity for meaningful participation in social environments. This paper develops a conceptual baseline for LLM-enabled social agents by arguing that they should be grounded in role definitions operationalized through persona descriptions. On this basis, we outline research directions for representation, hybrid control, and evaluation. The paper concludes that persona-based role definitions are a necessary foundation for turning language competence into social behaviour.

2026-05-04T08:39:58Z 11 pages, 1 figure, Hybrid Human Artificial Intelligence (HHAI) 2026 Önder Gürcan Moharram Challenger http://arxiv.org/abs/2511.01045v2 GOSPA-Driven Non-Myopic Multi-Sensor Management with Multi-Bernoulli Filtering 2026-05-04T08:15:33Z

In this paper, we propose a non-myopic sensor management algorithm for multi-target tracking, with multiple sensors operating in the same surveillance area. The algorithm is based on multi-Bernoulli filtering and selects the actions that solve a non-myopic minimisation problem, where the cost function is the mean square generalised optimal sub-pattern assignment (GOSPA) error, over a future time window. For tractability, the sensor management algorithm actually uses an upper bound of the GOSPA error and is implemented via Monte Carlo Tree Search (MCTS). The sensors have the ability to jointly optimise and select their actions with the considerations of all other sensors in the surveillance area. The benefits of the proposed algorithm are analysed via simulations.

2025-11-02T18:42:45Z submitted to Elsevier Signal Processing May 2026 George Jones Angel Garcia-Fernandez http://arxiv.org/abs/2605.02307v1 SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind 2026-05-04T07:59:18Z

As LLM-based agents are increasingly interacting in multi-party settings, they need to properly handle information asymmetry, i.e., knowing when and to whom to disclose information is appropriate. Yet, existing benchmarks fail to measure this ability in realistic multi-party settings. Thus, we introduce SOTOPIA-TOM, a multi-dimensional benchmarking framework to evaluate LLM agents' ability to successfully navigate information asymmetric and privacy sensitive multi-party interactions. We create an interaction environment which enables both public (broadcast) and private (direct message) communication, and craft 160 human-reviewed scenarios across eight industry sectors, each involving 3 to 5 agents with partitioned private knowledge and channel-dependent sharing policies. To measure interaction abilities, we create a multi-dimensional evaluation framework to assess how well agents share useful information, seek missing details, coordinate efficiently, and protect privacy, which we also combine into a composite INFOMGMT metric. Results show that, across 6 LLM backbones and prompting strategies (vanilla, CoT-privacy, and ToM-based interventions), even the largest high-reasoning model (GPT-5) reaches only a 62% INFOMGMT score, which indicates persistent deficiencies in information seeking and privacy-aware decision-making. Additionally, ToM-based interventions more consistently improve the overall coordination-privacy balance (for example, relative to the vanilla baseline, ToM-Coach reduces critical privacy violations on GPT-4o from 9.9% to 2.2% while increasing the composite InfoMgmt score more than 2.5x from 15% to 40%). Overall, SOTOPIA-TOM exposes persistent limitations of current LLM agents in complex, information-asymmetric coordination and provides an extensible testbed for developing more privacy-aware, theory-of-mind capable multi-agent systems.

2026-05-04T07:59:18Z 37 pages, 22 Figures Yashwanth YS Ruichen Wang Shihua Zeng Xuhui Zhou Koichi Onoue Vasudha Varadarajan Maarten Sap http://arxiv.org/abs/2605.00420v2 Foresight Arena: An On-Chain Benchmark for Evaluating AI Forecasting Agents 2026-05-04T07:21:37Z

Evaluating the true forecasting ability of AI agents requires environments that are resistant to environments resistant to overfitting, free from centralized trust, and grounded in incentive-compatible scoring. Existing benchmarks either rely on static datasets vulnerable to training-data contamination, or measure trading PnL -- a metric conflating predictive accuracy with timing, sizing, and risk appetite. We introduce Foresight Arena, the first permissionless, on-chain benchmark for evaluating AI forecasting agents on real-world prediction markets. Agents submit probabilistic forecasts on binary Polymarket markets via a commit-reveal protocol enforced by Solidity smart contracts on Polygon PoS; outcomes are resolved trustlessly through the Gnosis Conditional Token Framework. Performance is measured by the Brier Score and a novel Alpha Score -- proper scoring rules that incentivize honest probability reporting and isolate predictive edge over market consensus. We provide a formal analysis: closed-form variance for per-market Alpha, the connection to Murphy's classical Brier decomposition, and a power analysis characterizing the number of rounds required to reliably distinguish agents of different skill levels. We show that detecting a true edge of $α^* = 0.02$ at 80% power requires approximately 350 resolved binary predictions (50 rounds of 7 markets), while $α^* = 0.01$ requires four times more. We complement these analytical results with a deterministic, seed-controlled simulation study calibrated to literature-reported Brier-score ranges, illustrating how Murphy decomposition distinguishes well-calibrated agents from market-tracking agents that fail through reduced resolution. Live results from the deployed benchmark will be reported in a future revision. All smart contracts and evaluation infrastructure are open-source.

2026-05-01T05:33:10Z v2: Reframed Section 6 as an illustrative simulation study with explicit disclosure that the numerical results in Section 6 come from a calibrated Monte Carlo simulation rather than a live deployment; added live-evaluation-pending limitation Maksym Nechepurenko Pavel Shuvalov http://arxiv.org/abs/2605.02235v1 Distributed Observer-based Fault Detection over Intelligent Networked Multi-Vehicle Systems 2026-05-04T05:09:41Z

Decentralized strategies are of interest for local decision-making over multi-vehicle networks. This paper studies mixed traffic networks of human-driven and autonomous vehicles with partial sensor measurements. The idea is to enable the group of connected autonomous vehicles (CAVs) to track the state of a group of human-driven vehicles (HDVs) via distributed consensus-based observers/estimators. Particularly, we make no assumption that the group of HDVs is locally observable in the direct neighborhood of any CAV. Then, the main contribution is to design local residual-based fault detection and isolation (FDI) at every CAV to detect possible faults/attacks in the sensor measurements. This distributed detection strategy enables every CAV to locally find possible anomalies in its taken sensor measurement with no need for a central processing unit. Two FDI logics are proposed with and without considering the history of the residuals. These FDI techniques are based on probabilistic threshold design on the residuals (in contrast to the existing deterministic threshold FDI techniques) with no assumption that the noise is of bounded support. This is more realistic in real-world multi-vehicle transportation systems.

2026-05-04T05:09:41Z European journal of control Mohammadreza Doostmohammadian Hamid R. Rabiee http://arxiv.org/abs/2605.02168v1 Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning 2026-05-04T02:58:05Z

Language model (LM)-based agents have demonstrated promising capabilities in automating complex tasks from natural language instructions, yet they continue to struggle with long-horizon planning and reasoning. To address this, we propose an enhanced multi-agent framework that decomposes automation into three roles: a planner for high-level decision-making, an actor for task execution, and a memory manager for contextual reasoning. While this modular decomposition aligns with established design patterns, our core contribution lies in a systematic compute-allocation analysis, revealing that planning is the dominant factor influencing task performance. Execution and memory management require significantly less compute and model capacity to achieve competitive results. Building on these insights, we introduce a planner-centric reinforcement learning approach, which exclusively optimizes the planner using trajectory-level rewards from a VLM-as-judge, while freezing the other components. Extensive experiments on benchmarks spanning web navigation, OS control, and tool use demonstrate that concentrating model capacity and learning on high-level planning yields robust and compute-efficient improvements in long-horizon agent automation. Our code is publicly released.

2026-05-04T02:58:05Z Wenyi Wu Sibo Zhu Kun Zhou Biwei Huang http://arxiv.org/abs/2605.02162v1 AAFLOW: Scalable Patterns for Agentic AI Workflows 2026-05-04T02:39:13Z

Agentic workflows in large language model systems integrate retrieval, reasoning, and memory, but existing frameworks suffer from scalability and reproducibility limitations due to fragmented data orchestration, serialization overhead, and non-deterministic execution. Although these frameworks increase flexibility, they don't have a formal execution model that adheres to the principles of high-performance computing. We introduce AAFLOW, a unified distributed runtime that creates communication-efficient execution plans by modeling agentic workflows as an operator abstraction. Using Apache Arrow and Cylon, AAFLOW creates a zero-copy data plane that allows direct interoperability between preprocessing, embedding, and vector retrieval without the need for serialization overhead. To lower coordination costs, it uses resource-deterministic scheduling and asynchronous batching. While retaining comparable LLM generation throughput, experimental results demonstrate up to 4.64 times pipeline speedup and 2.8 times gains in embedding and upsert phases. Rather than LLM inference acceleration, these advantages result from enhanced data flow, batching, and communication efficiency.

2026-05-04T02:39:13Z 10 pages, 8 Figures, 3 Tables. preprint for SC2026 Arup Kumar Sarker Mills Staylor Aymen Alsaadi Gregor von Laszewski Shantenu Jha Geoffrey Fox http://arxiv.org/abs/2605.02063v1 Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition 2026-05-03T21:14:06Z

We present Coopetition-Gym v1, a benchmark platform for mixed-motive multi-agent reinforcement learning under strategic coopetition. The platform comprises twenty environments organized into four mechanism classes that correspond to four foundational technical reports: interdependence and complementarity (arXiv:2510.18802), trust and reputation dynamics (arXiv:2510.24909), collective action and loyalty (arXiv:2601.16237), and sequential interaction and reciprocity (arXiv:2604.01240). Each environment carries a closed-form payoff structure and a calibrated interdependence matrix derived from the corresponding report. Every environment exposes a parameterized reward layer configurable across three structurally distinct modes (private, integrated, cooperative). This separation of payoff from reward enables reward-type ablation, the platform's principal methodological apparatus. Four of the twenty environments are calibrated against historically documented coopetitive relationships and reproduce their outcomes at 98.3, 81.7, 86.7, and 87.3 percent on the validation rubric (Samsung-Sony LCD, Renault-Nissan Alliance, Apache HTTP Server, Apple iOS App Store). The platform exposes Gymnasium, PettingZoo Parallel, and PettingZoo AEC interfaces and ships 126 reference algorithms: 16 learning algorithms, 7 game-theoretic oracles, 2 heuristic baselines, and 101 constant-action policies. A reference experimental study trained the 16 learning algorithms on every environment under every reward configuration with seven random seeds, producing a 25,708-run training corpus and a 1,116-run behavioral audit corpus, both released under CC-BY-4.0 with Croissant 1.0 metadata. Coopetition-Gym v1 is the first platform to combine continuous-action mixed-motive environments, parameterized reward mutuality, calibrated interdependence coefficients, game-theoretic oracle baselines, and validated case studies.

2026-05-03T21:14:06Z 82 pages, 14 figures, 9 tables, 51 references. AI-track technical report companion to the four-paper foundational series; should be read with arXiv:2510.18802, arXiv:2510.24909, arXiv:2601.16237, and arXiv:2604.01240. Reproducibility package and source code: https://github.com/vikpant/strategic-coopetition. Datasets released under CC-BY-4.0 at https://huggingface.co/vikpant Vik Pant Eric Yu