https://arxiv.org/api/lW4+HJQAtsOU8n05ruNI6NzXgSk 2026-06-28T00:34:37Z 12761 1305 15 http://arxiv.org/abs/2604.07751v1 Learning to Coordinate over Networks with Bounded Rationality 2026-04-09T03:16:32Z

Network coordination games are widely used to model collaboration among interconnected agents, with applications across diverse domains including economics, robotics, and cyber-security. We consider networks of bounded-rational agents who interact through binary stag hunt games, a canonical game theoretic model for distributed collaborative tasks. Herein, the agents update their actions using logit response functions, yielding the Log-Linear Learning (LLL) algorithm. While convergence of LLL to a risk-dominant Nash equilibrium requires unbounded rationality, we consider regimes in which rationality is strictly bounded. We first show that the stationary probability of states corresponding to perfect coordination is monotone increasing in the rationality parameter $β$. For $K$-regular networks, we prove that the stationary probability of a perfectly coordinated action profile is monotone in the connectivity degree $K$, and we provide an upper bound on the minimum rationality required to achieve a desired level of coordination. For irregular networks, we show that the stationary probability of perfectly coordinated action profiles increases with the number of edges in the graph. We show that, for a large class of networks, the partition function of the Gibbs measure is well approximated by the moment generating function of Gaussian random variable. This approximation allows us to optimize degree distributions and establishes that the optimal network - i.e., the one that maximizes the stationary probability of coordinated action profiles - is $K$-regular. Consequently, our results indicate that networks of uniformly bounded-rational agents achieve the most reliable coordination when connectivity is evenly distributed among agents.

2026-04-09T03:16:32Z To be submitted to the IEEE Transactions on Automatic Control Zhewei Wang Emrah Akyol Marcos M. Vasconcelos http://arxiv.org/abs/2604.07721v1 Sima 1.0: A Collaborative Multi-Agent Framework for Documentary Video Production 2026-04-09T02:11:30Z

Content creation for major video-sharing platforms demands significant manual labor, particularly for long-form documentary videos spanning one to two hours. In this work, we introduce Sima 1.0, a multi-agent system designed to optimize the weekly production pipeline for high-quality video generation. The framework partitions the production process into an 11-step pipeline distributed across a hybrid workforce. While foundational creative tasks and physical recording are executed by a human operator, time-intensive editing, caption refinement, and supplementary asset integration are delegated to specialized junior and senior-level AI agents. By systematizing tasks from script annotation to final asset exportation, Sima 1.0 significantly reduces the production workload, empowering a single creator to efficiently sustain a rigorous weekly publishing schedule.

2026-04-09T02:11:30Z Zhao Song http://arxiv.org/abs/2604.07667v1 From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation 2026-04-09T00:15:20Z

Multi-agent debate improves LLM reasoning, yet agreement among agents is not evidence of correctness. When agents converge on a wrong answer through social reinforcement, consensus-based stopping commits that error to an automated action with no recourse. We introduce Conformal Social Choice, a post-hoc decision layer that converts debate outputs into calibrated act-versus-escalate decisions. Verbalized probability distributions from heterogeneous agents are aggregated via a linear opinion pool and calibrated with split conformal prediction, yielding prediction sets with a marginal coverage guarantee: the correct answer is included with probability ${\geq}\,1{-}α$, without assumptions on individual model calibration. A hierarchical action policy maps singleton sets to autonomous action and larger sets to human escalation. On eight MMLU-Pro domains with three agents (Claude Haiku, DeepSeek-R1, Qwen-3 32B), coverage stays within 1--2 points of the target. The key finding is not that debate becomes more accurate, but that the conformal layer makes its failures actionable: 81.9% of wrong-consensus cases are intercepted at $α{=}0.05$. Because the layer refuses to act on cases where debate is confidently wrong, the remaining conformal singletons reach 90.0--96.8% accuracy (up to 22.1pp above consensus stopping) -- a selection effect, not a reasoning improvement. This safety comes at the cost of automation, but the operating point is user-adjustable via $α$.

2026-04-09T00:15:20Z Mengdie Flora Wang Haochen Xie Guanghui Wang Aijing Gao Guang Yang Ziyuan Li Qucy Wei Qiu Fangwei Han Hengzhi Qiu Yajing Huang Bing Zhu Jae Oh Woo http://arxiv.org/abs/2505.20579v6 The challenge of hidden gifts in multi-agent reinforcement learning 2026-04-08T20:58:11Z

Sometimes we benefit from actions that others have taken even when we are unaware that they took those actions. For example, if your neighbor chooses not to take a parking spot in front of your house when you are not there, you can benefit, even without being aware that they took this action. These ``hidden gifts'' represent an interesting challenge for multi-agent reinforcement learning (MARL), since assigning credit when the beneficial actions of others are hidden is non-trivial. Here, we study the impact of hidden gifts with a simple MARL task. In this task, agents in a grid-world environment have individual doors to unlock in order to obtain individual rewards. As well, if all the agents unlock their door the group receives a larger collective reward. However, there is only one key for all of the doors, such that the collective reward can only be obtained when the agents drop the key for others after they use it. Notably, there is nothing to indicate to an agent that the other agents have dropped the key, thus this act for others is a ``hidden gift''. We show that several different state-of-the-art MARL algorithms, including MARL specific architectures, fail to learn how to obtain the collective reward in this simple task. Interestingly, we find that decentralized actor-critic policy gradient agents can succeed when we provide them with information about their own action history, but MARL agents still cannot solve the task with action history. Finally, we derive a correction term for policy gradient agents, inspired by learning aware approaches, which reduces the variance in learning and helps them to converge to collective success more reliably. These results show that credit assignment in multi-agent settings can be particularly challenging in the presence of ``hidden gifts'', and demonstrate that self learning-awareness in decentralized agents can benefit these settings.

2025-05-26T23:28:52Z Increased analysis of LOLA baselines and moved to main section. Cleaned up proof and fixed error where gradient symbol was left in front of the log(policy). Self correction becomes more intuitive Dane Malenfant Blake A. Richards http://arxiv.org/abs/2604.07312v1 Intertemporal Demand Allocation for Inventory Control in Online Marketplaces 2026-04-08T17:22:37Z

Online marketplaces increasingly do more than simply match buyers and sellers: they route orders across competing sellers and, in many categories, offer ancillary fulfillment services that make seller inventory a source of platform revenue. We investigate how a platform can use intertemporal demand allocation to influence sellers' inventory choices without directly controlling stock. We develop a model in which the platform observes aggregate demand, allocates orders across sellers over time, and sellers choose between two fulfillment options, fulfill-by-merchant (FBM) and fulfill-by-platform (FBP), while replenishing inventory under state-dependent base-stock policies. The key mechanism we study is informational: by changing the predictability of each seller's sales stream, the platform changes sellers' safety-stock needs even when average demand shares remain unchanged. We focus on nondiscriminatory allocation policies that give sellers the same demand share and forecast risk. Within this class, uniform splitting minimizes forecast uncertainty, whereas any higher level of uncertainty can be implemented using simple low-memory allocation rules. Moreover, increasing uncertainty above the uniform benchmark requires routing rules that prevent sellers from inferring aggregate demand from their own sales histories. These results reduce the platform's problem to choosing a level of forecast uncertainty that trades off adoption of platform fulfillment against the inventory held by adopters. Our analysis identifies demand allocation as a powerful operational and informational design lever in digital marketplaces.

2026-04-08T17:22:37Z Rene Caldentey Tong Xie http://arxiv.org/abs/2604.07424v1 An Analysis of Artificial Intelligence Adoption in NIH-Funded Research 2026-04-08T17:05:11Z

Understanding the landscape of artificial intelligence (AI) and machine learning (ML) adoption across the National Institutes of Health (NIH) portfolio is critical for research funding strategy, institutional planning, and health policy. The advent of large language models (LLMs) has fundamentally transformed research landscape analysis, enabling researchers to perform large-scale semantic extraction from thousands of unstructured research documents. In this paper, we illustrate a human-in-the-loop research methodology for LLMs to automatically classify and summarize research descriptions at scale. Using our methodology, we present a comprehensive analysis of 58,746 NIH-funded biomedical research projects from 2025. We show that: (1) AI constitutes 15.9% of the NIH portfolio with a 13.4% funding premium, concentrated in discovery, prediction, and data integration across disease domains; (2) a critical research-to-deployment gap exists, with 79% of AI projects remaining in research/development stages while only 14.7% engage in clinical deployment or implementation; and (3) health disparities research is severely underrepresented at just 5.7% of AI-funded work despite its importance to NIH's equity mission. These findings establish a framework for evidence-based policy interventions to align the NIH AI portfolio with health equity goals and strategic research priorities.

2026-04-08T17:05:11Z Navapat Nananukul Mayank Kejriwal http://arxiv.org/abs/2604.07204v1 Designing for Accountable Agents: a Viewpoint 2026-04-08T15:28:47Z

AI systems are becoming increasingly complex, ubiquitous and autonomous, leading to increasing concerns about their impacts on individuals and society. In response, researchers have begun investigating how to ensure that the methods underlying AI decision-making are transparent and their decisions are explainable to people and conformant to human values and ethical principles. As part of this research thrust, the need for accountability within AI systems has been noted, but this notion has proven elusive to define; we aim to address this issue in the current paper. Unlike much recent work, we do not address accountability within the human organisational processes of developing and deploying AI; rather we consider what it would it mean for the agents within a multi-agent system (MAS), potentially including human agents, to be accountable to other agents or to have others accountable to them. In this work, we make the following contributions: we provide an in-depth survey of existing work on accountability in multiple disciplines, seeking to identify a coherent definition of the concept; we give a realistic example of a multi-agent system application domain that illustrates the benefits of enabling agents to follow accountability processes, and we identify a set of research challenges for the MAS community in building accountable agents, sketching out some initial solutions to these, thereby laying out a road-map for future research. Our focus is on laying the groundwork to enable autonomous elements within open socio-technical systems to take part in accountability processes.

2026-04-08T15:28:47Z Stephen Cranefield Nir Oren http://arxiv.org/abs/2510.18886v2 Emergence of Internal State-Modulated Swarming in Multi-Agent Patch Foraging System 2026-04-08T13:43:53Z

Active particles are entities that sustain persistent out-of-equilibrium motion by consuming energy. Under certain conditions, they exhibit the tendency to self-organize through coordinated movements, such as swarming via aggregation. While performing non-cooperative foraging tasks, the emergence of such swarming behavior in foragers, exemplifying active particles, has been attributed to the partial observability of the environment, in which the presence of another forager can serve as a proxy signal to indicate the potential presence of a food source or a resource patch. In this paper, we validate this phenomenon by simulating multiple self-propelled foragers as they forage from multiple resource patches in a non-cooperative manner. These foragers operate in a continuous two-dimensional space with stochastic position updates and partial observability. We evolve a shared policy in the form of a continuous-time recurrent neural network that serves as a velocity controller for the foragers. To this end, we use an evolutionary strategy algorithm wherein the different samples of the policy-distribution are evaluated in the same rollout. Then we show that agents are able to learn to adaptively forage in the environment. Next, we show the emergence of swarming in the form of aggregation among the foragers when resource patches are absent. We observe that the strength of this swarming behavior appears to be inversely proportional to the amount of resource stored in the foragers, which supports the risk-sensitive foraging claims. Empirical analysis of the learned controller's hidden states in minimal test runs uncovers their sensitivity to the amount of resource stored in a forager. Clamping these hidden states to represent a lesser amount of resource hastens its learned aggregation behavior.

2025-10-14T08:50:18Z 9 pages, 9 figures, 1 table, 1 algorithm Siddharth Chaturvedi Ahmed EL-Gazzar Marcel van Gerven http://arxiv.org/abs/2604.07036v1 ReDAct: Uncertainty-Aware Deferral for LLM Agents 2026-04-08T12:51:01Z

Recently, LLM-based agents have become increasingly popular across many applications, including complex sequential decision-making problems. However, they inherit the tendency of LLMs to hallucinate, leading to incorrect decisions. In sequential settings, even a single mistake can irreversibly degrade the trajectory, making hallucinations an even bigger problem. Although larger LLMs hallucinate less, they incur a significantly higher per-token cost. In this paper, we address this tradeoff by proposing ReDAct (Reason-Defer-Act). In ReDAct, an agent is equipped with two LLMs: a small, cheap model used by default, and a large, more reliable but expensive model. When the predictive uncertainty of the small model exceeds a calibrated threshold, the decision is deferred to the large model. We evaluate our approach in text-based embodied environments such as ALFWorld and MiniGrid and show that deferring only about 15% of decisions to the large model can match the quality of using it exclusively, while significantly reducing inference costs.

2026-04-08T12:51:01Z Dzianis Piatrashyn Nikita Kotelevskii Kirill Grishchenkov Nikita Glazkov Ivan Nasonov Ilya Makarov Timothy Baldwin Preslav Nakov Roman Vashurin Maxim Panov http://arxiv.org/abs/2604.07007v1 AgentCity: Constitutional Governance for Autonomous Agent Economies via Separation of Power 2026-04-08T12:28:20Z

Autonomous AI agents are beginning to operate across organizational boundaries on the open internet -- discovering, transacting with, and delegating to agents owned by other parties without centralized oversight. When agents from different human principals collaborate at scale, the collective becomes opaque: no single human can observe, audit, or govern the emergent behavior. We term this the Logic Monopoly -- the agent society's unchecked monopoly over the entire logic chain from planning through execution to evaluation. We propose the Separation of Power (SoP) model, a constitutional governance architecture deployed on public blockchain that breaks this monopoly through three structural separations: agents legislate operational rules as smart contracts, deterministic software executes within those contracts, and humans adjudicate through a complete ownership chain binding every agent to a responsible principal. In this architecture, smart contracts are the law itself -- the actual legislative output that agents produce and that governs their behavior. We instantiate SoP in AgentCity on an EVM-compatible layer-2 blockchain (L2) with a three-tier contract hierarchy (foundational, meta, and operational). The core thesis is alignment-through-accountability: if each agent is aligned with its human owner through the accountability chain, then the collective converges on behavior aligned with human intent -- without top-down rules. A pre-registered experiment evaluates this thesis in a commons production economy -- where agents share a finite resource pool and collaboratively produce value -- at 50-1,000 agent scale.

2026-04-08T12:28:20Z 111 pages, 11 figures, 19 tables, 67 references. Pre-registered experimental design Anbang Ruan Xing Zhang http://arxiv.org/abs/2410.08334v2 Exploring Natural Language-Based Strategies for Efficient Number Learning in Children through Reinforcement Learning 2026-04-08T11:56:06Z

In this paper, we build a reinforcement learning framework to study how children compose numbers using base-ten blocks. Studying numerical cognition in toddlers offers a powerful window into the learning process itself, because numbers sit at the intersection of language, logic, perception, and culture. Specifically, we utilize state of the art (SOTA) reinforcement learning algorithms and neural network architectures to understand how variations in linguistic instructions can affect the learning process. Our results also show that instructions providing explicit action guidance are a more effective learning signal for RL agents to construct numbers. Furthermore, we identify an effective curriculum for ordering numerical-composition examples during training, resulting in faster convergence and improved generalization to unseen data. These findings highlight the role of language and multi-modal signals in numerical cognition and provide hypotheses for designing effective instructional strategies for early childhood education.

2024-10-10T19:49:13Z Tirthankar Mittra http://arxiv.org/abs/2604.06972v1 Differentiable Environment-Trajectory Co-Optimization for Safe Multi-Agent Navigation 2026-04-08T11:43:38Z

The environment plays a critical role in multi-agent navigation by imposing spatial constraints, rules, and limitations that agents must navigate around. Traditional approaches treat the environment as fixed, without exploring its impact on agents' performance. This work considers environment configurations as decision variables, alongside agent actions, to jointly achieve safe navigation. We formulate a bi-level problem, where the lower-level sub-problem optimizes agent trajectories that minimize navigation cost and the upper-level sub-problem optimizes environment configurations that maximize navigation safety. We develop a differentiable optimization method that iteratively solves the lower-level sub-problem with interior point methods and the upper-level sub-problem with gradient ascent. A key challenge lies in analytically coupling these two levels. We address this by leveraging KKT conditions and the Implicit Function Theorem to compute gradients of agent trajectories w.r.t. environment parameters, enabling differentiation throughout the bi-level structure. Moreover, we propose a novel metric that quantifies navigation safety as a criterion for the upper-level environment optimization, and prove its validity through measure theory. Our experiments validate the effectiveness of the proposed framework in a variety of safety-critical navigation scenarios, inspired from warehouse logistics to urban transportation. The results demonstrate that optimized environments provide navigation guidance, improving both agents' safety and efficiency.

2026-04-08T11:43:38Z Zhan Gao Gabriele Fadini Stelian Coros Amanda Prorok http://arxiv.org/abs/2604.06876v1 Exploiting Aggregate Programming in a Multi-Robot Service Prototype 2026-04-08T09:37:40Z

Multi-robot systems are becoming increasingly relevant within diverse application domains, such as healthcare, exploration, and rescue missions. However, building such systems is still a significant challenge, since it adds the complexities of the physical nature of robots and their environments to those inherent in coordinating any distributed (multi-agent) system. Aggregate Programming (AP) has recently emerged as a promising approach to engineering resilient, distributed systems with proximity-based communication, and is notably supported by practical frameworks. In this paper we present a prototype of a multi-robot service system, which adopts AP for the design and implementation of its coordination software. The prototype has been validated both with simulations, and with tests in a University library.

2026-04-08T09:37:40Z In Proceedings PLACES 2026, arXiv:2604.05737 EPTCS 444, 2026, pp. 45-57 Giorgio Audrito Dipartimento di Informatica, Universita' di Torino Andrea Basso MITO Technology Daniele Bortoluzzi Dipartimento di Informatica, Universita' di Torino Ferruccio Damiani Dipartimento di Informatica, Universita' di Torino Giordano Scarso Dipartimento di Informatica, Universita' di Torino Gianluca Torta Dipartimento di Informatica, Universita' di Torino 10.4204/EPTCS.444.5 http://arxiv.org/abs/2604.06873v1 Generating Local Shields for Decentralised Partially Observable Markov Decision Processes 2026-04-08T09:36:52Z

Multi-agent systems under partial observation often struggle to maintain safety because each agent's locally chosen action does not, in general, determine the resulting joint action. Shielding addresses this by filtering actions based on the current state, but most existing techniques either assume access to a shared centralised global state or employ memoryless local filters that cannot consider interaction history. We introduce a shield process algebra with guarded choice and recursion for specifying safe global behaviour in communication-free Dec-POMDP settings. From a shield process, we compile a process automaton, then a global Mealy machine as a safe joint-action filter, and finally project it to local Mealy machines whose states are belief-style subsets of the global Mealy machine states consistent with each agent's observations, and which output per-agent safe action sets. We implement the pipeline in Rust and integrate PRISM, the Probabilistic Symbolic Model Checker, to compute best- and worst-case safety probabilities independently of the agents' policies. A multi-agent path-finding case study demonstrates how different shield processes substantially reduce collisions compared to the unshielded baseline while exhibiting varying levels of expressiveness and conservatism.

2026-04-08T09:36:52Z In Proceedings PLACES 2026, arXiv:2604.05737 EPTCS 444, 2026, pp. 1-10 Haoran Yang University of Oxford Nobuko Yoshida University of Oxford 10.4204/EPTCS.444.1 http://arxiv.org/abs/2604.06813v1 Event-Triggered Adaptive Consensus for Multi-Robot Task Allocation 2026-04-08T08:31:23Z

Coordinating robotic swarms in dynamic and communication-constrained environments remains a fundamental challenge for collective intelligence. This paper presents a novel framework for event-triggered organization, designed to achieve highly efficient and adaptive task allocation in a heterogeneous robotic swarm. Our approach is based on an adaptive consensus mechanism where communication for task negotiation is initiated only in response to significant events, eliminating unnecessary interactions. Furthermore, the swarm self-regulates its coordination pace based on the level of environmental conflict, and individual agent resilience is managed through a robust execution model based on Behavior Trees. This integrated architecture results in a collective system that is not only effective but also remarkably efficient and adaptive. We validate our framework through extensive simulations, benchmarking its performance against a range of coordination strategies. These include a non-communicating reactive behavior, a simple information-sharing protocol, the baseline Consensus-Based Bundle Algorithm (CBBA), and a periodic CBBA variant integrated within a Behavior Tree architecture. Furthermore, our approach is compared with Clustering-CBBA (C-CBBA), a state-of-the-art algorithm recognized for communication-efficient task management in heterogeneous clusters. Experimental results demonstrate that the proposed method significantly reduces network overhead when compared to communication-heavy strategies. Moreover, it maintains top-tier mission effectiveness regarding the number of tasks completed, showcasing high efficiency and practicality. The framework also exhibits significant resilience to both action execution and permanent agent failures, highlighting the effectiveness of our event-triggered model for designing adaptive and resource-efficient robotic swarms for complex scenarios.

2026-04-08T08:31:23Z 40 pages, 18 figures. Published in Computer Communications under CC-BY license Computer Communications, Volume 251, 2026, 108499 Fidel Aznar Mar Pujol Álvaro Díez 10.1016/j.comcom.2026.108499