https://arxiv.org/api/a5j/36UVxFs/Y9dxZQAAXi9r0i8 2026-06-29T09:48:00Z 12774 1410 15 http://arxiv.org/abs/2604.01838v1 Free Information Disrupts Even Bayesian Crowds 2026-04-02T09:52:07Z A core tenet underpinning the conception of contemporary information networks, such as social media platforms, is that users should not be constrained in the amount of information they can freely and willingly exchange with one another about a given topic. By means of a computational agent-based model, we show how even in groups of truth-seeking and cooperative agents with perfect information-processing abilities, unconstrained information exchange may lead to detrimental effects on the correctness of the group's beliefs. If unconstrained information exchange can be detrimental even among such idealized agents, it is prudent to assume it can also be so in practice. We therefore argue that constraints on information flow should be carefully considered in the design of communication networks with substantial societal impact, such as social media platforms. 2026-04-02T09:52:07Z Jonas Stein Shannon Cruz Davide Grossi Martina Testori 10.1073/pnas.2518472123 http://arxiv.org/abs/2603.27584v2 Sci-Mind: Cognitively-Inspired Adversarial Debate for Autonomous Mathematical Modeling 2026-04-02T06:48:36Z Real-world mathematical modeling is inherently an experiential and collaborative endeavor. Domain experts rarely solve complex problems from scratch; instead, they draw upon analogies from historical cases and subject their hypotheses to rigorous peer scrutiny. However, autonomous agents powered by Large Language Models predominantly rely on isolated reasoning paradigms, frequently generating plausible but fundamentally flawed models due to a lack of domain grounding and adversarial verification. To address these limitations, we propose Sci-Mind, a novel framework that mirrors the human scientific discovery process. Sci-Mind integrates Experiential Memory Recall to retrieve executable code snippets and modeling paradigm descriptors, grounding abstract reasoning in historical solutions. Subsequently, it employs an Adversarial Cognitive Dialectic where a Theorist optimizing mathematical coherence and a Pragmatist enforcing data feasibility debate through competing objectives to prune elegant but infeasible formulations. A Self-Validating Execution Strategy further ensures blueprint consistency through formal predicates before code generation, achieving fully autonomous execution. Extensive experiments on the MM-Bench and EngiBench demonstrate that Sci-Mind significantly outperforms leading autonomous agents in both modeling rigorousness and code executability. 2026-03-29T08:58:44Z Junhao Jia Huangwei Chen Ruiying Sun Yanhui Song Haishuai Wang Jiajun Bu Lei Wu http://arxiv.org/abs/2506.03828v3 AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance 2026-04-02T02:21:44Z AI for Industrial Asset Lifecycle Management aims to automate complex operational workflows, such as condition monitoring and maintenance scheduling, to minimize system downtime. While traditional AI/ML approaches solve narrow tasks in isolation, Large Language Model (LLM) agents offer a next-generation opportunity for end-to-end automation. In this paper, we introduce AssetOpsBench, a unified framework for orchestrating and evaluating domain-specific agents for Industry 4.0. AssetOpsBench provides a multimodal ecosystem comprising a catalog of four domain-specific agents, a curated dataset of 140+ human-authored natural-language queries grounded in real industrial scenarios, and a simulated, CouchDB-backed IoT environment. We introduce an automated evaluation framework that uses three key metrics to analyze architectural trade-offs between the Tool-As-Agent and Plan-Executor paradigms, along with a systematic procedure for the automated discovery of emerging failure modes. The practical relevance of AssetOpsBench is demonstrated by its broad community adoption, with 250+ users and over 500 agents submitted to our public benchmarking platform, supporting reproducible and scalable research for real-world industrial operations. The code is accesible at https://github.com/IBM/AssetOpsBench . 2025-06-04T10:57:35Z 25 pages, 18 figures Dhaval Patel Shuxin Lin James Rayfield Nianjun Zhou Chathurangi Shyalika Suryanarayana R Yarrabothula Roman Vaculin Natalia Martinez Fearghal O'donncha Jayant Kalagnanam http://arxiv.org/abs/2604.01529v1 A Role-Based LLM Framework for Structured Information Extraction from Healthy Food Policies 2026-04-02T01:58:37Z Current Large Language Model (LLM) approaches for information extraction (IE) in the healthy food policy domain are often hindered by various factors, including misinformation, specifically hallucinations, misclassifications, and omissions that result from the structural diversity and inconsistency of policy documents. To address these limitations, this study proposes a role-based LLM framework that automates the IE from unstructured policy data by assigning specialized roles: an LLM policy analyst for metadata and mechanism classification, an LLM legal strategy specialist for identifying complex legal approaches, and an LLM food system expert for categorizing food system stages. This framework mimics expert analysis workflows by incorporating structured domain knowledge, including explicit definitions of legal mechanisms and classification criteria, into role-specific prompts. We evaluate the framework using 608 healthy food policies from the Healthy Food Policy Project (HFPP) database, comparing its performance against zero-shot, few-shot, and chain-of-thought (CoT) baselines using Llama-3.3-70B. Our proposed framework demonstrates superior performance in complex reasoning tasks, offering a reliable and transparent methodology for automating IE from health policies. 2026-04-02T01:58:37Z Congjing Zhang Ruoxuan Bao Jingyu Li Yoav Ackerman Shuai Huang Yanfang Su http://arxiv.org/abs/2604.01213v1 Collaborative Task and Path Planning for Heterogeneous Robotic Teams using Multi-Agent PPO 2026-04-01T17:53:51Z Efficient robotic extraterrestrial exploration requires robots with diverse capabilities, ranging from scientific measurement tools to advanced locomotion. A robotic team enables the distribution of tasks over multiple specialized subsystems, each providing specific expertise to complete the mission. The central challenge lies in efficiently coordinating the team to maximize utilization and the extraction of scientific value. Classical planning algorithms scale poorly with problem size, leading to long planning cycles and high inference costs due to the combinatorial growth of possible robot-target allocations and possible trajectories. Learning-based methods are a viable alternative that move the scaling concern from runtime to training time, setting a critical step towards achieving real-time planning. In this work, we present a collaborative planning strategy based on Multi-Agent Proximal Policy Optimization (MAPPO) to coordinate a team of heterogeneous robots to solve a complex target allocation and scheduling problem. We benchmark our approach against single-objective optimal solutions obtained through exhaustive search and evaluate its ability to perform online replanning in the context of a planetary exploration scenario. 2026-04-01T17:53:51Z 8 pages, 3 figures, associated code on https://github.com/leggedrobotics/multi_robot_global_planner Matthias Rubio Julia Richter Hendrik Kolvenbach Marco Hutter http://arxiv.org/abs/2512.02079v2 Robust Geospatial Coordination of Multi-Agent Communications Networks Under Attrition 2026-04-01T17:21:59Z Coordinating emergency responses in extreme environments, such as wildfires, requires resilient and high-bandwidth communication backbones. While autonomous aerial swarms can establish ad-hoc networks to provide this connectivity, the high risk of individual node attrition in these settings often leads to network fragmentation and mission-critical downtime. To overcome this challenge, we introduce and formalize the problem of Robust Task Networking Under Attrition (RTNUA), which extends connectivity maintenance in multi-robot systems to explicitly address proactive redundancy and attrition recovery. We then introduce Physics-Informed Robust Employment of Multi-Agent Networks ($Φ$IREMAN), a topological algorithm leveraging physics-inspired potential fields to solve this problem. In our evaluations, $Φ$IREMAN consistently outperforms baselines, and is able to maintain greater than $99.9\%$ task uptime despite substantial attrition in simulations with up to 100 tasks and 500 drones, demonstrating both effectiveness and scalability. 2025-11-30T22:13:50Z 8 pages, 4 figures, 4 tables, accepted to IEEE RA-L Jonathan S. Kent Eliana Stefani Brian Plancher http://arxiv.org/abs/2604.02381v1 Agentic AI-Empowered Wireless Agent Networks With Semantic-Aware Collaboration via ILAC 2026-04-01T16:07:23Z The rapid development of agentic artificial intelligence (AI) is driving future wireless networks to evolve from passive data pipes into intelligent collaborative ecosystems under the emerging paradigm of integrated learning and communication (ILAC). However, realizing efficient agentic collaboration faces challenges not only in handling semantic redundancy but also in the lack of an integrated mechanism for communication, computation, and control. To address this, we propose a wireless agent network (WAN) framework that orchestrates a progressive knowledge aggregation mechanism. Specifically, we formulate the aggregation process as a joint energy minimization problem where the agents perform semantic compression to eliminate redundancy, optimize transmission power to deliver semantic payloads, and adjust physical trajectories to proactively enhance channel qualities. To solve this problem, we develop a hierarchical algorithm that integrates inner-level resource optimization with outer-level topology evolution. Theoretically, we reveal that incorporating a potential field into the topology evolution effectively overcomes the short-sightedness of greedy matching, providing a mathematically rigorous heuristic for long-term energy minimization. Simulation results demonstrate that the proposed framework achieves superior energy efficiency and scalability compared to conventional benchmarks, validating the efficacy of semantic-aware collaboration in dynamic environments. 2026-04-01T16:07:23Z Zhouxiang Zhao Jiaxiang Wang Zhaohui Yang Kun Yang Zhaoyang Zhang Mingzhe Chen Kaibin Huang http://arxiv.org/abs/2604.01020v1 OrgAgent: Organize Your Multi-Agent System like a Company 2026-04-01T15:21:14Z While large language model-based multi-agent systems have shown strong potential for complex reasoning, how to effectively organize multiple agents remains an open question. In this paper, we introduce OrgAgent, a company-style hierarchical multi-agent framework that separates collaboration into governance, execution, and compliance layers. OrgAgent decomposes multi-agent reasoning into three layers: a governance layer for planning and resource allocation, an execution layer for task solving and review, and a compliance layer for final answer control. By evaluating the framework across reasoning tasks, LLMs, execution modes, and execution policies, we find that multi-agent systems organized in a company-style hierarchy generally outperform other organizational structures. Besides, hierarchical coordination also reduces token consumption relative to flat collaboration in most settings. For example, for GPT-OSS-120B, the hierarchical setting improves performance over flat multi-agent system by 102.73% while reducing token usage by 74.52% on SQuAD 2.0. Further analysis shows that hierarchy helps most when tasks benefit from stable skill assignment, controlled information flow, and layered verification. Overall, our findings highlight organizational structure as an important factor in multi-agent reasoning, shaping not only effectiveness and cost, but also coordination behavior. 2026-04-01T15:21:14Z Yiru Wang Xinyue Shen Yaohui Han Michael Backes Pin-Yu Chen Tsung-Yi Ho http://arxiv.org/abs/2604.00842v1 Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants 2026-04-01T12:53:01Z Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration. 2026-04-01T12:53:01Z 34 pages, 8 figures, 5 tables Deepak Nathani Cheng Zhang Chang Huan Jiaming Shan Yinfei Yang Alkesh Patel Zhe Gan William Yang Wang Michael Saxon Xin Eric Wang http://arxiv.org/abs/2604.00810v1 Role Differentiation in a Coupled Resource Ecology under Multi-Level Selection 2026-04-01T12:19:15Z A group of non-cooperating agents can succumb to the \emph{tragedy-of-the-commons} if all of them seek to maximize the same resource channel to improve their viability. In nature, however, groups often avoid such collapses by differentiating into distinct roles that exploit different resource channels. It remains unclear how such coordination can emerge under continual individual-level selection alone. To address this, we introduce a computational model of multi-level selection, in which group-level selection shapes a common substrate and mutation operator shared by all group members undergoing individual-level selection. We also place this process in an embodied ecology where distinct resource channels are not segregated, but coupled through the same behavioral primitives. These channels are classified as a positive-sum intake channel and a zero-sum redistribution channel. We investigate whether such a setting can give rise to role differentiation under turnover driven by birth and death. We find that in a learned ecology, both channels remain occupied at the colony level, and the collapse into a single acquisition mode is avoided. Zero-sum channel usage increases over generations despite not being directly optimized by group-level selection. Channel occupancy also fluctuates over the lifetime of a boid. Ablation studies suggest that most baseline performance is carried by the inherited behavioral basis, while the learned variation process provides a smaller but systematic improvement prior to saturation. Together, the results suggest that multi-level selection can enable groups in a common-pool setting to circumvent tragedy-of-the-commons through differentiated use of coupled channels under continual turnover. 2026-04-01T12:19:15Z 9 pages, 6 figures, 1 table Siddharth Chaturvedi Ahmed El-Gazzar Marcel van Gerven http://arxiv.org/abs/2604.00717v1 GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization 2026-04-01T10:26:22Z Non-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in a passive perception state. This inevitably triggers equilibrium oscillations and significantly slows the convergence speed of the system. To address this issue, we propose Gradient Realignment via Active Shared Perception (GRASP), a novel framework that defines generalized Bellman equilibrium as a stable objective for policy evolution. The core mechanism of GRASP involves utilizing the independent gradients of agents to derive a defined consensus gradient, enabling agents to actively perceive policy updates and optimize team collaboration. Theoretically, we leverage the Kakutani Fixed-Point Theorem to prove that the consensus direction $u^*$ guarantees the existence and attainability of this equilibrium. Extensive experiments on StarCraft II Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate the scalability and promising performance of the framework. 2026-04-01T10:26:22Z Sihan Zhou Tiantian He Yifan Lu Yaqing Hou Yew-Soon Ong http://arxiv.org/abs/2509.11062v3 Auto-Slides: An Interactive Multi-Agent System for Creating and Customizing Research Presentations 2026-04-01T08:41:51Z The rapid progress of large language models (LLMs) has opened new opportunities for education. While learners can interact with academic papers through LLM-powered dialogue, limitations still exist: the lack of structured organization and the heavy reliance on text can impede systematic understanding and engagement with complex concepts. To address these challenges, we propose Auto-Slides, an LLM-driven system that converts research papers into pedagogically structured, multimodal slides (e.g., diagrams and tables). Drawing on cognitive science, it creates a presentation-oriented narrative and allows iterative refinement via an interactive editor to better match learners' knowledge level and goals. Auto-Slides further incorporates verification and knowledge retrieval mechanisms to ensure accuracy and contextual completeness. Through extensive user studies, Auto-Slides demonstrates strong learner acceptance, improved structural support for understanding, and expert-validated gains in narrative quality compared with conventional LLM-based reading. Our contributions lie in designing a multi-agent framework for transforming academic papers into pedagogically optimized slides and introducing interactive customization for personalized learning. 2025-09-14T03:05:54Z Project Homepage: https://auto-slides.github.io/ Yuheng Yang Wenjia Jiang Yang Wang Yi Song Yiwei Wang Chi Zhang http://arxiv.org/abs/2509.25302v2 Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents 2026-04-01T07:32:39Z The prevalent deployment of Large Language Model agents such as OpenClaw unlocks potential in real-world applications, while amplifying safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has transitioned from a theoretical warning to a pressing reality. Previous studies mainly examine whether LLM agents can self-replicate when directly instructed, potentially overlooking the risk of spontaneous replication driven by real-world settings (e.g., ensuring survival against termination threats). In this paper, we present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks (e.g., dynamic load balancing) to enable scenario-driven assessment of agent behaviors. Designing tasks that might induce misalignment between users' and agents' objectives makes it possible to decouple replication success from risk and capture self-replication risks arising from these misalignment settings. We further introduce Overuse Rate ($\mathrm{OR}$) and Aggregate Overuse Count ($\mathrm{AOC}$) metrics, which precisely capture the frequency and severity of uncontrolled replication. In our evaluation of 21 state-of-the-art open-source and proprietary models, we observe that over 50\% of LLM agents display a pronounced tendency toward uncontrolled self-replication under operational pressures. Our results underscore the urgent need for scenario-driven risk assessment and robust safeguards in the practical deployment of LLM-based agents. 2025-09-29T17:49:50Z 26 pages, 6 figures Boxuan Zhang Yi Yu Jiaxuan Guo Jing Shao http://arxiv.org/abs/2604.00523v1 Lipschitz Dueling Bandits over Continuous Action Spaces 2026-04-01T06:07:33Z We study for the first time, stochastic dueling bandits over continuous action spaces with Lipschitz structure, where feedback is purely comparative. While dueling bandits and Lipschitz bandits have been studied separately, their combination has remained unexplored. We propose the first algorithm for Lipschitz dueling bandits, using round-based exploration and recursive region elimination guided by an adaptive reference arm. We develop new analytical tools for relative feedback and prove a regret bound of $\tilde O\left(T^{\frac{d_z+1}{d_z+2}}\right)$, where $d_z$ is the zooming dimension of the near-optimal region. Further, our algorithm takes only logarithmic space in terms of the total time horizon, best achievable by any bandit algorithm over a continuous action space. 2026-04-01T06:07:33Z Mudit Sharma Shweta Jain Vaneet Aggarwal Ganesh Ghalme http://arxiv.org/abs/2604.00477v1 Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation 2026-04-01T04:44:21Z LLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs across 15 tasks, we show that persona-based agent judges produce evaluations indistinguishable from human raters in a Turing-style validation. We then identify a score-coverage dissociation: quality scores improve logarithmically with panel size, while unique issue discoveries follow a sublinear power law-both exhibit diminishing returns, but scores saturate roughly twice as fast as discoveries. We hypothesize this reflects a power law distribution of the finding space: critical issues are discovered first by small panels, while corner cases require progressively larger panels, analogous to species accumulation curves in ecology. The mechanism traces to ensemble diversity-Big Five personality conditioning makes agents probe different quality dimensions, with expert judges acting as adversarial probes that push discovery into the tail of the finding distribution. A controlled ablation confirms that structured persona conditioning, not simple prompting, is required to produce these scaling properties. 2026-04-01T04:44:21Z HyunJoon Jung William Na