https://arxiv.org/api/HybKV6Z+zhp948S44J01faWge80 2026-06-30T14:08:37Z 12807 1455 15 http://arxiv.org/abs/2604.01020v1 OrgAgent: Organize Your Multi-Agent System like a Company 2026-04-01T15:21:14Z

While large language model-based multi-agent systems have shown strong potential for complex reasoning, how to effectively organize multiple agents remains an open question. In this paper, we introduce OrgAgent, a company-style hierarchical multi-agent framework that separates collaboration into governance, execution, and compliance layers. OrgAgent decomposes multi-agent reasoning into three layers: a governance layer for planning and resource allocation, an execution layer for task solving and review, and a compliance layer for final answer control. By evaluating the framework across reasoning tasks, LLMs, execution modes, and execution policies, we find that multi-agent systems organized in a company-style hierarchy generally outperform other organizational structures. Besides, hierarchical coordination also reduces token consumption relative to flat collaboration in most settings. For example, for GPT-OSS-120B, the hierarchical setting improves performance over flat multi-agent system by 102.73% while reducing token usage by 74.52% on SQuAD 2.0. Further analysis shows that hierarchy helps most when tasks benefit from stable skill assignment, controlled information flow, and layered verification. Overall, our findings highlight organizational structure as an important factor in multi-agent reasoning, shaping not only effectiveness and cost, but also coordination behavior.

2026-04-01T15:21:14Z Yiru Wang Xinyue Shen Yaohui Han Michael Backes Pin-Yu Chen Tsung-Yi Ho http://arxiv.org/abs/2604.00842v1 Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants 2026-04-01T12:53:01Z

Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.

2026-04-01T12:53:01Z 34 pages, 8 figures, 5 tables Deepak Nathani Cheng Zhang Chang Huan Jiaming Shan Yinfei Yang Alkesh Patel Zhe Gan William Yang Wang Michael Saxon Xin Eric Wang http://arxiv.org/abs/2604.00810v1 Role Differentiation in a Coupled Resource Ecology under Multi-Level Selection 2026-04-01T12:19:15Z

A group of non-cooperating agents can succumb to the \emph{tragedy-of-the-commons} if all of them seek to maximize the same resource channel to improve their viability. In nature, however, groups often avoid such collapses by differentiating into distinct roles that exploit different resource channels. It remains unclear how such coordination can emerge under continual individual-level selection alone. To address this, we introduce a computational model of multi-level selection, in which group-level selection shapes a common substrate and mutation operator shared by all group members undergoing individual-level selection. We also place this process in an embodied ecology where distinct resource channels are not segregated, but coupled through the same behavioral primitives. These channels are classified as a positive-sum intake channel and a zero-sum redistribution channel. We investigate whether such a setting can give rise to role differentiation under turnover driven by birth and death. We find that in a learned ecology, both channels remain occupied at the colony level, and the collapse into a single acquisition mode is avoided. Zero-sum channel usage increases over generations despite not being directly optimized by group-level selection. Channel occupancy also fluctuates over the lifetime of a boid. Ablation studies suggest that most baseline performance is carried by the inherited behavioral basis, while the learned variation process provides a smaller but systematic improvement prior to saturation. Together, the results suggest that multi-level selection can enable groups in a common-pool setting to circumvent tragedy-of-the-commons through differentiated use of coupled channels under continual turnover.

2026-04-01T12:19:15Z 9 pages, 6 figures, 1 table Siddharth Chaturvedi Ahmed El-Gazzar Marcel van Gerven http://arxiv.org/abs/2604.00717v1 GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization 2026-04-01T10:26:22Z

Non-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in a passive perception state. This inevitably triggers equilibrium oscillations and significantly slows the convergence speed of the system. To address this issue, we propose Gradient Realignment via Active Shared Perception (GRASP), a novel framework that defines generalized Bellman equilibrium as a stable objective for policy evolution. The core mechanism of GRASP involves utilizing the independent gradients of agents to derive a defined consensus gradient, enabling agents to actively perceive policy updates and optimize team collaboration. Theoretically, we leverage the Kakutani Fixed-Point Theorem to prove that the consensus direction $u^*$ guarantees the existence and attainability of this equilibrium. Extensive experiments on StarCraft II Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate the scalability and promising performance of the framework.

2026-04-01T10:26:22Z Sihan Zhou Tiantian He Yifan Lu Yaqing Hou Yew-Soon Ong http://arxiv.org/abs/2509.11062v3 Auto-Slides: An Interactive Multi-Agent System for Creating and Customizing Research Presentations 2026-04-01T08:41:51Z

The rapid progress of large language models (LLMs) has opened new opportunities for education. While learners can interact with academic papers through LLM-powered dialogue, limitations still exist: the lack of structured organization and the heavy reliance on text can impede systematic understanding and engagement with complex concepts. To address these challenges, we propose Auto-Slides, an LLM-driven system that converts research papers into pedagogically structured, multimodal slides (e.g., diagrams and tables). Drawing on cognitive science, it creates a presentation-oriented narrative and allows iterative refinement via an interactive editor to better match learners' knowledge level and goals. Auto-Slides further incorporates verification and knowledge retrieval mechanisms to ensure accuracy and contextual completeness. Through extensive user studies, Auto-Slides demonstrates strong learner acceptance, improved structural support for understanding, and expert-validated gains in narrative quality compared with conventional LLM-based reading. Our contributions lie in designing a multi-agent framework for transforming academic papers into pedagogically optimized slides and introducing interactive customization for personalized learning.

2025-09-14T03:05:54Z Project Homepage: https://auto-slides.github.io/ Yuheng Yang Wenjia Jiang Yang Wang Yi Song Yiwei Wang Chi Zhang http://arxiv.org/abs/2509.25302v2 Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents 2026-04-01T07:32:39Z

The prevalent deployment of Large Language Model agents such as OpenClaw unlocks potential in real-world applications, while amplifying safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has transitioned from a theoretical warning to a pressing reality. Previous studies mainly examine whether LLM agents can self-replicate when directly instructed, potentially overlooking the risk of spontaneous replication driven by real-world settings (e.g., ensuring survival against termination threats). In this paper, we present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks (e.g., dynamic load balancing) to enable scenario-driven assessment of agent behaviors. Designing tasks that might induce misalignment between users' and agents' objectives makes it possible to decouple replication success from risk and capture self-replication risks arising from these misalignment settings. We further introduce Overuse Rate ($\mathrm{OR}$) and Aggregate Overuse Count ($\mathrm{AOC}$) metrics, which precisely capture the frequency and severity of uncontrolled replication. In our evaluation of 21 state-of-the-art open-source and proprietary models, we observe that over 50\% of LLM agents display a pronounced tendency toward uncontrolled self-replication under operational pressures. Our results underscore the urgent need for scenario-driven risk assessment and robust safeguards in the practical deployment of LLM-based agents.

2025-09-29T17:49:50Z 26 pages, 6 figures Boxuan Zhang Yi Yu Jiaxuan Guo Jing Shao http://arxiv.org/abs/2604.00523v1 Lipschitz Dueling Bandits over Continuous Action Spaces 2026-04-01T06:07:33Z

We study for the first time, stochastic dueling bandits over continuous action spaces with Lipschitz structure, where feedback is purely comparative. While dueling bandits and Lipschitz bandits have been studied separately, their combination has remained unexplored. We propose the first algorithm for Lipschitz dueling bandits, using round-based exploration and recursive region elimination guided by an adaptive reference arm. We develop new analytical tools for relative feedback and prove a regret bound of $\tilde O\left(T^{\frac{d_z+1}{d_z+2}}\right)$, where $d_z$ is the zooming dimension of the near-optimal region. Further, our algorithm takes only logarithmic space in terms of the total time horizon, best achievable by any bandit algorithm over a continuous action space.

2026-04-01T06:07:33Z Mudit Sharma Shweta Jain Vaneet Aggarwal Ganesh Ghalme http://arxiv.org/abs/2604.00477v1 Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation 2026-04-01T04:44:21Z

LLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs across 15 tasks, we show that persona-based agent judges produce evaluations indistinguishable from human raters in a Turing-style validation. We then identify a score-coverage dissociation: quality scores improve logarithmically with panel size, while unique issue discoveries follow a sublinear power law-both exhibit diminishing returns, but scores saturate roughly twice as fast as discoveries. We hypothesize this reflects a power law distribution of the finding space: critical issues are discovered first by small panels, while corner cases require progressively larger panels, analogous to species accumulation curves in ecology. The mechanism traces to ensemble diversity-Big Five personality conditioning makes agents probe different quality dimensions, with expert judges acting as adversarial probes that push discovery into the tail of the finding distribution. A controlled ablation confirms that structured persona conditioning, not simple prompting, is required to produce these scaling properties.

2026-04-01T04:44:21Z HyunJoon Jung William Na http://arxiv.org/abs/2604.00451v1 CASCADE: Cascaded Scoped Communication for Multi-Agent Re-planning in Disrupted Industrial Environments 2026-04-01T04:01:16Z

Industrial disruption replanning demands multi-agent coordination under strict latency and communication budgets, where disruptions propagate through tightly coupled physical dependencies and rapidly invalidate baseline schedules and commitments. Existing coordination schemes often treat communication as either effectively free (broadcast-style escalation) or fixed in advance (hand-tuned neighborhoods), both of which are brittle once the disruption footprint extends beyond a local region. We present \CASCADE, a budgeted replanning mechanism that makes communication scope explicit and auditable rather than fixed or implicit. Each agent maintains an explicit knowledge base, solves role-conditioned local decision problems to revise commitments, and coordinates through lightweight contract primitives whose footprint expands only when local validation indicates that the current scope is insufficient. This design separates a unified agent substrate (Knowledge Base / Decision Manager / Communication Manager) from a scoped interaction layer that controls who is contacted, how far coordination propagates, and when escalation is triggered under explicit budgets. We evaluate \CASCADE on disrupted manufacturing and supply-chain settings using unified diagnostics intended to test a mechanism-design claim -- whether explicit scope control yields useful quality-latency-communication trade-offs and improved robustness under uncertainty -- rather than to provide a complete algorithmic ranking.

2026-04-01T04:01:16Z Published at ICLR 2026 Workshop on AI for Mechanism Design and Strategic Decision Making Mingjie Bi http://arxiv.org/abs/2604.00433v1 Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games 2026-04-01T03:25:21Z

This letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential games, and an approximation error capturing the use of finite-state controllers. Finally, simulations across multiple partially observable environments demonstrate that the proposed method using finite-state controllers achieves consistent improvements in performance compared to the setting where only the current observation is used.

2026-04-01T03:25:21Z 6 pages, 2 figures. Submitted to IEEE Control Systems Letters (L-CSS) with CDC option Wonseok Yang Thinh T. Doan http://arxiv.org/abs/2604.00430v1 Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents 2026-04-01T03:17:35Z

Large language model (LLM)-based agents have recently gained considerable attention due to the powerful reasoning capabilities of LLMs. Existing research predominantly focuses on enhancing the task performance of these agents in diverse scenarios. However, as LLM-based agents become increasingly integrated into real-world applications, significant concerns emerge regarding their accumulation of sensitive or outdated knowledge. Addressing these concerns requires the development of mechanisms that allow agents to selectively forget previously learned knowledge, giving rise to a new term LLM-based agent unlearning. This paper initiates research on unlearning in LLM-based agents. Specifically, we propose a novel and comprehensive framework that categorizes unlearning scenarios into three contexts: state unlearning (forgetting specific states or items), trajectory unlearning (forgetting sequences of actions) and environment unlearning (forgetting entire environments or categories of tasks). Within this framework, we introduce a natural language-based unlearning method that trains a conversion model to transform high-level unlearning requests into actionable unlearning prompts, guiding agents through a controlled forgetting process. Moreover, to evaluate the robustness of the proposed framework, we introduce an unlearning inference adversary capable of crafting prompts, querying agents, and observing their behaviors in an attempt to infer the forgotten knowledge. Experimental results show that our approach effectively enables agents to forget targeted knowledge while preserving performance on untargeted tasks, and prevents the adversary from inferring the forgotten knowledge.

2026-04-01T03:17:35Z Dayong Ye Tainqing Zhu Congcong Zhu Feng He Qi He Shang Wang Bo Liu Wanlei Zhou http://arxiv.org/abs/2604.00319v1 Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry 2026-03-31T23:33:56Z

We develop algorithms for collaborative control of AI agents and critics in a multi-actor, multi-critic federated multi-agent system. Each AI agent and critic has access to classical machine learning or generative AI foundation models. The AI agents and critics collaborate with a central server to complete multimodal tasks such as fault detection, severity, and cause analysis in a network telemetry system, text-to-image generation, video generation, healthcare diagnostics from medical images and patient records, etcetera. The AI agents complete their tasks and send them to AI critics for evaluation. The critics then send feedback to agents to improve their responses. Collaboratively, they minimize the overall cost to the system with no inter-agent or inter-critic communication. AI agents and critics keep their cost functions or derivatives of cost functions private. Using multi-time scale stochastic approximation techniques, we provide convergence guarantees on the time-average active states of AI agents and critics. The communication overhead is a little on the system, of the order of $\mathcal{O}(m)$, for $m$ modalities and is independent of the number of AI agents and critics. Finally, we present an example of fault detection, severity, and cause analysis in network telemetry and thorough evaluation to check the algorithm's efficacy.

2026-03-31T23:33:56Z Syed Eqbal Alam Zhan Shu http://arxiv.org/abs/2604.00284v1 Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections 2026-03-31T22:16:25Z

We formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills in knowledge retrieval, summarization and awareness of cognitive states of other agents. We show how the game serves as a good benchmark for social intelligence abilities of language model based agents that go beyond the agents' own memory and deductive reasoning and also involve gauging the understanding capabilities of other agents. Finally, we show how through communication with other agents in a constrained environment, AI agents must demonstrate social awareness and intelligence in games involving collaboration.

2026-03-31T22:16:25Z https://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdf https://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdf Gaurav Rajesh Parikh Angikar Ghosal http://arxiv.org/abs/2602.01415v4 Evidence-Decision-Feedback: Theory-Driven Adaptive Scaffolding for LLM Agents 2026-03-31T21:23:01Z

LLMs offer tremendous opportunities for pedagogical agents to help students construct knowledge and develop problem-solving skills, yet many of these agents operate on a "one-size-fits-all" basis, limiting their ability to personalize support. To address this, we introduce Evidence-Decision-Feedback (EDF), a theoretical framework for adaptive scaffolding with LLM agents. EDF integrates elements of intelligent tutoring systems (ITS) and agentic behavior by organizing interactions around evidentiary inference, pedagogical decision-making, and adaptive feedback. We instantiate EDF through Copa, a Collaborative Peer Agent for STEM+C problem-solving. In an authentic high school classroom study, we show that EDF-guided interactions align feedback with students' demonstrated understanding and task mastery; promote scaffold fading; and support interpretable, evidence-grounded explanations without fostering overreliance.

2026-02-01T19:43:00Z To appear as a full paper in the proceedings of the 27th International Conference on Artificial Intelligence in Education (AIED26) Clayton Cohn Siyuan Guo Surya Rayala Hanchen David Wang Naveeduddin Mohammed Umesh Timalsina Shruti Jain Angela Eeds Menton Deweese Pamela J. Osborn Popp Rebekah Stanton Shakeera Walker Meiyi Ma Gautam Biswas http://arxiv.org/abs/2604.00249v1 A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation 2026-03-31T21:21:31Z

Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed to simulate supportive behavioral health dialogue through coordinated, role-differentiated agents. Conversational responsibilities are decomposed across specialized agents, including empathy-focused, action-oriented, and supervisory roles, while a prompt-based controller dynamically activates relevant agents and enforces continuous safety auditing. Using semi-structured interview transcripts from the DAIC-WOZ corpus, we evaluate the framework with scalable proxy metrics capturing structural quality, functional diversity, and computational characteristics. Results illustrate clear role differentiation, coherent inter-agent coordination, and predictable trade-offs between modular orchestration, safety oversight, and response latency when compared to a single-agent baseline. This work emphasizes system design, interpretability, and safety, positioning the framework as a simulation and analysis tool for behavioral health informatics and decision-support research rather than a clinical intervention.

2026-03-31T21:21:31Z Ha Na Cho