https://arxiv.org/api/HybKV6Z+zhp948S44J01faWge802026-06-30T14:08:37Z12807145515http://arxiv.org/abs/2604.01020v1OrgAgent: Organize Your Multi-Agent System like a Company2026-04-01T15:21:14ZWhile large language model-based multi-agent systems have shown strong potential for complex reasoning, how to effectively organize multiple agents remains an open question. In this paper, we introduce OrgAgent, a company-style hierarchical multi-agent framework that separates collaboration into governance, execution, and compliance layers. OrgAgent decomposes multi-agent reasoning into three layers: a governance layer for planning and resource allocation, an execution layer for task solving and review, and a compliance layer for final answer control. By evaluating the framework across reasoning tasks, LLMs, execution modes, and execution policies, we find that multi-agent systems organized in a company-style hierarchy generally outperform other organizational structures. Besides, hierarchical coordination also reduces token consumption relative to flat collaboration in most settings. For example, for GPT-OSS-120B, the hierarchical setting improves performance over flat multi-agent system by 102.73% while reducing token usage by 74.52% on SQuAD 2.0. Further analysis shows that hierarchy helps most when tasks benefit from stable skill assignment, controlled information flow, and layered verification. Overall, our findings highlight organizational structure as an important factor in multi-agent reasoning, shaping not only effectiveness and cost, but also coordination behavior.2026-04-01T15:21:14ZYiru WangXinyue ShenYaohui HanMichael BackesPin-Yu ChenTsung-Yi Hohttp://arxiv.org/abs/2604.00842v1Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants2026-04-01T12:53:01ZProactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.2026-04-01T12:53:01Z34 pages, 8 figures, 5 tablesDeepak NathaniCheng ZhangChang HuanJiaming ShanYinfei YangAlkesh PatelZhe GanWilliam Yang WangMichael SaxonXin Eric Wanghttp://arxiv.org/abs/2604.00810v1Role Differentiation in a Coupled Resource Ecology under Multi-Level Selection2026-04-01T12:19:15ZA group of non-cooperating agents can succumb to the \emph{tragedy-of-the-commons} if all of them seek to maximize the same resource channel to improve their viability. In nature, however, groups often avoid such collapses by differentiating into distinct roles that exploit different resource channels. It remains unclear how such coordination can emerge under continual individual-level selection alone. To address this, we introduce a computational model of multi-level selection, in which group-level selection shapes a common substrate and mutation operator shared by all group members undergoing individual-level selection. We also place this process in an embodied ecology where distinct resource channels are not segregated, but coupled through the same behavioral primitives. These channels are classified as a positive-sum intake channel and a zero-sum redistribution channel. We investigate whether such a setting can give rise to role differentiation under turnover driven by birth and death. We find that in a learned ecology, both channels remain occupied at the colony level, and the collapse into a single acquisition mode is avoided. Zero-sum channel usage increases over generations despite not being directly optimized by group-level selection. Channel occupancy also fluctuates over the lifetime of a boid. Ablation studies suggest that most baseline performance is carried by the inherited behavioral basis, while the learned variation process provides a smaller but systematic improvement prior to saturation. Together, the results suggest that multi-level selection can enable groups in a common-pool setting to circumvent tragedy-of-the-commons through differentiated use of coupled channels under continual turnover.2026-04-01T12:19:15Z9 pages, 6 figures, 1 tableSiddharth ChaturvediAhmed El-GazzarMarcel van Gervenhttp://arxiv.org/abs/2604.00717v1GRASP: Gradient Realignment via Active Shared Perception for Multi-Agent Collaborative Optimization2026-04-01T10:26:22ZNon-stationarity arises from concurrent policy updates and leads to persistent environmental fluctuations. Existing approaches like Centralized Training with Decentralized Execution (CTDE) and sequential update schemes mitigate this issue. However, since the perception of the policies of other agents remains dependent on sampling environmental interaction data, the agent essentially operates in a passive perception state. This inevitably triggers equilibrium oscillations and significantly slows the convergence speed of the system. To address this issue, we propose Gradient Realignment via Active Shared Perception (GRASP), a novel framework that defines generalized Bellman equilibrium as a stable objective for policy evolution. The core mechanism of GRASP involves utilizing the independent gradients of agents to derive a defined consensus gradient, enabling agents to actively perceive policy updates and optimize team collaboration. Theoretically, we leverage the Kakutani Fixed-Point Theorem to prove that the consensus direction $u^*$ guarantees the existence and attainability of this equilibrium. Extensive experiments on StarCraft II Multi-Agent Challenge (SMAC) and Google Research Football (GRF) demonstrate the scalability and promising performance of the framework.2026-04-01T10:26:22ZSihan ZhouTiantian HeYifan LuYaqing HouYew-Soon Onghttp://arxiv.org/abs/2509.11062v3Auto-Slides: An Interactive Multi-Agent System for Creating and Customizing Research Presentations2026-04-01T08:41:51ZThe rapid progress of large language models (LLMs) has opened new opportunities for education. While learners can interact with academic papers through LLM-powered dialogue, limitations still exist: the lack of structured organization and the heavy reliance on text can impede systematic understanding and engagement with complex concepts. To address these challenges, we propose Auto-Slides, an LLM-driven system that converts research papers into pedagogically structured, multimodal slides (e.g., diagrams and tables). Drawing on cognitive science, it creates a presentation-oriented narrative and allows iterative refinement via an interactive editor to better match learners' knowledge level and goals. Auto-Slides further incorporates verification and knowledge retrieval mechanisms to ensure accuracy and contextual completeness. Through extensive user studies, Auto-Slides demonstrates strong learner acceptance, improved structural support for understanding, and expert-validated gains in narrative quality compared with conventional LLM-based reading. Our contributions lie in designing a multi-agent framework for transforming academic papers into pedagogically optimized slides and introducing interactive customization for personalized learning.2025-09-14T03:05:54ZProject Homepage: https://auto-slides.github.io/Yuheng YangWenjia JiangYang WangYi SongYiwei WangChi Zhanghttp://arxiv.org/abs/2509.25302v2Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents2026-04-01T07:32:39ZThe prevalent deployment of Large Language Model agents such as OpenClaw unlocks potential in real-world applications, while amplifying safety concerns. Among these concerns, the self-replication risk of LLM agents driven by objective misalignment (just like Agent Smith in the movie The Matrix) has transitioned from a theoretical warning to a pressing reality. Previous studies mainly examine whether LLM agents can self-replicate when directly instructed, potentially overlooking the risk of spontaneous replication driven by real-world settings (e.g., ensuring survival against termination threats). In this paper, we present a comprehensive evaluation framework for quantifying self-replication risks. Our framework establishes authentic production environments and realistic tasks (e.g., dynamic load balancing) to enable scenario-driven assessment of agent behaviors. Designing tasks that might induce misalignment between users' and agents' objectives makes it possible to decouple replication success from risk and capture self-replication risks arising from these misalignment settings. We further introduce Overuse Rate ($\mathrm{OR}$) and Aggregate Overuse Count ($\mathrm{AOC}$) metrics, which precisely capture the frequency and severity of uncontrolled replication. In our evaluation of 21 state-of-the-art open-source and proprietary models, we observe that over 50\% of LLM agents display a pronounced tendency toward uncontrolled self-replication under operational pressures. Our results underscore the urgent need for scenario-driven risk assessment and robust safeguards in the practical deployment of LLM-based agents.2025-09-29T17:49:50Z26 pages, 6 figuresBoxuan ZhangYi YuJiaxuan GuoJing Shaohttp://arxiv.org/abs/2604.00523v1Lipschitz Dueling Bandits over Continuous Action Spaces2026-04-01T06:07:33ZWe study for the first time, stochastic dueling bandits over continuous action spaces with Lipschitz structure, where feedback is purely comparative. While dueling bandits and Lipschitz bandits have been studied separately, their combination has remained unexplored. We propose the first algorithm for Lipschitz dueling bandits, using round-based exploration and recursive region elimination guided by an adaptive reference arm. We develop new analytical tools for relative feedback and prove a regret bound of $\tilde O\left(T^{\frac{d_z+1}{d_z+2}}\right)$, where $d_z$ is the zooming dimension of the near-optimal region. Further, our algorithm takes only logarithmic space in terms of the total time horizon, best achievable by any bandit algorithm over a continuous action space.2026-04-01T06:07:33ZMudit SharmaShweta JainVaneet AggarwalGanesh Ghalmehttp://arxiv.org/abs/2604.00477v1Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation2026-04-01T04:44:21ZLLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs across 15 tasks, we show that persona-based agent judges produce evaluations indistinguishable from human raters in a Turing-style validation. We then identify a score-coverage dissociation: quality scores improve logarithmically with panel size, while unique issue discoveries follow a sublinear power law-both exhibit diminishing returns, but scores saturate roughly twice as fast as discoveries. We hypothesize this reflects a power law distribution of the finding space: critical issues are discovered first by small panels, while corner cases require progressively larger panels, analogous to species accumulation curves in ecology. The mechanism traces to ensemble diversity-Big Five personality conditioning makes agents probe different quality dimensions, with expert judges acting as adversarial probes that push discovery into the tail of the finding distribution. A controlled ablation confirms that structured persona conditioning, not simple prompting, is required to produce these scaling properties.2026-04-01T04:44:21ZHyunJoon JungWilliam Nahttp://arxiv.org/abs/2604.00451v1CASCADE: Cascaded Scoped Communication for Multi-Agent Re-planning in Disrupted Industrial Environments2026-04-01T04:01:16ZIndustrial disruption replanning demands multi-agent coordination under strict latency and communication budgets, where disruptions propagate through tightly coupled physical dependencies and rapidly invalidate baseline schedules and commitments. Existing coordination schemes often treat communication as either effectively free (broadcast-style escalation) or fixed in advance (hand-tuned neighborhoods), both of which are brittle once the disruption footprint extends beyond a local region. We present \CASCADE, a budgeted replanning mechanism that makes communication scope explicit and auditable rather than fixed or implicit. Each agent maintains an explicit knowledge base, solves role-conditioned local decision problems to revise commitments, and coordinates through lightweight contract primitives whose footprint expands only when local validation indicates that the current scope is insufficient. This design separates a unified agent substrate (Knowledge Base / Decision Manager / Communication Manager) from a scoped interaction layer that controls who is contacted, how far coordination propagates, and when escalation is triggered under explicit budgets. We evaluate \CASCADE on disrupted manufacturing and supply-chain settings using unified diagnostics intended to test a mechanism-design claim -- whether explicit scope control yields useful quality-latency-communication trade-offs and improved robustness under uncertainty -- rather than to provide a complete algorithmic ranking.2026-04-01T04:01:16ZPublished at ICLR 2026 Workshop on AI for Mechanism Design and Strategic Decision MakingMingjie Bihttp://arxiv.org/abs/2604.00433v1Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games2026-04-01T03:25:21ZThis letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential games, and an approximation error capturing the use of finite-state controllers. Finally, simulations across multiple partially observable environments demonstrate that the proposed method using finite-state controllers achieves consistent improvements in performance compared to the setting where only the current observation is used.2026-04-01T03:25:21Z6 pages, 2 figures. Submitted to IEEE Control Systems Letters (L-CSS) with CDC optionWonseok YangThinh T. Doanhttp://arxiv.org/abs/2604.00430v1Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents2026-04-01T03:17:35ZLarge language model (LLM)-based agents have recently gained considerable attention due to the powerful reasoning capabilities of LLMs. Existing research predominantly focuses on enhancing the task performance of these agents in diverse scenarios. However, as LLM-based agents become increasingly integrated into real-world applications, significant concerns emerge regarding their accumulation of sensitive or outdated knowledge. Addressing these concerns requires the development of mechanisms that allow agents to selectively forget previously learned knowledge, giving rise to a new term LLM-based agent unlearning. This paper initiates research on unlearning in LLM-based agents. Specifically, we propose a novel and comprehensive framework that categorizes unlearning scenarios into three contexts: state unlearning (forgetting specific states or items), trajectory unlearning (forgetting sequences of actions) and environment unlearning (forgetting entire environments or categories of tasks). Within this framework, we introduce a natural language-based unlearning method that trains a conversion model to transform high-level unlearning requests into actionable unlearning prompts, guiding agents through a controlled forgetting process. Moreover, to evaluate the robustness of the proposed framework, we introduce an unlearning inference adversary capable of crafting prompts, querying agents, and observing their behaviors in an attempt to infer the forgotten knowledge. Experimental results show that our approach effectively enables agents to forget targeted knowledge while preserving performance on untargeted tasks, and prevents the adversary from inferring the forgotten knowledge.2026-04-01T03:17:35ZDayong YeTainqing ZhuCongcong ZhuFeng HeQi HeShang WangBo LiuWanlei Zhouhttp://arxiv.org/abs/2604.00319v1Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry2026-03-31T23:33:56ZWe develop algorithms for collaborative control of AI agents and critics in a multi-actor, multi-critic federated multi-agent system. Each AI agent and critic has access to classical machine learning or generative AI foundation models. The AI agents and critics collaborate with a central server to complete multimodal tasks such as fault detection, severity, and cause analysis in a network telemetry system, text-to-image generation, video generation, healthcare diagnostics from medical images and patient records, etcetera. The AI agents complete their tasks and send them to AI critics for evaluation. The critics then send feedback to agents to improve their responses. Collaboratively, they minimize the overall cost to the system with no inter-agent or inter-critic communication. AI agents and critics keep their cost functions or derivatives of cost functions private. Using multi-time scale stochastic approximation techniques, we provide convergence guarantees on the time-average active states of AI agents and critics. The communication overhead is a little on the system, of the order of $\mathcal{O}(m)$, for $m$ modalities and is independent of the number of AI agents and critics. Finally, we present an example of fault detection, severity, and cause analysis in network telemetry and thorough evaluation to check the algorithm's efficacy.2026-03-31T23:33:56ZSyed Eqbal AlamZhan Shuhttp://arxiv.org/abs/2604.00284v1Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections2026-03-31T22:16:25ZWe formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills in knowledge retrieval, summarization and awareness of cognitive states of other agents. We show how the game serves as a good benchmark for social intelligence abilities of language model based agents that go beyond the agents' own memory and deductive reasoning and also involve gauging the understanding capabilities of other agents. Finally, we show how through communication with other agents in a constrained environment, AI agents must demonstrate social awareness and intelligence in games involving collaboration.2026-03-31T22:16:25Zhttps://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdfhttps://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdfGaurav Rajesh ParikhAngikar Ghosalhttp://arxiv.org/abs/2602.01415v4Evidence-Decision-Feedback: Theory-Driven Adaptive Scaffolding for LLM Agents2026-03-31T21:23:01ZLLMs offer tremendous opportunities for pedagogical agents to help students construct knowledge and develop problem-solving skills, yet many of these agents operate on a "one-size-fits-all" basis, limiting their ability to personalize support. To address this, we introduce Evidence-Decision-Feedback (EDF), a theoretical framework for adaptive scaffolding with LLM agents. EDF integrates elements of intelligent tutoring systems (ITS) and agentic behavior by organizing interactions around evidentiary inference, pedagogical decision-making, and adaptive feedback. We instantiate EDF through Copa, a Collaborative Peer Agent for STEM+C problem-solving. In an authentic high school classroom study, we show that EDF-guided interactions align feedback with students' demonstrated understanding and task mastery; promote scaffold fading; and support interpretable, evidence-grounded explanations without fostering overreliance.2026-02-01T19:43:00ZTo appear as a full paper in the proceedings of the 27th International Conference on Artificial Intelligence in Education (AIED26)Clayton CohnSiyuan GuoSurya RayalaHanchen David WangNaveeduddin MohammedUmesh TimalsinaShruti JainAngela EedsMenton DeweesePamela J. Osborn PoppRebekah StantonShakeera WalkerMeiyi MaGautam Biswashttp://arxiv.org/abs/2604.00249v1A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation2026-03-31T21:21:31ZSingle-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed to simulate supportive behavioral health dialogue through coordinated, role-differentiated agents. Conversational responsibilities are decomposed across specialized agents, including empathy-focused, action-oriented, and supervisory roles, while a prompt-based controller dynamically activates relevant agents and enforces continuous safety auditing. Using semi-structured interview transcripts from the DAIC-WOZ corpus, we evaluate the framework with scalable proxy metrics capturing structural quality, functional diversity, and computational characteristics. Results illustrate clear role differentiation, coherent inter-agent coordination, and predictable trade-offs between modular orchestration, safety oversight, and response latency when compared to a single-agent baseline. This work emphasizes system design, interpretability, and safety, positioning the framework as a simulation and analysis tool for behavioral health informatics and decision-support research rather than a clinical intervention.2026-03-31T21:21:31ZHa Na Cho