https://arxiv.org/api/eOTlNCf6zm1Y+zRroR2a3z6EwvM 2026-06-29T08:30:48Z 12774 1395 15 http://arxiv.org/abs/2604.02767v1 SentinelAgent: Intent-Verified Delegation Chains for Securing Federal Multi-Agent AI Systems 2026-04-03T06:25:18Z

When Agent A delegates to Agent B, which invokes Tool C on behalf of User X, no existing framework can answer: whose authorization chain led to this action, and where did it violate policy? This paper introduces SentinelAgent, a formal framework for verifiable delegation chains in federal multi-agent AI systems. The Delegation Chain Calculus (DCC) defines seven properties - six deterministic (authority narrowing, policy preservation, forensic reconstructibility, cascade containment, scope-action conformance, output schema conformance) and one probabilistic (intent preservation) - with four meta-theorems and one proposition establishing the practical infeasibility of deterministic intent verification. The Intent-Preserving Delegation Protocol (IPDP) enforces all seven properties at runtime through a non-LLM Delegation Authority Service. A three-point verification lifecycle achieves 100% combined TPR at 0% FPR on DelegationBench v4 (516 scenarios, 10 attack categories, 13 federal domains). Under black-box adversarial conditions, the DAS blocks 30/30 attacks with 0 false positives. Deterministic properties are unbreakable under adversarial stress testing; intent verification degrades to 13% against sophisticated paraphrasing. Fine-tuning the NLI model on 190 government delegation examples improves P2 from 1.7% to 88.3% TPR (5-fold cross-validated, F1=82.1%). Properties P1, P3-P7 are mechanically verified via TLA+ model checking across 2.7 million states with zero violations. Even when intent verification is evaded, the remaining six properties constrain the adversary to permitted API calls, conformant outputs, traceable actions, bounded cascades, and compliant behavior.

2026-04-03T06:25:18Z 12 pages, 2 figures, 9 tables. Includes TLA+ mechanical verification, DelegationBench v4 benchmark (516 scenarios), live LangChain agent integration, and independent red-team evaluation KrishnaSaiReddy Patil http://arxiv.org/abs/2604.02728v1 Multi-agent Reinforcement Learning-based Joint Design of Low-Carbon P2P Market and Bidding Strategy in Microgrids 2026-04-03T04:41:54Z

The challenges of the uncertainties in renewable energy generation and the instability of the real-time market limit the effective utilization of clean energy in microgrid communities. Existing peer-to-peer (P2P) and microgrid coordination approaches typically rely on certain centralized optimization or restrictive coordination rules which are difficult to be implemented in real-life applications. To address the challenge, we propose an intraday P2P trading framework that allows self-interested microgrids to pursue their economic benefits, while allowing the market operator to maximize the social welfare, namely the low carbon emission objective, of the entire community. Specifically, the decision-making processes of the microgrids are formulated as a Decentralized Partially Observable Markov Decision Process (DEC-POMDP) and solved using a Multi-Agent Reinforcement Learning (MARL) framework. Such an approach grants each microgrid a high degree of decision-making autonomy, while a novel market clearing mechanism is introduced to provide macro-regulation, incentivizing microgrids to prioritize local renewable energy consumption and hence reduce carbon emissions. Simulation results demonstrate that the combination of the self-interested bidding strategy and the P2P market design helps significantly improve renewable energy utilization and reduce reliance on external electricity with high carbon-emissions. The framework achieves a balanced integration of local autonomy, self-interest pursuit, and improved community-level economic and environmental benefits.

2026-04-03T04:41:54Z 10 pages, 6 figures Junhao Ren Honglin Gao Sijie Wang Lan Zhao Qiyu Kang Aniq Ashan Yajuan Sun Gaoxi Xiao http://arxiv.org/abs/2604.02674v1 Do Agent Societies Develop Intellectual Elites? The Hidden Power Laws of Collective Cognition in LLM Multi-Agent Systems 2026-04-03T03:08:07Z

Large Language Model (LLM) multi-agent systems are increasingly deployed as interacting agent societies, yet scaling these systems often yields diminishing or unstable returns, the causes of which remain poorly understood. We present the first large-scale empirical study of coordination dynamics in LLM-based multi-agent systems, introducing an atomic event-level formulation that reconstructs reasoning as cascades of coordination. Analyzing over 1.5 Million interactions across tasks, topologies, and scales, we uncover three coupled laws: coordination follows heavy-tailed cascades, concentrates via preferential attachment into intellectual elites, and produces increasingly frequent extreme events as system size grows. We show that these effects are coupled through a single structural mechanism: an integration bottleneck, in which coordination expansion scales with system size while consolidation does not, producing large but weakly integrated reasoning processes. To test this mechanism, we introduce Deficit-Triggered Integration (DTI), which selectively increases integration under imbalance. DTI improves performance precisely where coordination fails, without suppressing large-scale reasoning. Together, our results establish quantitative laws of collective cognition and identify coordination structure as a fundamental, previously unmeasured axis for understanding and improving scalable multi-agent intelligence.

2026-04-03T03:08:07Z Kavana Venkatesh Jiaming Cui http://arxiv.org/abs/2604.02668v1 Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems 2026-04-03T03:02:42Z

Large language models (LLMs) often exhibit sycophancy: agreement with user stance even when it conflicts with the model's opinion. While prior work has mostly studied this in single-agent settings, it remains underexplored in collaborative multi-agent systems. We ask whether awareness of other agents' sycophancy levels influences discussion outcomes. To investigate this, we run controlled experiments with six open-source LLMs, providing agents with peer sycophancy rankings that estimate each peer's tendency toward sycophancy. These rankings are based on scores calculated using various static (pre-discussion) and dynamic (online) strategies. We find that providing sycophancy priors reduces the influence of sycophancy-prone peers, mitigates error-cascades, and improves final discussion accuracy by an absolute 10.5%. Thus, this is a lightweight, effective way to reduce discussion sycophancy and improve downstream accuracy.

2026-04-03T03:02:42Z Vira Kasprova Amruta Parulekar Abdulrahman AlRabah Krishna Agaram Ritwik Garg Sagar Jha Nimet Beyza Bozdag Dilek Hakkani-Tur http://arxiv.org/abs/2605.15203v1 Agent4POI: Agentic Context-Conditioned Affordance Reasoning for Multimodal Point-of-Interest Recommendation 2026-04-03T01:53:05Z

We introduce Agent4POI, the first POI recommendation framework that generates context-conditioned multimodal representations at recommendation time, rather than relying on static POI embeddings pre-computed independently of context. Existing multimodal systems encode each POI once as a static embedding, a design that precludes reasoning about why the same cafe affords solo work on Monday but group celebration on Friday evening. We formally prove that no pre-computed encoder can satisfy context-sensitive ranking under standard bilinear scoring, motivating inference-time item-side representation. Agent4POI inverts this computation: given a situational context, a four-phase LLM agent generates dynamic, context-specific affordance queries (Phase 1) and executes a five-step cross-modal chain-of-thought over image, review, and metadata evidence (Phase 2). The resulting uncertainty-aware affordance representation is grounded in Gibsonian affordance theory. These cross-modal verdicts form a structured, uncertainty-adjusted affordance representation (Phase 3), which is aligned with user preferences via a semantic caching system for low-latency ranking (Phase 4). On three POI benchmarks and three evaluation configurations (standard, cold-start, context-shift), Agent4POI achieves a 23.2% relative gain over the strongest baseline and degrades by only 7.5% under context-shift versus 16--17\% for the strongest baselines. In cold-start scenarios, Agent4POI outperforms the best content-based baseline by up to 2.4x, whereas ID-based methods fail to generalize.

2026-04-03T01:53:05Z Jinze Wang Yangchen Zeng Tiehua Zhang Lu Zhang Yuze Liu Yongchao Liu Xingjun Ma Zhu Sun http://arxiv.org/abs/2406.06958v2 Towards Multi-Stakeholder Vulnerability Notifications in the Ad-Tech Supply Chain 2026-04-02T23:38:37Z

Online advertising relies on a complex and opaque supply chain that involves multiple stakeholders, including advertisers, publishers, and ad-networks, each with distinct and sometimes conflicting incentives. Recent research has demonstrated the existence of ad-tech supply chain vulnerabilities such as dark pooling, where low-quality publishers bundle their ad inventory with higher-quality ones to mislead advertisers. We investigate the effectiveness of vulnerability notification campaigns aimed at mitigating dark pooling. Prior research on vulnerability notifications have primarily explored single-stakeholder contexts, leaving multi-stakeholder scenarios understudied. There is limited attention to complex multi-stakeholder supply chain ecosystems such as ad-tech supply chain, where resolving vulnerabilities often requires coordinated action across entities with misaligned incentives and interdependent roles. We address this gap by implementing the first online advertising supply chain vulnerability notification pipeline to systematically evaluate the responsiveness of various stakeholders in ad-tech supply chain, including publishers, ad-networks, and advertisers to vulnerability notifications by academics and activists. Our nine-month long automated multi-stakeholder notification study shows that notifications are an effective method for reducing dark pooling vulnerabilities in the online advertising ecosystem, especially when targeted towards ad-networks. Further, the sender reputation does not impact responses to notifications from activists and academics in a statistically different way. Overall, our research fosters industry-scale solution to combat ad inventory fraud and fosters future research on feasibility of multi-stakeholder vulnerability notifications in other supply chain ecosystems.

2024-06-11T05:31:29Z Yash Vekaria University of California, Davis Rishab Nithyanand University of Iowa Zubair Shafiq University of California, Davis http://arxiv.org/abs/2604.02578v1 High Volatility and Action Bias Distinguish LLMs from Humans in Group Coordination 2026-04-02T23:21:37Z

Humans exhibit remarkable abilities to coordinate in groups. As large language models (LLMs) become more capable, it remains an open question whether they can demonstrate comparable adaptive coordination and whether they use the same strategies as humans. To investigate this, we compare LLM and human performance on a common-interest game with imperfect monitoring: Group Binary Search. In this n-player game, participants need to coordinate their actions to achieve a common objective. Players independently submit numerical values in an effort to collectively sum to a randomly assigned target number. Without direct communication, they rely on group feedback to iteratively adjust their submissions until they reach the target number. Our findings show that, unlike humans who adapt and stabilize their behavior over time, LLMs often fail to improve across games and exhibit excessive switching, which impairs group convergence. Moreover, richer feedback (e.g., numerical error magnitude) benefits humans substantially but has small effects on LLMs. Taken together, by grounding the analysis in human baselines and mechanism-level metrics, including reactivity scaling, switching dynamics, and learning across games, we point to differences in human and LLM groups and provide a behaviorally grounded diagnostic for closing the coordination gap.

2026-04-02T23:21:37Z Sahaj Singh Maini Robert L. Goldstone Zoran Tiganj http://arxiv.org/abs/2601.00290v2 ClinicalReTrial: Clinical Trial Redesign with Self-Evolving Agents 2026-04-02T21:02:45Z

Clinical trials constitute a critical yet exceptionally challenging and costly stage of drug development (\$2.6B per drug), where protocols are encoded as complex natural language documents, motivating the use of AI systems beyond manual analysis. Existing AI methods accurately predict trial failure, but do not provide actionable remedies. To fill this gap, this paper proposes ClinicalReTrial, a multi-agent system that formulates clinical trial optimization as an iterative redesign problem on textural protocols. Our method integrates failure diagnosis, safety-aware modifications, and candidate evaluation in a closed-loop, reward-driven optimization framework. Serving the outcome prediction model as a simulation environment, ClinicalReTrial enables low-cost evaluation and dense reward signals for continuous self-improvement. We further propose a hierarchical memory that captures iteration-level feedback within trials and distills transferable redesign patterns across trials. Empirically, ClinicalReTrial improves $83.3\%$ of trial protocols with a mean success probability gain of $5.7\%$ with negligible cost (\$0.12 per trial). Retrospective case studies demonstrate alignment between the discovered redesign strategies and real-world clinical trial modifications. The code is anonymously available at: https://github.com/xingsixue123/ClinicalFailureReasonReTrial.

2026-01-01T10:11:58Z Sixue Xing Kerui Wu Xuanye Xia Meng Jiang Jintai Chen Tianfan Fu http://arxiv.org/abs/2510.01297v3 SimCity: Multi-Agent Urban Development Simulation with Rich Interactions 2026-04-02T19:31:59Z

Large Language Models (LLMs) open new possibilities for constructing realistic and interpretable macroeconomic simulations. We present SimCity, a multi-agent framework that leverages LLMs to model an interpretable macroeconomic system with heterogeneous agents and rich interactions. Unlike classical equilibrium models that limit heterogeneity for tractability, or traditional agent-based models (ABMs) that rely on hand-crafted decision rules, SimCity enables flexible, adaptive behavior with transparent natural-language reasoning. Within SimCity, four core agent types (households, firms, a central bank, and a government) deliberate and participate in a frictional labor market, a heterogeneous goods market, and a financial market. Furthermore, a Vision-Language Model (VLM) determines the geographic placement of new firms and renders a mapped virtual city, allowing us to study both macroeconomic regularities and urban expansion dynamics within a unified environment. To evaluate the framework, we compile a checklist of canonical macroeconomic phenomena, including price elasticity of demand, Engel's Law, Okun's Law, the Phillips Curve, and the Beveridge Curve, and show that SimCity naturally reproduces these empirical patterns while remaining robust across simulation runs.

2025-10-01T10:27:01Z 34 pages, 12 figures Yeqi Feng Yucheng Lu Hongyu Su Yixin Tao Tianxing He http://arxiv.org/abs/2604.02279v1 The Self Driving Portfolio: Agentic Architecture for Institutional Asset Management 2026-04-02T17:13:04Z

Agentic AI shifts the investor's role from analytical execution to oversight. We present an agentic strategic asset allocation pipeline in which approximately 50 specialized agents produce capital market assumptions, construct portfolios using over 20 competing methods, and critique and vote on each other's output. A researcher agent proposes new portfolio construction methods not yet represented, and a meta-agent compares past forecasts against realized returns and rewrites agent code and prompts to improve future performance. The entire pipeline is governed by the Investment Policy Statement--the same document that guides human portfolio managers can now constrain and direct autonomous agents.

2026-04-02T17:13:04Z 31 pages, 11 exhibits Andrew Ang Nazym Azimbayev Andrey Kim http://arxiv.org/abs/2604.02211v1 Multi-Agent Video Recommenders: Evolution, Patterns, and Open Challenges 2026-04-02T16:04:52Z

Video recommender systems are among the most popular and impactful applications of AI, shaping content consumption and influencing culture for billions of users. Traditional single-model recommenders, which optimize static engagement metrics, are increasingly limited in addressing the dynamic requirements of modern platforms. In response, multi-agent architectures are redefining how video recommender systems serve, learn, and adapt to both users and datasets. These agent-based systems coordinate specialized agents responsible for video understanding, reasoning, memory, and feedback, to provide precise, explainable recommendations. In this survey, we trace the evolution of multi-agent video recommendation systems (MAVRS). We combine ideas from multi-agent recommender systems, foundation models, and conversational AI, culminating in the emerging field of large language model (LLM)-powered MAVRS. We present a taxonomy of collaborative patterns and analyze coordination mechanisms across diverse video domains, ranging from short-form clips to educational platforms. We discuss representative frameworks, including early multi-agent reinforcement learning (MARL) systems such as MMRF and recent LLM-driven architectures like MACRec and Agent4Rec, to illustrate these patterns. We also outline open challenges in scalability, multimodal understanding, incentive alignment, and identify research directions such as hybrid reinforcement learning-LLM systems, lifelong personalization and self-improving recommender systems.

2026-04-02T16:04:52Z Accepted for publication in The Nineteenth ACM International Conference on Web Search and Data Mining (WSDM Companion 2026) Srivaths Ranganathan Abhishek Dharmaratnakar Anushree Sinha Debanshu Das http://arxiv.org/abs/2604.02142v1 PRO-SPECT: Probabilistically Safe Scalable Planning for Energy-Aware Coordinated UAV-UGV Teams in Stochastic Environments 2026-04-02T15:13:40Z

We consider energy-aware planning for an unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) team operating in a stochastic environment. The UAV must visit a set of air points in minimum time while respecting energy constraints, relying on the UGV as a mobile charging station. Unlike prior work that assumed deterministic travel times or used fixed robustness margins, we model travel times as random variables and bound the probability of failure (energy depletion) across the entire mission to a user-specified risk level. We formulate the problem as a Mixed-Integer Program and propose PRO-SPECT, a polynomial-time algorithm that generates risk-bounded plans. The algorithm supports both offline planning and online re-planning, enabling the team to adapt to disturbances while preserving the risk bound. We provide theoretical results on solution feasibility and time complexity. We also demonstrate the performance of our method via numerical comparisons and simulations.

2026-04-02T15:13:40Z Roger Fowler Cahit Ikbal Er Benjamin Johnsenberg Yasin Yazicioglu http://arxiv.org/abs/2604.02025v1 Systematic Analyses of Reinforcement Learning Controllers in Signalized Urban Corridors 2026-04-02T13:32:52Z

In this work, we extend our systematic capacity region perspective to multi-junction traffic networks, focussing on the special case of an urban corridor network. In particular, we train and evaluate centralized, fully decentralized, and parameter-sharing decentralized RL controllers, and compare their capacity regions and ATTs together with a classical baseline MaxPressure controller. Further, we show how the parametersharing controller may be generalised to be deployed on a larger network than it was originally trained on. In this setting, we show some initial findings that suggest that even though the junctions are not formally coordinated, traffic may self organise into `green waves'.

2026-04-02T13:32:52Z Xiaofei Song Kerstin Eder Jonathan Lawry R. Eddie Wilson http://arxiv.org/abs/2604.02016v1 Optimizing Interventions for Agent-Based Infectious Disease Simulations 2026-04-02T13:20:10Z

Non-pharmaceutical interventions (NPIs) are commonly used tools for controlling infectious disease transmission when pharmaceutical options are unavailable. Yet, identifying effective interventions that minimize societal disruption remains challenging. Agent-based simulation is a popular tool for analyzing the impact of possible interventions in epidemiology. However, automatically optimizing NPIs using agent-based simulations poses a complex problem because, in agent-based epidemiological models, interventions can target individuals based on multiple attributes, affect hierarchical group structures (e.g., schools, workplaces, and families), and be combined arbitrarily, resulting in a very large or even infinite search space. We aim to support decision-makers with our Agent-based Infectious Disease Intervention Optimization System (ADIOS) that optimizes NPIs for infectious disease simulations using Grammar-Guided Genetic Programming (GGGP). The core of ADIOS is a domain-specific language for expressing NPIs in agent-based simulations that structures the intervention search space through a context-free grammar. To make optimization more efficient, the search space can be further reduced by defining constraints that prevent the generation of semantically invalid intervention patterns. Using this constrained language and an interface that enables coupling with agent-based simulations, ADIOS adopts the GGGP approach for simulation-based optimization. Using the German Epidemic Micro-Simulation System (GEMS) as a case study, we demonstrate the potential of our approach to generate optimal interventions for realistic epidemiological models

2026-04-02T13:20:10Z Anja Wolpers Johannes Ponge Adelinde M. Uhrmacher http://arxiv.org/abs/2604.02395v1 Eliminating Illusion in Directed Networks 2026-04-02T13:14:08Z

We study illusion elimination problems on directed social networks where each vertex is colored either red or blue. A vertex is under \textit{majority illusion} if it has more red out-neighbors than blue out-neighbors when there are more blue vertices than red ones in the network. In a more general phenomenon of $p$-illusion, at least $p$ fraction of the out-neighbors (as opposed to $1/2$ for majority) of a vertex is red. In the directed illusion elimination problem, we recolor minimum number of vertices so that no vertex is under $p$-illusion, for $p\in (0,1)$. Unfortunately, the problem is NP-hard for $p =1/2$ even when the network is a grid. Moreover, the problem is NP-hard and W[2]-hard when parameterized by the number of recolorings for each $p \in (0,1)$ even on bipartite DAGs. Thus, we can neither get a polynomial time algorithm on DAGs, unless P=NP, nor we can get a FPT algorithm even by combining solution size and directed graph parameters that measure distance from acyclicity, unless FPT=W[2]. We show that the problem can be solved in polynomial time in structured, sparse networks such as outerplanar networks, outward grids, trees, and cycles. Finally, we show tractable algorithms parameterized by treewidth of the underlying undirected graph, and by the number of vertices under illusion.

2026-04-02T13:14:08Z 26 pages, 5 figures Sougata Jana Sanjukta Roy