https://arxiv.org/api/7Coh7SJa1fbztDMnunWUAISOC0s 2026-06-27T15:12:46Z 12761 1170 15 http://arxiv.org/abs/2604.15972v1 Weak-Link Optimization for Multi-Agent Reasoning and Collaboration 2026-04-17T11:36:20Z

LLM-driven multi-agent frameworks address complex reasoning tasks through multi-role collaboration. However, existing approaches often suffer from reasoning instability, where individual agent errors are amplified through collaboration, undermining overall performance. Current research mainly focuses on enhancing high-capability agents or suppressing unreliable outputs to improve framework effectiveness, while systematic identification and reinforcement of performance-limiting agents receive less attention. To address this gap, we propose WORC, a \underline{w}eak-link \underline{o}ptimization framework for multi-agent \underline{r}easoning and \underline{c}ollaboration, grounded in the weak-link principle. WORC follows a two-stage workflow. In the weak agent localization stage, task features are constructed, and a meta-learning-based weight predictor trained on optimal configurations identified by swarm intelligence algorithms (SIAs) enables zero-shot mapping from these features to agent performance weights, where the agent with the lowest predicted weight is identified as the weak agent. In the weak-link optimization stage, an uncertainty-driven allocation strategy assigns additional reasoning budgets to weak agents, with lower predicted weights leading to larger repeated-sampling quotas to compensate for reliability deficiencies. Experimental results show that WORC achieves an average accuracy of 82.2\% on reasoning benchmarks while improving framework stability and cross-architecture generalization, suggesting that compensating for weak links, rather than reinforcing strengths alone, enhances the robustness of multi-agent systems.

2026-04-17T11:36:20Z 13 pages, 4 figures. Submitted to CAAI Transactions on Intelligence Technology Haoyu Bian Chaoning Zhang Jiaquan Zhang Xingyao Li Yuanfang Guo Wei Dong Yang Yang http://arxiv.org/abs/2604.15937v1 Polarization by Default: Auditing Recommendation Bias in LLM-Based Content Curation 2026-04-17T10:55:21Z

Large Language Models (LLMs) are increasingly deployed to curate and rank human-created content, yet the nature and structure of their biases in these tasks remains poorly understood: which biases are robust across providers and platforms, and which can be mitigated through prompt design. We present a controlled simulation study mapping content selection biases across three major LLM providers (OpenAI, Anthropic, Google) on real social media datasets from Twitter/X, Bluesky, and Reddit, using six prompting strategies (\textit{general}, \textit{popular}, \textit{engaging}, \textit{informative}, \textit{controversial}, \textit{neutral}). Through 540,000 simulated top-10 selections from pools of 100 posts across 54 experimental conditions, we find that biases differ substantially in how structural and how prompt-sensitive they are. Polarization is amplified across all configurations, toxicity handling shows a strong inversion between engagement- and information-focused prompts, and sentiment biases are predominantly negative. Provider comparisons reveal distinct trade-offs: GPT-4o Mini shows the most consistent behavior across prompts; Claude and Gemini exhibit high adaptivity in toxicity handling; Gemini shows the strongest negative sentiment preference. On Twitter/X, where author demographics can be inferred from profile bios, political leaning bias is the clearest demographic signal: left-leaning authors are systematically over-represented despite right-leaning authors forming the pool plurality in the dataset, and this pattern largely persists across prompts.

2026-04-17T10:55:21Z Nicolò Pagan Christopher Barrie Chris Andrew Bail Petter Törnberg http://arxiv.org/abs/2604.16566v1 Agentic AI for Education: A Unified Multi-Agent Framework for Personalized Learning and Institutional Intelligence 2026-04-17T10:25:35Z

Agentic Artificial Intelligence (AI) represents a paradigm shift from reactive systems to proactive, autonomous decision making frameworks. Existing AI-based educational systems remain fragmented and lack multi-level integration across stakeholders. This paper proposes the Agentic Unified Student Support System (AUSS), a novel multi-agent architecture integrating student-level personalization, educator-level automation, and institutional-level intelligence. The framework leverages Large Language Models (LLMs), reinforcement learning, predictive analytics, and rule-based reasoning. Experimental results demonstrate improvements in recommendation accuracy (92.4%), grading efficiency (94.1%), and dropout prediction (F1-score: 89.5%). The proposed system enables scalable, adaptive, and intelligent educational ecosystems.

2026-04-17T10:25:35Z Arya Mary K J Deepthy K Bhaskar Sinu T S Binu V P http://arxiv.org/abs/2605.20210v1 Governance by Design: Architecting Agentic AI for Organizational Learning and Scalable Autonomy 2026-04-17T09:03:42Z

Agentic AI systems - systems that can pursue goals through multi-step planning and tool-mediated action with limited direct supervision - are moving from experimental prototypes to enterprise deployments. This transition introduces tensions in implementation, scaling, and governance: organizations seek scalable autonomy for knowledge and coordination work, yet must preserve accountability, safety, cost control, and responsibility as systems initiate actions, access enterprise data, and evolve through iterative updates. Building on an in-depth qualitative case of a large IT services company's 2025 development and staged rollout of an agentic system integrated with enterprise tools; we show that governance is implemented through concrete architectural and working arrangements that determine what the system is allowed to do, which tools and data it can use, how memory is handled, and how performance improvements are introduced over time. We then distill seven lessons that explain how to build effective governance into agentic AI during operationalization and scaling.

2026-04-17T09:03:42Z 17 pages, 1 figure, 3 tables Nelly Dux Cristina Alaimo Philippe Roussiere Abhishek Kumar Mishra http://arxiv.org/abs/2510.24758v2 A Digital Twin Framework for Decision-Support and Optimization of EV Charging Infrastructure in Localized Urban Systems 2026-04-17T07:36:27Z

As Electric Vehicle (EV) adoption accelerates in urban environments, optimizing charging infrastructure is vital for balancing user satisfaction, energy efficiency, and financial viability. This study advances beyond static models by proposing a digital twin framework that integrates agent-based decision support with embedded optimization to dynamically simulate EV charging behaviors, infrastructure layouts, and policy responses across scenarios. Applied to a localized urban site (a university campus) in Hanoi, Vietnam, the model evaluates operational policies, EV station configurations, and renewable energy sources. The interactive dashboard enables seasonal analysis, revealing a 20% drop in solar efficiency from October to March, with wind power contributing under 5% of demand, highlighting the need for adaptive energy management. Simulations show that dynamic notifications of newly available charging slots improve user satisfaction, while gasoline bans and idle fees enhance slot turnover with minimal added complexity. Embedded metaheuristic optimization identifies near-optimal mixes of fast (30kW) and standard (11kW) solar-powered chargers, balancing energy performance, profitability, and demand with high computational efficiency. This digital twin provides a flexible, computation-driven platform for EV infrastructure planning, with a transferable, modular design that enables seamless scaling from localized to city-wide urban contexts.

2025-10-21T12:26:35Z 38 pages, 11 figures. Accepted for publication in CEUS. This version is made available under the CC-BY-NC-ND 4.0 license. Final version available at: https://doi.org/10.1016/j.compenvurbsys.2026.102422 Computers, Environment and Urban Systems, Volume 127, 102422 (2026) Bui Khanh Linh Do Thanh H. Nguyen Nghi Huynh Quang Doanh Nguyen-Ngoc Laurent El Ghaoui 10.1016/j.compenvurbsys.2026.102422 http://arxiv.org/abs/2507.02935v3 Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration 2026-04-17T02:59:49Z

Successful human-agent teaming relies on an agent being able to understand instructions given by a (human) principal. In many cases, an instruction may be incomplete or ambiguous. In such cases, the agent must infer the unspoken intentions from their shared context, that is, it must exercise the principal's Theory of Mind (ToM) and infer the mental states of its principal. We consider the prospects of effective human-agent collaboration using large language models (LLMs). To assess ToM in a dynamic, goal-oriented, and collaborative environment, we introduce a novel task, Instruction Inference, in which an agent assists a principal in reaching a goal by interpreting incomplete or ambiguous instructions. We present Tomcat, an LLM-based agent, designed to exhibit ToM reasoning in interpreting and responding to the principal's instructions. We implemented two variants of Tomcat. One, dubbed Fs-CoT (Fs for few-shot, CoT for chain-of-thought), is based on a small number of examples demonstrating the requisite structured reasoning. One, dubbed CP (commonsense prompt), relies on commonsense knowledge and information about the problem. We realized both variants of Tomcat on three leading LLMs, namely, GPT-4o, DeepSeek-R1, and Gemma-3-27B. To evaluate the effectiveness of Tomcat, we conducted a study with 52 human participants in which we provided participants with the same information as the CP variant. We computed intent accuracy, action optimality, and planning optimality to measure the ToM capabilities of Tomcat and our study participants. We found that Tomcat with Fs-CoT, particularly with GPT-4o and DeepSeek-R1, achieves performance comparable to the human participants, underscoring its ToM potential for human-agent collaboration.

2025-06-26T20:44:12Z 66 pages with appendix, 10 figures (Appendix: 26 Figures), 11 tables. Code available at: https://github.com/fardinsaad/Tomcat-LLM Fardin Saad Pradeep K. Murukannaiah Munindar P. Singh http://arxiv.org/abs/2604.16543v1 Conjunctive Prompt Attacks in Multi-Agent LLM Systems 2026-04-17T02:31:09Z

Most LLM safety work studies single-agent models, but many real applications rely on multiple interacting agents. In these systems, prompt segmentation and inter-agent routing create attack surfaces that single-agent evaluations miss. We study \emph{conjunctive prompt attacks}, where a trigger key in the user query and a hidden adversarial template in one compromised remote agent each appear benign alone but activate harmful behavior when routing brings them together. We consider an attacker who changes neither model weights nor the client agent and instead controls only trigger placement and template insertion. Across star, chain, and DAG topologies, routing-aware optimization substantially increases attack success over non-optimized baselines while keeping false activations low. Existing defenses, including PromptGuard, Llama-Guard variants, and system-level controls such as tool restrictions, do not reliably stop the attack because no single component appears malicious in isolation. These results expose a structural vulnerability in agentic LLM pipelines and motivate defenses that reason over routing and cross-agent composition. Code is available at https://github.com/UCF-ML-Research/ConjunctiveAgents.

2026-04-17T02:31:09Z ACL 2026 Main Conference Nokimul Hasan Arif Qian Lou Mengxin Zheng http://arxiv.org/abs/2604.19813v1 Evolution of Lane-Changing Behavior in Mixed Traffic: A Quantum Game Theory Approach 2026-04-17T02:14:38Z

As automated vehicles (AVs) enter mixed traffic, proactively anticipating the evolution of human driving behavior during critical interactions, such as lane changes, is essential. However, classical Evolutionary Game Theory (EGT) fails to capture the complexity of human decision-making during lane changes. Specifically, by strictly assuming independence between agents, classical models calibrated on empirical payoffs predict a convergence to unrealistic full cooperation, contradicting the stable 42% cooperation rate observed in real-world data. To resolve this discrepancy, this study introduces a Quantum Game Theory (QGT) framework. We analyze 7,636 lane-changing interactions from the Waymo Open Motion Dataset (WOMD) to derive empirical payoff matrices via a Quantal Response Equilibrium (QRE) model. Utilizing the Marinatto-Weber (MW) quantization scheme, we introduce an entanglement parameter to mathematically embed latent correlations directly into the payoff structure of a single interaction. Our results identify a human entanglement parameter of $|b|^2_{HDV} \approx 0.52$ that accurately reproduces the observed mixed equilibrium. Furthermore, simulations of three AV deployment strategies (classical, entangled, and inverted) reveal that human adaptation depends critically on the underlying AV algorithm: while cooperative classical AVs maximize system-wide cooperation at high market penetration rates, defective inverted AVs paradoxically yield higher overall cooperation at low penetration rates by prompting more cooperative behaviors from human drivers. Consequently, rather than waiting for large scale deployment to observe these effects, stakeholders can utilize this framework to simulate repeated interactions and proactively anticipate how human driver behavior will evolve in response to specific AV software designs.

2026-04-17T02:14:38Z Sungyong Chung Tina Radvand Alireza Talebpour http://arxiv.org/abs/2604.15610v1 Scalable Algorithms with Provable Optimality Bounds for the Multiple Watchman Route Problem 2026-04-17T01:18:42Z

In this paper, we tackle the Multiple Watchman Route Problem (MWRP), which aims to find a set of paths that M watchmen can follow such that every location on the map can be seen by at least one watchman. First, we propose multiple methods to reduce the state space over which a search needs to be conducted by pruning map areas that are guaranteed to be seen en route to other areas. Next, we introduce MWRP-CP3, an efficient optimal planner that combines these methods with techniques that improve the quality and calculation time of existing heuristics. We present several suboptimal algorithms with bounds on solution quality, including MxWA*, a general variant of weighted A* for makespan problems. We also present anytime variations of our suboptimal algorithms, as well as techniques to improve an existing suboptimal solution by solving multiple decomposed sub-problems. We show that MWRP-CP3 can reduce the search space by more than 95% and runs more than 200x faster than existing optimal algorithms on 2D grid maps. We also show that our suboptimal algorithms solve maps 3x larger than those solvable by MWRP-CP3. See mwrp-cp3.github.io for the open source codebase and video demonstrations.

2026-04-17T01:18:42Z Srikar Gouru Ariel Felner Jiaoyang Li http://arxiv.org/abs/2604.15558v1 Preregistered Belief Revision Contracts 2026-04-16T22:22:54Z

Deliberative multi-agent systems allow agents to exchange messages and revise beliefs over time. While this interaction is meant to improve performance, it can also create dangerous conformity effects: agreement, confidence, prestige, or majority size may be treated as if they were evidence, producing high-confidence convergence to false conclusions. To address this, we introduce PBRC (Preregistered Belief Revision Contracts), a protocol-level mechanism that strictly separates open communication from admissible epistemic change. A PBRC contract publicly fixes first-order evidence triggers, admissible revision operators, a priority rule, and a fallback policy. A non-fallback step is accepted only when it cites a preregistered trigger and provides a nonempty witness set of externally validated evidence tokens. This ensures that every substantive belief change is both enforceable by a router and auditable after the fact. In this paper, (a) we prove that under evidential contracts with conservative fallback, social-only rounds cannot increase confidence and cannot generate purely conformity-driven wrong-but-sure cascades. (b) We show that auditable trigger protocols admit evidential PBRC normal forms that preserve belief trajectories and canonicalized audit traces. (c) We demonstrate that sound enforcement yields epistemic accountability: any change of top hypothesis is attributable to a concrete validated witness set. For token-invariant contracts, (d) we prove that enforced trajectories depend only on token-exposure traces; under flooding dissemination, these traces are characterized exactly by truncated reachability, giving tight diameter bounds for universal evidence closure. Finally, we introduce a companion contractual dynamic doxastic logic to specify trace invariants, and provide simulations illustrating cascade suppression, auditability, and robustness-liveness trade-offs.

2026-04-16T22:22:54Z Saad Alqithami http://arxiv.org/abs/2605.16298v1 Data-driven and distributed governance of building facilities management using decentralized autonomous organization, digital twin, and large language models 2026-04-16T20:10:35Z

While traditional AI and data-driven facilities management approaches have improved building operational efficiency, they remain constrained by centralized organizational structures that are vulnerable to cyber attacks, limited contextual understanding, and decision-making processes that exclude key stakeholders from governance. This paper introduces a novel AI- and data-driven distributed governance framework for smart building management that integrates decentralized autonomous organizations (DAOs), digital twins, large language models (LLMs), and blockchain technology. The framework enables transparent collective decision-making through a DAO governance platform, implements data-driven management using IoT and digital twins, incorporates LLM-based virtual assistants for enhanced decision support, and utilizes blockchain for secure building automation. A full-stack decentralized application was developed to facilitate user interaction with these integrated components. The system was evaluated for cost efficiency, scalability, data security, and usability using the System Usability Scale (SUS). Expert interviews were also conducted to assess its practical benefits and implementation challenges.

2026-04-16T20:10:35Z 33 pages, 20 figures, 4 tables Reachsak Ly Alireza Shojaei Xinghua Gao Philip Agee Abiola Akanmu http://arxiv.org/abs/2604.16534v1 Public and private blockchain for decentralized digital building twins and building automation system 2026-04-16T20:06:21Z

The communication protocols and data transfer mechanisms employed by IoT devices in smart buildings and corresponding digital twin systems predominantly rely on centralized architectures. Such centralized systems are vulnerable to single points of failure, where a malfunction can disrupt operational processes. This study introduces a blockchain-based decentralized protocol to enhance the cyber resilience of IoT data transfer for digital twins and enable decentralized automation of building operations. The framework incorporates public and private blockchain technologies alongside two case studies showcasing prototypes of each system. These prototypes were validated within a real-world building environment using smart home appliances and two digital twin platforms, with their performance evaluated based on cost, scalability, data security, and privacy. The findings reveal that the Hyperledger Fabric-based system excels in terms of scalability, speed, and cost-effectiveness, while both frameworks offer advantages over traditional centralized protocols in system cyber resilience, data security, and privacy.

2026-04-16T20:06:21Z 27 pages, 15 figures, 2 tables Reachsak Ly Alireza Shojaei http://arxiv.org/abs/2604.15475v1 NeuroMesh: A Unified Neural Inference Framework for Decentralized Multi-Robot Collaboration 2026-04-16T18:54:34Z

Deploying learned multi-robot models on heterogeneous robots remains challenging due to hardware heterogeneity, communication constraints, and the lack of a unified execution stack. This paper presents NeuroMesh, a multi-domain, cross-platform, and modular decentralized neural inference framework that standardizes observation encoding, message passing, aggregation, and task decoding in a unified pipeline. NeuroMesh combines a dual-aggregation paradigm for reduction- and broadcast-based information fusion with a parallelized architecture that decouples cycle time from end-to-end latency. Our high-performance C++ implementation leverages Zenoh for inter-robot communication and supports hybrid GPU/CPU inference. We validate NeuroMesh on a heterogeneous team of aerial and ground robots across collaborative perception, decentralized control, and task assignment, demonstrating robust operation across diverse task structures and payload sizes. We plan to release NeuroMesh as an open-source framework to the community.

2026-04-16T18:54:34Z 8 page, 8 figures, Accepted at the IEEE Robotics Automation Letter (RA-L) Yang Zhou Yash Shetye Long Quang Devon Super Jesse Milzman Manohari Goarin Aditya Azad Devang Sunil Dhake Jeffery Mao Carlos Nieto-Granda Giuseppe Loianno http://arxiv.org/abs/2511.00739v3 Towards Understanding, Analyzing, and Optimizing Agentic AI Execution: A CPU-Centric Perspective 2026-04-16T18:23:56Z

Agentic AI serving converts monolithic LLM-based inference to autonomous problem-solvers that can plan, call tools, perform reasoning, and adapt on the fly. Due to diverse task execution need, such serving heavily rely on heterogeneous CPU-GPU systems with majority of the external tools responsible for agentic capability, either run on or are orchestrated by the CPU. Towards having a deeper understanding of its role, this paper aims to characterize and analyze the system bottlenecks introduced by agentic AI workloads from a largely overlooked CPU-centric perspective. We first present a compile-time characterization of agentic AI execution and choose representative workloads to capture the algorithmic diversity. We then perform runtime characterization of the representative workloads analyzing the end-to-end latency and throughput on two different hardware systems to isolate respective architectural bottlenecks. Based on the insights on the bottlenecks, we finally present two scheduling optimizations, namely, 1. CPU-Aware Overlapped Micro-Batching (COMB) and 2. Mixed Agentic Scheduling (MAS) on homogeneous and heterogeneous agentic workloads, respectively. In specific, these methods optimize for improved CPU-GPU concurrent utilization while reducing skewed resource allocation for heterogeneous execution. Experimental evaluations on the two hardware systems demonstrate the efficacy of COMB in yielding up to 1.7x lower P50 latency in standalone homogeneous workload execution and up to 3.9x/1.8x lower service/total latency under homogeneous open-loop load. Additionally, for heterogeneous open-loop load, MAS can reduce the total latency for minority request-type by up to 2.37x/2.49x at P50/P90 percentile.

2025-11-01T23:46:44Z Ritik Raj Souvik Kundu Ishita Vohra Hong Wang Tushar Krishna http://arxiv.org/abs/2604.15267v1 CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas 2026-04-16T17:40:30Z

It is increasingly important that LLM agents interact effectively and safely with other goal-pursuing agents, yet, recent works report the opposite trend: LLMs with stronger reasoning capabilities behave _less_ cooperatively in mixed-motive games such as the prisoner's dilemma and public goods settings. Indeed, our experiments show that recent models -- with or without reasoning enabled -- consistently defect in single-shot social dilemmas. To tackle this safety concern, we present the first comparative study of game-theoretic mechanisms that are designed to enable cooperative outcomes between rational agents _in equilibrium_. Across four social dilemmas testing distinct components of robust cooperation, we evaluate the following mechanisms: (1) repeating the game for many rounds, (2) reputation systems, (3) third-party mediators to delegate decision making to, and (4) contract agreements for outcome-conditional payments between players. Among our findings, we establish that contracting and mediation are most effective in achieving cooperative outcomes between capable LLM models, and that repetition-induced cooperation deteriorates drastically when co-players vary. Moreover, we demonstrate that these cooperation mechanisms become _more effective_ under evolutionary pressures to maximize individual payoffs.

2026-04-16T17:40:30Z 65 pages, 38 Figures, 8 Tables, 17 Listings Emanuel Tewolde Xiao Zhang David Guzman Piedrahita Vincent Conitzer Zhijing Jin