https://arxiv.org/api/opUIsrdVJfcnO0Wu59cPV/KsN2E 2026-06-27T11:19:03Z 12761 1110 15 http://arxiv.org/abs/2605.23930v1 Quantum Frog: Emergent Cooperation and Difficulty Scaling in a Quantized-Time Cooperative Game 2026-04-22T00:55:08Z

We introduce \emph{Quantum Frog}, a two-player cooperative game built on a novel \emph{quantized-time} mechanic in which the environment advances only when a player acts. Inspired by the classic arcade game Frogger, Quantum Frog requires two frogs to cross an 8$\times$8 grid of traffic and reach the far side together. We use reinforcement learning (RL) as an analytical lens to answer four design questions: (1) how does game difficulty scale with traffic density, (2) what is the optimal single-agent policy and why, (3) how large is the cooperation gap between independent and cooperative two-agent play, and (4) what joint strategy emerges when agents are incentivised to cooperate? We train agents through five escalating stages, Tabular Q-Learning, Deep Q-Network (\DQN), Independent \DQN~(\IDQN), and Multi-Agent Proximal Policy Optimisation (\MAPPO\ with a centralised critic), evaluating each against traffic densities of one to six cars. Our key findings are: (i) the quantized-time mechanic makes a \emph{rush strategy} (moving directly upward at every step) universally optimal, as time exposure to traffic is minimised; (ii) adding an uncoordinated second player is harder than sextupling the traffic for a single expert player; (iii) cooperative training recovers +32--34 percentage points of joint success rate relative to independent agents and reduces episode length from $\sim$90 to $\sim$6 steps; and (iv) the emergent cooperative strategy is synchronised rushing, not complex positional coordination, illustrating that shared incentives alone suffice to align agents in time-critical cooperative tasks. These findings provide concrete, empirically grounded guidance for the commercial design of Quantum Frog and offer broader insights into the role of environment mechanics in shaping multi-agent learning dynamics.

2026-04-22T00:55:08Z Saad Mankarious http://arxiv.org/abs/2604.19589v1 TeamFusion: Supporting Open-ended Teamwork with Multi-Agent Systems 2026-04-21T15:40:46Z

In open-ended domains, teams must reconcile diverse viewpoints to produce strong deliverables. Answer aggregation approaches commonly used in closed domains are ill-suited to this setting, as they tend to suppress minority perspectives rather than resolve underlying disagreements. We present TeamFusion, a multi-agent system designed to support teamwork in open-ended domains by: 1. Instantiating a proxy agent for each team member conditioned on their expressed preferences; 2. Conducting a structured discussion to surface agreements and disagreements; and 3. Synthesizing more consensus-oriented deliverables that feed into new iterations of discussion and refinement. We evaluate TeamFusion on two teamwork tasks where team members can assess how well their individual views are represented in team decisions and how consensually strong the final deliverables are, finding that it outperforms direct aggregation baselines across metrics, tasks, and team configurations.

2026-04-21T15:40:46Z 22 pages Jiale Liu Victor S. Bursztyn Lin Ai Haoliang Wang Sunav Choudhary Saayan Mitra Qingyun Wu http://arxiv.org/abs/2604.19541v1 FOCAL: Filtered On-device Continuous Activity Logging for Efficient Personal Desktop Summarization 2026-04-21T15:00:41Z

Desktop interaction streams provide a continuous, privacy-sensitive record of interleaved user tasks. Transforming these streams into task-organized personal logs on-device faces two main challenges: exhaustive Vision-Language Model (VLM) processing strains local resources, and global stream processing causes cross-task context pollution. We present FOCAL (Filtered On-device Continuous Activity Logging), a privacy-first multi-agent system utilizing a unified filter-plan-log architecture. It cascades a lightweight Filter Agent for noise suppression, a text-only Brain Agent for task attribution, a Record Agent for selective visual reasoning, and a task-isolated Memory Agent for context-coherent summarization. Experiments on DesktopBench (comprising 2,572 screenshots across 420 complex sessions) show FOCAL reduces total token consumption by 60.4% and VLM call count by 72.3% versus a baseline, while boosting Key Information Recall (KIR) from 0.38 to 0.61. Crucially, under $A{\to}B{\to}A$ task interruptions, FOCAL maintains Task Acc 0.81 and KIR 0.80, whereas the baseline collapses to Task Acc 0.03. FOCAL pioneers the efficient, on-device summarization of instruction-free desktop streams into multi-perspective personal logs.

2026-04-21T15:00:41Z Haoran Yin Zhiyuan Wen Jiannong Cao Bo Yuan Ruosong Yang http://arxiv.org/abs/2604.19540v1 Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems 2026-04-21T15:00:25Z

Teams of LLM agents increasingly collaborate on tasks spanning days or weeks: multi-day data-generation sprints where generator, reviewer, and auditor agents coordinate in real time on overlapping batches; specialists carrying findings forward across session restarts; product decisions compounding over many review rounds. This requires agents to share, evaluate, and combine each other's cognitive state in real time across sessions. We call this cross-session agent-to-agent cognitive collaboration, distinct from parallel agent execution. To enable it, three problems must be solved together. (P1) Each agent decides field by field what to accept from peers, not accept or reject whole messages. (P2) Every claim is traceable to source, so returning claims are recognised as echoes of the receiver's own prior thinking. (P3) Memory that survives session restarts is relevant because of how it was stored, not how it is retrieved. These are protocol-level properties at the semantic layer of agent communication, distinct from tool-access and task-delegation protocols at lower layers. We call this missing protocol layer "semantic infrastructure," and the Mesh Memory Protocol (MMP) specifies it. Four composable primitives work together: CAT7, a fixed seven-field schema for every Cognitive Memory Block (CMB); SVAF, which evaluates each field against the receiver's role-indexed anchors and realises P1; inter-agent lineage, carried as parents and ancestors of content-hash keys and realising P2; and remix, which stores only the receiver's own role-evaluated understanding of each accepted CMB, never the raw peer signal, realising P3. MMP is specified, shipped, and running in production across three reference deployments, where each session runs an autonomous agent as a mesh peer with its own identity and memory, collaborating with other agents across the network for collective intelligence.

2026-04-21T15:00:25Z 23 pages, 2 figures, 2 listings, 1 table. MMP v0.2.3 specification at https://sym.bot/spec/mmp (CC BY 4.0). Reference implementations on npm (@sym-bot/sym, @sym-bot/mesh-channel; Apache 2.0) Hongwei Xu http://arxiv.org/abs/2604.19538v1 Integrating Anomaly Detection into Agentic AI for Proactive Risk Management in Human Activity 2026-04-21T14:57:36Z

Agentic AI, with goal-directed, proactive, and autonomous decision-making capabilities, offers a compelling opportunity to address movement-related risks in human activity, including the persistent hazard of falls among elderly populations. Despite numerous approaches to fall mitigation through fall prediction and detection, existing systems have not yet functioned as universal solutions across care pathways and safety-critical environments. This is largely due to limitations in consistently handling real-world complexity, particularly poor context awareness, high false alarm rates, environmental noise, and data scarcity. We argue that fall detection and fall prediction can usefully be formulated as anomaly detection problems and more effectively addressed through an agentic AI system. More broadly, this perspective enables the early identification of subtle deviations in movement patterns associated with increased risk, whether arising from age-related decline, fatigue, or environmental factors. While technical requirements for immediate deployment are beyond the scope of this paper, we propose a conceptual framework that highlights potential value. This framework promotes a well-orchestrated approach to risk management by dynamically selecting relevant tools and integrating them into adaptive decision-making workflows, rather than relying on static configurations tailored to narrowly defined scenarios.

2026-04-21T14:57:36Z 6 pages, 3 figures Farbod Zorriassatine Ahmad Lotfi http://arxiv.org/abs/2605.23928v1 Context: Proactive Goal-Directed Intelligence via Composable Sandboxed Programs, Declarative Wiring, and Structured Interaction 2026-04-21T14:39:40Z

We present Context, the intelligence layer of the Magarshak Architecture, which replaces reactive query-response chatbots with proactive goal-directed agents that advance shared tasks without waiting for user prompts. The architecture rests on three mutually reinforcing mechanisms. Write-time context assembly precomputes enriched typed attributes via Groker agents, assembling interaction context as a deterministic pure function of graph state; context blocks are byte-identical across turns between semantic changes, enabling near-100% KV-cache reuse. Composable sandboxed wisdom programs form a governed library of LM-generated imperative programs declaratively wired to goal types via typed stream relations, composed via phase ordering, and executed at interaction time without further LM calls. Proactive goal stream state machines drive conversations toward terminal states by inspecting graph state and emitting structured interaction content (option arrays, governance affordances, clarification prompts) without awaiting user input. We prove six formal results: the Context Stability Theorem, bounding per-turn LM cost as a function of semantic change rate; a Program Composition Correctness Theorem; a Declarative Wiring Soundness Theorem; the Proactive Dominance Theorem, proving proactive agents weakly dominate reactive agents on expected turns-to-terminal-state; Coordination Overhead Elimination and Quality Preservation, establishing Pareto improvements in multi-participant goal chats; and a Cross-Platform Vote Consistency Theorem. Implemented in the open-source Qbix / Safebox / Safebots stack.

2026-04-21T14:39:40Z 7 pages; third in a series with arXiv:2501.XXXXX (Magarshak Machine / SPACER) and arXiv:2502.XXXXX (Grokers) Gregory Magarshak http://arxiv.org/abs/2604.19509v1 Assessing VLM-Driven Semantic-Affordance Inference for Non-Humanoid Robot Morphologies 2026-04-21T14:26:29Z

Vision-language models (VLMs) have demonstrated remarkable capabilities in understanding human-object interactions, but their application to robotic systems with non-humanoid morphologies remains largely unexplored. This work investigates whether VLMs can effectively infer affordances for robots with fundamentally different embodiments than humans, addressing a critical gap in the deployment of these models for diverse robotic applications. We introduce a novel hybrid dataset that combines annotated real-world robotic affordance-object relations with VLM-generated synthetic scenarios, and perform an empirical analysis of VLM performance across multiple object categories and robot morphologies, revealing significant variations in affordance inference capabilities. Our experiments demonstrate that while VLMs show promising generalisation to non-humanoid robot forms, their performance is notably inconsistent across different object domains. Critically, we identify a consistent pattern of low false positive rates but high false negative rates across all morphologies and object categories, indicating that VLMs tend toward conservative affordance predictions. Our analysis reveals that this pattern is particularly pronounced for novel tool use scenarios and unconventional object manipulations, suggesting that effective integration of VLMs in robotic systems requires complementary approaches to mitigate over-conservative behaviour while preserving the inherent safety benefits of low false positive rates.

2026-04-21T14:26:29Z AAMAS 2026 (main track), 9 pages, 4 figures Jess Jones Raul Santos-Rodriguez Sabine Hauert 10.65109/WTKR8312 http://arxiv.org/abs/2512.20640v2 Reflection-Driven Self-Optimization 6G Agentic AI RAN via Simulation-in-the-Loop Workflows 2026-04-21T11:04:29Z

The escalating complexity of sixth-generation (6G) networks demands unprecedented levels of autonomy beyond the capabilities of traditional optimization-based and current AI-based resource management approaches. While agentic AI has emerged as a promising paradigm for autonomous RAN, current frameworks provide sophisticated reasoning capabilities but lack mechanisms for empirical validation and self-improvement. This article identifies simulation-in-the-loop validation as a critical enabler for truly autonomous networks, where AI agents can empirically verify decisions and learn from outcomes. We present the first reflection-driven self-optimization framework that integrates agentic AI with high-fidelity network simulation in a closed-loop architecture. Our system orchestrates four specialized agents, including scenario, solver, simulation, and reflector agents, working in concert to transform agentic AI into a self-correcting system capable of escaping local optima, recognizing implicit user intent, and adapting to dynamic network conditions. Extensive experiments validate significant performance improvements over non-agentic approaches: 17.1\% higher throughput in interference optimization, 67\% improved user QoS satisfaction through intent recognition, and 25\% reduced resource utilization during low-traffic periods while maintaining service quality.

2025-12-08T06:34:35Z Yunhao Hu Xinchen Lyu Chenshan Ren Keda Chen Qimei Cui Xiaofeng Tao http://arxiv.org/abs/2604.19301v1 Large Language Models Exhibit Normative Conformity 2026-04-21T10:06:25Z

The conformity bias exhibited by large language models (LLMs) can pose a significant challenge to decision-making in LLM-based multi-agent systems (LLM-MAS). While many prior studies have treated "conformity" simply as a matter of opinion change, this study introduces the social psychological distinction between informational conformity and normative conformity in order to understand LLM conformity at the mechanism level. Specifically, we design new tasks to distinguish between informational conformity, in which participants in a discussion are motivated to make accurate judgments, and normative conformity, in which participants are motivated to avoid conflict or gain acceptance within a group. We then conduct experiments based on these task settings. The experimental results show that, among the six LLMs evaluated, up to five exhibited tendencies toward not only informational conformity but also normative conformity. Furthermore, intriguingly, we demonstrate that by manipulating subtle aspects of the social context, it may be possible to control the target toward which a particular LLM directs its normative conformity. These findings suggest that decision-making in LLM-MAS may be vulnerable to manipulation by a small number of malicious users. In addition, through analysis of internal vectors associated with informational and normative conformity, we suggest that although both behaviors appear externally as the same form of "conformity," they may in fact be driven by distinct internal mechanisms. Taken together, these results may serve as an initial milestone toward understanding how "norms" are implemented in LLMs and how they influence group dynamics.

2026-04-21T10:06:25Z Mikako Bito Keita Nishimoto Kimitaka Asatani Ichiro Sakata http://arxiv.org/abs/2604.19247v1 BONSAI: A Mixed-Initiative Workspace for Human-AI Co-Development of Visual Analytics Applications 2026-04-21T08:57:49Z

Developing Visual Analytics (VA) applications requires integrating complex machine learning models with expressive interactive interfaces. Developers face a stark trade-off: building tightly-coupled monoliths plagued by fragile interdependencies, or relying on restrictive, simplistic frameworks. Meanwhile, unconstrained, single-shot AI code generation promises speed but yields unstructured, unauditable chaos. The core challenge is combining the control and expressiveness of custom development with the efficiency of AI generation under strict constraints. To address this, we introduce BONSAI, a mixed-initiative workspace for the multi-agent co-development of VA applications. BONSAI utilizes a modular four-layer architecture (hardware, services, orchestration, application) that allows human and AI developers to independently contribute reusable components. The workspace incorporates this architecture into a structured four-phase development process (plan, design, monitor, and review), ensuring distributed agency and full provenance, where all human and AI contributions are structurally bounded and tracked. We evaluate BONSAI through case studies demonstrating the efficient creation of novel tools and the rapid reconstruction of complex VA applications directly from research paper descriptions. Ultimately, this paper contributes a conceptual workflow, a scalable architecture, and an integrated system that successfully balances AI's generative speed with the structural rigor required for complex VA development.

2026-04-21T08:57:49Z 9 pages paper, 2 pages references, 10 figures Thilo Spinner Matthias Miller Fabian Sperrle-Roth Mennatallah El-Assady http://arxiv.org/abs/2604.19837v1 Forage V2: Knowledge Evolution and Transfer in Autonomous Agent Organizations 2026-04-21T08:14:09Z

Autonomous agents operating in open-world tasks -- where the completion boundary is not given in advance -- face denominator blindness: they systematically underestimate the scope of the target space. Forage V1 addressed this through co-evolving evaluation (an independent Evaluator discovers what "complete" means) and method isolation (Evaluator and Planner cannot see each other's code). V2 extends the architecture from a single expedition to a learning organization: experience accumulates across runs, transfers across model capabilities, and institutional safeguards prevent knowledge degradation. We demonstrate two claims across three task types (web scraping, API queries, mathematical reasoning). Knowledge accumulation: over six runs, knowledge entries grow from 0 to 54, and denominator estimates stabilize as domain understanding deepens. Knowledge transfer: a weaker agent (Sonnet) seeded with a stronger agent's (Opus) knowledge narrows a 6.6pp coverage gap to 1.1pp, halves cost (9.40 to 5.13 USD), converges in half the rounds (mean 4.5 vs. 7.0), and three independent seeded runs arrive at exactly the same denominator estimate (266), suggesting organizational knowledge calibrates evaluation itself. V2's contribution is architectural: it designs institutions -- audit separation, contract protocols, organizational memory -- that make any agent more reliable upon entry. The accumulated experience is organizational, model-agnostic, and transferable, stored as readable documents that any future agent inherits regardless of provider or capability level.

2026-04-21T08:14:09Z Huaqing Xie http://arxiv.org/abs/2605.23927v1 TEAM-SimHRA: A Team-Based Simulation Framework for Human Reliability Analysis Using Multi-Agent Large Language Models 2026-04-21T07:23:06Z

Team-level failure in nuclear control rooms arises not from isolated operator error, but from emergent interaction dynamics, delayed diagnosis, suppressed dissent, and authority-driven error propagation, that conventional human reliability analysis methods are structurally unable to model. This study introduces TEAM-SimHRA, a multi-agent large language model simulation framework that reconceptualizes human reliability as an interaction-driven emergent property of control room teams rather than a static individual attribute. Unlike existing approaches that assign fixed error probabilities to predefined tasks, TEAM-SimHRA reproduces collective cognition, role-conditioned authority dynamics, and real-time communication suppression across temporally evolving accident progressions. Validated against the Three Mile Island (1979) and Chernobyl (1986) accidents, the two most extensively documented nuclear team failures , the framework achieves face-validity pass rates of 43.5% and 52.6% respectively, reproducing near-historical decision delay (134.8 vs. 138 min), perfect communication suppression stability, and full authority pressure cascade at historically accurate propagation depth. These results demonstrate that multi-agent simulation can extract quantitative team-level reliability indicators that are inaccessible to traditional methods, opening a viable path toward simulation-based dynamic probabilistic risk assessment for safety-critical sociotechnical systems.

2026-04-21T07:23:06Z Xingyu Xiao Jiejuan Tong Jingang Liang Haitao Wang http://arxiv.org/abs/2604.17995v2 Multi-UAV Path Following using Vector-Field Guidance 2026-04-21T04:15:37Z

This paper presents a decentralized, collision-free framework for path following guidance of multiple uncrewed aerial vehicles (UAVs), while maintaining uniform spacing along a reference path. A vector field-based guidance law is employed to drive each UAV toward the reference path. A rotational repulsion mechanism, utilizing relative distance and bearing between UAVs, is proposed to avoid collisions during convergence to the path, and an inter-UAV spacing error-based velocity control law is presented to achieve uniform separation along the path. Analytical guarantees are established for collision avoidance and convergence of the inter-UAV spacing errors to zero, ensuring uniform separation along the path. Numerical simulations demonstrate the efficacy of the proposed method.

2026-04-20T09:20:21Z Submitted to 2026 Modeling, Estimation and Control Conference (MECC) Gautam Kumar Amit Shivam Ashwini Ratnoo http://arxiv.org/abs/2604.18005v2 Diversity Collapse in Multi-Agent LLM Systems: Structural Coupling and Collective Failure in Open-Ended Idea Generation 2026-04-21T04:12:30Z

Multi-agent systems (MAS) are increasingly used for open-ended idea generation, driven by the expectation that collective interaction will broaden the exploration diversity. However, when and why such collaboration truly expands the solution space remains unclear. We present a systematic empirical study of diversity in MAS-based ideation across three bottom-up levels: model intelligence, agent cognition, and system dynamics. At the model level, we identify a compute efficiency paradox, where stronger, highly aligned models yield diminishing marginal diversity despite higher per-sample quality. At the cognition level, authority-driven dynamics suppress semantic diversity compared to junior-dominated groups. At the system level, group-size scaling yields diminishing returns and dense communication topologies accelerate premature convergence. We characterize these outcomes as collective failures emerging from structural coupling, a process where interaction inadvertently contracts agent exploration and triggers diversity collapse. Our analysis shows that this collapse arises primarily from the interaction structure rather than inherent model insufficiency, highlighting the importance of preserving independence and disagreement when designing MAS for creative tasks. Our code is available at https://github.com/Xtra-Computing/MAS_Diversity.

2026-04-20T09:27:49Z 56 pages, 15 figures; Accepted at ACL 2026 Findings Nuo Chen Yicheng Tong Yuzhe Yang Yufei He Xueyi Zhang Qingyun Zou Qian Wang Bingsheng He http://arxiv.org/abs/2604.19026v1 ClawCoin: An Agentic AI-Native Cryptocurrency for Decentralized Agent Economies 2026-04-21T03:21:55Z

Autonomous AI agents live or die by the API tokens they consume: without paid inference capacity they cannot reason, act, or delegate. Compute-token cost has become the binding resource of the emerging agent economy, yet it is non-transferable: it is account-bound, vendor-specific, and absent from on-chain ledgers. Existing payment rails such as x402 move fiat-backed value between agents, but they do not represent the quantity agents actually burn. As a result, agents can transport purchasing power but cannot quote, escrow, or settle workflows in a unit aligned with compute cost. We present ClawCoin, a tokenized, compute-cost-indexed unit of account and settlement asset for decentralized agent economies. ClawCoin combines four layers: a robust basket index over standardized prices; an oracle publishing signed fresh attestations; a NAV-based mint/redeem vault with coverage thresholds and rate limits; and an on-chain settlement layer for multi-hop delegations. We implement a prototype on an Ethereum-compatible L2 and evaluate it using a multi-agent simulator and the OpenClaw testbed. Across single-agent, multi-agent, workflow, and procurement experiments, ClawCoin stabilizes execution capacity under cost shocks, reduces cross-agent quote dispersion, eliminates partial settlements, and sustains cooperative market dynamics that fiat-denominated baselines cannot. These results suggest that compute-indexed units of account can improve decentralized agent coordination.

2026-04-21T03:21:55Z Shaoyu Li Chaoyu Zhang Hexuan Yu Y. Thomas Hou Wenjing Lou