https://arxiv.org/api/q7vVgnIEYzQxBOYotkF6XUBpzOA2026-06-28T01:43:00Z12761132015http://arxiv.org/abs/2604.03486v2VisionClaw: Always-On AI Agents through Smart Glasses2026-04-08T04:29:20ZWe present VisionClaw, an always-on wearable AI agent that integrates live egocentric perception with agentic task execution. Running on Meta Ray-Ban smart glasses, VisionClaw continuously perceives real-world context and enables in-situ, speech-driven action initiation and delegation via OpenClaw AI agents. Therefore, users can directly execute tasks through the smart glasses, such as adding real-world objects to an Amazon cart, generating notes from physical documents, receiving meeting briefings on the go, creating events from posters, or controlling IoT devices. We evaluate VisionClaw through a controlled laboratory study (N=12) and a longitudinal deployment study (N=5). Results show that integrating perception and execution enables faster task completion and reduces interaction overhead compared to non-always-on and non-agent baselines. Beyond performance gains, deployment findings reveal a shift in interaction: tasks are initiated opportunistically during ongoing activities, and execution is increasingly delegated rather than manually controlled. These results suggest a new paradigm for wearable AI agents, where perception and action are continuously coupled to support situated, hands-free interaction.2026-04-03T22:17:10Z17 pages, 11 figures, plus appendixXiaoan LiuDaeHo LeeEric J GonzalezMar Gonzalez-FrancoRyo Suzukihttp://arxiv.org/abs/2604.06629v1Logical Robots: Declarative Multi-Agent Programming in Logica2026-04-08T03:11:29ZWe present Logical Robots, an interactive multi-agent simulation platform where autonomous robot behavior is specified declaratively in the logic programming language Logica. Robot behavior is defined by logical predicates that map observations from simulated radar arrays and shared memory to desired motor outputs. This approach allows low-level reactive control and high-level planning to coexist within a single programming environment, providing a coherent framework for exploring multi-agent robot behavior.2026-04-08T03:11:29ZInternational Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 25-29, 2026. Paphos, CyprusEvgeny SkvortsovYilin XiaOjaswa GargShawn BowersBertram Ludäscherhttp://arxiv.org/abs/2604.06430v1Asynchronous Distributed Bandit Submodular Maximization under Heterogeneous Communication Delays2026-04-07T20:13:04ZWe study asynchronous distributed decision-making for scalable multi-agent bandit submodular maximization. We are motivated by distributed information-gathering tasks in unknown environments and under heterogeneous inter-agent communication delays. To enable scalability despite limited communication delays, existing approaches restrict each agent to coordinate only with its one-hop neighbors. But these approaches assume homogeneous communication delays among the agents and a synchronous global clock. In practice, however, delays are heterogeneous, and agents operate with mismatched local clocks. That is, each agent does not receive information from all neighbors at the same time, compromising decision-making. In this paper, we provide an asynchronous coordination algorithm to overcome the challenges. We establish a provable approximation guarantee against the optimal synchronized centralized solution, where the suboptimality gap explicitly depends on communication delays and clock mismatches. The bounds also depend on the topology of each neighborhood, capturing the effect of distributed decision-making via one-hop-neighborhood messages only. We validate the approach through numerical simulations on multi-camera area monitoring.2026-04-07T20:13:04ZPranjal SharmaZirui XuVasileios Tzoumashttp://arxiv.org/abs/2508.04691v2Before Humans Join the Team: Diagnosing Coordination Failures in Healthcare Robot Team Simulation2026-04-07T20:00:38ZAs humans move toward collaborating with coordinated robot teams, understanding how these teams coordinate and fail is essential for building trust and ensuring safety. However, exposing human collaborators to coordination failures during early-stage development is costly and risky, particularly in high-stakes domains such as healthcare. We adopt an agent-simulation approach in which all team roles, including the supervisory manager, are instantiated as LLM agents, allowing us to diagnose coordination failures before humans join the team. Using a controllable healthcare scenario, we conduct two studies with different hierarchical configurations to analyze coordination behaviors and failure patterns. Our findings reveal that team structure, rather than contextual knowledge or model capability, constitutes the primary bottleneck for coordination, and expose a tension between reasoning autonomy and system stability. By surfacing these failures in simulation, we prepare the groundwork for safe human integration. These findings inform the design of resilient robot teams with implications for process-level evaluation, transparent coordination protocols, and structured human integration. Supplementary materials, including codes, task agent setup, trace outputs, and annotated examples of coordination failures and reasoning behaviors, are available at: https://byc-sophie.github.io/mas-to-mars/.2025-08-06T17:54:10ZRevised version incorporating new analysis and restructuringYuanchen BaiZijian DingShaoyue WenXiang ChangAngelique Taylorhttp://arxiv.org/abs/2604.06392v1Qualixar OS: A Universal Operating System for AI Agent Orchestration2026-04-07T19:22:20ZWe present Qualixar OS, the first application-layer operating system for universal AI agent orchestration. Unlike kernel-level approaches (AIOS) or single-framework tools (AutoGen, CrewAI), Qualixar OS provides a complete runtime for heterogeneous multi-agent systems spanning 10 LLM providers, 8+ agent frameworks, and 7 transports. We contribute: (1) execution semantics for 12 multi-agent topologies including grid, forest, mesh, and maker patterns; (2) Forge, an LLM-driven team design engine with historical strategy memory; (3) three-layer model routing combining Q-learning, five strategies, and Bayesian POMDP with dynamic multi-provider discovery; (4) a consensus-based judge pipeline with Goodhart detection, JSD drift monitoring, and alignment trilemma navigation; (5) four-layer content attribution with HMAC signing and steganographic watermarks; (6) universal compatibility via the Claw Bridge supporting MCP and A2A protocols with a 25-command Universal Command Protocol; (7) a 24-tab production dashboard with visual workflow builder and skill marketplace. Qualixar OS is validated by 2,821 test cases across 217 event types and 8 quality modules. On a custom 20-task evaluation suite, the system achieves 100% accuracy at a mean cost of $0.000039 per task. Source-available under the Elastic License 2.0.2026-04-07T19:22:20Z20 pages, 7 figures, 8 tables. Zenodo DOI: 10.5281/zenodo.19454219Varun Pratap Bhardwaj10.5281/zenodo.1945421910.5281/zenodo.1945421910.5281/zenodo.1945421910.5281/zenodo.19454219http://arxiv.org/abs/2604.06148v1Who Governs the Machine? A Machine Identity Governance Taxonomy (MIGT) for AI Systems Operating Across Enterprise and Geopolitical Boundaries2026-04-07T17:51:56ZThe governance of artificial intelligence has a blind spot: the machine identities that AI systems use to act. AI agents, service accounts, API tokens, and automated workflows now outnumber human identities in enterprise environments by ratios exceeding 80 to 1, yet no integrated framework exists to govern them. A single ungoverned automated agent produced $5.4-10 billion in losses in the 2024 CrowdStrike outage; nation-state actors including Silk Typhoon and Salt Typhoon have operationalized ungoverned machine credentials as primary espionage vectors against critical infrastructure. This paper makes four original contributions. First, the AI-Identity Risk Taxonomy (AIRT): a comprehensive enumeration of 37 risk sub-categories across eight domains, each grounded in documented incidents, regulatory recognition, practitioner prevalence data, and threat intelligence. Second, the Machine Identity Governance Taxonomy (MIGT): an integrated six-domain governance framework simultaneously addressing the technical governance gap, the regulatory compliance gap, and the cross-jurisdictional coordination gap that existing frameworks address only in isolation. Third, a foreign state actor threat model for enterprise identity governance, establishing that Silk Typhoon, Salt Typhoon, Volt Typhoon, and North Korean AI-enhanced identity fraud operations have already operationalized AI identity vulnerabilities as active attack vectors. Fourth, a cross-jurisdictional regulatory alignment structure mapping enterprise AI identity governance obligations under EU, US, and Chinese frameworks simultaneously, identifying irreconcilable conflicts and providing a governance mechanism for managing them. A four-phase implementation roadmap translates the MIGT into actionable enterprise programs.2026-04-07T17:51:56Z75 pages (excl. references), 2 tables. Addresses policy makers, regulators, and practitioners at the intersection of AI governance, cybersecurity, and geopolitical riskAndrew KurtzKlaudia Krawieckahttp://arxiv.org/abs/2604.05977v1Adaptive Incentive Design with Regret Minimization2026-04-07T15:08:06ZIncentive design constitutes a foundational paradigm for influencing the behavior of strategic agents, wherein a system planner (principal) publicly commits to an incentive mechanism designed to align individual objectives with collective social welfare. This paper introduces the Regret-Minimizing Adaptive Incentive Design (RAID) problem, which aims to synthesize incentive laws under information asymmetry and achieve asymptotically minimal regret compared to an oracle with full information. To this end, we develop the RAID algorithm, which employs a switching policy alternating between probing (exploration) and estimate-based incentivization (exploitation). The associated type estimator relies only on a weaker excitation condition required for strong consistency in least squares estimation, substantially relaxing the persistence-of-excitation assumptions previously used in adaptive incentive design. In addition, we establish the strong consistency of the proposed type estimator and prove that the incentive obtained asymptotically minimizes the planner's average regret almost surely. Numerical experiments illustrate the convergence rate of the proposed methodology.2026-04-07T15:08:06Z8 pages, 3 figuresGeorgios VasileiouLantian ZhangSilun Zhanghttp://arxiv.org/abs/2604.05953v1Polynomial-Time Algorithm for Thiele Voting Rules with Voter Interval Preferences2026-04-07T14:46:37ZWe present a polynomial-time algorithm for computing an optimal committee of size $k$ under any given Thiele voting rule for elections on the Voter Interval domain (i.e., when voters can be ordered so that each candidate is approved by a consecutive voters). Our result extends to the Generalized Thiele rule, in which each voter has an individual weight (scoring) sequence. This resolves a 10-year-old open problem that was originally posed for Proportional Approval Voting and later extended to every Thiele rule (Elkind and Lackner, IJCAI 2015; Peters, AAAI 2018).
Our main technical ingredient is a new structural result -- a concavity theorem for families of intervals. It shows that, given two solutions of different sizes, one can construct a solution of any intermediate size whose score is at least the corresponding linear interpolation of the two scores. As a consequence, on Voter Interval profiles, the optimal total Thiele score is a concave function of the committee size. We exploit this concavity within an optimization framework based on a Lagrangian relaxation of a natural integer linear program formulation, obtained by moving the cardinality constraint into the objective. On Voter Interval profiles, the resulting constraint matrix is totally unimodular, so it can be solved in polynomial time.
Our main algorithm and its proof were obtained via human--AI collaboration. In particular, a slightly simplified version of the main structural theorem used by the algorithm was obtained in a single call to Gemini Deep Think.2026-04-07T14:46:37Z30 pagesPasin ManurangsiKrzysztof Sornathttp://arxiv.org/abs/2604.09703v1Cayley Graph Optimization for Scalable Multi-Agent Communication Topologies2026-04-07T13:43:09ZLarge-scale multi-agent communication has long faced a scalability bottleneck: fully connected networks require quadratic complexity, yet existing sparse topologies rely on hand-crafted rules. This paper treats the communication graph itself as a design variable and proposes CayleyTopo, a family of circulant Cayley graphs whose generator sets are optimized to minimize diameter, directly targeting worst-case information propagation speed. To navigate the enormous search space of possible generator sets, we develop a lightweight reinforcement learning framework that injects a number-theoretic prior to favor structurally rich generators, alongside a message-propagation score that provides dense connectivity feedback during construction. The resulting CayleyTopo consistently outperforms existing hand-crafted topologies, achieving faster information dissemination, greater resilience to link failures, and lower communication load, all while approaching the theoretical Moore bound. Our study opens the door to scalable, robust, and efficient communication foundations for future multi-agent systems, where the graph itself becomes optimizable rather than a fixed constraint.2026-04-07T13:43:09ZKeywords: Multi-agent communication, scalable topology, Cayley graph, diameter minimizationJingkai LuoYulin Shaohttp://arxiv.org/abs/2604.26959v1CareGuardAI: Context-Aware Multi-Agent Guardrails for Clinical Safety & Hallucination Mitigation in Patient-Facing LLMs2026-04-07T10:54:57ZIntegrating large language models (LLMs) into patient-facing healthcare systems offers significant potential to improve access to medical information. However, ensuring clinical safety and factual reliability remains a critical challenge. In practice, AI-generated responses may be conditionally correct yet medically inappropriate, as models often fail to interpret patient context and tend to produce agreeable responses rather than challenge unsafe assumptions. Unlike clinicians, who infer risk from incomplete information, LLMs frequently lack contextual awareness. Moreover, real-world patient interactions are open-ended and underspecified, unlike structured benchmark settings. We present CareGuardAI, a risk-aware safety framework for patient-facing medical question answering that addresses two key failure modes: clinical safety risk and hallucination risk. The framework introduces Clinical Safety Risk Assessment (SRA), inspired by ISO 14971, and Hallucination Risk Assessment (HRA) to evaluate medical risk and factual reliability. At inference time, CareGuardAI employs a multi-stage pipeline consisting of a controller agent, safety-constrained generation, and dual risk evaluation, followed by iterative refinement when necessary. Responses are released only when both SRA and HRA are less than or equal to 2, ensuring clinically acceptable outputs with bounded latency. We evaluate CareGuardAI on PatientSafeBench, MedSafetyBench, and MedHallu, covering both safety and hallucination detection. Across these benchmarks, the framework consistently outperforms strong baseline models, including GPT-4o-mini, demonstrating the importance of context-aware, risk-based, inference-time safety mechanisms for reliable deployment in healthcare.2026-04-07T10:54:57ZElham NasarianAbhilash NeogKwok-Leung TsuiNiyousha HosseiniChimehhttp://arxiv.org/abs/2604.05681v1LUDOBENCH: Evaluating LLM Behavioural Decision-Making Through Spot-Based Board Game Scenarios in Ludo2026-04-07T10:34:13ZWe introduce LudoBench, a benchmark for evaluating LLM strategic reasoning in Ludo, a stochastic multi-agent board game whose dice mechanics, piece capture, safe-square navigation, and home-path progression introduce meaningful planning complexity. LudoBench comprises 480 handcrafted spot scenarios across 12 behaviorally distinct decision categories, each isolating a specific strategic choice. We additionally contribute a fully functional 4-player Ludo simulator supporting Random, Heuristic, Game-Theory, and LLM agents. The game-theory agent uses Expectiminimax search with depth-limited lookahead to provide a principled strategic ceiling beyond greedy heuristics. Evaluating six models spanning four model families, we find that all models agree with the game-theory baseline only 40-46% of the time. Models split into distinct behavioral archetypes: finishers that complete pieces but neglect development, and builders that develop but never finish. Each archetype captures only half of the game theory strategy. Models also display measurable behavioral shifts under history-conditioned grudge framing on identical board states, revealing prompt-sensitivity as a key vulnerability. LudoBench provides a lightweight and interpretable framework for benchmarking LLM strategic reasoning under uncertainty. All code, the spot dataset (480 entries) and model outputs are available at https://anonymous.4open.science/r/LudoBench-5CBF/2026-04-07T10:34:13ZUnder ReviewOjas JainDhruv Kumarhttp://arxiv.org/abs/2601.21861v2Spatiotemporal Continual Learning for Mobile Edge UAV Networks: Mitigating Catastrophic Forgetting2026-04-07T10:17:12ZThis paper addresses catastrophic forgetting in mobile edge UAV networks within dynamic spatiotemporal environments. Conventional deep reinforcement learning often fails during task transitions, necessitating costly retraining to adapt to new user distributions. We propose the spatiotemporal continual learning (STCL) framework, realized through the group-decoupled multi-agent proximal policy optimization (G-MAPPO) algorithm. The core innovation lies in the integration of a group-decoupled policy optimization (GDPO) mechanism with a gradient orthogonalization layer to balance heterogeneous objectives including energy efficiency, user fairness, and coverage. This combination employs dynamic z-score normalization and gradient projection to mitigate conflicts without offline resets. Furthermore, 3D UAV mobility serves as a spatial compensation layer to manage extreme density shifts. Simulations demonstrate that the STCL framework ensures resilience, with service reliability recovering to over 0.9 for moderate loads of up to 100 users. Even under extreme saturation with 140 users, G-MAPPO maintains a significant performance lead over the multi-agent deep deterministic policy gradient (MADDPG) baseline by preventing policy stagnation. The algorithm delivers an effective capacity gain of 20 percent under high traffic loads, validating its potential for scalable aerial edge swarms.2026-01-29T15:31:04Z13 pages, 4 figures, 2 tables, manuscript submitted to IEEE journal for possible publicationChuan-Chi Laihttp://arxiv.org/abs/2603.26684v2Decoupling Geometric Planning and Execution in Scalable Multi-Agent Path Finding2026-04-07T07:20:56ZMulti-Agent Path Finding (MAPF) requires collision-free trajectories for multiple agents on a shared graph, often with the objective of minimizing the sum-of-costs (SOC). Many optimal and bounded-suboptimal solvers rely on time-expanded models and centralized conflict resolution, which limits scalability in large or dense instances. We propose a hybrid prioritized framework that separates \emph{geometric planning} from \emph{execution-time conflict resolution}. In the first stage, \emph{Geometric Conflict Preemption (GCP)} plans agents sequentially with A* on the original graph while inflating costs for transitions entering vertices used by higher-priority paths, encouraging spatial detours without explicit time reasoning. In the second stage, a \emph{Decentralized Local Controller (DLC)} executes the geometric paths using per-vertex FIFO authorization queues and inserts wait actions to avoid vertex and edge-swap conflicts. Experiments on standard benchmark maps with up to 1000 agents show that the method scales with an near-linear runtime trend and attains a 100\% success rate on instances satisfying the geometric feasibility assumption. Page of the project: https://sites.google.com/unizar.es/multi-agent-path-finding/home2026-03-11T11:04:54Z6 pages, 3 figures, WODES conference paperFernando SalanovaEduardo MontijanoCristian Mahuleahttp://arxiv.org/abs/2601.05905v2Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency2026-04-07T05:02:53ZAs Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextual perturbations. Existing evaluations largely rely on point-wise confidence like Self-Consistency, which can mask brittle belief. We show that even facts answered with perfect self-consistency can rapidly collapse under mild contextual interference. To address this gap, we propose Neighbor-Consistency Belief (NCB), a structural measure of belief robustness that evaluates response coherence across a conceptual neighborhood. To validate the efficiency of NCB, we introduce a new cognitive stress-testing protocol that probes outputs stability under contextual interference. Experiments across multiple LLMs show that the performance of high-NCB data is relatively more resistant to interference. Finally, we present Structure-Aware Training (SAT), which optimizes context-invariant belief structure and reduces long-tail knowledge brittleness by approximately 30%. Code is available at https://github.com/zjunlp/belief.2026-01-09T16:23:21ZACL 2026Haoming XuNingyuan ZhaoYunzhi YaoWeihong XuHongru WangXinle DengShumin DengJeff Z. PanHuajun ChenNingyu Zhanghttp://arxiv.org/abs/2601.05930v2Can We Predict Before Executing Machine Learning Agents?2026-04-07T04:58:28ZAutonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Execution Bottleneck, as hypothesis evaluation relies strictly on expensive physical execution. To bypass these physical constraints, we internalize execution priors to substitute costly runtime checks with instantaneous predictive reasoning, drawing inspiration from World Models. In this work, we formalize the task of Data-centric Solution Preference and construct a comprehensive corpus of 18,438 pairwise comparisons. We demonstrate that LLMs exhibit significant predictive capabilities when primed with a Verified Data Analysis Report, achieving 61.5% accuracy and robust confidence calibration. Finally, we instantiate this framework in FOREAGENT, an agent that employs a Predict-then-Verify loop, achieving a 6x acceleration in convergence while surpassing execution-based baselines by +6%. Our code and dataset are publicly available at https://github.com/zjunlp/predict-before-execute.2026-01-09T16:44:17ZACL 2026Jingsheng ZhengJintian ZhangYujie LuoYuren MaoYunjun GaoLun DuHuajun ChenNingyu Zhang