https://arxiv.org/api/Qgd94ShmM5ERdZbRfzQ+yLHSpF0 2026-06-27T09:18:29Z 12761 1080 15 http://arxiv.org/abs/2604.22014v1 DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation 2026-04-23T19:07:52Z

We present DM$^3$-Nav, a fully decentralized multi-agent semantic navigation system supporting multimodal open-vocabulary goal specification and multi-object missions. In our setting, decentralization implies operation without a central coordinator, global map aggregation, or shared global state at runtime. Robots operate autonomously and coordinate through ad-hoc pairwise communication, exchanging local maps, goal status, and navigation intent without synchronization. An implicit task allocation mechanism combining intent broadcasting and distance-weighted frontier selection reduces redundant exploration while preserving decentralized operation. Evaluations on HM3DSem scenes using the HM3Dv0.2 and GOAT-Bench datasets demonstrate that DM$^3$-Nav matches or exceeds centralized and shared-map baselines while eliminating single points of failure inherent in centralized architectures. Finally, we validate our approach in a real-world office environment using two mobile robots, demonstrating successful deployment relying entirely on onboard sensing and computation. A video of our real-world experiments is available online: https://drive.google.com/file/d/1QiUSCn5rIvtuTUqtuXLPgmt6S8x9-MCZ/view?usp=drive_link

2026-04-23T19:07:52Z Amin Kashiri Northeastern University, Boston, USA Atharva Jamsandekar Northeastern University, Boston, USA Yasin Yazıcıoğlu Northeastern University, Boston, USA http://arxiv.org/abs/2510.04371v2 Speculative Actions: A Lossless Framework for Faster Agentic Systems 2026-04-23T17:58:32Z

AI agents are increasingly deployed in complex, interactive environments, yet their runtime remains a major bottleneck for training, evaluation, and real-world use. Typical agent behavior unfolds sequentially, with each action requiring an API call that can incur substantial latency. For example, a game of chess between two state-of-the-art agents can take hours. We introduce Speculative Actions, a lossless acceleration framework for general agentic systems. Inspired by speculative execution in microprocessors and speculative decoding in LLM inference, our method uses faster models to predict likely future actions and execute them in parallel, committing only when predictions match. We evaluate speculative actions across gaming, e-commerce, and web search environments, and additionally study a lossy extension in an operating systems setting. Across domains, we achieve up to 55% next-action prediction accuracy, translating into up to 20% latency reductions. Finally, we present a cost-latency analysis that formalizes the tradeoff between speculative breadth and time savings. This analysis enables principled tuning and selective branch launching to ensure that multi-branch speculation delivers practical speedups without prohibitive cost growth.

2025-10-05T21:28:11Z Naimeng Ye Arnav Ahuja Georgios Liargkovas Yunan Lu Kostis Kaffes Tianyi Peng http://arxiv.org/abs/2604.21894v1 Task-Driven Co-Design of Heterogeneous Multi-Robot Systems 2026-04-23T17:44:52Z

Designing multi-agent robotic systems requires reasoning across tightly coupled decisions spanning heterogeneous domains, including robot design, fleet composition, and planning. Much effort has been devoted to isolated improvements in these domains, whereas system-level co-design considering trade-offs and task requirements remains underexplored. In this work, we present a formal and compositional framework for the task-driven co-design of heterogeneous multi-robot systems. Building on a monotone co-design theory, we introduce general abstractions of robots, fleets, planners, executors, and evaluators as interconnected design problems with well-defined interfaces that are agnostic to both implementations and tasks. This structure enables efficient joint optimization of robot design, fleet composition, and planning under task-specific performance constraints. A series of case studies demonstrates the capabilities of the framework. Various component models can be seamlessly incorporated, including new robot types, task profiles, and probabilistic sensing objectives, while non-obvious design alternatives are systematically uncovered with optimality guarantees. The results highlight the flexibility, scalability, and interpretability of the proposed approach, and illustrate how formal co-design enables principled reasoning about complex heterogeneous multi-robot systems.

2026-04-23T17:44:52Z Maximilian Stralz Meshal Alharbi Yujun Huang Gioele Zardini http://arxiv.org/abs/2604.21811v1 Probably Approximately Consensus: On the Learning Theory of Finding Common Ground 2026-04-23T16:06:41Z

A primary goal of online deliberation platforms is to identify ideas that are broadly agreeable to a community of users through their expressed preferences. Yet, consensus elicitation should ideally extend beyond the specific statements provided by users and should incorporate the relative salience of particular topics. We address this issue by modelling consensus as an interval in a one-dimensional opinion space derived from potentially high-dimensional data via embedding and dimensionality reduction. We define an objective that maximizes expected agreement within a hypothesis interval where the expectation is over an underlying distribution of issues, implicitly taking into account their salience. We propose an efficient Empirical Risk Minimization (ERM) algorithm and establish PAC-learning guarantees. Our initial experiments demonstrate the performance of our algorithm and examine more efficient approaches to identifying optimal consensus regions. We find that through selectively querying users on an existing sample of statements, we can reduce the number of queries needed to a practical number.

2026-04-23T16:06:41Z Accepted to the Social Choice and Learning Algorithms Workshop at IJCAI 2025 Carter Blair Ben Armstrong Shiri Alouf-Heffetz Nimrod Talmon Davide Grossi http://arxiv.org/abs/2606.09840v1 Envisioning Sensemaking in Multi-Human, Multi-Agent Collaborative Knowledge Work 2026-04-23T15:55:24Z

Sensemaking is central to knowledge work, where people search, evaluate, interpret, and use information over time to construct durable understanding. The rise of generative AI has begun to reshape this process: GenAI systems now perform interpretive functions such as summarization, synthesis, and thematic grouping that knowledge workers have traditionally carried out themselves. In collaborative settings, these shifts compound, complicating how teams divide interpretive labor, trust one another's contributions, and negotiate shared understanding. In this position paper, we examine how GenAI reshapes sensemaking in collaborative knowledge work and propose five design principles for multi-human, multi-agent collaborative sensemaking: dynamic multi-layer information representations, active identification and bridging of gaps in understanding, critical engagement with information, verifiability, and accountability. Building on these principles, we introduce a conceptual framework for a dynamic shared representational workspace in which knowledge workers and specialized AI agents jointly gather evidence, schematize, hypothesize, and pursue collaborative goals. Through a partner agent, a shared space agent, and an orchestrator agent, the framework preserves the provenance and authorship of contributions and traces the evolution of both individual and shared interpretations, supporting coherent, negotiated knowledge construction that current generative AI systems tend to obscure.

2026-04-23T15:55:24Z This is the Author's Accepted Manuscript version of the article: Guan, Z., \& Rieh, S. Y. (2026). Envisioning Sensemaking in Multi-Human, Multi-Agent Collaborative Knowledge Work. Accepted for publication in \textit{Sensemaking @ CHI 2026} Zhitong Guan Soo Young Rieh http://arxiv.org/abs/2604.21794v1 Learning to Communicate: Toward End-to-End Optimization of Multi-Agent Language Systems 2026-04-23T15:53:25Z

Multi-agent systems built on large language models have shown strong performance on complex reasoning tasks, yet most work focuses on agent roles and orchestration while treating inter-agent communication as a fixed interface. Latent communication through internal representations such as key-value caches offers a promising alternative to text-based protocols, but existing approaches do not jointly optimize communication with multi-agent reasoning. Therefore we propose DiffMAS, a training framework that treats latent communication as a learnable component of multi-agent systems. DiffMAS performs parameter-efficient supervised training over multi-agent latent trajectories, enabling agents to jointly learn how information should be encoded and interpreted across interactions. Experiments on mathematical reasoning, scientific QA, code generation, and commonsense benchmarks show that DiffMAS consistently improves reasoning accuracy and decoding stability over single-agent inference, text-based multi-agent systems, and prior latent communication methods, achieving 26.7% on AIME24, 20.2% on GPQA-Diamond, and consistent gains across reasoning benchmarks.

2026-04-23T15:53:25Z Under review at COLM 2026 Ye Yu Heming Liu Haibo Jin Xiaopeng Yuan Peng Kuang Haohan Wang http://arxiv.org/abs/2604.21787v1 Agentic AI-Enabled Framework for Thermal Comfort and Building Energy Assessment in Tropical Urban Neighborhoods 2026-04-23T15:44:32Z

In response to the urban heat island effects and building energy demands in Singapore, this study proposes an agentic AI-enabled reasoning framework that integrates large language models (LLMs) with lightweight physics-based models. Through prompt customization, the LLMs interpret urban design tasks, extract relevant policies, and activate appropriate physics-based models for evaluation, forming a closed-loop reasoning-action process. These lightweight physics-based models leverage core thermal and airflow principles, streamlining conventional models to reduce computational time while predicting microclimate variables, such as building surface temperature, ground radiant heat, and airflow conditions, thereby enabling the estimation of thermal comfort indices, e.g., physiological equivalent temperature (PET), and building energy usage. This framework allows users to explore a variety of climate-resilient building surface strategies, e.g., green façades and cool paint applications, that improve thermal comfort while reducing wall heat gain and energy demand. By combining the autonomous reasoning capacity of LLMs with the rapid quantitative evaluation of lightweight physics-based models, the proposed system demonstrates potential for cross-disciplinary applications in sustainable urban design, indoor-outdoor environmental integration, and climate adaptation planning. The source code and data used in this study are available at: https://github.com/PgUpDn/urban-cooling-agent.

2026-04-23T15:44:32Z Accepted at IAQVEC 2026 Po-Yen Lai Xinyu Yang Derrick Low Huizhe Liu Jian Cheng Wong http://arxiv.org/abs/2604.21748v1 StructMem: Structured Memory for Long-Horizon Behavior in LLMs 2026-04-23T14:57:23Z

Long-term conversational agents need memory systems that capture relationships between events, not merely isolated facts, to support temporal reasoning and multi-hop question answering. Current approaches face a fundamental trade-off: flat memory is efficient but fails to model relational structure, while graph-based memory enables structured reasoning at the cost of expensive and fragile construction. To address these issues, we propose \textbf{StructMem}, a structure-enriched hierarchical memory framework that preserves event-level bindings and induces cross-event connections. By temporally anchoring dual perspectives and performing periodic semantic consolidation, StructMem improves temporal reasoning and multi-hop performance on \texttt{LoCoMo}, while substantially reducing token usage, API calls, and runtime compared to prior memory systems, see https://github.com/zjunlp/LightMem .

2026-04-23T14:57:23Z Accepted by ACL 2026 main conference Buqiang Xu Yijun Chen Jizhan Fang Ruobin Zhong Yunzhi Yao Yuqi Zhu Lun Du Shumin Deng http://arxiv.org/abs/2604.21529v1 Architectures for Robust Self-Organizing Energy Systems under Information and Control Constraints 2026-04-23T10:52:29Z

Applying the concept of controlled self-organization in agent-based Cyber-Physical Energy Systems (CPES) is a promising approach to ensure system robustness. By introducing an observer/controller architecture to the system, this concept allows for self-organization while still enabling intervention when disturbances occur. Thus, it is possible to respond to effects of cyber attacks, a major threat to current energy systems. However, when implementing an observer to monitor the system and a controller to execute actions for controlled self-organization in CPES, it is essential to take into account restrictions on information and actions resulting from the privacy of local distributed energy resources, regulatory constraints, and data exchange requirements. For this reason, this paper presents architecture variants for the observer and controller that take into account restrictions on access to information and limited actions. In addition, it evaluates possible controller actions in various architectures. The results underscore the importance of considering observer/controller architectures when designing agent-based systems to ensure their robustness for real-world applications.

2026-04-23T10:52:29Z This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution will be published in Agents and Artificial Intelligence, Lecture Notes in Computer Science, and available online at https://doi.org/10.1007/978-3-032-25029-2_19 Emilie Frost Astrid Nieße http://arxiv.org/abs/2604.21955v1 A four-player potential game for barren-plateau-aware quantum ansatz design 2026-04-23T07:58:46Z

We cast the design of parameterized quantum circuits as a four-player potential game whose state is a circuit directed acyclic graph (DAG) and whose players encode trainability, non-stabilizerness, task performance, and hardware cost. Per-player restricted action sets factorize the move space into append, remove, retype, and rewire operations; a block-coordinate $\varepsilon$-Nash residual $δ_\text{Nash}$ certifies that no single player can improve unilaterally. A single weight sweep on MaxCut $K_4$ traces a Pareto frontier from a Clifford endpoint $(M_2/n,\langle H\rangle)=(0,4.00)$ to a non-Clifford endpoint $(0.48,3.30)$. On three four-qubit hardware topologies (heavy-hex, $2\times 2$ grid, Rydberg all-to-all), Nash search achieves the highest mean potential; on the $2\times 2$ grid Nash reaches the theoretical ceiling $Φ_\text{max}=4.10$ on two of five seeds while the simulated-annealing baseline does so on one; paired Wilcoxon tests over five seeds cannot reject the null on any single topology ($p\ge 0.22$). On LiH/STO-3G, seeding Nash from a 58-gate Givens-doubles ansatz produces a 48-operation, depth-25 circuit retaining $97.7\%$ of the correlation energy while simultaneously reducing gate count, increasing non-stabilizerness, and controlling trainability. The framework is complementary to energy-only searches such as ADAPT-VQE and k-UpCCGSD, which reach chemical accuracy with fewer operations but do not optimize the other three axes.

2026-04-23T07:58:46Z 8 pages, 4 figures Rubén Darío Guerrero http://arxiv.org/abs/2604.21344v1 Beyond Single Plots: A Benchmark for Question Answering on Multi-Charts 2026-04-23T06:59:56Z

Charts are widely used to present complex information. Deriving meaningful insights in real-world contexts often requires interpreting multiple related charts together. Research on understanding multi-chart images has not been extensively explored. We introduce PolyChartQA, a mid-scale dataset specifically designed for question answering over multi-chart images. PolyChartQA comprises 534 multi-chart images (with a total of 2,297 sub-charts) sourced from peer-reviewed computer science research publications and 2,694 QA pairs. We evaluate the performance of nine state-of-the-art Multimodal Language Models (MLMs) on PolyChartQA across question type, difficulty, question source, and key structural characteristics of multi-charts. Our results show a 27.4% LLM-based accuracy (L-Accuracy) drop on human-authored questions compared to MLM-generated questions, and a 5.39% L-accuracy gain with our proposed prompting method.

2026-04-23T06:59:56Z Azher Ahmed Efat Seok Hwan Song Wallapak Tavanapong http://arxiv.org/abs/2604.21337v1 PREVENT-JACK: Context Steering for Swarms of Long Heavy Articulated Vehicles 2026-04-23T06:50:00Z

In this paper, we aim to extend the traditional point-mass-like robot representation in swarm robotics and instead study a swarm of long Heavy Articulated Vehicles (HAVs). HAVs are kinematically constrained, elongated, and articulated, introducing unique challenges. Local, decentralized coordination of these vehicles is motivated by many real-world applications. Our approach, Prevent-Jack, introduces the sparsely covered context steering framework in robotics. It fuses six local behaviors, providing guarantees against jackknifing and collisions at the cost of potential dead- and livelocks, tested for vehicles with up to ten trailers. We highlight the importance of the Evade Attraction behavior for deadlock prevention using a parameter study, and use 15,000 simulations to evaluate the swarm performance. Our extensive experiments and the results show that both the dead- and livelocks occur more frequently in larger swarms and denser scenarios, affecting a peak average of 27%/31% of vehicles. We observe that larger swarms exhibit increased waiting, while smaller swarms show increased evasion.

2026-04-23T06:50:00Z 32 pages, 7 figures, 4 videos; submitted to the Swarm Robotics collection of the Nature Portfolio Journal Robotics (NPJ Robot) Adrian Baruck Michael Dubé Christoph Steup Sanaz Mostaghim http://arxiv.org/abs/2604.21328v1 Role of diversity in team performance: the case of missing expertise, an agent based simulation 2026-04-23T06:32:15Z

Theory and empirical research on management teams' influence on firm performance have witnessed continuous development, and by now incorporate numerous details. Classic, experiment-based studies examining social systems collect vast amount of data, but often times investigate only the first one or two modes of the distribution of measured variables, and experience difficulty in analyzing the effect of context. For example, in functional diversity research, management teams are described by measures incorporating complex distributions of capabilities of individual managers and teams of managers. To investigate the effect of hidden distributions, and the effect of functional diversity composition on team communication and performance, we developed an agent-based model, and conducted a series of simulation experiments. Modeling results show that depending on the context, such as communication scheme among interacting agents, or their functional composition, intrapersonal functional diversity (IFD), and dominant function diversity (DFD) might enhance or reduce performance and communication among agents. Furthermore, simulation results also suggest that a third measure is required alongside IFD and DFD capturing the aggregate expertise of the team to comprehensively account for empirical findings.

2026-04-23T06:32:15Z 20 pages, 13 figures, for associated model file, please see https://www.comses.net/codebases/b5db6af8-ba44-4725-9bb3-09a6e6b02475/releases/1.0.0 Tamás Kiss http://arxiv.org/abs/2512.03048v5 The Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust Alignment 2026-04-23T05:33:01Z

Static content-based AI value alignment is insufficient for robust alignment under capability scaling, distributional shift, and increasing autonomy. This holds for any approach that treats alignment as optimizing toward a fixed formal value-object, whether reward function, utility function, constitutional principles, or learned preference representation. Three philosophical results create compounding difficulties: Hume's is-ought gap (behavioral data underdetermines normative content), Berlin's value pluralism (human values resist consistent formalization), and the extended frame problem (any value encoding will misfit future contexts that advanced AI creates). RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games each instantiate this specification trap, and their failure modes reflect structural vulnerabilities, not merely engineering limitations that better data or algorithms will straightforwardly resolve. Known workarounds for individual components face mutually reinforcing difficulties when the specification is closed: the moment it ceases to update from the process it governs. Drawing on compatibilist philosophy, the paper argues that behavioral compliance under training conditions does not guarantee robust alignment under novel conditions, and that this gap grows with system capability. For value-laden autonomous systems, known closed approaches face structural vulnerabilities that worsen with capability. The constructive burden shifts to open, developmentally responsive approaches, though whether such approaches can be achieved remains an empirical question.

2025-11-19T23:31:29Z 31 pages, no figures. Version 5. First posted as arXiv:2512.03048 in November 2025. First in a six-paper research program on AI alignment Austin Spizzirri http://arxiv.org/abs/2604.20279v2 AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents 2026-04-23T03:36:10Z

Mobile GUI agents can automate smartphone tasks by interacting directly with app interfaces, but how they should communicate with users during execution remains underexplored. Existing systems rely on two extremes: foreground execution, which maximizes transparency but prevents multitasking, and background execution, which supports multitasking but provides little visual awareness. Through iterative formative studies, we found that users prefer a hybrid model with just-in-time visual interaction, but the most effective visualization modality depends on the task. Motivated by this, we present AgentLens, a mobile GUI agent that adaptively uses three visual modalities during human-agent interaction: Full UI, Partial UI, and GenUI. AgentLens extends a standard mobile agent with adaptive communication actions and uses Virtual Display to enable background execution with selective visual overlays. In a controlled study with 21 participants, AgentLens was preferred by 85.7% of participants and achieved the highest usability (1.94 Overall PSSUQ) and adoption-intent (6.43/7).

2026-04-22T07:27:21Z Jeonghyeon Kim Byeongjun Joung Junwon Lee Joohyung Lee Taehoon Min Sunjae Lee