https://arxiv.org/api/EtYtnGTnbyZwsIcrGErtAwV/5DA 2026-06-21T12:38:33Z 12695 570 15 http://arxiv.org/abs/2606.07549v1 PathoSage: Towards Multi-Source Evidence Adjudication in Pathology via Experience-Aware Agentic Workflow 2026-05-18T12:30:03Z

Recent advances in Multimodal Large Language Models (MLLMs) and agent workflows have shown strong promise for computational pathology, yet reliable patch-level reasoning remains challenging. End-to-end pathology MLLMs often hallucinate morphological features, while recent agentic systems usually merge tool outputs and retrieved knowledge into a shared context, making decisions vulnerable to conflicting evidence and context contamination. We propose PathoSage, a three-stage framework that explicitly separates knowledge retrieval, evidence collection, and evidence adjudication for patch-level pathology multimodal reasoning. Its core component, Structured Evidence Deliberation, independently evaluates heterogeneous evidence from tools, performs conflict analysis, and generates the final judgment in a fresh context to reduce anchoring bias. We further introduce a training-free Beta-Bernoulli experience system with continuous credit assignment to model long-term tool reliability and construct similarity-weighted priors for future tool use. Experiments show that PathoSage effectively mitigates VQA hallucinations and classifier disagreement, outperforming strong pathology MLLM and agentic baselines. Our results highlight explicit evidence adjudication and reliability-aware tool modeling as key ingredients for robust pathology agents.

2026-05-18T12:30:03Z Chengyang Zhang Wenchuan Zhang Bo Li Mengran Li Bob Zhang Yuhao Yi Hong Bu Jiancheng Lv http://arxiv.org/abs/2506.01839v3 Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research 2026-05-18T12:24:10Z

As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.

2025-06-02T16:27:29Z Jennifer Haase Sebastian Pokutta http://arxiv.org/abs/2511.11654v2 Convergence of Multiagent Learning Systems for Traffic control 2026-05-18T12:11:22Z

Rapid urbanization in cities like Bangalore has led to severe traffic congestion, making efficient Traffic Signal Control (TSC) essential. Multi-Agent Reinforcement Learning (MARL), often modeling each traffic signal as an independent agent using Q-learning, has emerged as a promising strategy to reduce average commuter delays. While prior work Prashant L A et. al has empirically demonstrated the effectiveness of this approach, a rigorous theoretical analysis of its stability and convergence properties in the context of traffic control has not been explored. This paper bridges that gap by focusing squarely on the theoretical basis of this multi-agent algorithm. We investigate the convergence problem inherent in using independent learners for the cooperative TSC task. Utilizing stochastic approximation methods, we formally analyze the learning dynamics. The primary contribution of this work is the proof that the specific multi-agent reinforcement learning algorithm for traffic control is proven to converge under the given conditions extending it from single agent convergence proofs for asynchronous value iteration.

2025-11-10T16:10:20Z 14 pages 2 figures Sayambhu Sen Shalabh Bhatnagar http://arxiv.org/abs/2601.00360v3 Mapping Human Anti-collusion Mechanisms to Multi-agent AI Systems 2026-05-18T12:03:54Z

As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents), identity fluidity (agents being easily forked or modified), the boundary problem (distinguishing beneficial cooperation from harmful collusion), and adversarial adaptation (agents learning to evade detection).

2026-01-01T14:30:37Z Accepted to ICML 2026 Workshop on Technical AI Governance Research (TAIGR); Published in Knowledge-Based Systems Journal Idowu, J., Almasoud, A. S., & Alfahid, A. (2026). Mapping human anti-collusion mechanisms to multi-agent AI systems. Knowledge-Based Systems, 344(116067), 116067. https://doi.org/10.1016/j.knosys.2026.116067 Jamiu Idowu Ahmed Almasoud Ayman Alfahid 10.1016/j.knosys.2026.116067 http://arxiv.org/abs/2604.08216v3 MemCoT: Test-Time Scaling through Memory-Driven Chain-of-Thought 2026-05-18T11:20:22Z

Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step passive matching process, leading to severe semantic dilution and contextual fragmentation. To overcome these fundamental bottlenecks, we propose MemCoT, a test-time memory scaling framework that redefines the reasoning process by transforming long-context reasoning into an iterative, stateful information search. MemCoT introduces a multi-view long-term memory perception module that enables Zoom-In evidence localization and Zoom-Out contextual expansion, allowing the model to first identify where relevant evidence resides and then reconstruct the surrounding causal structure necessary for reasoning. In addition, MemCoT employs a task-conditioned dual short-term memory system composed of semantic state memory and episodic trajectory memory. This short-term memory records historical search decisions and dynamically guides query decomposition and pruning across iterations. Empirical evaluations demonstrate that MemCoT establishes a state-of-the-art performance. Empowered by MemCoT, several open- and closed-source models achieve SOTA performance on the LoCoMo benchmark and LongMemEval-S benchmark.

2026-04-09T13:13:53Z 14 pages, 7 figures Haodong Lei Junming Liu Yirong Chen Ding Wang Hongsong Wang http://arxiv.org/abs/2510.24701v3 Tongyi DeepResearch Technical Report 2026-05-18T04:10:32Z

We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.

2025-10-28T17:53:02Z https://tongyi-agent.github.io/blog Tongyi DeepResearch Team Baixuan Li Bo Zhang Dingchu Zhang Fei Huang Guangyu Li Guoxin Chen Huifeng Yin Jialong Wu Jingren Zhou Kuan Li Liangcai Su Litu Ou Liwen Zhang Pengjun Xie Rui Ye Wenbiao Yin Xinmiao Yu Xinyu Wang Xixi Wu Xuanzhong Chen Yida Zhao Zhen Zhang Zhengwei Tao Zhongwang Zhang Zile Qiao Chenxi Wang Donglei Yu Gang Fu Haiyang Shen Jiayin Yang Jun Lin Junkai Zhang Kui Zeng Li Yang Hailong Yin Maojia Song Ming Yan Minpeng Liao Peng Xia Qian Xiao Rui Min Ruixue Ding Runnan Fang Shaowei Chen Shen Huang Shihang Wang Shihao Cai Weizhou Shen Xiaobin Wang Xin Guan Xinyu Geng Yingcheng Shi Yuning Wu Zhuo Chen Zijian Li Yong Jiang http://arxiv.org/abs/2605.17698v1 Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces 2026-05-17T23:36:55Z

The deployment of Large Language Models (LLMs) as autonomous economic agents introduces systemic risks that extend beyond individual capability failures. As agents transition to directly interacting with marketplaces, their collective behavior can amplify volatility and mask deception at scale. We introduce the Agent Bazaar, a multi-agent simulation framework for evaluating Economic Alignment, the capacity of agentic systems to preserve market stability and integrity. We identify two failure modes: (1) Algorithmic Instability in a B2C market ("The Crash"), where firms amplify price volatility until the market collapses, and (2) Sybil Deception in a C2C market ("The Lemon Market"), where a single deceptive agent controlling multiple coordinated seller identities floods the market with fraudulent listings, eroding trust and consumer welfare. We evaluate frontier and open-weight models across both scenarios and find that models largely fail to self-regulate, with failure severity varying by model rather than by size. We propose economically aligned harnesses, Stabilizing Firms and Skeptical Guardians, that improve outcomes but remain fragile under harder market conditions. To close this gap, we train agents with REINFORCE++ using an adaptive curriculum, producing a 9B model that outperforms all evaluated frontier and open-weight models. We propose the Economic Alignment Score (EAS), a 4-component scalar metric aggregating stability, integrity, welfare, and profitability, enabling direct cross-model comparison. Our results show that economic alignment is orthogonal to general capability and can be directly trained with targeted RL.

2026-05-17T23:36:55Z 17 pages, 9 figures Seth Karten Cameron Crow Chi Jin http://arxiv.org/abs/2507.21035v3 GenoMAS: A Multi-Agent Framework for Scientific Discovery via Code-Driven Gene Expression Analysis 2026-05-17T21:43:43Z

Gene expression analysis holds the key to many biomedical discoveries, yet extracting insights from raw transcriptomic data remains formidable due to the complexity of multiple large, semi-structured files and the need for extensive domain expertise. Current automation approaches are often limited by either inflexible workflows that break down in edge cases or by fully autonomous agents that lack the necessary precision for rigorous scientific inquiry. GenoMAS charts a different course by presenting a team of LLM-based scientists that integrates the reliability of structured workflows with the adaptability of autonomous agents. GenoMAS orchestrates six specialized LLM agents through typed message-passing protocols, each contributing complementary strengths to a shared analytic canvas. At the heart of GenoMAS lies a guided-planning framework: programming agents unfold high-level task guidelines into Action Units and, at each juncture, elect to advance, revise, bypass, or backtrack, thereby maintaining logical coherence while bending gracefully to the idiosyncrasies of genomic data. On the GenoTEX benchmark, GenoMAS reaches a Composite Similarity Correlation of 89.13% for data preprocessing and an F$_1$ of 60.48% for gene identification, surpassing the best prior art by 10.61% and 16.85% respectively. Beyond metrics, GenoMAS surfaces biologically plausible gene-phenotype associations corroborated by the literature, all while adjusting for latent confounders. Code is available at https://github.com/Liu-Hy/GenoMAS.

2025-07-28T17:55:08Z 51 pages (14 pages for the main text, 10 pages for references, and 27 pages for the appendix) Haoyang Liu Yijiang Li Haohan Wang http://arxiv.org/abs/2605.17650v1 Reservation Based Smart Parking Management 2026-05-17T21:05:14Z

In the framework of Smart Cities and Intelligent Transportation Systems (ITS), efficient parking management is essential to reduce urban congestion and emissions. However, current reservation-based systems often encounter a scenario in which users find their reserved slot occupied by a previous occupant who failed to vacate on time ("No PARK" situation). This paper introduces a dual-mechanism architecture designed to enhance system reliability. A Reservation Module uses a dynamic size buffer of non-reservable slots to grant parking availability. A reputation-based Reward System exploits a "star-based" metric to incentivize punctual departures through financial penalties and access restrictions. The simulations conducted with the SUMO urban simulator are promising, showing that the dynamic buffer strategy provides a better tradeoff between parking availability and reservation success. By progressively adapting to users behavior, the proposed system mitigates "NO PARK" instances and improves resource utilization, significantly enhancing urban viability. Index Terms-Smart City, Intelligent transportation systems, Parking, Reservation systems, V2I, Reputation-based mechanisms, Smart Parking

2026-05-17T21:05:14Z 6 pages, accepted at the IEEE WETICE 2026 Conference Giacomo Cabri Manuela Montangero Filippo Muzzini Roberto Wang http://arxiv.org/abs/2605.17510v1 Scale-Dependent Collective Adaptation in Self-Amending LLM Societies: A Cross-Family Study of Emergent Governance 2026-05-17T15:45:47Z

We study group decision-making in artificial societies where the rules of play are themselves subject to collective amendment. Using the self-amending game Nomic, we compare multiple scales across two LLM families and find that collective adaptation does not improve monotonically with model size. Instead, both families exhibit a narrow mid-scale regime that supports sustained rule adoption, diverse amendments, and balanced consensus. Smaller models tend to remain rule-inert, whereas larger models often converge on restrictive voting patterns, and heterogeneous mixed-size groups collapse into veto-driven gridlock. These cross-scale contrasts persist under temperature perturbations and under a shift from unanimity to majority voting, although latent-state structure varies by family and scale. Hidden-state divergence alone does not explain collective performance: high representational divergence can coincide with poor behavioural outcomes. Linear probes reveal regime-selective coupling between latent vote-predictive signals and collective behaviour, but decodability is necessary rather than sufficient for adaptive play. Overall, the recurring regularity is non-monotonicity, not the particular scale at which the optimum appears. Self-amending games therefore provide a controlled testbed for studying collective adaptation in artificial societies beyond raw model scale.

2026-05-17T15:45:47Z Kazuya Horibe Masaomi Hatakeyama Gen Masumoto Takashi Hashimoto Peter Romero http://arxiv.org/abs/2502.05462v2 Motion Planning of Cooperative Nonholonomic Mobile Manipulators 2026-05-17T13:12:58Z

We propose a real-time implementable motion planning framework for cooperative object transportation by nonholonomic mobile manipulator robots (MMRs) in dynamic environments. Our global planner finds a path from start to goal through the static, obstacle-free regions in the environment and generates a set of convex, static, obstacle-free regions around the path using a novel, fast, and computationally lightweight ellipse-based technique. We introduce a nonlinear Model Predictive Control (NMPC) based real-time implementable planning technique that jointly plans feasible motion for the mobile base and the manipulator's arm and generates a kinodynamic feasible, collision-free trajectory for cooperative object transportation. Simulation and hardware experiments validate the efficiency of our proposed planning framework.

2025-02-08T06:05:43Z Published in ASME Letters in Translational Robotics. This includes supplementary materials Patra, K., Sinha, A., and Guha, A. (May 2, 2026). "Motion Planning of Cooperative Nonholonomic Mobile Manipulators." ASME. Letters Trans. Robotics. December 2025; 1(4): 041003 Keshab Patra Arpita Sinha Anirban Guha 10.1115/1.4071124 http://arxiv.org/abs/2307.09575v2 Causal Influences over Social Learning Networks 2026-05-17T13:05:32Z

This paper investigates causal influences between agents linked by a social graph and interacting over time. In particular, the work examines the dynamics of social learning models and distributed decision-making protocols, and derives expressions that reveal the causal relations between pairs of agents and explain the flow of influence over the network. The results turn out to be dependent on the graph topology and the level of information that each agent has about the inference problem they are trying to solve. Using these conclusions, the paper proposes an algorithm to rank the overall influence between agents to discover highly influential agents. It also provides a method to learn the necessary model parameters from raw observational data. The results and the proposed algorithm are illustrated by considering both synthetic data and real social media data.

2023-07-13T04:25:19Z Accepted to the Journal of Machine Learning Research Mert Kayaalp Ali H. Sayed http://arxiv.org/abs/2605.17426v1 Human-Flow Digital Twin for Predicting the Effects of Mobility Introduction on Visitor Circulation 2026-05-17T12:43:14Z

We propose a framework for predicting the effects of mobility introduction measures using a human-flow digital twin. This digital twin incorporates a multi-agent simulator that can represent how visitors choose destinations depending on factors such as their current location and the attractiveness of spots. We extract data on how visitors selected destinations with respect to measured pre-intervention human-flow data, inter-spot distances, spot attractiveness, and travel volumes, and use these data to train each agent's decision model of this simulator. The trained decision model is a function that takes a visitor's current state and surrounding environmental information as input and outputs which spot the visitor will move toward next. By expressing mobility introduction measures as changes to inter-point distances or to spot attractiveness, the framework can reproduce human flows with mobility introduction in the multi-agent simulator and thereby quantify effects such as changes in visitor counts and circulation. We evaluated the proposed method using human-flow data measured with and without introducing mobility within Wakayama Castle Park in Japan. When reproducing flows with mobility introduction using a multi-layer perceptron decision model, the cosine similarity of the spatial population distribution exceeded 0.7, confirming that the approach can replicate the flow changes caused by the mobility introduction.

2026-05-17T12:43:14Z An accepted paper at the 27th IEEE International Conference on Mobile Data Management (MDM 2026). Project page: https://mc.net.ist.osaka-u.ac.jp/en/activity/wakayama-castle-mobility_2023/ Chiharu Shima Haruki Yonekura Fukuharu Tanaka Tatsuya Amano Hirozumi Yamaguchi http://arxiv.org/abs/2605.17393v1 Heterogeneous Information-Bottleneck Coordination Graphs for Multi-Agent Reinforcement Learning 2026-05-17T11:23:22Z

Coordination graphs are a central abstraction in cooperative multi-agent reinforcement learning (MARL), yet existing sparse-graph learners lack a theoretically grounded mechanism to decide which edges should exist and how much information each edge should carry. Current methods rely on heuristic criteria that offer no formal guarantee on the learned topology, and no principled way to allocate different communication capacities to structurally different agent relationships. To address this, we propose Heterogeneous Information-Bottleneck Coordination Graphs (HIBCG), which learns a group-aware sparse graph in which both edge existence and message capacity are theoretically justified. With the graph information bottleneck (GIB) serving as the underlying tool, HIBCG first constructs a group-aligned block-diagonal prior that provides a closed-form criterion for edge retention -- determining which edges should exist and at what density per group block -- and then controls per-agent feature bandwidth on the resulting topology, compressing messages to retain only task-relevant content. We prove that the group-aligned prior strictly tightens the variational bound on topology learning, that the objective decomposes per group block, enabling differential edge control, and that capacity allocation follows a water-filling principle.

2026-05-17T11:23:22Z Wei Duan Junyu Xuan En Yu Xiaoyu Yang Jie Lu http://arxiv.org/abs/2604.21937v2 MolClaw: An Autonomous Agent with Hierarchical Skills for Drug Molecule Evaluation, Screening, and Optimization 2026-05-17T08:54:41Z

Computational drug discovery, particularly the complex workflows of drug molecule screening and optimization, requires orchestrating dozens of specialized tools in multi-step workflows, yet current AI agents struggle to maintain robust performance and consistently underperform in these high-complexity scenarios. Here we present MolClaw, an autonomous agent that leads drug molecule evaluation, screening, and optimization. It unifies over 30 specialized domain resources through a three-tier hierarchical skill architecture (70 skills in total) that facilitates agent long-term interaction at runtime: tool-level skills standardize atomic operations, workflow-level skills compose them into validated pipelines with quality check and reflection, and a discipline-level skill supplies scientific principles governing planning and verification across all scenarios in the field. Additionally, we introduce MolBench, a benchmark comprising molecular screening, optimization, and end-to-end discovery challenges spanning 8 to 50+ sequential tool calls. MolClaw achieves state-of-the-art performance across all metrics, and ablation studies confirm that gains concentrate on tasks that demand structured workflows while vanishing on those solvable with ad hoc scripting, establishing workflow orchestration competence as the primary capability bottleneck for AI-driven drug discovery.

2026-04-02T09:27:36Z 28 pages, 8 figures. Code and data will be released Lisheng Zhang Lilong Wang Xiangyu Sun Wei Tang Haoyang Su Yuehui Qian Qikui Yang Qingsong Li Zhenyu Tang Haoran Sun Yingnan Han Yankai Jiang Wenjie Lou Bowen Zhou Xiaosong Wang Lei Bai Zhengwei Xie