https://arxiv.org/api/TPSTE94TVk/AZV1oSfO5xT9145Y2026-06-25T23:59:55Z1275097515http://arxiv.org/abs/2604.27900v1Can We Volunteer Out of the Peer Review Crisis?2026-04-30T14:10:33ZThe volume of scientific manuscripts is growing faster than the capacity to evaluate them, yet the institutions that govern peer review have remained largely unchanged. The result is a widening mismatch: reviewer scarcity, noisier assessments, and declining confidence in editorial decisions. Every scientist wants better reviews, but review quality depends on the total burden, which no single author can shift. To isolate this tension, we provide a game-theoretic thought experiment: a voluntary lottery in which authors accept a chance of random pre-review rejection, reducing reviewer burden and improving the quality of surviving evaluations. We show that a Nash equilibrium emerges in which authors voluntarily enter the lottery. Scientists who care about the literature they read, not just the papers they publish, will opt in, raising the quality of published science for all.2026-04-30T14:10:33ZMain text: 13 pages, 4 figures. Supplementary Information: 18 pagesTheo TangToby HandfieldJulian Garciahttp://arxiv.org/abs/2604.27820v1ObjectGraph: From Document Injection to Knowledge Traversal -- A Native File Format for the Agentic Era2026-04-30T13:03:54ZEvery document format in existence was designed for a human reader moving linearly through text. Autonomous LLM agents do not read - they retrieve. This fundamental mismatch forces agents to inject entire documents into their context window, wasting tokens on irrelevant content, compounding state across multi-turn loops, and broadcasting information indiscriminately across agent roles. We argue this is not a prompt engineering problem, not a retrieval problem, and not a compression problem: it is a format problem.
We introduce OBJECTGRAPH (.og), a file format that reconceives the document as a typed, directed knowledge graph to be traversed rather than a string to be injected. OBJECTGRAPH is a strict superset of Markdown - every .md file is a valid .og file - requires no infrastructure beyond a two-primitive query protocol, and is readable by both humans and agents without tooling.
We formalize the Document Consumption Problem, characterise six structural properties no existing format satisfies simultaneously, and prove OBJECTGRAPH satisfies all six. We further introduce the Progressive Disclosure Model, the Role-Scoped Access Protocol, and Executable Assertion Nodes as native format primitives. Empirical evaluation across five document classes and eight agent task types demonstrates up to 95.3 percent token reduction with no statistically significant degradation in task accuracy (p > 0.05). Transpiler fidelity reaches 98.7 percent content preservation on a held-out document benchmark.2026-04-30T13:03:54Z12 pages, 4 figures, 4 tablesMohit DubeyOpen Gigantichttp://arxiv.org/abs/2606.20573v1AONA: A Comprehensive Architecture and Workflow Design for Global Agentic Collaboration2026-04-30T12:58:10ZThe rapid advancement of Large Language Models (LLMs) has established autonomous agents as the core vehicles for artificial intelligence applications. However, existing Internet infrastructures, primarily relying on TCP/IP and DNS, are designed for human-centric, host-to-host data transmission, inherently lacking the semantic awareness, dynamic capability discovery, and decentralized trust mechanisms required for autonomous agent interactions. To address these limitations and break the closed ecosystems of single vendors, this paper proposes AONA (Agentic Overlay Network Architecture), a novel overlay network architecture for the Internet of Agents (IoA). We first provide a multi-disciplinary scientific defense for multi-agent collaboration, demonstrating its theoretical necessity over single super-intelligence through the lenses of organizational economics, scaling principles, and the Price of Anarchy. AONA is then structured as a four-layer logical blueprint comprising the Base, Interconnection, Collaboration, and Application layers, which facilitates cross-protocol and cross-platform interoperability without disrupting the underlying physical network. To physically instantiate this blueprint, we design a distributed node infrastructure anchored by Management Root Nodes, Registry Service Nodes, Discovery Service Nodes, and Enterprise Intelligent Service Hubs for private domain integration. Finally, we detail the dynamic operational workflows-including zero-trust identity issuance, globally coordinated semantic taxonomy synchronization, intent-driven semantic discovery, and trusted metering for commercial settlement-that drive the network. This comprehensive architecture provides a robust, scalable, and secure foundation for the future of global agentic collaboration.2026-04-30T12:58:10Z28 pages, 8 figuresJinliang XuRunkai ZhuBingqi LiFanjie NieJin LiJiagui Xiehttp://arxiv.org/abs/2604.27753v1Autonomous Traffic Signal Optimization Using Digital Twin and Agentic AI for Real-Time Decision-Making2026-04-30T11:43:06ZThis article outlines a new framework of traffic light optimization through a digital twin of the transport infrastructure, managed by agentic AI to ensure real-time autonomous decisions. The framework relies on physical sensors and edge computing to measure real-time traffic information and simulate traffic flow in a constantly updated digital twin. The traffic light is automatically controlled through the digital twin according to traffic congestion, travel delay and traffic patterns. This approach is implemented as a three-layer system: perception, conceptualization and action. The perception layer receives data on physical systems; the conceptualization layer uses LangChain to process the data; and the action layer links to the Model Context Protocol (MCP) and traffic management APIs to implement optimised traffic signal control algorithms. The results show that the framework minimizes waiting time at traffic lights and positively affects the effectiveness of the entire traffic flow, which is better than the fixed-time and reinforcement learning-based baselines.2026-04-30T11:43:06ZThis paper is submitted to MECON2026 conferenceSalman JanToqeer Ali SyedShahid KamalQamar WaliAli Akarmahttp://arxiv.org/abs/2605.00071v1Compliance-Aware Agentic Payments on Stablecoin Rails2026-04-30T11:15:04ZAgentic payment systems extend delegated action to financial transfers, but scaling them on stablecoin rails in regulated settings requires safeguards that remain effective when humans are not continuously in the loop. We present a compliance-aware architecture that combines x402-style, signature-based payment authorisation and relayed execution with programmable compliance embedded as an on-chain guardrail via a policy wrapper and policy manager coordinating modular checks. By enforcing compliance at the point of execution, rather than as a separate off-chain workflow, the approach preserves low-friction settlement when conditions are satisfied, records transaction-linked on-chain attestations, and supports structured resolution when requirements are pending.2026-04-30T11:15:04ZDemo Paper TrackKenneth SeeXue Wen Tanhttp://arxiv.org/abs/2602.10140v2Can Large Language Models Implement Agent-Based Models? An ODD-based Replication Study2026-04-30T09:54:00ZLarge language models (LLMs) can now synthesize non-trivial executable code from textual descriptions, raising an important question: can LLMs reliably implement agent-based models from standardized specifications in a way that supports replication, verification, and validation? We address this question by evaluating 17 contemporary LLMs on a controlled ODD-to-code translation task, using the PPHPC predator-prey model as a fully specified reference. Generated Python implementations are assessed through staged executability checks, model-independent statistical comparison against a validated NetLogo baseline, and quantitative measures of runtime efficiency and maintainability. Results show that behaviorally faithful implementations are achievable but not guaranteed, and that executability alone is insufficient for scientific use. GPT-4.1 consistently produces statistically valid and efficient implementations, with Claude 3.7 Sonnet performing well but less reliably. Overall, the findings clarify both the promise and current limitations of LLMs as model engineering tools, with implications for reproducible agent-based and ecological modeling.2026-02-08T19:56:20ZThe peer-reviewed version of this paper is published in Ecological Modelling at https://doi.org/10.1016/j.ecolmodel.2026.111624. This version is typeset by the author and differs only in pagination and typographical detailEcological Modelling, 517, 111624, 2026Nuno FachadaDaniel FernandesCarlos M. FernandesJoão P. Matos-Carvalho10.1016/j.ecolmodel.2026.111624http://arxiv.org/abs/2604.27616v1RoadMapper: A Multi-Agent System for Roadmap Generation of Solving Complex Research Problems2026-04-30T09:08:14ZPeople commonly leverage structured content to accelerate knowledge acquisition and research problem solving. Among these, roadmaps guide researchers through hierarchical subtasks to solve complex research problems step by step. Despite progress in structured content generation, the roadmap generation task has remained unexplored. To bridge this gap, we introduce RoadMap, a novel benchmark designed to evaluate the ability of large language models (LLMs) to construct high-quality roadmaps for solving complex research problems. Based on this, we identify three limitations of LLMs: (1) lack of professional knowledge, (2) unreasonable task decomposition, and (3) disordered logical relationships. To address these challenges, we propose RoadMapper, an LLM-based multi-agent system that decomposes the research roadmap generation task into three key stages (i.e., initial generation, knowledge augmentation, and iterative "critique-revise-evaluate"). Extensive experiments demonstrate that RoadMapper can improve LLMs' ability for roadmap generation, while enhancing average performance by more than 8% and saving 84% of the time required by human experts, highlighting its effectiveness and application potential.2026-04-30T09:08:14ZAccepted to Findings of ACL 2026Jiacheng LiuZichen TangZhongjun YangXinyi HuXueyuan LinLinwei JiaRuofei BaiRongjin LiShiyao PengHaocheng GaoHaihong Ehttp://arxiv.org/abs/2504.12612v2Chronology of Multi-Agent Interactions for Provenance of Evolving Information2026-04-30T09:00:48ZProvenance is the chronological history of things, resonating with the fundamental pursuit to uncover origins, trace connections, and situate entities within the flow of space and time. As artificial intelligence advances towards autonomous agents capable of interactive collaboration on complex tasks, the provenance of generated content becomes entangled in the interplay of collective creation, where contributions are continuously revised, extended or overwritten. In a multi-agent generative chain, content undergoes successive transformations, often leaving little, if any, trace of prior contributions. In this study, we investigate the problem of tracking multi-agent provenance across the temporal dimension of generation. We propose a chronological system for post hoc attribution of generative history from content alone, without reliance on internal memory states or external meta-information. At its core lies the notion of symbolic chronicles, representing signed and time-stamped records, in a form analogous to the chain of custody in forensic science. The system operates through a feedback loop, whereby each generative timestep updates the chronicle of prior interactions and synchronises it with the synthetic content in the very act of generation. This research seeks to develop an accountable form of collaborative artificial intelligence within evolving cyber ecosystems.2025-04-17T03:23:17ZRoyal Society Open Science (2026)Ching-Chun ChangIsao Echizen10.1098/rsos.251988http://arxiv.org/abs/2602.07408v2Progressive Multi-Agent Reasoning for Biological Perturbation Prediction2026-04-30T04:55:11ZPredicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, they are often overwhelmed by the entangled nature of high-dimensional perturbation results. Moreover, recent works have primarily focused on genetic perturbations in single-cell experiments, leaving bulk-cell chemical perturbations, which is central to drug discovery, largely unexplored. Motivated by this, we present LINCSQA, a novel benchmark for predicting target gene regulation under complex chemical perturbations in bulk-cell environments. We further propose PBio-Agent, a multi-agent framework that integrates difficulty-aware task sequencing with iterative knowledge refinement. Our key insight is that genes affected by the same perturbation share causal structure, allowing confidently predicted genes to contextualize more challenging cases. The framework employs specialized agents enriched with biological knowledge graphs, while a synthesis agent integrates outputs and specialized judges ensure logical coherence. PBio-Agent outperforms existing baselines on both LINCSQA and PerturbQA, enabling even smaller models to predict and explain complex biological processes without additional training.2026-02-07T06:59:44Z17 pages, 4 figures, 9 tablesHyomin KimSang-Yeon HwangJaechang LimYinhua PiaoYunhak OhWoo Youn KimChanyoung ParkSungsoo AhnJunhyeok Jeonhttp://arxiv.org/abs/2604.27378v1Continuous-time q-learning for mean-field control with common noise, part-II: q-learning algorithms2026-04-30T03:41:00ZThis paper is a continuation work of Ren et al. (2026) aiming to further devise q-learning algorithms for mean-field control (MFC) with controlled common noise. Based on the relaxed control formulation, we first establish the martingale condition of the value function and the Iq-function by evaluating along the conditional state distributions generated by all test policies. As the data in the relaxed control formulation are not observable in practice, we quantify the error incurred when they are replaced by the observable ones in the exploratory formulation under discretely sampled actions. This, together with a two-layer fixed point characterization of an optimal policy in Ren et al. (2026), allows us to propose several algorithms including the Actor-Critic q-learning algorithm, in which the policy is updated in the Actor-step based on the iteration rule induced by the improved Iq-function, and the value function and Iq-function are updated in the Critic-step based on the martingale orthogonality condition using the data from the exploratory formulation. We also establish the convergence of the inner iterations in the Actor-step in an infinite-horizon linear quadratic (LQ) framework. In two examples, within and beyond LQ framework, our q-learning algorithms are implemented with satisfactory performance.2026-04-30T03:41:00ZKeywords: Mean-field control, common noise, martingale characterization, optimal q-learning algorithm, Actor-Critic q-learning algorithmZhenjie RenXiaoli WeiXiang YuXun Yu Zhouhttp://arxiv.org/abs/2604.19606v2AblateCell: A Reproduce-then-Ablate Agent for Virtual Cell Repositories2026-04-30T03:40:02ZSystematic ablations are essential to attribute performance gains in AI Virtual Cells, yet they are rarely performed because biological repositories are under-standardized and tightly coupled to domain-specific data and formats. While recent coding agents can translate ideas into implementations, they typically stop at producing code and lack a verifier that can reproduce strong baselines and rigorously test which components truly matter. We introduce AblateCell, a reproduce-then-ablate agent for virtual cell repositories that closes this verification gap. AblateCell first reproduces reported baselines end-to-end by auto-configuring environments, resolving dependency and data issues, and rerunning official evaluations while emitting verifiable artifacts. It then conducts closed-loop ablation by generating a graph of isolated repository mutations and adaptively selecting experiments under a reward that trades off performance impact and execution cost. Evaluated on three single-cell perturbation prediction repositories (CPA, GEARS, BioLORD), AblateCell achieves 88.9% (+29.9% to human expert) end-to-end workflow success and 93.3% (+53.3% to heuristic) accuracy in recovering ground-truth critical components. These results enable scalable, repository-grounded verification and attribution directly on biological codebases.2026-04-21T15:55:33Z25 pages, 5 figuresXue XiaChengkai YaoMingyu TsoiXinjie MaoWenxuan HuangJiaqi WeiHao WuCheng TanLang YuYuejin YangMengdi LiuSiqi SunZhangyang Gaohttp://arxiv.org/abs/2604.27372v1Continuous-time q-learning for mean-field control with common noise, part-I: Theoretical foundations2026-04-30T03:37:55ZThis paper investigates the continuous-time counterpart of the Q-function for entropy-regularized mean-field control (MFC) with controlled common noise, coined as q-function by Jia and Zhou (2023) in the single agent's model. We first show that, under discretely sampled actions, the value function in the exploratory formulation converges to the one in the relaxed control formulation as the time grid refines. Leveraging the relaxed control formulation, we derive the exploratory Hamilton-Jacobi-Bellman (HJB) equation, in which the controlled common noise gives rise to an additional nonlinear functional of policy, rendering the policy iteration intricate. Under certain concavity condition, we establish the existence and uniqueness of the optimal one-step policy iteration via a first-order condition using the partial linear functional derivative with respect to policy. The policy improvement at each iteration is verified by relating to an entropy-regularized optimization problem over the space of policies. In the mean-field setting, we introduce the integrated q-function (Iq-function) defined on the state distribution and the policy, and it is shown that an optimal policy is identified as a two-layer fixed point to the argmax operator of the Iq-function. Finally, we provide the explicit characterization of an optimal policy as a Gaussian distribution in the general linear-quadratic (LQ) setting.2026-04-30T03:37:55ZKeywords: Continuous-time reinforcement learning, mean-field control, common noise, policy improvement, integrated q-function, two-layer fixed pointZhenjie RenXiaoli WeiXiang YuXun Yu Zhouhttp://arxiv.org/abs/2505.24265v4R3DM: Enabling Role Discovery and Diversity Through Dynamics Models in Multi-agent Reinforcement Learning2026-04-30T01:20:58ZMulti-agent reinforcement learning (MARL) has achieved significant progress in large-scale traffic control, autonomous vehicles, and robotics. Drawing inspiration from biological systems where roles naturally emerge to enable coordination, role-based MARL methods have been proposed to enhance cooperation learning for complex tasks. However, existing methods exclusively derive roles from an agent's past experience during training, neglecting their influence on its future trajectories. This paper introduces a key insight: an agent's role should shape its future behavior to enable effective coordination. Hence, we propose Role Discovery and Diversity through Dynamics Models (R3DM), a novel role-based MARL framework that learns emergent roles by maximizing the mutual information between agents' roles, observed trajectories, and expected future behaviors. R3DM optimizes the proposed objective through contrastive learning on past trajectories to first derive intermediate roles that shape intrinsic rewards to promote diversity in future behaviors across different roles through a learned dynamics model. Benchmarking on SMAC and SMACv2 environments demonstrates that R3DM outperforms state-of-the-art MARL approaches, improving multi-agent coordination to increase win rates by up to 20%. The code is available at https://github.com/UTAustin-SwarmLab/R3DM.2025-05-30T06:40:19Z21 pages, To appear in the International Conference of Machine Learning (ICML 2025)Harsh GoelMohammad OmamaBehdad ChalakiVaishnav TadiparthiEhsan Moradi PariSandeep Chinchalihttp://arxiv.org/abs/2604.27233v1Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents2026-04-29T22:09:47ZTool-calling agents are evaluated on tool selection, parameter accuracy, and scope recognition, yet LLM trajectory assessments remain inherently post-hoc. Disconnected from the active execution loop, such assessments identify errors that are usually addressed through prompt-tuning or retraining, and fundamentally cannot course-correct the agent in real time. To close this gap, we move evaluation into the execution loop at inference time: a specialized reviewer agent evaluates provisional tool calls prior to execution, shifting the paradigm from post-hoc recovery to proactive evaluation and error mitigation.
In practice, this architecture establishes a clear separation of concerns between the primary execution agent and a secondary review agent. As with any multi-agent system, the reviewer can introduce new errors while correcting others, yet no prior work to our knowledge has systematically measured this tradeoff. To quantify this tradeoff, we introduce Helpfulness-Harmfulness metrics: helpfulness measures the percentage of base agent errors that feedback corrects; harmfulness measures the percentage of correct responses that feedback degrades. These metrics directly inform reviewer design by revealing whether a given model or prompt provides net positive value.
We evaluate our approach on BFCL (single-turn) and Tau2-Bench (multi-turn stateful scenarios), achieving +5.5% on irrelevance detection and +7.1% on multi-turn tasks. Our metrics reveal that reviewer model choice is critical: the reasoning model o3-mini achieves a 3:1 benefit-to-risk ratio versus 2.1:1 for GPT-4o. Automated prompt optimization via GEPA provides an additional +1.5-2.8%. Together, these results demonstrate a core advantage of separating execution and review: the reviewer can be systematically improved through model selection and prompt optimization, without retraining the base agent.2026-04-29T22:09:47ZAnh TaJunjie ZhuShahin Shayandehhttp://arxiv.org/abs/2604.27228v1When Roles Fail: Epistemic Constraints on Advocate Role Fidelity in LLM-Based Political Statement Analysis2026-04-29T21:50:28ZDemocratic discourse analysis systems increasingly rely on multi-agent LLM pipelines in which distinct evaluator models are assigned adversarial roles to generate structured, multi-perspective assessments of political statements. A core assumption is that models will reliably maintain their assigned roles. This paper provides the first systematic empirical test of that assumption using the TRUST pipeline. We develop an epistemic stance classifier that identifies advocate roles from reasoning text without relying on surface vocabulary, and measure role fidelity across 60 political statements (30 English, 30 German) using four metrics: Role Drift Index (RDI), Expected Drift Distance (EDD), Directional Drift Index (DDI), and Entropy-based Role Stability (ERS). We identify two failure modes - the Epistemic Floor Effect (fact-check results create an absolute lower bound below which the legitimizing role cannot be maintained) and Role-Prior Conflict (training-time knowledge overrides role instructions for factually unambiguous statements) - as manifestations of a single mechanism: Epistemic Role Override (ERO). Model choice significantly affects role fidelity: Mistral Large outperforms Claude Sonnet by 28pp (67% vs. 39%) and exhibits a qualitatively different failure mode - role abandonment without polarity reversal - compared to Claude's active switch to the opposing stance. Role fidelity is language-robust. Fact-check provider choice is not universally neutral: Perplexity significantly reduces Claude's role fidelity on German statements (Delta = -15pp, p = 0.007) while leaving Mistral unaffected. These findings have direct implications for multi-agent LLM validation: a system validated without role fidelity measurement may systematically misrepresent the epistemic diversity it was designed to provide.2026-04-29T21:50:28Z22 pagesJuergen Dietrich