https://arxiv.org/api/IPXHFBtZA4JDpFFji/8oqNbnEyY2026-06-27T12:14:49Z12761112515http://arxiv.org/abs/2604.17072v2CogGen: A Cognitively Inspired Recursive Framework for Deep Research Report Generation2026-04-21T02:48:52ZThe autonomous synthesis of deep research reports represents a critical frontier for Large Language Models (LLMs), demanding sophisticated information orchestration and non-linear narrative logic. Current approaches rely on rigid predefined linear workflows, which cause error accumulation, preclude global restructuring from subsequent insights, and ultimately limit in-depth multimodal fusion and report quality. We propose CogGen, a Cognitively inspired recursive framework for deep research report Generation. Leveraging a Hierarchical Recursive Architecture to simulate cognitive writing, CogGen enables flexible planning and global restructuring. To extend this recursivity to multimodal content, we introduce Abstract Visual Representation (AVR): a concise intent-driven language that iteratively refines visual-text layouts without pixel-level regeneration overhead. We further present CLEF, a Cognitive Load Evaluation Framework, and curate a new benchmark from Our World in Data (OWID). Extensive experiments show CogGen achieves state-of-the-art results among open-source systems, generating reports comparable to professional analysts' outputs and surpassing Gemini Deep Research. Our code and dataset are available at https://github.com/NJUNLP/CogGen.2026-04-18T17:21:04Z28 pages, 3 figures, Accepted to ACL 2026 FindingsKuo TianPengfei SunZhen WuJunran DingXinyu Daihttp://arxiv.org/abs/2604.18975v1Gated Coordination for Efficient Multi-Agent Collaboration in Minecraft Game2026-04-21T01:58:02ZIn long-horizon open-world multi-agent systems, existing methods often treat local anomalies as automatic triggers for communication. This default design introduces coordination noise, interrupts local execution, and overuses public interaction in cases that could be resolved locally. To address this issue, we propose a partitioned information architecture for MLLM agents that explicitly separates private execution states from public coordination states. Building on this design, we introduce two key mechanisms. First, we develop an event-triggered working memory based on system-verified outcomes to maintain compact and low-noise local state representations. Second, we propose a cost-sensitive gated escalation mechanism that determines whether cross-region communication should be initiated by jointly considering node criticality, local recovery cost, and downstream task impact. In this way, communication is transformed from a default reaction into a selective decision. Experiments conducted on long-term construction tasks in open environments demonstrate that, compared to baseline models based on strong communication and planned structures, the introduction of gated communication and a partitioned information architecture results in superior performance in terms of blueprint completion quality and execution chain length. It also improves local self-recovery, reduces ineffective escalations, and increases the utility of public communication.2026-04-21T01:58:02ZHuaDong JianChenghao LiHaoyu WangJiajia ShuaiJinyu GuoYang YangChaoning Zhanghttp://arxiv.org/abs/2509.16343v2Visual Reasoning Agent: Robust Vision Systems in Remote Sensing via Inference-Time Scaling2026-04-20T22:59:13ZBuilding robust vision systems for high-stakes domains such as remote sensing requires stronger visual reasoning than what single-pass inference typically provides; yet, retraining large models is often computationally expensive and data intensive. We present Visual Reasoning Agent (VRA), a training-free agentic visual reasoning framework that orchestrates off-the-shelf large vision-language models (LVLMs) with a large reasoning model (LRM) through an iterative Think-Critique-Act loop for cross-model verification, self-critique, and recursive refinement. On the remote sensing benchmark VRSBench VQA dataset, VRA consistently outperforms multiple standalone LVLM baselines and achieves up to 40.67\% improvement on challenging question types spanning both perception and reasoning tasks. In addition, integrating three LVLMs with VRA improves the overall accuracy of the standalone LVLMs from 52.8% to 78.8%, demonstrating the effectiveness of agentic reasoning with increased inference-time compute.2025-09-19T18:34:08ZAccepted to MORS 2026 Artificial Intelligence Workshop ProceedingsChung-En Johnny YuBrian JalaianNathaniel D. Bastianhttp://arxiv.org/abs/2512.21794v4Multi-agent Adaptive Mechanism Design2026-04-20T21:23:04ZWe study a sequential mechanism design problem in which a principal seeks to elicit truthful reports from multiple rational agents while starting with no prior knowledge of agents' beliefs. We introduce Distributionally Robust Adaptive Mechanism (DRAM), a general framework combining insights from both mechanism design and online learning to jointly address truthfulness and cost-optimality. Throughout the sequential game, the mechanism estimates agents' beliefs and iteratively updates a distributionally robust linear program with shrinking ambiguity sets to reduce payments while preserving truthfulness. Our mechanism guarantees truthful reporting with high probability while achieving $\tilde{O}(\sqrt{T})$ cumulative regret, and we establish a matching lower bound showing that no feasible adaptive mechanism can asymptotically do better. The framework generalizes to plug-in estimators, supporting structured priors and delayed feedback. To our knowledge, this is the first adaptive mechanism under general settings that maintains truthfulness and achieves optimal regret when incentive constraints are unknown and must be learned.2025-12-25T21:59:51ZQiushi HanDavid Simchi-LeviRenfei TanZishuo Zhaohttp://arxiv.org/abs/2504.09662v3AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations2026-04-20T19:31:48ZMulti-agent large language model simulations have the potential to model complex human behaviors and interactions. If the mechanics are set up properly, unanticipated and valuable social dynamics can surface. However, it is challenging to consistently enforce simulation mechanics while still allowing for notable and emergent dynamics. We present AgentDynEx, an AI system that helps set up simulations from user-specified mechanics and dynamics. AgentDynEx uses LLMs to guide users through a Configuration Matrix to identify core mechanics and define milestones to track dynamics. It also introduces a method called \textit{nudging}, where the system dynamically reflects on simulation progress and gently intervenes if it begins to deviate from intended outcomes. A technical evaluation found that nudging enables simulations to have more complex mechanics and maintain its notable dynamics compared to simulations without nudging. We discuss the importance of nudging as a technique for balancing mechanics and dynamics of multi-agent simulations.2025-04-13T17:26:35Z40 pages, 9 figuresJenny MaRiya SahniKarthik SreedharLydia B. Chiltonhttp://arxiv.org/abs/2604.18755v1Opinion polarization from compression-based decision making where agents optimize local complexity and global simplicity2026-04-20T19:00:00ZUnderstanding social polarization requires integrating insights from psychology, sociology, and complex systems science. Agent-based modeling provides a natural framework to combine perspectives from different fields and explore how individual cognition shapes collective outcomes. This study introduces a novel agent-based model that integrates two cognitive and social mechanisms: the desire to be unique within a group (optimal distinctiveness theory) and the tendency to simplify complex information (cognitive compression). In the model, virtual agents interact in pairs and decide whether to adopt each other's opinions by balancing two opposing drives: maximizing opinion diversity within their local social group while simplifying the overall opinion landscape, with both evaluated using Shannon entropy. We show that the combination of these mechanisms can reproduce real-world patterns, such as the emergence of distinct heterogeneous opinion clusters. Moreover, unlike many existing models where opinions become fixed once opinion groups form, individuals in our model continue to adjust their opinions after clusters emerge, leading to ongoing variation within and between opinion groups. Computational experiments reveal that polarization emerges when local group sizes are moderate (consistent with Dunbar's number), while smaller groups cause fragmentation and larger ones hinder distinct cluster formation. Higher cognitive compression increases unpredictability, while lower compression produces more consistent group structures. These results demonstrate how simple psychological rules can generate complex, realistic social behavior and advance understanding of polarization in human societies.2026-04-20T19:00:00ZAlina DubovskayaDavid J. P. O'SullivanMichael Quaylehttp://arxiv.org/abs/2604.18500v1QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance2026-04-20T16:52:22ZWe introduce a multi-agent framework intended to emulate parts of a quantitative research team and support equity factor research on large financial panel datasets. QRAFTI integrates a research toolkit for panel data with MCP servers that expose data access, factor construction, and custom coding operations as callable tools. It can help replicate established factors, formulate and test new signals, and generate standardized research reports accompanied by narrative analysis and computational traces. On multi-step empirical tasks, using chained tool calls and reflection-based planning may offer better performance and explainability than dynamic code generation alone.2026-04-20T16:52:22ZTerence LimKumar MuthuramanMichael Suryhttp://arxiv.org/abs/2601.14662v2Query-Efficient Agentic Graph Extraction Attacks on GraphRAG Systems2026-04-20T16:00:16ZGraph-based retrieval-augmented generation (GraphRAG) systems construct knowledge graphs over document collections to support multi-hop reasoning. While prior work shows that GraphRAG responses may leak retrieved subgraphs, the feasibility of query-efficient reconstruction of the hidden graph structure remains unexplored under realistic query budgets. We study a budget-constrained black-box setting where an adversary adaptively queries the system to steal its latent entity-relation graph. We propose AGEA (Agentic Graph Extraction Attack), a framework that leverages a novelty-guided exploration-exploitation strategy, external graph memory modules, and a two-stage graph extraction pipeline combining lightweight discovery with LLM-based filtering. We evaluate AGEA on medical, agriculture, and literary datasets across Microsoft-GraphRAG and LightRAG systems. Under identical query budgets, AGEA significantly outperforms prior attack baselines, recovering up to 90% of entities and relationships while maintaining high precision. These results demonstrate that modern GraphRAG systems are highly vulnerable to structured, agentic extraction attacks, even under strict query limits. The code is available at https://github.com/shuashua0608/AGEA.2026-01-21T05:20:54ZTo be published in ACL Main 2026Shuhua YangJiahao ZhangYilong WangDongwon LeeSuhang Wanghttp://arxiv.org/abs/2604.18364v1Training and Agentic Inference Strategies for LLM-based Manim Animation Generation2026-04-20T14:54:06ZGenerating programmatic animation using libraries such as Manim presents unique challenges for Large Language Models (LLMs), requiring spatial reasoning, temporal sequencing, and familiarity with domain-specific APIs that are underrepresented in general pre-training data. A systematic study of how training and inference strategies interact in this setting is lacking in current research. This study introduces ManimTrainer, a training pipeline that combines Supervised Fine-tuning (SFT) with Reinforcement Learning (RL) based Group Relative Policy Optimisation (GRPO) using a unified reward signal that fuses code and visual assessment signals, and ManimAgent, an inference pipeline featuring Renderer-in-the-loop (RITL) and API documentation-augmented RITL (RITL-DOC) strategies. Using these techniques, this study presents the first unified training and inference study for text-to-code-to-video transformation with Manim. It evaluates 17 open-source sub-30B LLMs across nine combinations of training and inference strategies using ManimBench. Results show that SFT generally improves code quality, while GRPO enhances visual outputs and increases the models' responsiveness to extrinsic signals during self-correction at inference time. The Qwen 3 Coder 30B model with GRPO and RITL-DOC achieved the highest overall performance, with a 94% Render Success Rate (RSR) and 85.7% Visual Similarity (VS) to reference videos, surpassing the baseline GPT-4.1 model by +3 percentage points in VS. Additionally, the analysis shows that the correlation between code and visual metrics strengthens with SFT and GRPO but weakens with inference-time enhancements, highlighting the complementary roles of training and agentic inference strategies in Manim animation generation.2026-04-20T14:54:06ZRavidu Suien Rammuni SilvaAhmad LotfiIsibor Kennedy IhianleGolnaz ShahtahmassebiJordan J. Birdhttp://arxiv.org/abs/2604.05489v4SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation2026-04-20T13:23:27ZText-to-Video (T2V) generation has benefited from recent advances in diffusion models, yet current systems still struggle under complex scenarios, which are generally exacerbated by the ambiguity and underspecification of text prompts. In this work, we formulate complex-scenario prompt refinement as a stage-wise multi-agent refinement process and propose SCMAPR, i.e., a scenario-aware and Self-Correcting Multi-Agent Prompt Refinement framework for T2V prompting. SCMAPR coordinates specialized agents to (i) route each prompt to a taxonomy-grounded scenario for strategy selection, (ii) synthesize scenario-aware rewriting policies and perform policy-conditioned refinement, and (iii) conduct structured semantic verification that triggers conditional revision when violations are detected. To clarify what constitutes complex scenarios in T2V prompting, provide representative examples, and enable rigorous evaluation under such challenging conditions, we further introduce T2V-Complexity, which is a complex-scenario T2V benchmark consisting exclusively of complex-scenario prompts. Extensive experiments on 3 existing benchmarks and our T2V-Complexity benchmark demonstrate that SCMAPR consistently improves text-video alignment and overall generation quality under complex scenarios, achieving up to 2.67% and 3.28 gains in average score on VBench and EvalCrafter, and up to 0.028 improvement on T2V-CompBench over 3 State-Of-The-Art baselines. The codes of SCMAPR are publicly available at https://github.com/HiThink-Research/SCMAPR.2026-04-07T06:33:01ZChengyi YangPengzhen LiJiayin QiAimin ZhouJi WuJi Liuhttp://arxiv.org/abs/2604.18233v1Aether: Network Validation Using Agentic AI and Digital Twin2026-04-20T13:18:58ZNetwork change validation remains a critical yet predominantly manual, time-consuming, and error-prone process in modern network operations. While formal network verification has made substantial progress in proving correctness properties, it is typically applied in offline, pre-deployment settings and faces challenges in accommodating continuous changes and validating live production behavior. Current operational approaches typically involve scattered testing tools, resulting in partial coverage and errors that surface only after deployment. In this paper, we present Aether, a novel approach that integrates Generative Agentic AI with a multi-functional Network Digital Twin to automate and streamline network change validation workflows. It features an agentic architecture with five specialized Network Operations AI agents that collaboratively handle the change validation lifecycle from intent analysis to network verification and testing. Aether agents use a unified Network Digital Twin integrating modeling, simulation, and emulation to maintain a consistent, up-to-date network view for verification and testing. By orchestrating agent collaboration atop this digital twin, Aether enables automated, rapid network change validation while reducing manual effort, minimizing errors, and improving operational agility and cost-effectiveness. We evaluate Aether over synthetic network change scenarios covering main classes of network changes and on past incidents from a major ISP operational network, demonstrating promising results in error detection (100%), diagnostic coverage (92-96%), and speed (6-7 minutes) over traditional methods.2026-04-20T13:18:58Z12 pages, 6 figuresJordan AugeCisco SystemsSam BettsCisco SystemsGiovanna CarofiglioCisco SystemsGiulio GrassiCisco SystemsMartin GysiSwisscomJohn Kenneth d'SouzaSwisscomhttp://arxiv.org/abs/2604.18210v1TacticGen: Grounding Adaptable and Scalable Generation of Football Tactics2026-04-20T12:57:11ZSuccess in association football relies on both individual skill and coordinated tactics. While recent advancements in spatio-temporal data and deep learning have enabled predictive analyses like trajectory forecasting, the development of tactical design remains limited. Bridging this gap is essential, as prediction reveals what is likely to occur, whereas tactic generation determines what should occur to achieve strategic objectives. In this work, we present TacticGen, a generative model for adaptable and scalable tactic generation. TacticGen formulates tactics as sequences of multi-agent movements and interactions conditioned on the game context. It employs a multi-agent diffusion transformer with agent-wise self-attention and context-aware cross-attention to capture cooperative and competitive dynamics among players and the ball. Trained with over 3.3 million events and 100 million tracking frames from top-tier leagues, TacticGen achieves state-of-the-art precision in predicting player trajectories. Building on it, TacticGen enables adaptable tactic generation tailored to diverse inference-time objectives through classifier guidance mechanism, specified via rules, natural language, or neural models. Its modeling performance is also inherently scalable. A case study with football experts confirms that TacticGen generates realistic, strategically valuable tactics, demonstrating its practical utility for tactical planning in professional football. The project page is available at: https://shengxu.net/TacticGen/.2026-04-20T12:57:11Z23 pagesSheng XuGuiliang LiuTarak KharratYudong LuoMohamed AloulouJavier López PeñaKonstantin SofeikovAdam ReidPaul RobertsSteven SpencerJoe CarnallIan McHaleOliver SchulteHongyuan ZhaWei-Shi Zhenghttp://arxiv.org/abs/2604.18123v1ConventionPlay: Capability-Limited Training for Robust Ad-Hoc Collaboration2026-04-20T11:42:58ZAd-hoc collaboration often relies on identifying and adhering to shared conventions. However, when partners can follow multiple conventions, agents must do more than simply adapt; they must actively steer the team toward the most effective joint strategy. We present ConventionPlay, a reinforcement learning-based approach that extends cognitive hierarchies to include a diverse population of adaptive followers. By training against partners with varied capability limits, our agent learns to probe its partner's repertoire, leading the team when possible and following when necessary. Our results in canonical coordination tasks show that ConventionPlay achieves superior coordination efficiency, particularly in settings where conventions have differentiated payoffs.2026-04-20T11:42:58ZAbhishek SriramanEleni VasilakiRobert Loftinhttp://arxiv.org/abs/2601.17017v2Self-Organizing Railway Traffic Management2026-04-20T11:36:08ZImproving traffic management in case of perturbation is one of the main challenges in today's railway research. The great majority of the existing literature proposes approaches to make centralized decisions to minimize delay propagation. In this paper, we propose a new paradigm to the same aim: we design and implement a modular process to allow trains to self-organize. This process consists in having trains identifying their neighbors, formulating traffic management hypotheses, checking their compatibility and selecting the best ones through a consensus mechanism. Finally, these hypotheses are merged into a directly applicable traffic plan. In a thorough experimental analysis on a portion of the Italian network, we compare the results of self-organization with those of a state-of-the-art centralized approach. In particular, we make this comparison mimicking a realistic deployment thanks to a closed-loop framework including a microscopic railway simulator. The results indicate that self-organization achieves better results than the centralized algorithm, specifically thanks to the definition and exploitation of the instance decomposition allowed by the proposed approach.2026-01-15T14:48:13ZThis work has been submitted to the IEEE for possible publicationFederico NaldiniFabio OddiLeo D'AmatoGrégory MarlièreVito TrianniPaola Pellegrinihttp://arxiv.org/abs/2604.18046v1EvoMarket: A High-Fidelity and Scalable Financial Market Simulator2026-04-20T10:10:58ZHigh-fidelity, scalable market simulation is a key instrument for mechanism evaluation, stress testing, and counterfactual policy analysis. Yet existing simulators rarely achieve \emph{mechanism fidelity} beyond single-asset intraday settings, \emph{microstructure fidelity} against historical limit order books (LOB), and \emph{computational tractability} at market scale in a single system. This paper presents \textit{EvoMarket}, a discrete-event, multi-agent financial market simulator designed for intervention-oriented experiments in multi-asset and cross-day environments. EvoMarket couples a high-throughput execution core (optimized LOB data structures, hierarchical scheduling under propagation delays, and asynchronous per-asset matching) with explicit institutional mechanisms (market calendars, opening call auctions, price limits, and T+1 settlement). To avoid expensive black-box calibration, EvoMarket introduces an Oracle-guided in-run self-calibration mechanism that interprets microstructure discrepancy as missing order flow and synthesizes corrective orders at recording checkpoints. Experiments on China A-share order-flow and LOB data show close replay alignment over five trading days, fidelity gains from budgeted in-run calibration across depth levels, broad agent order-space coverage, and scalable performance under increasing input order rates and market breadth. We further demonstrate cross-asset linkage and event-study style intervention evaluation that produces structured dependence and interpretable event-time responses.2026-04-20T10:10:58ZMuyao ZhongZhenhua YangYuxiang LiuKe TangPeng Yang