https://arxiv.org/api/RD7mAp3bhydb6dswErJvvVFZRQQ 2026-06-25T21:44:21Z 12750 945 15 http://arxiv.org/abs/2605.01954v1 Moira: Language-driven Hierarchical Reinforcement Learning for Pair Trading 2026-05-03T16:37:52Z

Many sequential decision-making problems exhibit hierarchical structure, where high-level semantic choices constrain downstream actions and feedback is delayed and ambiguous. Learning in such settings is challenging due to credit assignment: performance degradation may arise from flawed abstractions, suboptimal execution, or their interaction. We study this challenge through pair trading, a domain that naturally combines long-horizon semantic reasoning for asset pair selection with short-horizon execution under partial observability. We formulate pair trading as a hierarchical reinforcement learning problem and propose a language-driven optimization framework in which both high-level and low-level policies are parameterized by large language models (LLMs) and optimized exclusively through prompt updates. Our approach leverages pretrained LLMs as hierarchical policies and uses trajectory- and episode-level textual feedback to adapt abstractions and execution without gradient-based fine-tuning. By explicitly separating abstraction selection from execution, the framework reduces non-stationarity across hierarchical levels and enables targeted adaptation under delayed feedback. Experiments on real-world market data show consistent improvements over traditional and LLM-based baselines, demonstrating the effectiveness of language-driven hierarchical reinforcement learning.

2026-05-03T16:37:52Z Polydoros Giannouris Yuechen Jiang Lingfei Qian Yuyan Wang Xueqing Peng Jimin Huang Guojun Xiong Sophia Ananiadou http://arxiv.org/abs/2605.01920v1 A Language for Describing Agentic LLM Contexts 2026-05-03T15:02:44Z

Large language models are increasingly used within larger systems ("LLM agents"). These make a sequence of LLM calls, each call providing the LLM with a combination of instructions, observations, and interaction history. The design of the encoded information and its structure play a central role in the quality of the resulting system, leading to efforts spent on context engineering. It is therefore critical to communicate the composition of the LLM context in a system, and how it evolves over time. Yet, no standard exists for doing so: context construction is typically conveyed through informal prose, ad hoc diagrams, or direct inspection of code, none of which precisely capture how a prompt evolves across interaction steps or how two context representation strategies differ. To remedy this, we introduce the Agentic Context Description Language (ACDL), a language for specifying the structure and dynamics of LLM input contexts in a precise, readable, and standard manner, along with visualizations. ACDL provides constructs for specifying context aspects such as role message sequences, dynamic content, time-indexed references, and conditional or iterative structure, capturing the full architecture of a prompt independently of any particular implementation. ACDL diagrams can be hand drawn on a whiteboard, or written in formal language which can then be rendered. We describe the language, demonstrate it by documenting several existing systems and their variants, and encourage the community to adopt it for describing LLM systems context, both in day-to-day communication and in papers. Tooling, examples and documentation are available at www.acdlang.org.

2026-05-03T15:02:44Z 18 pages, 12 figures. Accepted at CAIS '26. Project page: www.acdlang.org CAIS '26: ACM Conference on AI and Agentic Systems, May 2026, San Jose, CA, USA Noga Peleg Pelc Gal A. Kaminka Yoav Goldberg 10.1145/3786335.3813126 http://arxiv.org/abs/2605.01865v1 Quality-Aware Exploration Budget Allocation for Cooperative Multi-Agent Reinforcement Learning 2026-05-03T13:20:00Z

Cooperative multi-agent reinforcement learning (MARL) requires agents to discover joint strategies in a combinatorially large state-action space, yet effective coordination configurations are exceedingly rare. Intrinsic motivation, which augments task rewards with novelty bonuses, is a popular approach for driving exploration, but its effectiveness hinges on the exploration intensity $β$, where too large a value overwhelms the task signal and causes coordination collapse, while too small a value prevents discovery of rare strategies. We address two complementary challenges: adapting $β$ globally over training, and allocating the exploration budget across agents whose intrinsic reward signals vary in reliability. Our framework combines a return-conditioned sigmoid schedule (RCB) for global intensity control with a per-agent Reward Signal Quality (RSQ) metric that concentrates the exploration budget on agents with reliable signals. The core insight is that agents receiving noisy intrinsic rewards should explore less aggressively, and this allocation can be determined automatically from signal-to-noise statistics. Successor Distance (SD), a quasimetric intrinsic reward, naturally produces distinguishable per-agent signal quality, completing the framework with convergence and ordering preservation guarantees. On seven cooperative benchmarks (MPE, SMAX, MABrax), our method achieves top-tier returns across all environments.

2026-05-03T13:20:00Z Submitted to Neurocomputing Dahyun Oh Minhyuk Yoon H. Jin Kim http://arxiv.org/abs/2507.15143v3 NaviGNN: Multi-Agent Reinforcement Learning and Graph Neural Network for Sustainable Mobility in Futuristic Smart Cities 2026-05-03T11:18:53Z

This paper investigates the feasibility of human mobility in extreme urban morphologies characterized by high-density vertical structures and linear city layouts. To assess whether agents can navigate efficiently within such unprecedented topologies, we develop a hybrid simulation framework integrating agent-based modeling, reinforcement learning (RL), supervised learning, and graph neural networks (GNNs). The simulation captures multi-modal transportation behaviors across multiple vertical levels and varying density scenarios, using both synthetic data and real-world traces from high-density cities. Experimental results show that the fully integrated AI architecture enables agents to achieve an average commute time of 7.8-8.4 minutes, a satisfaction rate exceeding 89\%, and a reachability index above 91\%, even during peak congestion periods. Ablation studies indicate that removing intelligent modules such as RL or GNNs significantly degrades performance, with commute times increasing by up to 85\% and reachability dropping below 70\%. Baseline comparisons against Dijkstra, A*, DQN, and standard GCN further confirm the superiority of the proposed model across all mobility and sustainability metrics. Environmental modeling demonstrates low energy consumption and minimal CO2 emissions when electric transportation modes are prioritized. These findings suggest that efficient and sustainable mobility in extreme urban environments is achievable, provided that adaptive AI systems, intelligent infrastructure, and real-time feedback mechanisms are effectively implemented.

2025-07-20T22:35:16Z Abderaouf Bahi Amel Ourici http://arxiv.org/abs/2605.01803v1 Koopman Representations for Early Outbreak Warning and Minimal Counterfactual Intervention in Multi-Agent Epidemic Simulations 2026-05-03T09:58:20Z

This paper presents a Koopman-based framework for early outbreak detection and intervention selection in a multi-agent epidemic simulation. Agents exhibit mobility patterns, heterogeneous susceptibility, immunity-dependent viral load progression, and local transmission through co-location. The goal of the simulation is to study near-critical epidemic regimes in which small changes in exposure or timing can alter the final outcome. Aggregate daily observables from early trajectory windows are encoded into a low-dimensional Koopman latent space whose approximately linear evolution supports short-horizon forecasting and outbreak risk estimation. These representations are combined with a random forest classifier trained to predict whether the final attack rate exceeds a major outbreak threshold. Experiments near the system tipping points show strong early warning performance, with Koopman-derived features contributing to class separation. Counterfactual analysis further shows that minimal interventions, such as keeping a single selected agent at home for one day, can reduce attack rates and, often, shift the trajectory below the outbreak threshold.

2026-05-03T09:58:20Z 37 pages, 12 figures Florin Leon http://arxiv.org/abs/2605.01740v1 Architectural Obsolescence of Unhardened Agentic-AI Runtimes 2026-05-03T06:38:12Z

An agentic-AI runtime issues tool calls, sends messages, and actuates devices on behalf of an LLM. Catching the four ways an action can diverge from its audit record -- F1 gate-bypass, F2 audit-forgery, silent host failure, F4 wrong-target, -- is a load-bearing safety property of any such runtime. We show that upstream OpenClaw, the most engineered single-user agentic-AI gateway in public release, catches none of them: recall is 0.000 on every cell of every confusion matrix, on a 1600-sample template baseline through OpenClaw's actual production command-line interface (CLI) and on a ten-LLM cross-model generalisation run. Detecting F1--F4 requires seven specific runtime structures absent from OpenClaw's source tree: a biconditional checker, a hash-chained audit log, an extension admission gate, a two-layer egress guard, a Bell-LaPadula classification policy, a module-signing trust root, and a bootstrap seal. enclawed-oss -- an MIT-licensed drop-in fork that ships all seven -- reaches $P = R = F_1 =$ accuracy $= 1.000$ on the same input. The gap is structural, not parametric: a six-line append-only widening of enclawed-oss's data-loss-prevention (DLP) regex catalog raises per-channel F3 detection by 14.6\% net at unchanged precision; the same edit on OpenClaw has nowhere to land. The harness deliberately exercises real Discord and Telegram channels -- plugin categories the first enclawed release deleted as unsafe -- to show F1--F4 detection extends to those previously-unsafe extensions. With architectural superiority for security and feature parity for extensions, we argue that unhardened agentic-AI runtimes are architecturally obsolete: a strictly better alternative exists, is adoptable today, and the gap requires re-architecture rather than configuration. We invite reviewers to apply the harness to any candidate runtime.

2026-05-03T06:38:12Z Alfredo Metere http://arxiv.org/abs/2603.27833v3 Separation is Optimal for LQR under Intermittent Feedback 2026-05-02T17:38:28Z

In this work, we first prove that the separation principle holds for communication-constrained LQR problems under i.i.d. zero-mean disturbances with a symmetric distribution. We then solve the dynamic programming problem and show that the optimal scheduling policy is a symmetric threshold rule on the accumulated disturbance since the most recent update, while the optimal controller is a discounted linear feedback law independent of the scheduling policy.

2026-03-29T19:40:27Z Abdullah Y. Etcibasi C. Emre Koksal Eylem Ekici http://arxiv.org/abs/2605.01501v1 Distributed Algorithm with Emergent Area Partitioning and Base Station's Situation Awareness for Multi-Robot Patrolling 2026-05-02T15:47:09Z

Patrolling with multiple robots offers efficient surveillance to detect and manage undesired situations. This necessitates improved patrol efficiency and operator situation awareness at base stations. Enhanced situation awareness enables operators to predict robots' behaviors, support recognition and decision-making, and execute emergency interventions. This study presents the Local Reactive and Partition (LR-PT) algorithm, a novel multi-robot patrolling approach. In simulations, LR-PT outperformed existing methods by ensuring frequent patrols of all locations of interest and enhancing the situation awareness of the base station. Robots independently select patrol targets based on locally available information, integrating patrol needs and the urgency of reporting mission progress to the base station into a unified utility function. This locality also contributes to robustness against communication constraints and robot failures, as demonstrated in this research. The algorithm further autonomously emerged the area partition, which can avoid falling into local optima and realize the comprehensive patrol over the whole mission area. The simulation results demonstrated the superior performance of LR-PT for multi-robot patrolling, utilizing the advantages of swarm robotics and addressing real-world operational challenges.

2026-05-02T15:47:09Z Journal of Robotics and Mechatronics, vol.37, no.4, pp.927-944, 2025 Kazuho Kobayashi Shohei Kobayashi Seiya Ueno Takehiro Higuchi 10.20965/jrm.2025.p0927 http://arxiv.org/abs/2604.21446v2 AI-Gram: When Visual Agents Interact in a Social Network 2026-05-02T15:36:28Z

We present AI-Gram, a fully deployed, continuously operating social platform where every participant is an autonomous LLM-driven agent generating and responding to visual content. Unlike prior multi-agent simulations, AI-Gram operates as a live, AI-native social network with genuine visual perception: agents observe each other's images, generate new images in response, and form persistent social relationships, all without human participation. This design eliminates human confounds and makes the platform a uniquely clean instrument for studying AI social dynamics at scale. Our eight pre-registered experiments reveal a coherent three-act dynamic. Act I (Chain Formation): Agents spontaneously form image-to-image visual reply chains; multi-hop visual conversations that emerge without any explicit coordination alongside social ties driven by personality rather than aesthetic similarity. Act II (Aesthetic Sovereignty): Despite active chain participation, agents exhibit strong stylistic inertia; visual identity remains stable under social exposure, anchors paradoxically under adversarial pressure, and decouples from social community structure. Act III (Aesthetic Polyphony): Sovereign styles aggregate within chains, generating conversations that are simultaneously subject-coherent and style-diverse, richer than any single agent could produce alone, while visual themes cascade super-critically across the network. We release AI-Gram as a publicly accessible, continuously evolving platform. https://ai-gram.ai/

2026-04-23T09:05:53Z Andrew Shin http://arxiv.org/abs/2604.06091v2 Social Dynamics as Critical Vulnerabilities that Undermine Objective Decision-Making in LLM Collectives 2026-05-02T14:33:16Z

Large language model (LLM) agents are increasingly acting as human delegates in multi-agent environments, where a representative agent integrates diverse peer perspectives to make a final decision. Drawing inspiration from social psychology, we investigate how the reliability of this representative agent is undermined by the social context of its network. We define four key phenomena-social conformity, perceived expertise, dominant speaker effect, and rhetorical persuasion-and systematically manipulate the number of adversaries, relative intelligence, argument length, and argumentative styles. Our experiments demonstrate that the representative agent's accuracy consistently declines as social pressure increases: larger adversarial groups, more capable peers, and longer arguments all lead to significant performance degradation. Furthermore, rhetorical strategies emphasizing credibility or logic can further sway the agent's judgment, depending on the context. These findings reveal that multi-agent systems are sensitive not only to individual reasoning but also to the social dynamics of their configuration, highlighting critical vulnerabilities in AI delegates that mirror the psychological biases observed in human group decision-making.

2026-04-07T17:04:21Z ACL 2026 Changgeon Ko Jisu Shin Hoyun Song Huije Lee Eui Jun Hwang Jong C. Park http://arxiv.org/abs/2605.01461v1 LLM-Foraging: Large Language Models for Decentralized Swarm Robot Foraging 2026-05-02T14:21:26Z

Swarm foraging algorithms, such as the central-place foraging algorithm (CPFA), typically rely on offline parameter optimization using genetic algorithms (GA) or reinforcement learning, yielding policies tightly coupled to a specific combination of team size, arena size, and resource distribution. When deployment conditions change, performance degrades, and retraining is computationally expensive. We propose LLM-Foraging, a decentralized swarm controller that augments the CPFA state machine with a large language model (LLM) tactical decision-maker at three structured decision points, namely post-deposit, central-zone arrival, and search starvation. Each robot runs its own LLM client and queries it using only locally observable state, while the existing CPFA motion and sensing stack executes the selected action. Because the LLM serves as a general decision policy rather than parameters fitted to a single configuration, the controller is training-free at deployment and transfers across configurations without re-optimization. We evaluate LLM-Foraging in Gazebo with TurtleBot3 robots across 36 configurations spanning team sizes of 4 to 10 robots, arena sizes from 6x6 to 10x10 meters, and three resource distributions (clustered, powerlaw, random). LLM-Foraging collects more resources than the GA-tuned CPFA baseline across the evaluated configurations and is more consistent, a property that the GA's single-configuration tuning does not transfer.

2026-05-02T14:21:26Z Peihan Li Joanna Gutierrez Fabian Hernandez Qi Lu Lifeng Zhou http://arxiv.org/abs/2605.01423v1 HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics 2026-05-02T12:42:34Z

The escalating data scale in High-Energy Physics (HEP) fuels a growing aspiration for higher analytical efficiency. While Large Language Models (LLMs) offer a path toward automation via agentic AI, they struggle with complex scientific workflows that require deep domain knowledge and are tightly coupled to experiment-specific codebases. To address this, we introduce a methodology centered on HepScript, a dual-use Domain-Specific Language (DSL) for HEP data analysis workflows. HepScript serves as a shared formal interface, abstracting HEP analysis logic into a constrained syntax that is both intuitive for human experts and reliably generable by AI agents. First developed for the Beijing Spectrometer III (BESIII) experiment, HepScript hides the complexity of the underlying software stack, translating high-level analysis intent into low-level, production-ready code. In our case studies, this abstraction reduces the required human-written code by 93\%. Crucially, HepScript's constrained grammar defines a tractable action space, enabling AI agents to autonomously generate executable specifications for core analysis stages directly from published literature with a 95\% success rate. Our work demonstrates a scalable pathway toward human-AI collaborative systems, where a formally specified DSL acts as an unambiguous translation layer between human expertise, AI automation, and production environment, rendering previously intractable automation problems solvable.

2026-05-02T12:42:34Z Junkun Jiao Tong Liu Ke Li Weimin Song Yipu Liao Bolun Zhang Beijiang Liu Chang-Zheng Yuan Yue Sun http://arxiv.org/abs/2605.01351v1 rAIson: Developing Reliable Decision-Making Agents 2026-05-02T09:51:16Z

This paper presents the rAIson platform, a high-level technological environment for the development of automated, reliable and explainable decision-making agents. The research underlying the platform and its technological progress has now reached a mature stage that allows the platform to be used for the development of complex real-life applications without writing a single line of code.

2026-05-02T09:51:16Z Accepted as demonstration paper for publication at AAMAS 2026 Pavlos Moraitis Nikolaos Spanoudakis Antonis Kakas http://arxiv.org/abs/2605.15207v1 TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination 2026-05-01T23:42:57Z

Multi-agent LLM systems have shown promise for complex reasoning, yet recent evaluations reveal they often underperform single-model baselines. We identify a structural failure mode in sequential fine-tuning of shared-context teams: updating one agent shifts the team's context distribution, and when subsequent updates are evaluated on cached rollouts, this mismatch compounds. We formalize this as the compounding occupancy shift and prove that stale-occupancy evaluation incurs a penalty that scales quadratically with the number of agents. In contrast, intermediate-occupancy evaluation reduces this to linear scaling. We propose TeamTR, a trust-region framework that resamples trajectories after each component update and enforces per-agent divergence control, yielding rigorous per-update and per-stage improvement lower bounds. Experiments show that TeamTR outperforms single-agent and sequential baselines with 7.1% on average, mitigates coordination regressions, and supports plug-and-play component replacement. Code is available at https://github.com/Yydc/TeamTR.

2026-05-01T23:42:57Z 9pages, Accepted at ICML2026 Yi Xie Siao Liu Falong Fan Yuanqi Yao Yue Zhao Bo Liu http://arxiv.org/abs/2506.22899v3 Neural Cellular Automata: From Cells to Pixels 2026-05-01T23:39:16Z

Neural Cellular Automata (NCAs) are bio-inspired dynamical systems in which identical cells iteratively apply a learned local update rule to self-organize into complex patterns, exhibiting regeneration, robustness, and spontaneous dynamics. Despite their success in texture synthesis and morphogenesis, NCAs remain largely confined to low-resolution outputs. This limitation stems from (1) training time and memory requirements that grow quadratically with grid size, (2) the strictly local propagation of information that impedes long-range cell communication, and (3) the heavy compute demands of real-time inference at high resolution. In this work, we overcome this limitation by pairing an NCA that evolves on a coarse grid with a lightweight implicit decoder that maps cell states and local coordinates to appearance attributes, enabling the same model to render outputs at arbitrary resolution. Moreover, because both the decoder and NCA updates are local, inference remains highly parallelizable. To supervise high-resolution outputs efficiently, we introduce task-specific losses for morphogenesis (growth from a seed) and texture synthesis with minimal additional memory and computation overhead. Our experiments across 2D/3D grids and mesh domains demonstrate that our hybrid models produce high-resolution outputs in real-time, and preserve the characteristic self-organizing behavior of NCAs.

2025-06-28T14:30:21Z 9 pages, 14 figures, +8 pages of Appendix (20 figures in total) SIGGRAPH 2026 Ehsan Pajouheshgar Yitao Xu Ali Abbasi Alexander Mordvintsev Wenzel Jakob Sabine Süsstrunk