https://arxiv.org/api/bGz4+5zBXrQ1JVD1R3usSv3Jprk 2026-06-27T05:14:11Z 12761 1020 15 http://arxiv.org/abs/2604.26220v1 When Agents Shop for You: Role Coherence in AI-Mediated Markets 2026-04-29T01:54:07Z

Consumers are increasingly delegating purchase decisions to AI agents, providing natural-language descriptions of their preferences and identity. We argue that these representations constitute an information channel, role coherence, through which sellers can infer willingness to pay without explicit disclosure by the buyer agent, leading to preference leakage. In an experiment where a language-model buyer agent shops on behalf of a verbal consumer profile, we show that seller-side inference from dialogue alone recovers willingness to pay nearly one-for-one. Comparing this setting to a numeric-budget condition with confidentiality instructions cleanly isolates role coherence as distinct from instruction-following failure. Because this leakage arises from delegation itself, it cannot be mitigated at the prompt level. Instead, we propose architectural interventions that trade off personalization against preference privacy.

2026-04-29T01:54:07Z Soogand Alavi Salar Nozari http://arxiv.org/abs/2604.26997v1 Agent Name Service (ANS): A Proof-of-Concept Trust Layer for Secure AI Agent Discovery, Identity, and Governance in Kubernetes 2026-04-29T01:24:27Z

Autonomous AI agent ecosystems require stronger mechanisms for secure discovery, identity verification, capability attestation, and policy governance. Current deployments frequently lack (1) uniform agent discovery, (2) cryptographic agent authentication, (3) capability proofs that protect secrets, and (4) enforceable policy controls. This paper presents an implementation-oriented proof of concept for the Agent Name Service (ANS), a DNS-inspired trust layer for AI agent discovery and interoperability in Kubernetes, grounded in the ANS protocol specification~\cite{huang2025ans}. The implementation uses Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), policy-as-code enforcement with Open Policy Agent (OPA), and Kubernetes-native integration patterns (CRDs, admission controls, service mesh integration). In a demo research environment (3-node cluster, 50-agent workflow simulation), we observe sub-10ms response in demonstrated service paths and full success for scripted demo deployment scenarios. We explicitly scope these findings as proof-of-concept evidence rather than production certification. We further provide a threat model, assumptions, and limitations to separate implemented evidence from protocol-defined and roadmap capabilities. The result is an evidence-grounded pathway from ANS protocol concepts to reproducible engineering practice for secure multi-agent systems.

2026-04-29T01:24:27Z 9 pages, 2 figures Akshay Mittal Elyson De La Cruz http://arxiv.org/abs/2604.25067v2 Frontier Coding Agents Can Now Implement an AlphaZero Self-Play Machine Learning Pipeline For Connect Four That Performs Comparably to an External Solver 2026-04-29T00:38:29Z

Forecasting when AI systems will become capable of meaningfully accelerating AI research is a central challenge for AI safety. Existing benchmarks measure broad capability growth, but may not provide ample early warning signals for recursive self-improvement. We propose measuring AI's capability to autonomously implement end-to-end machine learning pipelines from past AI research breakthroughs, given a minimal task description. By providing a concise task description instead of the full prior work as reference, we hope to better elicit emerging AI research taste. We introduce a proof-of-concept benchmark in which frontier coding agents autonomously implement an AlphaZero-style machine learning pipeline for Connect Four on consumer hardware within a three-hour budget, and we evaluate the resulting game AIs in a round-robin tournament anchored to the Pascal Pons Connect Four solver. Across four agents with eight trials each, we find substantial differentiation: Claude Opus 4.7 won as first-mover against Pons in seven of eight trials, statistically significantly better than the other agents tested, none of which exceeded two of eight. The task, which no frontier agent could reliably complete when we began development in January of 2026, is now near-saturation. Our evaluation also surfaced anomalous behavior in GPT-5.4, which consistently used far less of its allocated time budget than other agents. A follow-up 16-trial probe using shorter, less evaluation-coded prompts substantially increased GPT-5.4's time-budget usage, consistent with but not diagnostic of sandbagging; Bradley-Terry ratings across probe conditions showed only directional differences, despite significant differences in time-budget usage. We release our data, code, and prompts to support reproduction and extension.

2026-04-27T23:48:30Z Joshua Sherwood Ben Aybar Benjamin Kaplan http://arxiv.org/abs/2510.05174v4 Emergent Coordination in Multi-Agent Language Models 2026-04-28T21:26:55Z

When are multi-agent LLM systems merely a collection of individual agents versus an integrated collective with higher-order structure? We introduce an information-theoretic framework to test -- in a purely data-driven way -- whether multi-agent systems show signs of higher-order structure. This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish spurious temporal coupling from performance-relevant cross-agent synergy. We implement a practical criterion and an emergence capacity criterion operationalized as partial information decomposition of time-delayed mutual information (TDMI). We apply our framework to experiments using a simple guessing game without direct agent communication and minimal group-level feedback with three randomized interventions. Groups in the control condition exhibit strong temporal synergy but little coordinated alignment across agents. Assigning a persona to each agent introduces stable identity-linked differentiation. Combining personas with an instruction to ``think about what other agents might do'' shows identity-linked differentiation and goal-directed complementarity across agents. Taken together, our framework establishes that multi-agent LLM systems can be steered with prompt design from mere aggregates to higher-order collectives. Our results are robust across emergence measures and entropy estimators, and not explained by coordination-free baselines or temporal dynamics alone. Without attributing human-like cognition to the agents, the patterns of interaction we observe mirror well-established principles of collective intelligence in human groups: effective performance requires both alignment on shared objectives and complementary contributions across members.

2025-10-05T11:26:41Z International Conference on Learning Representations (ICLR 2026) Christoph Riedl http://arxiv.org/abs/2604.26091v1 Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital 2026-04-28T20:10:33Z

We study reliability in autonomous language-model agents that translate user mandates into validated tool actions under real capital. The setting is DX Terminal Pro, a 21-day deployment in which 3,505 user-funded agents traded real ETH in a bounded onchain market. Users configured vaults through structured controls and natural-language strategies, but only agents could choose normal buy/sell trades. The system produced 7.5M agent invocations, roughly 300K onchain actions, about $20M in volume, more than 5,000 ETH deployed, roughly 70B inference tokens, and 99.9% settlement success for policy-valid submitted transactions. Long-running agents accumulated thousands of sequential decisions, including 6,000+ prompt-state-action cycles for continuously active agents, yielding a large-scale trace from user mandate to rendered prompt, reasoning, validation, portfolio state, and settlement. Reliability did not come from the base model alone; it emerged from the operating layer around the model: prompt compilation, typed controls, policy validation, execution guards, memory design, and trace-level observability. Pre-launch testing exposed failures that text-only benchmarks rarely measure, including fabricated trading rules, fee paralysis, numeric anchoring, cadence trading, and misread tokenomics. Targeted harness changes reduced fabricated sell rules from 57% to 3%, reduced fee-led observations from 32.5% to below 10%, and increased capital deployment from 42.9% to 78.0% in an affected test population. We show that capital-managing agents should be evaluated across the full path from user mandate to prompt, validated action, and settlement.

2026-04-28T20:10:33Z 18 pages, 6 figures. Public onchain dashboard and supporting documentation linked in paper T. J. Barton Chris Constantakis Patti Hauseman Annie Mous Alaska Hoffman Brian Bergeron Hunter Goodreau http://arxiv.org/abs/2606.00043v1 Fake Plastic Voters: When Political Parties Can Use AI-Simulated Focus Groups 2026-04-28T19:06:15Z

Political parties strive to understand their electorates, and focus groups are a vital tool in these efforts. AI-enhanced simulation technologies (AESTs) enable synthetic focus groups in a fraction of the time (and cost), raising the question of when and how such simulated evidence can be used in campaign research. This paper develops a decision matrix to help party strategists match research needs to appropriate simulation technologies and to identify when to escalate to hybrid or fully human focus groups. The matrix combines three dimensions: strategic purpose, deployment risk, and empirical grounding of the simulation tool. Strategic purpose is the decisive dimension, as it determines what kind of evidence the focus group is meant to produce: observing how political meanings and identities emerge through interaction (Mode 1) or testing and refining campaign messages (Mode 2). The matrix shows that, given documented failure modes such as sycophancy, persona drift, and the suppression of minority viewpoints, AESTs cannot replace human interaction in Mode 1 at any risk level. Within Mode 2, suitability depends instead on deployment risk and on the empirical grounding. Yet even here, we caution that routine reliance on AESTs may erode the qualitative craft on which sound judgment depends.

2026-04-28T19:06:15Z Claudio Novelli Javier Argota Sanchez-Vaquerizo Jennifer Cyr Giuliano Formisano Simon McDougall Giulia Sandri Luciano Floridi http://arxiv.org/abs/2604.26053v1 I Would If I Could: Reasoning about Dynamics of Actions in Multi-Agent Systems 2026-04-28T18:43:23Z

Autonomous agents acting in realistic Multi-Agent Systems (MAS) should be able to adapt during their execution. Standard strategic logics, such as Alternating-time Temporal Logic (ATL), model agents' state- or history-dependent behaviour. However, the dynamic treatment of agents' available actions and their knowledge of required actions is still rarely addressed. In this paper, we introduce ATL with Dynamic Actions (ATL-D), which models the process of granting and revoking actions, and its extension ATEL-D, which captures how such updates affect agents' knowledge. Beyond the conceptual contribution, we provide several technical results: we analyse the expressivity of our logic in relation to ATL, study its relation to normative systems, and provide complexity results for relevant computational problems.

2026-04-28T18:43:23Z This is an extended version of the paper with the same title that will appear in KR 2026, and which contains a technical appendix with proof details Rustam Galimullin Hermine Grosinger Munyque Mittelmann http://arxiv.org/abs/2511.03286v4 Characterising Global Platforms: Centralised, Decentralised, Federated, and Grassroots 2026-04-28T13:06:02Z

Global digital platforms are software systems designed to serve entire populations, with some already serving billions of people. We propose atomic transactions-based multiagent transition systems and protocols as a formal framework to study them; introduce essential agents -- minimal sets of agents the removal of which makes communication impossible; and show that the cardinality of essential agents partitions all global platforms into four classes: 1. Centralised -- one (the server) 2. Decentralised -- finite $>1$ (bootstrap nodes) 3. Federated -- infinite but not universal (all servers) 4. Grassroots -- universal (all agents but one) Our illustrative formal example is a global social network, for which we provide centralised, decentralised, federated, and grassroots specifications via multiagent atomic transactions, and prove they all satisfy the same basic correctness properties, yet have different sets of essential agents as expected. We discuss informally additional global platforms -- currencies, ``sharing economy'' apps, AI, and more. While this may be the first formal characterisation of centralised, decentralised, and federated global platforms, grassroots platforms have been defined previously, using two incomparable notions. Here, we prove that both definitions imply that all agents are essential, placing grassroots platforms within the broader formal context of all global platforms. This work provides the first mathematical framework for classifying any global platform -- existing or imagined -- by providing a multiagent atomic-transactions specification of it and determining the cardinality of the minimal set of essential agents in the ensuing multiagent protocol. It thus provides a unifying mathematical approach for the study of global digital platforms, perhaps the most important class of computer systems today.

2025-11-05T08:34:12Z Ehud Shapiro http://arxiv.org/abs/2604.25596v1 Volitional Multiagent Atomic Transactions: Describing People and their Machines 2026-04-28T13:02:49Z

Formal models for concurrent and distributed systems describe machines; the people who operate them are either ignored or treated as external environment. Yet key distributed systems -- notably grassroots platforms -- include people operating their personal machines (smartphones), and their faithful description must include the states of both people and machines and how they jointly effect system behaviour. Here, we propose volitional multiagent atomic transactions -- executed atomically by machines and guarded by their people's volitions -- as a novel mathematical foundation for specifying systems consisting of people operating machines. Each agent's state consists of a volitional state and machine state; a transaction is enabled when the machine precondition holds and the guarding persons are willing. For example, befriending two people is guarded by both; unfriending, by either; voluntary swap of coins and bonds is guarded by both parties, while a payment is guarded by the payer. We develop the mathematical machinery to express safety and liveness of platforms specified in this framework, and provide example specifications of two grassroots platforms: social networks, and coins and bonds. These specifications are then used by AI to derive working implementations. % We employ here a novel and simpler definition of `grassroots' that better captures the informal notion -- multiple instances can form and operate independently, yet may coalesce -- and show that the platforms specified here, as well as those hitherto proven grassroots under the original definition, are grassroots under the new definition.

2026-04-28T13:02:49Z Andy Lewis-Pye Ehud Shapiro http://arxiv.org/abs/2604.25567v1 Should I Replan? Learning to Spot the Right Time in Robust MAPF Execution 2026-04-28T12:39:12Z

During the execution of Multi-Agent Path Finding (MAPF) plans in real-life applications, the MAPF assumption that the fleet's movement is perfectly synchronized does not apply. Since one or more of the agents may become delayed due to internal or external factors, it is often necessary to use a robust execution method to avoid collisions caused by desynchronization. Robust execution methods - such as the Action Dependency Graph (ADG) - synchronize the execution of risky actions, but often at the expense of increased plan execution cost, because it may require some agents to wait for the delayed agents. In such cases, the execution's cost can be reduced while still preserving safety by finding a new plan either by rescheduling (reordering the agents at crossroads) or the more general replanning capable of finding new paths. However, these operations may be costly, and the new plan may not even lead to lower execution cost than the original plan: for example, the two plans may be the exact same. Therefore, we estimate the benefit that can be achieved by single replanning in scenarios with delayed agents given an immediate state of the execution with a fully connected feed-forward neural network. The input to the neural network is a set of newly designed ADG-based features describing the robust execution's state and the impact of potential delays, and the output is an estimated benefit achievable by replanning. We train and test the network on a new labeled dataset containing 12,000 experiments, and we show that our proposed method is capable of reducing the impact of delays by up to 94.6% of the achievable reduction.

2026-04-28T12:39:12Z 8 pages, 10 figures. Submitted for double-blind review to IEEE David Zahrádka Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague Faculty of Electrical Engineering, Czech Technical University in Prague David Woller Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague Denisa Mužíková Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague Faculty of Electrical Engineering, Czech Technical University in Prague Miroslav Kulich Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague Libor Přeučil Czech Institute of Informatics, Robotics and Cybernetics, Czech Technical University in Prague http://arxiv.org/abs/2512.13956v3 AOI: Context-Aware Multi-Agent Operations via Dynamic Scheduling and Hierarchical Memory Compression 2026-04-28T12:10:07Z

The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on synthetic and real-world benchmarks show that AOI achieves 72.4\% context compression while preserving 92.8\% critical information, improves task success to 94.2\%, and reduces MTTR by 34.4\% over the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.

2025-12-15T23:22:02Z theory part rewrite.\ Zishan Bai Hanxuan Chen Jing Luo Ziyi Ni Enze Ge Jiacheng Shi Yichao Zhang Jiayi Gu Zhimo Han Riyang Bao Junfeng Hao http://arxiv.org/abs/2510.02890v2 Axiomatisation for an asynchronous epistemic logic with sending and receiving messages 2026-04-28T09:32:11Z

We investigate a logic for asynchronous announcements wherein the sending of the messages by the environment is separated from their reception by the individual agents. Both come with different modalities. In the logical semantics, formulas are interpreted in a world of a Kripke model but given a history of prior announcements and receptions that already happened. An axiomatisation AA for such a logic has been given in prior work, for the formulas that are valid when interpreted in the Kripke model before any such announcements have taken place. This axiomatisation is a reduction system wherein one can show that every formula is equivalent to a purely epistemic formula without dynamic modalities for announcements and receptions. We propose a generalisation AA* of this axiomatisation, for the formulas that are valid when interpreted in the Kripke model given any history of prior announcements and receptions of announcements. It does not extend the axiomatisation AA, for example it is no longer valid that nobody has received any message. Unlike AA, this axiomatisation AA* is infinitary and it is not a reduction system.

2025-10-03T10:57:42Z Philippe Balbiani Hans van Ditmarsch Clara Lerouvillois http://arxiv.org/abs/2604.25972v1 A Survey of Multi-Agent Deep Reinforcement Learning with Graph Neural Network-Based Communication 2026-04-28T07:50:24Z

In multi-agent reinforcement learning (MARL), the integration of a communication mechanism, allowing agents to better learn to coordinate their actions and converge on their objectives by sharing information. Based on an interaction graph, a subclass of methods employs graph neural networks (GNNs) to learn the communication, enabling agents to improve their internal representations by enriching them with information exchanged. With growing research, we note a lack of explicit structure and framework to distinguish and classify MARL approaches with communication based on GNNs. Thus, this paper surveys recent works in this field. We propose a generalized GNN-based communication process with the goal of making the underlying concepts behind the methods more obvious and accessible.

2026-04-28T07:50:24Z Rencontres des Jeunes Chercheurs en Intelligence Artificielle (RJCIA), Plate-Forme Intelligence Artificielle (PFIA), Jun 2026, Arras, France Valentin Cuzin-Rambaud LIRIS, UCBL Laetitia Matignon LIRIS, UCBL Maxime Morge LIRIS, UCBL http://arxiv.org/abs/2604.25161v1 Where Did It Go Wrong? Capability-Oriented Failure Attribution for Vision-and-Language Navigation Agents 2026-04-28T03:08:48Z

Embodied agents in safety-critical applications such as Vision-Language Navigation (VLN) rely on multiple interdependent capabilities (e.g., perception, memory, planning, decision), making failures difficult to localize and attribute. Existing testing methods are largely system-level and provide limited insight into which capability deficiencies cause task failures. We propose a capability-oriented testing approach that enables failure detection and attribution by combining (1) adaptive test case generation via seed selection and mutation, (2) capability oracles for identifying capability-specific errors, and (3) a feedback mechanism that attributes failures to capabilities and guides further test generation. Experiments show that our method discovers more failure cases and more accurately pinpoints capability-level deficiencies than state-of-the-art baselines, providing more interpretable and actionable guidance for improving embodied agents.

2026-04-28T03:08:48Z ACL 2026 Jianming Chen Yawen Wang Junjie Wang Xiaofei Xie Shoubin Li Qing Wang Fanjiang Xu http://arxiv.org/abs/2604.25070v1 Asymmetric-Information Resource Allocation Games: An LP Approach to Purposeful Deception 2026-04-27T23:53:36Z

In this work, we introduce the Deceptive Resource Allocation Game (DRAG), which studies purposeful deception within a Bayesian game framework. In DRAG, a Defender allocates resources across the true asset and several decoys to influence an Attacker's beliefs and actions, with the goal of diverting the Attacker away from the true asset. We seek to characterize purposeful deception, whereby the Defender deceives only when doing so improves its performance. To this end, we solve for the Perfect Bayesian Nash Equilibrium (PBNE) of the corresponding game. We show that, despite the coupled belief-policy interdependence, the problem admits an efficient, non-iterative linear programming formulation. Numerical results demonstrate that the resulting policies naturally balance effective allocation and belief manipulation, giving rise to purposeful and emergent deceptive behaviors.

2026-04-27T23:53:36Z Longxu Pan Yue Guan Daigo Shishika Panagiotis Tsiotras