https://arxiv.org/api/lk1y9dE381e//GyG/+a29W9kIw4 2026-06-18T21:12:08Z 12677 450 15 http://arxiv.org/abs/2605.24538v1 Is Decentralized AI Governable? From Regulative Policy to Constitutive Protocol 2026-05-23T12:09:56Z Every major framework for governing artificial intelligence presupposes an identifiable entity -- a developer, deployer, or operator -- who can be held responsible and compelled to comply. Decentralized AI (DeAI) dissolves this presupposition. We analyze DeAI as a six-layer decentralizing stack -- model, training, compute, harness, identity, and ownership -- and show how partial decentralization across layers compounds into what we call the \emph{governance vacuum}: a condition in which AI systems are consequential enough to require governance but lack the properties that existing frameworks presuppose in their targets. This vacuum takes two analytically distinct forms: an \emph{accountability gap}, where no addressable principal can be identified, and an \emph{incapacitation gap}, where even an identified principal cannot alter the running system. We demonstrate that these failures are not merely jurisdictional but defeat every presupposition of governance through normative address -- the communication of rules to a comprehending, responsive agent. Drawing on Lessig's modalities of regulation and Searle's distinction between regulative and constitutive rules, we argue for a shift in the locus of governance from policy to protocol, from normative address to architectural constraint. Protocol-based constitutive governance does not address the agents operating within a system but shapes the substrate that determines what kinds of actions are possible within it. We identify four ethical conditions -- legitimacy, contestability, transparency, and non-domination -- that such governance must satisfy to avoid degenerating into unaccountable technocratic power, and we argue that the central political challenge of governing AI in a decentralized world is reconstructing forms of democratic authorization for architectural choices that persist after the ordinary chain of policy has broken down. 2026-05-23T12:09:56Z Submitted for Ethics and Information Technology Botao Amber Hu Helena Rong http://arxiv.org/abs/2605.24516v1 Adaptive Punishment for Cooperation in Mixed-Motive Games 2026-05-23T11:01:49Z Mixed-motive scenarios are ubiquitous in real-world multi-agent interactions, where self-interested agents often defect for immediate rewards, overlooking the potential of altruistic cooperation to improve long-term gains and collective welfare. Peer punishment can deter defection, but as costly second-order altruism, its persistent imposition may undermine the punisher's interests. Existing approaches often struggle to effectively implement punishment to promote cooperation. To balance the efficacy and cost of punishment, we propose Adaptive Punishment for Cooperation (APC), a distributed method that determines punishment intensity based on both a dynamic punishment probability and the severity of defection. This dynamic probability substantially reduces costly and ineffective punishment while also promotes cooperation. To accurately assess defection and its severity, we use a defection awareness module, whose learning is guided by game reward. Theoretical analysis and empirical results show APC performs effectively in iterated public goods game. Empirically, APC also significantly outperforms existing baselines across sequential social dilemmas, learning rational and effective punishment policies that foster cooperation by strategically deterring defection. 2026-05-23T11:01:49Z Min Tang Fanqi Kong Linyuan Lü Xue Feng http://arxiv.org/abs/2601.16091v2 Delayed Assignments in Online Non-Centroid Clustering with Stochastic Arrivals 2026-05-23T09:55:24Z Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant. 2026-01-22T16:42:05Z To Appear in the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2026 Saar Cohen http://arxiv.org/abs/2605.24436v1 A Reinforcement Learning Inspired Latent Yield Based Adaptive Algorithm Switching Mechanism 2026-05-23T07:12:30Z Selecting the most suitable algorithm for a given problem instance remains a challenging task, particularly in online or dynamic environments where problem characteristics evolve over time. Relying solely on instantaneous performance metrics can result in a reactive and unstable behaviour, often leading to suboptimal algorithm switching. This paper introduces a computationally efficient approach for aggregating an algorithm's performance across multiple problem instances that is fairly immune to erratic variations in instance features. Inspired by features inherent to Reinforcement Learning (RL), this technique encapsulates rewards and penalties into a latent yield that, in turn, triggers exploitation and exploration, consequently resulting in adaptive algorithm switching. The proposed technique employs island models, inspired by Genetic Algorithms, to facilitate parallel exploration and performance exchanges among algorithm populations inhabiting local repertoires. Experimental evaluations on sorting algorithms and robotic obstacle avoidance tasks demonstrate the feasibility and effectiveness of the approach, highlighting its potential in domains where adaptive algorithm selection is critical. 2026-05-23T07:12:30Z Accepted and published in the Proceedings of the 29th European Conference on Applications of Evolutionary Computation (EvoApplications 2026), held as part of EvoStar 2026, Toulouse, France, April 8 to 10, 2026. Lecture Notes in Computer Science (LNCS), Springer Nature Switzerland Applications of Evolutionary Computation, EvoApplications 2026, LNCS, Springer Nature Switzerland, 2026 Jayprakash S. Nair Jimson Mathew Shivashankar B. Nair 10.1007/978-3-032-23604-3_8 http://arxiv.org/abs/2508.12479v2 EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization 2026-05-23T04:58:29Z Min-max optimization arises in many domains such as game theory, adversarial machine learning, etc. For these problems, gradient-based methods are well understood and enjoy strong guarantees. However, in the absence of convexity or concavity, existing approaches study convergence to an approximate saddle point or first-order stationary points, which may be arbitrarily far from global optima. In this work, we present an algorithmic framework for computing the global minimax value in convex--non-concave and non-convex--concave min-max optimization. For convex--non-concave min-max problems, we use a reformulation that transforms the problem into a non-concave--convex max-min optimization problem with suitably defined feasible sets and objective function. This reformulation can be viewed as an extension of Sion's minimax theorem to the convex--non-concave setting. We then introduce EXOTIC -- an Exact, Optimistic, Tree-based algorithm for solving the reformulated max-min problem. EXOTIC combines an iterative convex optimization solver for the inner minimization with an optimistic hierarchical tree search for the outer maximization, inspired by StroquOOL~\cite{bartlett2019simple}. Unlike StroquOOL, which assumes stochastic zero-mean noisy evaluations, EXOTIC handles deterministic, biased, and budget-dependent evaluation errors arising from finite-time solutions of the inner convex subproblems. We establish an upper bound on its optimality gap. The same framework also applies to non-convex--concave min-max optimization. Empirically, EXOTIC outperforms gradient-based methods on popular benchmarks from the literature. Finally, we demonstrate the utility of EXOTIC by computing security strategies in multi-player games with three or more players -- a computationally challenging task that, to our knowledge, no prior method solves exactly. 2025-08-17T19:39:19Z 35 pages, 2 figures, 3 tables Chinmay Maheshwari Chinmay Pimpalkhare Debasish Chatterjee http://arxiv.org/abs/2602.03955v3 AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent 2026-05-23T00:45:07Z While large language model (LLM) multi-agent systems achieve superior reasoning performance through iterative debate, practical deployment is limited by their high computational cost and error propagation. This paper proposes AgentArk, a novel framework to distill multi-agent dynamics into the weights of a single model, effectively transforming explicit test-time interactions into implicit model capabilities. This equips a single agent with the intelligence of multi-agent systems while remaining computationally efficient. Specifically, we investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios: reasoning-enhanced fine-tuning; trajectory-based augmentation; and process-aware distillation. By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents. They further demonstrate enhanced robustness and generalization across diverse reasoning tasks. We hope this work can shed light on future research on efficient and robust multi-agent development. Our code is at https://github.com/AIFrontierLab/AgentArk. 2026-02-03T19:18:28Z Yinyi Luo Yiqiao Jin Weichen Yu Mengqi Zhang Srijan Kumar Xiaoxiao Li Weijie Xu Xin Chen Jindong Wang http://arxiv.org/abs/2605.22256v2 Emergence of agriculture in an artificial society of reinforcement learning agents 2026-05-22T21:10:22Z The origin of agriculture represents a major evolutionary transition and a paradigmatic example of how complex collective behaviors emerge from simple interactions. Here we introduce an artificial society of reinforcement learning agents embedded in a dynamic ecological environment to identify general principles underlying this transition. Within this system, agricultural practices emerge spontaneously - without explicit instruction - through the coupled dynamics of learning and environmental modification. We show that this transition is governed by four key ingredients: individual planning through the valuation of delayed rewards, social vulnerability to cheaters, stabilization via social learning, and an emergent lock-in effect that renders agriculture effectively irreversible once established. In particular, we demonstrate that social learning acts as a "firewall" that suppresses cheater invasion and enables the propagation of successful strategies, leading to sustained population growth and nonlinear amplification of domesticated resources. Together, these results reveal universal mechanisms linking individual decision-making, social interactions, and ecological feedbacks. More broadly, they highlight the potential of artificial societies as experimental platforms to study the emergence of cultural innovations and major evolutionary transitions. 2026-05-21T10:00:29Z Gautier Hamon Martí Sánchez-Fibla Clément Moulin-Frier Ricard Solé http://arxiv.org/abs/2605.24134v1 ProofAgent Harness: Open Infrastructure for Adversarial Evaluation of AI Agents 2026-05-22T18:52:34Z AI agents are entering high-risk production settings, where they use tools, retain context, follow policies, handle private data, and interact with users over multiple turns. Yet many evaluation methods still judge isolated outputs or static tasks, missing failures that emerge through trajectory, pressure, and adversarial interaction. We introduce ProofAgent Harness, open infrastructure for scalable, auditable, and adversarial AI agent evaluation. The harness provides evaluation infrastructure around an agent: it curates evaluation intelligence, runs adversarial multi-turn trials, captures behavioral traces, applies post-hoc multi-juror scoring, resolves disagreement, and produces evidence-linked reports. Its open design allows developers and researchers to extend domains, traps, metrics, juror personas, scoring rules, and reporting formats. At its core is Adversarial Multi-Juror Scoring with Turn-Level Audit, which evaluates completed agent behavior under pressure using calibrated juror personas, consensus checks, and turn-level evidence. Experiments across customer support, medical triage, privacy and security, and code generation agents show that strong agents fail selectively through weak metrics, fragile turns, unsafe reframing, and manipulation paths. We also find that a small quantized local Harness LLM can challenge production agents powered by best-in-class large LLMs, suggesting that evaluation capability emerges from the full harness pipeline rather than model scale alone. ProofAgent Harness turns AI agent evaluation from a static score into scalable adversarial evaluation infrastructure: repeatable, evidence-backed, extensible, and actionable before deployment. 2026-05-22T18:52:34Z 48 pages, 3 figures Fouad Bousetouane http://arxiv.org/abs/2605.23887v1 CHRONOS: Temporally-Aware Multi-Agent Coordination for Evolving Data Marketplaces 2026-05-22T17:47:45Z Temporal knowledge-graph data marketplaces face three coupled failures in static designs: stale hybrid index shortcuts reduce recall as edges evolve, stationary Shapley pricing misattributes value after distribution shifts, and uncoordinated agents over-consume a shared differential-privacy budget. We present CHRONOS, a three-layer architecture providing a unified treatment of these challenges with explicit public and private separation. Layer one applies neural-ODE temporal decay to shortcut edges, providing a per-query expected recall-loss bound of Big-O of Pq lambda delta t, with a monotone-envelope guarantee reducing bound looseness to 1.8 to 3.2 times observed loss. Layer two conditions Shapley valuation on detected changepoints and provides finite-sample error guarantees under noise. Layer three uses EXP3-IX to achieve Big-O of the square root of T log T regret while enforcing epsilon and delta differential privacy via moments accounting. CHRONOS releases a privatized affinity matrix per epoch using the Gaussian mechanism; all retrieval and ranking are post-processing, incurring no extra privacy cost. We provide multi-epoch settlement, scalability analysis for 500 sellers, and comparisons against accelerated baselines. Across four benchmarks, CHRONOS shows 0.937 recall at ten, 2.74 queries per second, 161 ms latency, and total epsilon of 4.25 at delta of 10 to the power of negative 6 under zCDP composition. These results indicate a competitive operating point. A limitation is that at this privacy level, released valuations remain noise-dominated; utility derives primarily from public index routing and adaptive scheduling driven by low-sensitivity statistics. 2026-05-22T17:47:45Z Joydeep Chandra http://arxiv.org/abs/2605.23771v1 PhotoFlow: Agentic 3D Virtual Photography Missions 2026-05-22T15:40:52Z Virtual photography asks an agent to enter a prepared 3D scene with no preselected camera pose or reference image, infer a suitable shot from scene information and a language intent, choose executable camera parameters, and render the final photograph. Recent progress in vision-language models makes this kind of spatial agent increasingly plausible, but the task stresses two capabilities that remain hard to evaluate together: complex 3D spatial understanding and abstract aesthetic judgment. We introduce PhotoFlow, a Director-Reviewer-Reflector agent for closed-loop camera search. The Director builds a soft photographic blueprint and proposes diverse candidate cameras; the Reviewer combines rule checks, visual critique, and pairwise incumbent selection; and the Reflector converts failures into region memory, dead-zone suppression, and high-explore relocation. We also introduce VPhotoBench, a benchmark of 47 open-license Blender scenes and 141 language-conditioned photography missions spanning subject placement, relational composition, and atmosphere/style. On held-out experiments, PhotoFlow achieves the strongest external quality-alignment composite and success rate among one-shot prediction, single-chain reflection, anchor-bank selection, and random search under a six-round rendering budget. To our knowledge, this is the first work to make language-conditioned virtual photography in arbitrary Blender scenes an executable agent task, and our results show that an LLM-centered spatial agent can already produce strong photographs in a setting designed to challenge both 3D reasoning and aesthetic choice. 2026-05-22T15:40:52Z Jiarui Guo Haojia Wei Yiming Zhang Yifei Liu Yuning Gong Hongjie Zhang Xue Yang Zhihang Zhong http://arxiv.org/abs/2605.23743v1 The Communication Complexity of Instant-Runoff Voting 2026-05-22T15:18:25Z The communication complexity of a voting rule is the worst-case number of bits that n voters must transmit to a central authority under the most efficient elicitation protocol in an election with m candidates. We study the communication complexity of Instant-Runoff Voting (IRV). Conitzer and Sandholm [2005] established an upper bound of O(n (log m)${}^2$), but did not provide a matching lower bound beyond $Ω$(n log m). We resolve this open problem by raising the lower bound to $Ω$(n (log m)${}^2$) using the fooling set technique, thereby showing that the communication complexity of IRV is $Θ$(n (log m)${}^2$). We further show that this complexity drops to $Θ$(n log m) under the single-peakedness restriction, and that both the IRV-Average variant and Single Transferable Vote (STV), the multiwinner extension of IRV, have the same asymptotic communication complexity as IRV. 2026-05-22T15:18:25Z Élie de Panafieu LAMSADE, CNRS François Durand LAMSADE, CNRS Jérôme Lang LAMSADE, CNRS http://arxiv.org/abs/2605.23578v1 Safety, Liveness, and Fairness in Quantitative Argumentation Dialogues 2026-05-22T12:46:38Z We introduce notions of safety, liveness, and fairness, as commonly used in temporal reasoning, to quantitative (bipolar) argumentation dialogues where repeated inferences are drawn from argumentation graphs with weighted nodes. Between inferences, these graphs undergo updates. Strong and weak safety capture that arguments' (final) strengths remain above a specific threshold of justification and always reach the threshold eventually, respectively. Liveness requires that arguments' strengths fluctuate across the threshold of justification. Fairness notions assess how safe arguments are spread within a sequence of argumentation graphs. We formally show how these notions are related, and discuss some analytical challenges with respect to providing general guarantees for our properties. 2026-05-22T12:46:38Z Arunavo Ganguly Julian Alfredo Mendez Timotheus Kampik http://arxiv.org/abs/2605.23562v1 ARMS: Automatic Reward Shaping for Sparse-Reward Multi-Agent Reinforcement Learning 2026-05-22T12:29:29Z Sparse rewards are a major bottleneck in multi-agent reinforcement learning (MARL), where simultaneous learning induces non-stationarity and makes reward design especially delicate. Reward shaping can accelerate learning, but in the multi-agent setting it must preserve the strategic structure of the problem rather than merely improve short-term optimization. We propose Automatic Reward-shaping in Multi-agent Systems (ARMS), a self-supervised reward shaping framework for MARL that learns dense shaping signals from sparse environmental rewards through trajectory ranking. Since single-agent trajectory-ranking guarantees do not directly transfer to MARL, we reformulate policy invariance through conditional best-response reasoning, and show that if certain conditions hold, then using shaping rewards preserves each agent's best-response set under fixed opponent policies, and consequently preserve the set of Nash equilibria. Guided by this perspective, ARMS alternates between policy learning and reward learning while sharing shaping parameters across agents for efficiency. Experiments in a partially observable multi-agent pathfinding domain show that ARMS improves sampling efficiency under increasing reward sparsity and agent count, generalizes to unseen environments, and reveals a MARL-specific failure mode in which limited exploration and coupled policy--reward dynamics induce oscillatory behavior. Increasing exploration mitigates this effect and stabilizes learning. To the best of our knowledge, ARMS is the first automatic reward shaping framework for MARL whose design is motivated by a game-theoretic equilibrium-preservation result. 2026-05-22T12:29:29Z Elie Abboud Oren Gal http://arxiv.org/abs/2605.23481v1 Optimal Design Framework for Distributed Array Using Magnetically-Actuated Satellite Swarm 2026-05-22T10:41:12Z Distributed space antennas using electromagnetic formation flight (EMFF) are a promising architecture for large-aperture, long-life space communication systems. Their feasible aperture, however, is governed by coupled constraints on antenna performance, satellite mass, power generation, coil geometry, and formation-keeping power. This paper proposes a system-level design framework for EMFF-based distributed space antennas. It links phased-array requirements with satellite-level sizing constraints and provides a static grid-based reference for designing feasible apertures under a fixed system mass. Unlike our previous bucket-brigade disturbance-compensation model, the formation-maintenance requirement is incorporated through a control index derived from distributed-control simulations. This index is integrated into an antenna-aperture maximization problem with sizing, power, coil, and sidelobe-envelope constraints. Parametric case studies examine margin magnetic moment, prescribed transmit power, and large inter-satellite spacing. Results show that increasing system mass improves footprint reduction or effective isotropic radiated power only while satellite-level design headroom remains. In direct-to-device cases with 0.15 m spacing, generated-power and coil-geometry constraints dominate the feasible aperture. In the 0.60 m large-spacing case, the required coil burden can exceed satellite-level mass, size, and power capacities, making the design infeasible despite favorable communication performance. The proposed framework enables the design and evaluation of feasible static grid-based EMFF distributed antennas under coupled antenna, satellite, and control constraints. 2026-05-22T10:41:12Z Submitted to IEEE Access and currently under review Seang Shim Yuta Takahashi Naoto Usami Shin-ichiro Sakai http://arxiv.org/abs/2510.12787v4 Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics 2026-05-22T10:39:41Z We present Ax-Prover, a multi-agent system for automated theorem proving in Lean that can solve problems across diverse scientific domains and operate either autonomously or collaboratively with human experts. To achieve this, Ax-Prover approaches scientific problem solving through formal proof generation, a process that demands both creative reasoning and strict syntactic rigor. Ax-Prover meets this challenge by equipping Large Language Models (LLMs), which provide knowledge and reasoning, with Lean tools via the Model Context Protocol (MCP), which ensure formal correctness. To evaluate its performance as an autonomous prover, we benchmark our approach against frontier LLMs and specialized prover models on two public math benchmarks and on two Lean benchmarks we introduce in the fields of abstract algebra and quantum theory. On public datasets, Ax-Prover is competitive with state-of-the-art provers, while it largely outperforms them on the new benchmarks. This shows that, unlike specialized systems that struggle to generalize, our tool-based agentic theorem prover approach offers a generalizable methodology for formal verification across diverse scientific domains. Furthermore, we demonstrate Ax-Prover's assistant capabilities in a practical use case, showing how it enabled an expert mathematician to formalize the proof of a complex cryptography theorem. 2025-10-14T17:57:04Z Benjamin Breen Marco Del Tredici Jacob McCarran Javier Aspuru Mijares Weichen Winston Yin Kfir Sulimany Jacob M. Taylor Frank H. L. Koppens Dirk Englund