https://arxiv.org/api/qhjA3Aljx5zTOzzzlHcIY2EQEdA2026-06-13T23:11:10Z1261918015http://arxiv.org/abs/2606.02872v1Terminal Time and Angle-Constrained Nonlinear Intercept Guidance2026-06-01T20:37:08ZThis paper considers the problem of simultaneously controlling an interceptor's impact time and impact angle using its lateral acceleration as the sole control input. With a single control input, the nonlinear engagement kinematics is inherently underactuated, which complicates guidance law synthesis. To overcome this challenge, a hierarchical sliding mode-based guidance law is developed to concurrently regulate the two terminal constraints. The proposed architecture consists of a two-layer sliding manifold. The first layer comprises two sub-sliding surfaces corresponding to the impact time and impact angle error dynamics, respectively, while the second layer introduces a composite sliding manifold that combines the two individual sub-surfaces. Then, a variable-gain adaptive guidance law is designed to ensure time and angle-constrained interception against a stationary target, which is further extended to intercept a constant velocity target. Simulations are conducted for various engagement scenarios to attest to the efficacy of the proposed approach.2026-06-01T20:37:08ZShivam BajpaiAbhinav Sinhahttp://arxiv.org/abs/2605.30392v2Delayed Repression and Emergent Instability in Adaptive Multi-Agent Systems2026-06-01T20:36:24ZRegulatory institutions (from content moderation platforms to financial supervisors) observe, deliberate, and intervene only after a characteristic delay. We ask whether this processing lag alone can destabilize a multi-agent system that would otherwise remain stable, without exogenous shocks, coordination among agents, or malicious actors. We study this in two stages. First, we analyze a delayed replicator equation in which autonomous agents benefit from radical behavior but face punishment based on a lagged institutional alarm signal. We derive a closed-form critical delay beyond which the unique interior equilibrium loses stability through a Hopf bifurcation, and prove via center manifold reduction that the bifurcation is supercritical (bounded oscillations, not explosive growth) for the entire sigmoid response family. Second, we embed N=240 agents on a network with reinforcement learning (tabular Q-learning) and cross institutional delay with three decision architectures: fixed-policy, reactive (a memoryless threshold heuristic), and Q-learning. The hierarchy is opposite to the naive expectation that learning amplifies instability. Reactive agents are perfectly stable without delay yet collapse once delay is introduced (96% runaway by delay >= 8); fixed-policy agents are immune (0% at all delays); Q-learning agents are only partially resilient (66% at delay 20). The destabilizing ingredient is reactivity to delayed signals, not learning: agents that immediately exploit low-alarm windows trigger oscillatory feedback loops, while learning buffers this through punishment memory encoded in value functions. Throughout, "runaway" denotes bounded large-amplitude oscillation crossing a radical-fraction threshold, consistent with the supercritical bifurcation, not unbounded growth.2026-05-28T12:26:48Z32 pages, 13 figures, 2 appendices. v2: corrected network parameterization; central result re-anchored on reactive agents; added robustness sweeps; bibliography fixes; structural and language edits. Code: https://github.com/YehudaItkin/delayed-repression-instabilityIgor Itkinhttp://arxiv.org/abs/2606.02867v1The Epi-LLM Framework: probing LLM behavioral priors through epidemiological agent-based models2026-06-01T20:31:06ZHuman behaviour during epidemics affects infectious disease dynamics, but quantifying this remains deeply challenging. Here we introduce the Epi-LLM framework: a novel integration of agent-based modelling, real-life epigames, and large language models (LLMs) in which a synthetic society of agents reasons and adapts dynamically over an outbreak contact network. Comparing synthetic agent behaviour against a no-intervention SEIR baseline and human participant data from the AUIB epigame study, we find that LLM agents across four different architectures reduced peak active infections, with quarantine compliance peaking at 58-65% on day six of the 15-day simulation. A binomial generalised linear model showed that perceived health severity was the strongest predictor of quarantine behaviour ($β= 0.33, p = 0.002$), yielding a pseudo-$R^2$ of 0.055, comparable to the 0.072 observed in the human trial. LLM architecture is a key determinant of epidemic dynamics: low-variance architectures offer greater internal validity for testing behavioural rules, while high-variance models may better represent real-world decision-making. Geographic labels alone do not induce culturally differentiated behaviour; explicit attitudinal parameterisation is required. This proof-of-principle work lays the groundwork for deploying the Epi-LLM framework as a scalable, risk-free simulation environment for pandemic preparedness research.2026-06-01T20:31:06ZSubmitted to American Journal of EpidemiologyPetra FerenzAva KeelingTobias O'KeefeLorenzo StiglianoFrancesco Di LauroAndres ColubriJasmina Panovska-Griffithshttp://arxiv.org/abs/2606.02866v1When Helping Hurts and How to Fix It: Multi-Agent Debate for Data Cleaning2026-06-01T20:29:47ZWhen does multi-agent debate help data cleaning, and when does it hurt? Across three benchmarks, four model families, and over 6,000 task-condition pairs, we find debate's effect reverses sign: it degrades generation across all four models (-1.6 to -15.5pp) through critique-induced confusion (CIC), hallucinated Critic feedback that the Generator accepts uncritically, yet improves error detection (+27.4pp F1, d=1.0). We derive a debate benefit condition: debate helps when the probability of rescuing a wrong output (Critic verification odds weighted by fixability) exceeds the probability of destroying a correct one. A factorial experiment proves adversarial separation is essential: self-verification with identical tools fails, while a separate Critic with code-execution grounding and evidence-gated generation produces the first debate configuration to significantly exceed single-agent on a generative task (+5.3pp, p<0.05). The condition correctly predicts all nine task types and generalizes with zero false positives across 19 published comparisons in seven domains.2026-06-01T20:29:47Z27 pages, 4 figures, 12 tables. Includes appendix with full experimental results, prompt templates, and dataset statisticsChirag ParmarAkshat MehtaHenglin WuJagadish RamamurthyShweta Medhekarhttp://arxiv.org/abs/2510.12837v4Semantic knowledge guides innovation and drives cultural evolution2026-06-01T20:24:30ZCultural evolution allows ideas and technologies to accumulate across generations, reaching their most complex and open-ended form in humans. While social learning enables the transmission of such innovations, the cognitive processes that generate them remain poorly understood. Classical theories typically treat innovation as random variation, a simplification insufficient for explaining the complexity of human cultural evolution. We propose that semantic knowledge-the associations linking concepts to their properties and functions-guides human innovation and drives cumulative culture. To test this, we combined an agent-based model, which examines how semantic knowledge shapes cultural evolutionary dynamics, with a large-scale behavioral experiment (N = 1,243) testing its role in human innovation. Across both approaches, we found that semantic knowledge directed exploration toward meaningful solutions, enhanced innovation success, and enabled generalization from prior discoveries. Moreover, semantic knowledge interacted synergistically with social learning to amplify innovation and accelerate cumulative cultural change. In contrast, experimental participants lacking access to semantic knowledge performed no better than chance, even when social learning was possible, and relied on shallow exploration strategies for innovation. Together, these findings suggest that semantic knowledge is a key cognitive process underpinning human cumulative culture.2025-10-13T16:03:51ZProceedings of the National Academy of Sciences, 123(22), e2530750123, 2026Anil YamanShen TianBjörn Lindström10.1073/pnas.2530750123http://arxiv.org/abs/2606.02862v1Toward a Modular Architecture for Embedded AI Agent Systems at the Edge2026-06-01T20:24:18ZThe rise of Large Language Models (LLMs) has enabled agentic AI capable of complex reasoning and tool use; however, deploying such autonomy in pervasive computing environments remains challenging due to the strict memory and energy constraints of embedded microcontrollers. Existing frameworks typically assume server-class resources or continuous connectivity, leaving a gap for deeply embedded systems. This paper proposes a modular reference architecture for Embedded Agent Systems that bridges the divide between deterministic real-time control and agentic intelligence.
We introduce a tiered design that decouples On-Device Agents - executing highly compressed neural networks and rule-based logic for low-latency, privacy-critical tasks - from Cloud-Augmented Agents that leverage Small Language Models (SLMs) for higher-level reasoning and planning. A key contribution is the integration of a cross-cutting Governance Layer, ensuring observability, policy enforcement, and safety across distributed fleets of autonomous devices. Rather than presenting purely empirical benchmarks, we analyze architectural design principles and trade-offs regarding latency, energy, and reliable execution in resource-constrained environments.2026-06-01T20:24:18ZMarcus RübMichael Gerhardshttp://arxiv.org/abs/2606.02859v1Economy of Minds: Emerging Multi-Agent Intelligence with Economic Interactions2026-06-01T20:21:09ZHow can a population of agents self-orchestrate and self-adapt into stronger collective intelligence without centralized control? Inspired by Friedrich Hayek's economic theory of decentralized coordination in markets, we study this question through an agent economy in which agents compete via auctions for the right to act, exchange payments, and accumulate wealth from environmental rewards. These simple economic signals induce decentralized credit assignment, driving planning without global orchestration or explicit communication protocols. The population evolves through economic selection: effective agents accumulate wealth and are mutated via exploitation, while ineffective ones go bankrupt and are replaced via exploration. We show that, initialized with weak agents, the economy produces emergent multi-step reasoning strategies and outperforms stronger monolithic baselines across five agentic tasks, including mathematical reasoning, financial research, scientific research, accelerator design, and distributed-system optimization. We further provide theoretical insights into how economic dynamics shape agent behaviors, linking local incentives to long-term global performance. Our results suggest a new path to multi-agent intelligence: rather than engineering coordination, we can design decentralized incentive structures under which it automatically emerges.2026-06-01T20:21:09ZZhenting QiHuangyuan SuAo QuChenyu WangYu YaoHan ZhengKushal ChattopadhyayGuowei XuZihan WangWeirui YeVijay Janapa ReddiJu LiPaul Pu LiangHimabindu LakkarajuSham KakadeYilun Duhttp://arxiv.org/abs/2606.02840v1Self-Regulation through Communication in Evolved Neural Agents2026-06-01T20:04:50ZCommunication is typically understood as indication: signals that transfer information from sender to receiver. We present a minimal predator avoidance task in which pairs of evolved CTRNN agents use communication for robust survival, and in which agents hear their own vocalizations, as in natural systems. Across 112 perfect-fitness agents from over 2,000 evolutionary runs, three dominant strategies emerge (accounting for 81% of agents): safety calling (39%), where agents signal from safe cover; alarm indication (22%), where agents vocalize when a threat is present without relying on self-hearing; and self-regulatory calling (20%), where agents depend on hearing their own call to sustain escape behavior. Self-hearing dependency is common among agents that call during an active threat (47%), but rare among agents that call only after reaching safe cover (10%; p < 10^-4). This pattern is consistent with a difference in causal order: safety callers act then communicate, while self-regulatory callers communicate in order to act. Removing self-hearing selectively impairs self-regulatory callers (fitness 0.40) while safety callers remain functional (0.90; p < 10^-9). These results show that communication can evolve to serve the caller's own behavioral regulation, not just information transfer to others.2026-06-01T20:04:50Z7 pages, 5 figures. Submitted to ALIFE 2026Joshua Nunleyhttp://arxiv.org/abs/2606.02813v1Democracy on Rugged Landscapes: Phase Transitions in Optimal Voting Rules2026-06-01T19:30:19ZLaws and institutions shape individual outcomes through complex interactions with citizens' diverse circumstances, yet how different voting methods navigate this coupled landscape remains poorly understood. We model collective governance as optimization on NK fitness landscapes, where shared bits (laws) are updated by voting while individual bits (personal traits) remain fixed. A cross-dependency parameter $α$ controls how legislation's effects depend on individual circumstances. We compare eight standard voting methods and a generalized scoring family across landscape ruggedness $K \in \{1,\ldots,20\}$ and $α\in [0,1]$ with 1000 runs per configuration.
Under direct democracy, the optimal voting method undergoes sharp phase transitions as a function of landscape complexity: cardinal score voting dominates on smooth landscapes, ordinal scoring with $p=0.35$ at low-to-moderate ruggedness, Borda count across a wide middle range, and STAR voting at the highest complexity. A two-parameter empirical formula reduces the $(K, α)$ plane to a single complexity axis for visualization. Borda count achieves the highest mean fitness and lowest variance across most of the parameter space.
We further introduce a representative democracy model parameterized by identity weight $β$ and candidate self-interest $p_{\mathrm{self}}$. Representation reshapes the complexity-dependent structure even under favorable conditions: cardinal score voting dominates across most regimes, with plurality emerging as the top method at high $β$ and low-to-moderate $p_{\mathrm{self}}$.2026-06-01T19:30:19Z8 pages, 3 figures. Submitted to ALIFE 2026Joshua Nunleyhttp://arxiv.org/abs/2606.02568v1ClinEnv: An Interactive Multi-Stage Long Horizon EHR Environment for Agents2026-06-01T17:56:26ZClinical practice is not the selection of an answer from enumerated options: a physician gathers heterogeneous information incrementally and commits to sequential, irreversible decisions under uncertainty. Static benchmarks cannot probe and existing interactive medical benchmarks each compromise on at least one of them. We present ClinEnv, an interactive benchmark that evaluates LLMs as attending physicians over real inpatient admissions under a paradigm we term Longitudinal Inpatient Simulation. Each case is automatically constructed into an ordered sequence of decision stages; at every stage the model must actively query four specialized agents before committing to medications, procedures, and diagnoses. ClinEnv scores both what the model decides, through deterministic ontology-grounded matching, and how it gathers information. Across seven models, the strongest reaches only 0.31 decision F1, and outcome quality is sharply decoupled from process quality. Difficulty concentrates in management decisions and later stages, where models recover discharge diagnoses far more reliably than management actions (0.51 vs. 0.17 F1) and continue to issue redundant queries as cases progress. ClinEnv makes this information-acquisition gap, invisible to outcome-only evaluation, directly measurable.2026-06-01T17:56:26Z20 pages, 6 figures, 12 tablesYuxing LuYushuhong LinWenqi ShiJ. Ben TamoXukai ZhaoJinzhuo WangMay Dongmei Wanghttp://arxiv.org/abs/2606.02529v1A No-Regret Framework for Adaptive Incentive Design2026-06-01T17:37:11ZIncentive design studies how a central authority can influence strategic agents through payments, subsidies, or taxes, so that individual objectives align with collective welfare. This paper introduces a No-Regret Adaptive Incentive Design (RAID) framework for nonlinear games with continuous action spaces and private agent costs. In this framework, the authority (planner) designs incentives that regulate the Nash equilibrium toward a socially optimal action profile, while simultaneously learning agents' unknown preferences from repeated strategic responses. We formulate the RAID problem and construct a least-squares estimator whose strong consistency requires only diminishing excitation. Leveraging this weak excitation requirement, we propose a switching incentive policy that alternates between probing (exploration) and estimate-based (exploitation) incentives. The resulting policy achieves an $O(t^{-0.5})$ parameter estimation rate and accumulates $O(t^{0.5}\log t)$ squared social-cost regret, almost surely. We further extend the framework to an endogenous-noise response model, where standard least-squares estimation is biased due to an error-in-variables correlation between the noise and agent responses. We utilize a repeated-sampling estimator and corresponding switching policy that retain the same almost-sure convergence and regret rates. Numerical experiments validate the effectiveness and predicted convergence rates of the method.2026-06-01T17:37:11Z21 pages, 5 figuresGeorgios VasileiouLantian ZhangSilun Zhanghttp://arxiv.org/abs/2505.09799v3On Signed Network Games with Binary Actions2026-06-01T16:16:51ZWe study binary-action pairwise-separable graphical games that encompass both coordination and anti-coordination network games. Our model is grounded in an underlying directed signed graph, where each link is associated with a signed weight that describes both nature and the strength of the strategic pairwise interaction. Specifically, positive link weight corresponds to a strategic complement type interaction, whereas negative link weight corresponds to strategic substitute type interaction. The utility for each player is then an aggregation of pairwise terms determined by the weights of the signed graph in addition to an individual bias term. We consider a scenario that assumes the presence of a prominent cohesive subset of players, who are either connected exclusively by positive weights, or form a structurally balanced subset that can be bipartitioned into two adversarial subcommunities with positive intra-community and negative inter-community edges. Under suitable properties of the game restricted to the remaining players, our results guarantee the existence of Nash equilibria characterized by either consensus or polarization within the first group, as well as their stability under best response transitions. Our results can be interpreted as robustness results, building on the super-modular properties of network coordination games and on a novel use of the concept of graph cohesiveness.2025-05-14T20:51:34Z15 pages, 8 figures, 1 tableMartina VanelliLaura ArdittiGiacomo ComoFabio Fagnanihttp://arxiv.org/abs/2312.03644v3MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment2026-06-01T16:16:05ZOffline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges because interactions with an environment are prohibited. In this paper, we propose a new framework, namely Multi-Agent Causal Credit Assignment (MACCA), to address credit assignment in the offline MARL setting. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent's contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to integrate with various offline MARL methods seamlessly. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. In our experiments, we demonstrate that MACCA not only outperforms state-of-the-art methods but also enhances performance when integrated with other backbones.2023-12-06T17:59:34Z21 pages, 4 figuresTMLR 2025Ziyan WangYali DuYudi ZhangMeng FangBiwei Huanghttp://arxiv.org/abs/2606.02433v1ODTQA-FoRe: An Open-Domain Tabular Question Answering Dataset for Future Data Forecasting and Reasoning2026-06-01T16:06:20ZThe rapid development of LLMs has significantly advanced tabular question answering, but most systems cannot perform future-oriented numerical prediction. To address this gap, we introduce a novel task, Open-Domain Tabular Question Answering for Future Data Forecasting and Reasoning, and propose the first dataset to cover time-series forecasting and forecast-based reasoning scenarios using real estate data. This task poses challenges in retrieving precise historical data, overcoming the forecasting limitations of LLMs, and standardizing responses for diverse queries. To solve the above challenges, we propose TimeFore, an LLM agent-based framework that decomposes the problem into three collaborative roles: a Retriever autonomously generates SQL to fetch data, a Forecaster invokes external time-series models for higher accuracy, and an Analyzer synthesizes the results to construct a precise and consistent final answer. Extensive experiments demonstrate the effectiveness of our TimeFore.2026-06-01T16:06:20ZThis paper has been accepted by Findings of ACL 2026Zhensheng WangXiaole LiuWenmian YangKun ZhouYiquan ZhangWeijia Jiahttp://arxiv.org/abs/2605.09907v2RADAR: Redundancy-Aware Diffusion for Multi-Agent Communication Structure Generation2026-06-01T12:51:51ZCompared with individual agents, large language model based multi-agent systems have shown great capabilities consistently across diverse tasks, including code generation, mathematical reasoning, and planning, etc. Despite their impressive performance, the effectiveness and robustness of these systems heavily rely on their communication topology, which is often fixed or generated in a single step. This restricts fine-grained structural exploration and flexible composition, resulting in excessive token utilization on simple tasks while limiting capability on complicated tasks. To mitigate this challenge, we introduce RADAR, a redundancy-aware and query-adaptive generative framework that actively reduce communication overhead. Motivated by recent progress in conditional discrete graph diffusion models, we formulate communication topology design as a step-by-step generation process, guided by the effective size of the graph. Comprehensive experiments on six benchmarks demonstrate that RADAR consistently outperforms recent baselines, achieving higher accuracy, lower token consumption, and greater robustness across diverse scenarios. Our code and data are available at https://github.com/cszhangzhen/RADAR.2026-05-11T02:50:40ZAccepted by ICML 2026 (fix typos)Zhen ZhangWanjing ZhouJuncheng LiHao FeiJun WenWei Ji