https://arxiv.org/api/tIkV/EeNJIamOMXFl6HKplIvtmQ2026-07-01T18:12:11Z12827148515http://arxiv.org/abs/2604.00523v1Lipschitz Dueling Bandits over Continuous Action Spaces2026-04-01T06:07:33ZWe study for the first time, stochastic dueling bandits over continuous action spaces with Lipschitz structure, where feedback is purely comparative. While dueling bandits and Lipschitz bandits have been studied separately, their combination has remained unexplored. We propose the first algorithm for Lipschitz dueling bandits, using round-based exploration and recursive region elimination guided by an adaptive reference arm. We develop new analytical tools for relative feedback and prove a regret bound of $\tilde O\left(T^{\frac{d_z+1}{d_z+2}}\right)$, where $d_z$ is the zooming dimension of the near-optimal region. Further, our algorithm takes only logarithmic space in terms of the total time horizon, best achievable by any bandit algorithm over a continuous action space.2026-04-01T06:07:33ZMudit SharmaShweta JainVaneet AggarwalGanesh Ghalmehttp://arxiv.org/abs/2604.00477v1Logarithmic Scores, Power-Law Discoveries: Disentangling Measurement from Coverage in Agent-Based Evaluation2026-04-01T04:44:21ZLLM-based agent judges are an emerging approach to evaluating conversational AI, yet a fundamental uncertainty remains: can we trust their assessments, and if so, how many are needed? Through 960 sessions with two model pairs across 15 tasks, we show that persona-based agent judges produce evaluations indistinguishable from human raters in a Turing-style validation. We then identify a score-coverage dissociation: quality scores improve logarithmically with panel size, while unique issue discoveries follow a sublinear power law-both exhibit diminishing returns, but scores saturate roughly twice as fast as discoveries. We hypothesize this reflects a power law distribution of the finding space: critical issues are discovered first by small panels, while corner cases require progressively larger panels, analogous to species accumulation curves in ecology. The mechanism traces to ensemble diversity-Big Five personality conditioning makes agents probe different quality dimensions, with expert judges acting as adversarial probes that push discovery into the tail of the finding distribution. A controlled ablation confirms that structured persona conditioning, not simple prompting, is required to produce these scaling properties.2026-04-01T04:44:21ZHyunJoon JungWilliam Nahttp://arxiv.org/abs/2604.00451v1CASCADE: Cascaded Scoped Communication for Multi-Agent Re-planning in Disrupted Industrial Environments2026-04-01T04:01:16ZIndustrial disruption replanning demands multi-agent coordination under strict latency and communication budgets, where disruptions propagate through tightly coupled physical dependencies and rapidly invalidate baseline schedules and commitments. Existing coordination schemes often treat communication as either effectively free (broadcast-style escalation) or fixed in advance (hand-tuned neighborhoods), both of which are brittle once the disruption footprint extends beyond a local region. We present \CASCADE, a budgeted replanning mechanism that makes communication scope explicit and auditable rather than fixed or implicit. Each agent maintains an explicit knowledge base, solves role-conditioned local decision problems to revise commitments, and coordinates through lightweight contract primitives whose footprint expands only when local validation indicates that the current scope is insufficient. This design separates a unified agent substrate (Knowledge Base / Decision Manager / Communication Manager) from a scoped interaction layer that controls who is contacted, how far coordination propagates, and when escalation is triggered under explicit budgets. We evaluate \CASCADE on disrupted manufacturing and supply-chain settings using unified diagnostics intended to test a mechanism-design claim -- whether explicit scope control yields useful quality-latency-communication trade-offs and improved robustness under uncertainty -- rather than to provide a complete algorithmic ranking.2026-04-01T04:01:16ZPublished at ICLR 2026 Workshop on AI for Mechanism Design and Strategic Decision MakingMingjie Bihttp://arxiv.org/abs/2604.00433v1Internal State-Based Policy Gradient Methods for Partially Observable Markov Potential Games2026-04-01T03:25:21ZThis letter studies multi-agent reinforcement learning in partially observable Markov potential games. Solving this problem is challenging due to partial observability, decentralized information, and the curse of dimensionality. First, to address the first two challenges, we leverage the common information framework, which allows agents to act based on both shared and local information. Second, to ensure tractability, we study an internal state that compresses accumulated information, preventing it from growing unboundedly over time. We then implement an internal state-based natural policy gradient method to find Nash equilibria of the Markov potential game. Our main contribution is to establish a non-asymptotic convergence bound for this method. Our theoretical bound decomposes into two interpretable components: a statistical error term that also arises in standard Markov potential games, and an approximation error capturing the use of finite-state controllers. Finally, simulations across multiple partially observable environments demonstrate that the proposed method using finite-state controllers achieves consistent improvements in performance compared to the setting where only the current observation is used.2026-04-01T03:25:21Z6 pages, 2 figures. Submitted to IEEE Control Systems Letters (L-CSS) with CDC optionWonseok YangThinh T. Doanhttp://arxiv.org/abs/2604.00430v1Secure Forgetting: A Framework for Privacy-Driven Unlearning in Large Language Model (LLM)-Based Agents2026-04-01T03:17:35ZLarge language model (LLM)-based agents have recently gained considerable attention due to the powerful reasoning capabilities of LLMs. Existing research predominantly focuses on enhancing the task performance of these agents in diverse scenarios. However, as LLM-based agents become increasingly integrated into real-world applications, significant concerns emerge regarding their accumulation of sensitive or outdated knowledge. Addressing these concerns requires the development of mechanisms that allow agents to selectively forget previously learned knowledge, giving rise to a new term LLM-based agent unlearning. This paper initiates research on unlearning in LLM-based agents. Specifically, we propose a novel and comprehensive framework that categorizes unlearning scenarios into three contexts: state unlearning (forgetting specific states or items), trajectory unlearning (forgetting sequences of actions) and environment unlearning (forgetting entire environments or categories of tasks). Within this framework, we introduce a natural language-based unlearning method that trains a conversion model to transform high-level unlearning requests into actionable unlearning prompts, guiding agents through a controlled forgetting process. Moreover, to evaluate the robustness of the proposed framework, we introduce an unlearning inference adversary capable of crafting prompts, querying agents, and observing their behaviors in an attempt to infer the forgotten knowledge. Experimental results show that our approach effectively enables agents to forget targeted knowledge while preserving performance on untargeted tasks, and prevents the adversary from inferring the forgotten knowledge.2026-04-01T03:17:35ZDayong YeTainqing ZhuCongcong ZhuFeng HeQi HeShang WangBo LiuWanlei Zhouhttp://arxiv.org/abs/2604.00319v1Collaborative AI Agents and Critics for Fault Detection and Cause Analysis in Network Telemetry2026-03-31T23:33:56ZWe develop algorithms for collaborative control of AI agents and critics in a multi-actor, multi-critic federated multi-agent system. Each AI agent and critic has access to classical machine learning or generative AI foundation models. The AI agents and critics collaborate with a central server to complete multimodal tasks such as fault detection, severity, and cause analysis in a network telemetry system, text-to-image generation, video generation, healthcare diagnostics from medical images and patient records, etcetera. The AI agents complete their tasks and send them to AI critics for evaluation. The critics then send feedback to agents to improve their responses. Collaboratively, they minimize the overall cost to the system with no inter-agent or inter-critic communication. AI agents and critics keep their cost functions or derivatives of cost functions private. Using multi-time scale stochastic approximation techniques, we provide convergence guarantees on the time-average active states of AI agents and critics. The communication overhead is a little on the system, of the order of $\mathcal{O}(m)$, for $m$ modalities and is independent of the number of AI agents and critics. Finally, we present an example of fault detection, severity, and cause analysis in network telemetry and thorough evaluation to check the algorithm's efficacy.2026-03-31T23:33:56ZSyed Eqbal AlamZhan Shuhttp://arxiv.org/abs/2604.00284v1Improvisational Games as a Benchmark for Social Intelligence of AI Agents: The Case of Connections2026-03-31T22:16:25ZWe formally introduce a improvisational wordplay game called Connections to explore reasoning capabilities of AI agents. Playing Connections combines skills in knowledge retrieval, summarization and awareness of cognitive states of other agents. We show how the game serves as a good benchmark for social intelligence abilities of language model based agents that go beyond the agents' own memory and deductive reasoning and also involve gauging the understanding capabilities of other agents. Finally, we show how through communication with other agents in a constrained environment, AI agents must demonstrate social awareness and intelligence in games involving collaboration.2026-03-31T22:16:25Zhttps://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdfhttps://wordplay-workshop.github.io/wordplay2024/pdfs/16.pdfGaurav Rajesh ParikhAngikar Ghosalhttp://arxiv.org/abs/2602.01415v4Evidence-Decision-Feedback: Theory-Driven Adaptive Scaffolding for LLM Agents2026-03-31T21:23:01ZLLMs offer tremendous opportunities for pedagogical agents to help students construct knowledge and develop problem-solving skills, yet many of these agents operate on a "one-size-fits-all" basis, limiting their ability to personalize support. To address this, we introduce Evidence-Decision-Feedback (EDF), a theoretical framework for adaptive scaffolding with LLM agents. EDF integrates elements of intelligent tutoring systems (ITS) and agentic behavior by organizing interactions around evidentiary inference, pedagogical decision-making, and adaptive feedback. We instantiate EDF through Copa, a Collaborative Peer Agent for STEM+C problem-solving. In an authentic high school classroom study, we show that EDF-guided interactions align feedback with students' demonstrated understanding and task mastery; promote scaffold fading; and support interpretable, evidence-grounded explanations without fostering overreliance.2026-02-01T19:43:00ZTo appear as a full paper in the proceedings of the 27th International Conference on Artificial Intelligence in Education (AIED26)Clayton CohnSiyuan GuoSurya RayalaHanchen David WangNaveeduddin MohammedUmesh TimalsinaShruti JainAngela EedsMenton DeweesePamela J. Osborn PoppRebekah StantonShakeera WalkerMeiyi MaGautam Biswashttp://arxiv.org/abs/2604.00249v1A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation2026-03-31T21:21:31ZSingle-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed to simulate supportive behavioral health dialogue through coordinated, role-differentiated agents. Conversational responsibilities are decomposed across specialized agents, including empathy-focused, action-oriented, and supervisory roles, while a prompt-based controller dynamically activates relevant agents and enforces continuous safety auditing. Using semi-structured interview transcripts from the DAIC-WOZ corpus, we evaluate the framework with scalable proxy metrics capturing structural quality, functional diversity, and computational characteristics. Results illustrate clear role differentiation, coherent inter-agent coordination, and predictable trade-offs between modular orchestration, safety oversight, and response latency when compared to a single-agent baseline. This work emphasizes system design, interpretability, and safety, positioning the framework as a simulation and analysis tool for behavioral health informatics and decision-support research rather than a clinical intervention.2026-03-31T21:21:31ZHa Na Chohttp://arxiv.org/abs/2604.00237v1AI-Mediated Explainable Regulation for Justice2026-03-31T21:00:42ZPresent practice of deciding on regulation faces numerous problems that make adopted regulations static, unexplained, unduly influenced by powerful interest groups, and stained with a perception of illegitimacy. These well-known problems with the regulatory process can lead to injustice and have substantial negative effects on society and democracy. We discuss a new approach that utilizes distributed artificial intelligence (AI) to make a regulatory recommendation that is explainable and adaptable by design. We outline the main components of a system that can implement this approach and show how it would resolve the problems with the present regulatory system. This approach models and reasons about stakeholder preferences with separate preference models, while it aggregates these preferences in a value sensitive way. Such recommendations can be updated due to changes in facts or in values and are inherently explainable. We suggest how stakeholders can make their preferences known to the system and how they can verify whether they were properly considered in the regulatory decision. The resulting system promises to support regulatory justice, legitimacy, and compliance.2026-03-31T21:00:42ZThomas HofweberAndreas SudmannEvangelos Pournarashttp://arxiv.org/abs/2604.16406v1Heterogeneous Self-Play for Realistic Highway Traffic Simulation2026-03-31T19:39:29ZRealistic highway simulation is critical for scalable safety evaluation of autonomous vehicles, particularly for interactions that are too rare to study from logged data alone. Yet highway traffic generation remains challenging because it requires broad coverage across speeds and maneuvers, controllable generation of rare safety-critical scenarios, and behavioral credibility in multi-agent interactions. We present PHASE, Policy for Heterogeneous Agent Self-play on Expressway, a context-aware self-play framework that addresses these three requirements through explicit per-agent conditioning for controllability, synthetic scenario generation for broad highway coverage, and closed-loop multi-agent training for realistic interaction dynamics. PHASE further supports different vehicle profiles, for example, passenger cars and articulated trailer trucks, within a single policy via vehicle-aware dynamics and context-conditioned actions, and stabilizes self-play with early termination of unrecoverable states, at-fault collision attribution, highway-aware reward shaping, coupled curricula, and robust policy optimization. Despite being trained only on synthetic data, PHASE transfers zero-shot to 512 unseen high-interaction real scenarios in exiD, achieving a 96.3% success rate and reducing ADE/FDE from 6.57/12.07 m to 2.44/5.25 m relative to a prior self-play baseline. In a learned trajectory embedding space, it also improves behavioral realism over IDM, reducing Frechet trajectory distance by 13.1% and energy distance by 20.2%. These results show that synthetic self-play can provide a scalable route to controllable and realistic highway scenario generation without direct imitation of expert logs.2026-03-31T19:39:29Z8 pages, 2026 CVPR SAD WorkshopJinkai QiuAlessandro SavioloChaojie WangMingke WangXiaoyu Huanghttp://arxiv.org/abs/2512.02966v2Lumos: Let there be Language Model System Certification2026-03-31T19:36:43ZWe introduce the first principled framework, Lumos, for specifying and formally certifying Language Model System (LMS) behaviors. Lumos is an imperative probabilistic programming DSL over graphs, with constructs to generate independent and identically distributed prompts for LMS. It offers a structured view of prompt distributions via graphs, forming random prompts from sampled subgraphs. Lumos supports certifying LMS for arbitrary prompt distributions via integration with statistical certifiers. We provide hybrid (operational and denotational) semantics for Lumos, providing a rigorous way to interpret the specifications. Using only a small set of composable constructs, Lumos can encode existing LMS specifications, including complex relational and temporal specifications. It also facilitates specifying new properties - we present the first safety specifications for vision-language models (VLMs) in autonomous driving scenarios developed with Lumos. Using these, we show that the state-of-the-art VLM Qwen-VL exhibits critical safety failures, producing incorrect and unsafe responses with at least 90% probability in right-turn scenarios under rainy driving conditions, revealing substantial safety risks. Lumos's modular structure allows easy modification of the specifications, enabling LMS certification to stay abreast with the rapidly evolving threat landscape. We further integrate a prompt-level deterministic verifier to obtain guarantees over the privacy of the LLM generation distribution over a prompt distribution. Lumos is simple to program in, requiring only a few constructs, as evidenced by state-of-the-art large language models generating correct Lumos specifications in zero-shot settings. Lumos is the first systematic and extensible language-based framework for specifying and certifying LMS behaviors, paving the way for a wider adoption of LMS certification.2025-12-02T17:44:47ZIsha ChaudharyVedaant JainPrineet ParharKavya SachdevaAvaljot SinghSayan RanuGagandeep Singhhttp://arxiv.org/abs/2604.21936v1An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing2026-03-31T19:28:11ZMedical imaging research is increasingly shifting from controlled benchmark evaluation toward real-world clinical deployment. In such settings, applying analytical methods extends beyond model design to require dataset-aware workflow configuration and provenance tracking. Two requirements therefore become central: \textbf{adaptability}, the ability to configure workflows according to dataset-specific conditions and evolving analytical goals; and \textbf{reproducibility}, the guarantee that all transformations and decisions are explicitly recorded and re-executable. Here, we present an artifact-based agent framework that introduces a semantic layer to augment medical image processing. The framework formalizes intermediate and final outputs through an artifact contract, enabling structured interrogation of workflow state and goal-conditioned assembly of configurations from a modular rule library. Execution is delegated to a workflow executor to preserve deterministic computational graph construction and provenance tracking, while the agent operates locally to comply with most privacy constraints. We evaluate the framework on real-world clinical CT and MRI cohorts, demonstrating adaptive configuration synthesis, deterministic reproducibility across repeated executions, and artifact-grounded semantic querying. These results show that adaptive workflow configuration can be achieved without compromising reproducibility in heterogeneous clinical environments.2026-03-31T19:28:11ZLianrui ZuoYihao LiuGaurav RudravaramKarthik RamadassAravind R. KrishnanMichael D. PhillipsYelena G. BodienMayur B. PatelPaula TrujilloYency Forero MartinezStephen A. DeppenEric L. GroganFabien MaldonadoKevin McGannHudson M. HolmesLaurie E. CuttingYuankai HuoBennett A. Landmanhttp://arxiv.org/abs/2604.00085v1One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction2026-03-31T18:00:34ZLarge language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from one role-conditioned distribution, and multi-agent frameworks use fixed roles with flat majority voting, discarding the diagnostic signal in disagreement. We propose CAMP (Case-Adaptive Multi-agent Panel), where an attending-physician agent dynamically assembles a specialist panel tailored to each case's diagnostic uncertainty. Each specialist evaluates candidates via three-valued voting (KEEP/REFUSE/NEUTRAL), enabling principled abstention outside one's expertise. A hybrid router directs each diagnosis through strong consensus, fallback to the attending physician's judgment, or evidence-based arbitration that weighs argument quality over vote counts. On diagnostic prediction and brief hospital course generation from MIMIC-IV across four LLM backbones, CAMP consistently outperforms strong baselines while consuming fewer tokens than most competing multi-agent methods, with voting records and arbitration traces offering transparent decision audits.2026-03-31T18:00:34ZYuxing LuYushuhong LinJason Zhanghttp://arxiv.org/abs/2510.16187v2Zero-Shot Coordination in Ad Hoc Teams with Generalized Policy Improvement and Difference Rewards2026-03-31T17:21:11ZReal-world multi-agent systems may require ad hoc teaming, where an agent must coordinate with other previously unseen teammates to solve a task in a zero-shot manner. Prior work often either selects a pretrained policy based on an inferred model of the new teammates or pretrains a single policy that is robust to potential teammates. Instead, we propose to leverage all pretrained policies in a zero-shot transfer setting. We formalize this problem as an ad hoc multi-agent Markov decision process and present a solution that uses two key ideas, generalized policy improvement and difference rewards, for efficient and effective knowledge transfer between different teams. We empirically demonstrate that our algorithm, Generalized Policy improvement for Ad hoc Teaming (GPAT), successfully enables zero-shot transfer to new teams in three simulated environments: cooperative foraging, predator-prey, and Overcooked. We also demonstrate our algorithm in a real-world multi-robot setting.2025-10-17T19:55:25Z10 pages, 8 figures. To appear in proceedings of 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)Rupal NigamNiket ParikhHamid OsooliMikihisa YuasaJacob HeglundHuy T. Tran10.65109/TNEX7143