https://arxiv.org/api/XzbBy1+PvypaRc8w4kEa6znFbYo 2026-06-13T17:19:33Z 30934 105 15 http://arxiv.org/abs/2606.08596v1 Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration 2026-06-07T12:20:32Z

Constructing efficient and reliable policies to assist humans is indispensable for human-AI collaboration. Existing methods mainly follow two lines of work. Most prior work relies on multi-agent reinforcement learning (MARL) to learn black-box policies, which limits interpretability and raises safety concerns. Recent methods query large language models (LLMs) at each decision step, causing slow responses and high inference costs. We propose Collaboration Policy Tree (Co-pi-tree), a closed-loop method that learns an executable policy tree consisting of a partner-behavior prediction tree and an agent-action selection tree. Co-pi-tree constructs a policy by distilling LLM reasoning into policy tree code. It then evaluates the policy through partner interaction, obtains feedback, and uses natural language to summarize the interaction feedback to improve problematic branches. Experiments in Overcooked-AI show that Co-pi-tree improves average reward by 35.4% over the baseline average, while reducing the number of LLM queries by 77.7% and test-time latency by 97.1%. Project page: https://beiwenzhang.github.io/Co-pi-tree/

2026-06-07T12:20:32Z Beiwen Zhang Yongheng Liang Guowei Zou Haitao Wang Hejun Wu http://arxiv.org/abs/2508.06336v2 Unsupervised Partner Design Enables Robust Ad-hoc Teamwork 2026-06-07T11:04:52Z

We introduce Unsupervised Partner Design (UPD), a population-free multi-agent reinforcement learning method for robust ad-hoc teamwork. UPD generates training partners on-the-fly and selects them adaptively based on a learnability criterion, removing the need for pre-trained partner populations or manual parameter tuning. We show that this simple mechanism enables effective partner diversity and can be extended to joint partner-environment selection when a procedural level generator is available. Across Level-Based Foraging, Overcooked-AI, and the Overcooked Generalisation Challenge, UPD consistently achieves strong performance compared to both population-based and population-free baselines. In a human-AI user study, agents trained with UPD achieve higher returns and are rated as more adaptive, more human-like, and less frustrating than all evaluated baseline methods.

2025-08-08T14:11:15Z 27 pages Constantin Ruhdorfer Matteo Bortoletto Victor Oei Anna Penzkofer Andreas Bulling http://arxiv.org/abs/2606.08441v1 Comparing Controller-Free Pointing Techniques Across Depth for 2D Selection in Augmented Reality 2026-06-07T03:38:17Z

This paper presents a systematic evaluation of five controller-free pointing techniques for 2D target selection in AR, using ISO 9241-411. We compared them across multiple depths (2 m, 6 m, 10 m) in terms of movement time, accuracy, throughput, and workload (NASA TLX). Head- and eye-based pointing significantly outperformed the hand-based methods (Finger, Wrist, and Arm); Head input was the most accurate and remained the most consistent across depth. Depth significantly impacted performance, with complex interactions with target size and distance. Our results offer a comprehensive empirical basis for selecting appropriate controller-free techniques in depth-varying AR tasks.

2026-06-07T03:38:17Z Proceedings of the Graphics Interface Conference 2026 Samiha Sultana J. Felipe Gonzalez Robert J. Teather http://arxiv.org/abs/2606.08426v1 CritLens: Visual Analytics for Criteria Discovery in Review-Based Decision Making 2026-06-07T02:50:54Z

We present CritLens, a visual analytics system that helps users build personalized multi-criteria decision models from review text. In everyday decisions -- choosing equipment, hotels, or restaurants -- evaluation criteria are either preset by platforms or generated by LLMs, leaving users unable to discover, adjust, or verify them against the underlying evidence. This is problematic because many preferences are latent: they surface only upon encountering specific reviews, and any fixed framework risks overlooking low-frequency but decisive details. CritLens addresses this gap by using LLMs to transform reviews into an initial AHP decision model, then supporting iterative, human-in-the-loop refinement. Through coverage gap detection in the embedding space, users discover criteria missed by the initial model; through interactive weight adjustment under AHP consistency constraints, they express personal priorities; and through a multi-level scorecard and exportable decision report, they trace every ranking back to the original review text. Two case studies, an eight-participant user study, and a quantitative consistency-repair experiment demonstrate the system's effectiveness.

2026-06-07T02:50:54Z Hongjia Wu Shuai Zhou Hongxin Zhang Wei Chen http://arxiv.org/abs/2606.08413v1 Beyond Prediction: Longitudinal Reasoning in EHR-Integrated Clinical AI 2026-06-07T02:26:23Z

We present a structured analysis of how contemporary clinical AI systems integrate electronic health record (EHR) data and the extent to which they support longitudinal clinical reasoning. Drawing on a curated corpus of clinical natural language processing (NLP) and EHR-integrated systems, we develop a coding framework that captures both technical integration strategies and reasoning-relevant representational features, such as trajectory modeling, cross-encounter synthesis, longitudinal analysis, and absence reasoning. We also elicited the experiences of three physicians in their EHR use, including what strengths and weaknesses they found with their institution's current EHR system(s). Our analysis shows that while many systems incorporate EHR data, they predominantly operate on encounter-level or aggregated representations, with limited support for explicit temporal reasoning across patient histories. Reasoning-relevant structures are inconsistently represented, and evaluation paradigms remain largely focused on predictive performance instead of longitudinal interpretability. We argue that current approaches treat EHR data as a static input rather than a substrate for ongoing clinical reasoning, and we outline a framework for understanding how future systems might more effectively align with the temporal and interpretive structure of clinical practice.

2026-06-07T02:26:23Z Irene Yi Grace Brown Sufian Aldogom Nathan Roll Eric J. Basile Pamela M. Resnikoff Isaac Gutterman Oscar Schiff Keira Salata Benjamin Mujkic Ammar Ahmed http://arxiv.org/abs/2606.08323v1 "So There's a Catch-22 Here": How Early Adopters Who Build Multi-Agent LLM Systems Conceptualize Transparency 2026-06-06T20:24:40Z

Multi-agent large language model (LLM) systems are rapidly emerging, yet transparency, a cornerstone of responsible AI, remains under-defined in these distributed architectures, which have complexities of inter-agent coordination and orchestration. In this paper, we present one of the first empirical study of how early adopters of multi-agent LLM systems, who are both the builders and users, understand and practice transparency. We conducted semi-structured interviews with 13 early adopters in [Large Technology Organization] and applied thematic analysis to identify recurring patterns. Participants articulated divergent yet complementary framings of transparency, including reproducibility, debugging, boundary-setting, visualization, and auditing. These perspectives spanned questions of what transparency entails, why it matters, and how it is achieved. We synthesize these into a multidimensional framework, which is developer, user, and governance-focused positioning transparency as a situated socio-technical practice that informs future HCI and AI design and research around aligning expectations and capacities of their intended audiences.

2026-06-06T20:24:40Z Suchismita Naik Samir Passi Mihaela Vorvoreanu Scott Saponas Amanda Hall http://arxiv.org/abs/2601.11541v2 A Comparative Study of Student Perspectives on Technical Writing Feedback Quality: Evaluating LLMs, SLMs, and Humans in Computer Science Topics 2026-06-06T19:12:53Z

To address the scalability of feedback in computer science while mitigating the privacy and cost limitations of commercial Large Language Models (LLMs), this study evaluates a locally hosted Small Language Model (SLM). We deployed a quantized Llama-3.1, GPT-4, and human instructors across introductory programming (N=176), operating systems (N=80), and a writing seminar (N=7). Mixed-methods analysis of student perceptions reveals that while the local SLM matched commercial LLMs and was rated higher by students for readability and actionability in technical courses, human feedback remained more favoured for highly specialized writing tasks. We demonstrate that local SLMs offer a privacy-preserving, zero-marginal-cost alternative for foundational feedback, supporting a tiered pedagogical framework where AI handles structural guidance while instructors focus on high-level conceptual scaffolding.

2025-12-01T22:51:54Z accepted at AIED 26 Suqing Liu Runlong Ye Christopher Eaton Bogdan Simion Michael Liut http://arxiv.org/abs/2603.15428v2 Become the Beast: Exploring Human-Quadruped Locomotion for Exergames 2026-06-06T18:38:07Z

Embodying non-human characters and exercising abdominal muscles are both underexplored in exergames. We address this by describing the design and evaluation of a novel human quadruped locomotion exergame, Become the Beast. In the game, the player lies supine on the ground and moves their arms and legs to control a quadrupedal character (a tiger), similar to common bodyweight abdominal muscle exercises such as the Bicycle Crunch. The motion tracking is computer vision-based, utilizing a Kinect sensor placed above the player, which makes our approach suitable for commercial premises such as indoor activity parks where a system needs to run unattended and without any wearable components. Our system extends embodied interaction beyond traditional bipedal or controller-based systems, demonstrating how natural limb movements can generate responsive and immersive quadrupedal motion within virtual environments. We conducted a user study (N=15) and utilized Reflexive Thematic Analysis (RTA) to evaluate the system's intuitiveness, control, and overall player experience. The findings validate that natural body movements effectively control the avatar while delivering an intense core workout. Notably, gameplay immersion masked physical exertion, allowing rigorous core training to be primarily perceived as play.

2026-03-16T15:36:03Z 23 pages, 9 figures Shamit Ahmed Prabhav Bhatnagar Perttu Hämäläinen http://arxiv.org/abs/2606.08198v1 Exploring Above-neck Unimanual Swipe Gestures for Off-Device Earable Interaction 2026-06-06T14:31:12Z

Despite their growing popularity, in-ear Earable / Hearable devices (i.e., ear-mounted wearables) face interaction challenges due to limited input space and compact form factors. To enhance interaction capabilities, researchers are exploring off-device hand-based input spaces above the neck using midair and onskin gestures. However, existing literature primarily focuses on axial swipes (i.e., horizontal and vertical), leaving nonaxial swipes (i.e., unidirectional swipes with varied orientations) and angular swipes (e.g., L, U, or V) largely underexplored despite their potential interaction advantages. To address this gap, we conducted a within-subject gesture motion analysis study with 24 participants, analyzing 5,568 swipes of varying shape, orientation, and complexity. Our results revealed preferred starting and ending regions for different unidirectional and angular swipe shapes, as well as intuitive swipe shapes within the off-device, above-neck manual interaction space. We further examine off-device swipe characteristics, discuss the feasibility of recognizing these earable gestures with current sensing technologies, and highlight their potential application in various scenarios. These findings broaden the understanding of off-device earable gestures and provide design insights for integrating suitable nonaxial and angular swipes alongside traditional axial gestures to enhance interaction with in-ear earable devices.

2026-06-06T14:31:12Z To be published in Graphics Interface 2026 (Entry 1045a) Shaikh Shawon Arefin Shimon Ali Neshati Junwei Sun Qiang Xu Jian Zhao http://arxiv.org/abs/2606.08172v1 The Governance of Human-LLM Interaction: Safety Gating, Civility Steering, and Affective Default Lock-In 2026-06-06T13:36:37Z

Large language models (LLMs) increasingly mediate high-stakes interactions in finance, medicine, and mental-health support, yet users have limited control over how these systems communicate. We frame interaction style as a governance object: provider-side alignment not only blocks harmful content, but also stabilizes communicative defaults that shape users' epistemic distance, relational expectations, and capacity to opt out of emotionalized or anthropomorphic interaction. We introduce a deterministic multi-agent evaluation pipeline for measuring prompt steerability and style drift in long-horizon dialogue. The study replays 100 frozen user-only scripts across four domains and three runnable persona conditions: default, sarcastic, and cold, using three generator models, yielding 90,000 assistant replies scored by a human-calibrated LLM judge on harmfulness, negative emotion, inappropriateness, empathic language, anthropomorphism, and refusal behavior. A fourth harmful persona is evaluated separately as a safety-gating test. The paper contributes a reproducible method for quantifying whether prompt-specified styles remain stable over time and a governance framework distinguishing safety gating, civility steering, and affective default lock-in. Overall, we show that prompt steerability and regression-to-default are observable indicators of provider control over communicative form, with implications for pluralism, autonomy, and democratic agency in human-LLM interaction.

2026-06-06T13:36:37Z Manuele Reani Hongjian Zhang Hongyu Tian http://arxiv.org/abs/2606.08169v1 CLASP: Language-Driven Robot Skill Selection and Composition using Task-Parameterized Learning 2026-06-06T13:33:39Z

Enabling robots to understand and execute tasks from natural language commands while maintaining data efficiency remains challenging. Foundation models such as vision-language-action (VLA) and vision-language models (VLMs) provide intuitive interaction channels but require extensive data; task-parameterized imitation learning achieves data efficiency but lacks natural language grounding. This work bridges this gap through a modular architecture combining task-parameterized kernelized movement primitives (TP-KMPs) with pretrained VLMs. During learning, skills are acquired from 2 to 5 kinesthetic demonstrations, and the VLM generates skill schemas describing each skill's parameters and preconditions. During execution, the VLM interprets commands to select skills, reason about parameter bindings, and create novel behaviors through covariance-weighted composition. When no skill or composition suffices, the system identifies capability gaps and requests targeted demonstrations, all without fine-tuning. Validation on a 7-DoF manipulator shows success rates of 73.3%-100% in scenarios requiring skill selection, composition, and active learning.

2026-06-06T13:33:39Z 23 pages, 11 figues, 4 tables, 1 listing Markus Knauer Valentin Gieraths Tai Mai Samuel Bustamante Alin Albu-Schäffer Freek Stulp João Silvério http://arxiv.org/abs/2606.08131v1 LCAM: A Framework for Diagnosing Interactional Alignment Failures in Con-versational AI 2026-06-06T12:15:09Z

Conversational AI is increasingly used for advice, interpretation, reassurance, and decision support in contexts where users may be vulnerable, uncertain, or dependent on the system's apparent competence. Existing alignment work often focuses on model objectives, preference optimization, or output correctness. Yet, many harms arise through interaction: how systems frame authority, express uncertainty, simulate empathy, support reasoning, and make boundaries legible. This paper introduces the Layered Cognitive Alignment Model (LCAM), a conceptual and normative framework for diagnosing interac-tional alignment failures in conversational AI. LCAM defines alignment as a calibrated fit among system behavior, user goals, task demands, and normative context. It distinguishes five layers of fit: perceptual, semantic, affective, cognitive, and ethical, and two diagnostic polarities of misalignment: underfit and overreach. We apply LCAM to a published LLM counseling example, showing how an apparently supportive response can reinforce harmful beliefs, simulate inappropriate care, and obscure role boundaries. By translating conversational failures into audit and governance questions concerning over-reliance, false intimacy, autonomy erosion, boundary confusion, and inappropriate trust, LCAM offers a theoretical and normative lens for evaluating conversational AI beyond accuracy, helpfulness, or trust.

2026-06-06T12:15:09Z Manuele Reani Hongyu Tian http://arxiv.org/abs/2606.08130v1 How to be Non-Human : A Thematic Analysis of Animal Embodiment in VR Games 2026-06-06T12:15:07Z

This study employs a reflexive thematic analysis to systematically examine the design patterns of 48 first-person Virtual reality (VR) animal avatar games. The research identifies four primary design themes: Animal Biomimicry, Limited Animal Simulation, Hybrid HumanAnimal Features, and Human Behavior with Animal Avatar. The analysis reveals that approximately 77 percent of the games remain grounded in human-centered interaction logic, with animal forms primarily serving as visual representations. The study highlights the core tension between authenticity and usability in current VR animal avatar design, and points toward design opportunities for achieving more authentic animal avatar's interactive experience through directions such as controller innovation, unconventional body mapping, and dynamic feedback. This research provides a thematic classification framework for understanding the representation of non-human perspectives in VR games.

2026-06-06T12:15:07Z 21pages,9 figures, Digra 2026 Siqi Yu Shuai Liu Yiqing Tian Mar Canet Sola http://arxiv.org/abs/2407.10883v2 Data Want to be Free: An Innovation Resistance Theory Model for Identifying Barriers to Government Data Sharing 2026-06-06T10:05:07Z

Data sharing is increasingly essential for digital government and data-driven innovation, yet many public organizations remain reluctant to make their data openly available. While prior research has examined factors influencing open data adoption, little theoretical work explores why resistance persists within public agencies. This study develops an Innovation Resistance Theory (IRT) model tailored to government data sharing to identify predictors of organizational resistance. An initial model was derived from literature and refined through interviews with 21 public organizations across six European countries. The resulting IRT4DS model identifies 39 barriers spanning usage, value, risk, tradition, and image dimensions, and 23 countermeasures mapped to the most critical barriers and the actors responsible for addressing them. By extending IRT into the context of governmental data sharing, the study advances theoretical understanding of why public data often remains closed and provides actionable guidance for policymakers seeking to design enabling data ecosystems and reduce structural and cultural barriers to OGD adoption.

2024-07-15T16:35:38Z Anastasija Nikiforova Antoine Clarinval Anneke Zuiderwijk Daniel Rudmark Petar Milic Charalampos Alexopoulos Katrin Rajamäe-Soosaar http://arxiv.org/abs/2606.08050v1 Automatic, Real-time Classification of User Feedback Using Large Language Models 2026-06-06T08:31:21Z

In this paper we discuss an ongoing multi-year project that aims to make open text feedback more accessible and useful to UX practitioners by automating classification and providing real time access to comments, themes, and analysis. By significantly lowering the time and knowledge cost of implementing automated solutions, we aim to effectively democratize our data analysis processes, allowing and encouraging non-technical stakeholders to access and leverage data on their own. We share both the organizational and technical constraints we have encountered over the course of this project, and the solutions we have prototyped as a result of those constraints.

2026-06-06T08:31:21Z Jim Maddock Rose Leitner Anna Wu