https://arxiv.org/api/VI/V/uHICg5zJQK0tpAPKZdmZ/02026-06-13T13:41:58Z309346015http://arxiv.org/abs/2504.20519v5Large Language Model Chatbot Conversations vs Public Health Materials and Parental HPV Vaccination Intentions: A Randomized Clinical Trial2026-06-09T13:02:32ZHealth care systems are increasingly considering large language model (LLM)-based chatbots for vaccine communication, but evidence that they improve durable, behaviorally relevant outcomes beyond existing health materials is limited. This randomized clinical trial tested whether brief, multiturn LLM chatbot interactions increased parental intention to vaccinate children against human papillomavirus (HPV) compared with no intervention and government public health materials, and whether effects persisted. Parents in the US, Canada, and UK were recruited online from March 3 to May 25, 2025, with follow-up at 15 and 45 days. Eligible participants were adults with at least one HPV vaccine-eligible child who was unvaccinated or whose vaccination status was unknown. Participants were randomized to no-message control, country-matched government materials with at least 3 minutes of exposure, or a 3-minute GPT-4o chatbot interaction using either a default persuasive style or a shorter conversational style. The primary outcome was self-reported likelihood of vaccinating the child against HPV within 12 months, measured immediately after intervention on a 0-100 scale. Follow-up outcomes included vaccination intent and self-reported vaccination at 15 and 45 days. In total, 1297 participants were randomized (mean age 42.84 years; 72.1% female). Compared with no intervention, public health materials increased immediate vaccination intent (Cohen d = 0.53; 95% CI, 0.36-0.70), as did the default chatbot (d = 0.48; 95% CI, 0.30-0.65) and conversational chatbot (d = 0.33; 95% CI, 0.17-0.49). At 45 days, neither chatbot increased intent relative to controls, whereas public health materials maintained modest effects. No intervention increased self-reported vaccination uptake. Findings suggest well-designed public health materials may match or exceed short LLM chatbot conversations for HPV vaccine promotion.2025-04-29T07:59:46ZJAMA Network Open 2026Neil K. R. SehgalSunny RaiManuel TonneauAnish K. AgarwalJoseph CappellaMelanie KornidesLyle UngarAlison ButtenheimSharath Chandra Guntuku10.1001/jamanetworkopen.2026.16822http://arxiv.org/abs/2606.10786v1Being and Time in XR: Other-Presentness Beyond Co-Presence2026-06-09T12:37:20ZResearch in XR (Extended Reality) has conventionally centred upon concepts such as Presence, Embodiment, Social Presence, and Co-presence. Within these traditions, bodily action, sensory contingencies, synchronous interaction, and possibilities for action have generally been regarded as constitutive conditions for the experience of "being there" and of being with others. XR environments, however, permit the partial separation of conditions that ordinarily co-vary in everyday experience. Bodily co-presence, temporal simultaneity, spatial configuration, and social interaction need not remain inseparable. This paper approaches this possibility as a problem of other-presentness. Other-presentness refers to the conditions under which another individual is experienced as existing "here and now". The contribution of this paper does not lie in arguing that asynchronous others can evoke social responses; such observations have already been addressed within parasocial interaction and social presence research. Rather, the novelty lies in theorising XR as a technological condition capable of separating and operationalising the constitutive elements of other-presentness as design variables. Reconsidering Bodyless Presence as a methodological precedent and drawing upon experimental findings from Immersive Video research, this paper formulates Bodyless Presentness as a condition in which another individual continues to be experienced as presently existing despite attenuated bodily co-presence and weakened real-time simultaneity.2026-06-09T12:37:20Z5 pages, 3 figuresKoichi Toidahttp://arxiv.org/abs/2606.10753v1Deploying Speech-Driven 3D Facial Animation in Unreal Engine for Production-Ready Digital Humans2026-06-09T12:03:42ZSpeech-driven 3D facial animation research has shown promising results, but most methods rely on representations that are not compatible with production pipelines. In this work, we present a deployable system that bridges this gap by enabling speech-driven 3D facial animation directly in Unreal Engine (UE) using ARKit-compatible representations. We construct 3DMEAD-ARKit dataset by converting the MEAD corpus into blendshape sequences using MediaPipe, and retrain FaceDiffuser and ProbTalk3D-X to generate stochastic and emotion controllable animations. We further develop a modular UE plugin with a Python backend that supports model selection, and parameter control. We compare the results to two existing commercial tools: Epic Games' MetaHuman speech-driven animator and Nvidia Audio2Face with a perceptual user study. The results highlight the importance of comparisons among academic and commercial pipelines. We recommend watching the supplementary video. We also plan to do live demonstrations of our work at Siggraph 2026 conference.2026-06-09T12:03:42Z11 pagesAlessandro BusacchiKazi Injamamul HaqueZerrin Yumak10.1145/3799825.3818695http://arxiv.org/abs/2605.12100v2HM-Req: A Framework for Embedding Values within CPS Human Monitoring Requirements2026-06-09T09:36:19ZMonitoring humans, for example, their movement or location, is essential for safe and efficient human-machine collaboration in Cyber-Physical Systems (CPS). This information allows CPS to ensure safety properties, adapt their behaviour dynamically, and coordinate with humans. To ensure that the design of a CPS respects ethical principles and the privacy of its stakeholders, system requirements, particularly those related to human monitoring, must reflect the human values of all involved stakeholders. However, human values are often underrepresented in Software Engineering -- particularly during requirements elicitation and system design, crucial phases when introducing ethically critical functionality. Stakeholder values are often implicit and conflicting, yet rarely systematically captured. Furthermore, unstructured natural language requirements introduce ambiguity and vagueness, complicating conflict resolution. To address these problems, we propose HM-Req, a requirements elicitation framework including a Controlled Natural Language (CNL) for defining human monitoring requirements. These requirements are then augmented with human values from relevant stakeholders and integrated into a Value Dashboard to detect potential conflicts that require further discussion and resolution. Validation results, applying the CNL to different datasets and conducting a survey and expert interview, provide evidence of the CNL's ability to capture diverse human monitoring requirements and demonstrate HM-Req's usefulness for requirements elicitation activities.2026-05-12T13:15:39ZAccepted Version for publication at the 34th IEEE International Requirements Engineering Conference (RE'26). 10+2 pagesZoe PfisterRuth BreuMichael Vierhauserhttp://arxiv.org/abs/2606.10627v1Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice2026-06-09T09:28:46ZThe quality of piano performance depends on nuanced timing, articulation, and dynamic control, but practice feedback is often summary-based and hard to act on. We introduce Profy, a weakly supervised system that learns from take-level labels derived from aggregated listener ratings (expert-labeled vs. amateur-labeled) to produce time-aligned highlights for review during piano practice. We collected synchronized 1 kHz key-motion and audio from 73 pianists and used 1,083 valid takes for modeling and evaluation. The model outputs clip-level predictions together with evidence scores on a shared resampled model time base for visualization. On 20 amateur clips from short technique studies annotated by 21 expert pianists, the displayed highlight score aligns with passages that expert pianists marked for review despite training without localized labels (Pearson r=0.61, ROC-AUC 0.75). Rather than summarizing a take with a single global score, Profy helps learners decide where to inspect next by supporting scrubbing, looping, and focused replay of time-localized passages associated with expert-amateur differences.2026-06-09T09:28:46ZDesigning Interactive Systems Conference (DIS '26), June 13-17, 2026, Singapore, SingaporeKazuki KawamuraFujiki NakamuraHayato NishiokaMomoko ShiokiShinichi FuruyaJun Rekimoto10.1145/3800645.3812903http://arxiv.org/abs/2605.06234v2RobotEQ: Transitioning from Passive Intelligence to Active Intelligence in Embodied AI2026-06-09T07:34:06ZEmbodied AI is a prominent research topic in both academia and industry. Current research centers on completing tasks based on explicit user instructions. However, for robots to integrate into human society, they must understand which actions are permissible and which are prohibited, even without explicit commands. We refer to the user-guided AI as passive intelligence and the unguided AI as active intelligence. This paper introduces RobotEQ, the first benchmark for active intelligence, aiming to assess whether existing models can comprehend and adhere to social norms in embodied scenarios. First, we construct RobotEQ-Data, a dataset consisting of 1,894 egocentric images, spanning 10 representative embodied categories and 56 subcategories. Through extensive manual annotation, we provide 4,944 action judgment questions and 1,157 spatial grounding questions, specifying appropriate robot actions across diverse scenarios. Furthermore, we establish RobotEQ-Bench to evaluate the performance of state-of-the-art models on this task. Experimental results demonstrate that current models still fall short in achieving reliable active intelligence, particularly in spatial grounding. Meanwhile, leveraging RAG techniques to incorporate external social norm knowledge bases can generally enhance performance. This work can facilitate the transition of robotics from user-guided passive manipulation to active social compliance.2026-05-07T13:22:26ZKuofei FangXinyi CheHaomin OuyangShufan ZhangXuehao WangQi LiuLiyi LiuChenqi ZhangWenxi CaiWenyu DaiJinyang WuFan ZhangHaoyu ChenBin HeZheng Lianhttp://arxiv.org/abs/2605.04254v3Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies2026-06-09T07:30:52ZWe introduce State Vector Space Partitioning (SVSP), a novel method to mimic a black box reinforcement learning policy using a set of human-interpretable subpolicies. By partitioning a distillation dataset of state action pairs with linear support vector machine splits, SVSP constructs a compact and structured representation of the original policy. Our method improves mean return by +7.4% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1%. Our results pave the path towards a more flexible form of distillation where both the decision boundary and surrogate models can be chosen within a margin of the original black box behavior.2026-05-05T19:40:05ZAccepted for poster presentation at HHAI 2026Senne DeproostMehrdad AsadiAnn Nowéhttp://arxiv.org/abs/2606.11269v1Traits Run Deeper: Trait-Specific Asymmetric Fusion for Personality Assessment2026-06-09T06:38:36ZPersonality assessment aims to infer stable personality traits from dynamic behaviors across language, voice, and facial cues. Since different personality dimensions are revealed through distinct behavioral perspectives, modeling trait-specific evidence is challenging. However, most existing approaches adopt a uniform multimodal fusion strategy across all dimensions, assuming identical modality contributions. This overlooks trait-specific modality preferences and introduces cross-modal interference. To address this issue, we propose a novel personality assessment framework called Traits Run Deeper, which consists of three components. Specifically, the Multimodal Foundation Representation (MFR) module constructs personality-oriented multimodal inputs and leverages psychology-informed semantic templates as anchors, enabling foundation models to capture trait-relevant information. Building upon MFR, the Trait-Specific Modality Fusion (TSMF) module acts as an asymmetric fusion mechanism, allowing each dimension to selectively exploit different modality pathways from modality-specific modeling to complementary fusion. Thus, TSMF captures heterogeneous modality preferences while reducing cross-modal contamination. Furthermore, the Distribution-Calibrated Personality Regression (DCPR) module mitigates label imbalance and central tendency bias through target distribution calibration, improving robustness and stability. Experimental results on the AVI Challenge 2026 validation set demonstrate the effectiveness of the proposed framework, reducing mean squared error (MSE) by approximately 25% compared with the baseline. Consistent improvements are observed on the official test set, where our method achieves the best performance and ranks first in the Personality Assessment Track. The source code will be made available at https://github.com/MSA-LMC/AVI2026.2026-06-09T06:38:36ZJia LiQian ChenWei WangXinyu LiZhenzhen HuDongsheng ShaoRichang HongMeng Wanghttp://arxiv.org/abs/2606.10434v1Profiling cognitive offloading in LLM-mediated synthesis writing: Volume vs. content2026-06-09T05:21:18ZThis study compares two approaches to profiling how learners offload cognitive activity to LLMs during a synthesis writing task. Drawing on Salomon's distributed cognition and the Kintsch and van Dijk model of text comprehension, the study operationalises offloading to an LLM in two ways: as a volume of LLM use and as content of what is offloaded, both along with prior knowledge. Data from 97 university students interacting with a general-purpose LLM via a custom interface were analysed using k-means clustering. To capture the content of offloading, their prompts were interpreted as to who performs the activity (active or passive) and at what level of comprehension (local or global). Volume-based profiling (k=4) differentiated learners primarily by prior knowledge, with volume negatively associated with essay authorship. Content-based profiling (k=5) revealed qualitatively distinct patterns of offloading, from vocabulary clarification to active direction of structuring and generation to passive delegation of comprehension at both levels. These patterns reflect different fragmentation of the cognitive process, with differences in learning strategies, behavioural markers, and essay authorship. Combining volume and content of offloading could improve future analyses on how LLM use redistributes cognitive activity and its effects on learners.2026-06-09T05:21:18ZAccepted to the Proceedings of the European Conference for Tecnology-Enhanced Learning' 2026Oleksandra PoquetMani Shankar NanduriMaria Ximena Salinas LoyerMatthias StadlerMichael SailerJelena Jovanovichttp://arxiv.org/abs/2606.10398v1Selection, Not Salience: The Shape and Limits of Personalization in Social Highlighting2026-06-09T04:18:08ZDoes personalizing what a reader sees pay off, and where does it stop? Using a social web highlighter and a co-readership identity control (the same document highlighted by many users, which holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does), we map the shape and limits of personalization across reading altitudes. At the document altitude we give the clean, leakage-free, identity-controlled measurement that prior next-document evaluations could only upper-bound: a person's history identifies which documents in a co-reading neighborhood are theirs, with an own-versus-other gap of +0.169 against community negatives and +0.119 against topic-matched hard negatives (both highly significant); a content-based arm suggests the signal is not purely title-driven but is largely thematic. This is comparable to the span-level selection signal (+0.14) from our prior work: the selection signal is of comparable magnitude across altitudes (+0.12 to +0.17), most of it stable topic preference. At the sentence altitude, a two-stage personalized auto-highlight (an impersonal model proposes candidates, a personal model re-ranks them) does not improve on its impersonal baseline: two off-the-shelf zero-shot LLMs, including a frontier model, predict highlight locations worse than a lead baseline, and personal re-ranking is beaten by the salience order even on the highest-recall candidate pool, so the null is not merely a Stage-1 ceiling artifact. Measurable personalization appears primarily at the selection layer: modest (~+0.13), topic-dominated, with no reliable gain at the salience layer. We also surface a control-in-negatives bias that inflated our document gap to a spurious +0.227 until audited. Going beyond the shared salience layer may be better approached by aggregating individuals than by personalizing them harder.2026-06-09T04:18:08Z9 pages, 1 figure, 3 tablesKazuki NakayashikiKeisuke Watanabehttp://arxiv.org/abs/2603.20511v2CARE: A Capability-Based Measurement Framework for Reproductive Equity in Human-AI Interaction2026-06-09T02:51:27ZAlgorithmic systems mediate sexual and reproductive health (SRH) information seeking. Standard HCI and AI evaluation centers usability, accuracy, and interaction quality, measures designed to assess task performance and interaction quality at the system level. We introduce CARE, the Capability Approach for Reproductive Equity, a measurement framework for human-AI interaction that adds capability outcomes as a unit of evaluation above task performance. CARE functions in two parts. The Normative Design Lens identifies the resources, conversion factors, capabilities, and functionings a system should support. The Evaluation lens assesses how design features, interaction patterns, and social conditions shape capability outcomes, tradeoffs, and lived experiences in use. We apply CARE to SRH-specific chatbots, general-purpose LLMs, and search engine features in a study with 12 participants, demonstrating that it surfaces capability outcomes standard metrics aggregate away. The same design features expanded capabilities for some users while constraining them for others: source-level organization, response format, tone, and SRH-specific features all shaped which capabilities expanded for which users and in which direction. Participants' professional backgrounds, gender identities, and prior AI familiarity further shaped these effects, producing capability outcomes that usability and accuracy metrics, aggregated across users, would not surface. These findings demonstrate capability outcomes as a measurable unit for human-AI interaction evaluation, extending existing metrics with a capability layer above task performance.2026-03-20T21:25:56ZAlice ZhongPhoebe ChenPunya AragulaAnika SharmaKandyce BrennanSnehalkumar 'Neil' S. Gaikwad10.1145/3772363.3799046http://arxiv.org/abs/2507.09788v3TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit2026-06-09T02:50:22ZRecent advances in Large Language Models (LLM) have led to a new class of autonomous agents, renewing and expanding interest in the area. LLM-powered Multiagent Systems (MAS) have thus emerged, both for assistive and simulation purposes, yet tools for realistic human behavior simulation -- with its distinctive challenges and opportunities -- remain underdeveloped. Existing MAS libraries and tools lack fine-grained persona specifications, population sampling facilities, experimentation support, and integrated validation, among other key capabilities, limiting their utility for behavioral studies, social simulation, and related applications. To address these deficiencies, in this work we introduce TinyTroupe, a simulation toolkit enabling detailed persona definitions (e.g., nationality, age, occupation, personality, beliefs, behaviors) and programmatic control via numerous LLM-driven mechanisms. This allows for the concise formulation of behavioral problems of practical interest, either at the individual or group level, and provides effective means for their solution. TinyTroupe's components are presented using representative working examples, such as brainstorming and market research sessions, thereby simultaneously clarifying their purpose and demonstrating their usefulness. Quantitative and qualitative evaluations of selected aspects are also provided, including preliminary experiments with real human behavior as control. Results highlight possibilities, limitations, and trade-offs. The approach, though realized as a specific Python implementation, is meant as a novel conceptual contribution, which can be partially or fully incorporated in other contexts. The library is available as open source at https://github.com/microsoft/tinytroupe.2025-07-13T21:00:27Z9 pagesPaulo SalemRobert SimChristopher OlsenPrerit SaxenaRafael BarcelosYi Dinghttp://arxiv.org/abs/2606.10325v1Design and Implementation of a Real-time Multi-site Immersive Learning System Using Photon Fusion2026-06-09T02:19:55ZIn this paper, we develop a Virtual Reality-based immersive learning environment that allows teachers to conduct a lesson in a virtual space using Photon Fusion. The proposed system allows teachers and students to be present in the same virtual space regardless of their actual physical locations. The teachers can verbally communicate with students in real-time, interacting with 3D learning materials. By adopting Photon Fusion, the system achieves stable real-time communication and synchronization among multiple players. Evaluation results demonstrate that the proposed system provides stable communication performance, good usability, and minimal VR sickness, confirming its effectiveness as an immersive learning environment.2026-06-09T02:19:55ZIwai WataruDuc V. Nguyenhttp://arxiv.org/abs/2606.10182v1Creativity in the BioFoundry: Supporting scientific creativity in the age of automation2026-06-08T21:17:53ZBiofoundries automate biological experimentation at unprecedented scale, promising speed, reproducibility, and access. Yet automation also reshapes how scientists experience experimentation and creativity. Through in-depth interviews with nine scientists and experts across academia and industry (including biofoundry developers, automation engineers, and end-users), we examine how scientific creativity is enacted under automation. Biofoundries displace sensory cues, redistribute responsibility between humans and machines, and transform troubleshooting from an embodied, local practice into a predictive, social, and interpretive one. Rather than framing biofoundries as automation factories, we argue that they should be understood as Creativity Support Tools, whose design directly shapes how researchers notice breakdowns, exercise judgment, learn from failure, and progress through success. By connecting biofoundry practice with prior HCI work on automation, debugging, and distributed creativity, this paper demonstrates biofoundries as a distinctive and timely site for creativity research in science.2026-06-08T21:17:53Z13 pages, 6 figures, 2 tables, ACM Creativity and Cognition Conference 2026Mingyan Claire TianSarah Sterman10.1145/3803784.3807549http://arxiv.org/abs/2606.10180v1Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs2026-06-08T21:16:37ZWe introduce flow control of vision-language-action (VLA) models, a simple and effective way to steer VLA actions in real-time through generic inputs, such as a keyboard. This method can be used out-of-the-box and does not require retraining or fine-tuning VLAs. It enables relatively crude user inputs to steer a VLA to align with user intent. The VLA transforms these inputs into action samples drawn from the VLA expert action distribution learned during training, so that the generated actions are high quality (conformity to the action expert distribution) and high fidelity (reflecting the user's intent). We demonstrate that flow control has many desirable properties: (1) flow control accurately and responsively steers robot actions with user inputs, (2) it is robust to suboptimal user inputs, (3) it enables users to steer VLAs to achieve significantly higher success rates and faster task completion, and (4) fine-tuning a VLA on flow control trajectories improves the autonomous policy. Together, these results provide a simple and intuitive way for users to help steer VLA actions, increasing task performance.2026-06-08T21:16:37Z10 pages, 5 figuresJonathan C. KaoJason ChanAndy Wang