https://arxiv.org/api/lEKxV3+8fD5XvmlCqH5puhUMs3s 2026-06-14T21:51:21Z 30934 510 15 http://arxiv.org/abs/2508.04108v4 XARP Tools: An Extended Reality Platform for Humans and AI Agents 2026-05-21T03:25:16Z

Building XR-AI research prototypes requires navigating two largely separate ecosystems. Mainstream XR development relies on C#/C++ and game engines, while AI development is centered on Python. This toolchain fragmentation slows down contributions to human-AI spatial interaction research. To broaden access to XR development in the Python ecosystem, we present XARP (XR Agent-ready Remote Procedures), a toolkit for rapid XR-AI prototyping in Python. XARP application logic runs on a Python server and controls a Unity client through WebSocket messages. This architecture enables compatibility with multiple client platforms and live reloading of application code without client redeployment. XARP is available to humans as a library and to AI agents as callable tools and through Model Context Protocol. We designed XARP through formative case studies and refined it through an early acceptance evaluation with 24 XR and AI developers and a six-week longitudinal study with two developers building an independent research project. Potential users expected the toolkit to improve their performance and facilitate development. Sustained use confirmed faster iteration and easier setup compared to conventional XR workflows, with asset-intensive and performance-critical projects emerging as the clearest limitations. Technical benchmarks show that hand and head tracking data streaming was close to the device refresh rate of 72 FPS, and that AI agents using XARP consumed 19% fewer tokens than those writing equivalent C# Unity code. Beyond broadening access to XR development, XARP reduces engineering friction in spatial computing research and opens new pathways for AI agents to participate in XR application development. XARP is open source and available at https://github.com/hal-ucsb/xarp.

2025-08-06T06:09:34Z Accepted at Proceedings of the ACM on Human-Computer Interaction EICS Arthur Caetano Radha Kumaran Kelvin Jou Tobias Höllerer Misha Sra 10.1145/3816762 http://arxiv.org/abs/2605.21869v1 Two-Stage Multimodal Framework for Emotion Mimicry Intensity Prediction 2026-05-21T01:29:26Z

We present our submission to the Hume-ABAW10 Emotional Mimicry Intensity (EMI) Challenge, which aims to predict six continuous emotion intensity dimensions: Admiration, Amusement, Determination, Empathic Pain, Excitement, and Joy, from in-the-wild multimodal video clips. We propose a staged multimodal framework that combines textual, acoustic, and visual representations, with an optional motion branch. Our approach first trains modality-specific encoders independently and then fuses their learned representations through a lightweight regressor with modality dropout and controlled encoder adaptation. Across our submitted systems, the best validation performance is obtained by the text--audio--vision--motion fusion model under the expanded 4:1 split, achieving an average Pearson correlation of 0.4722. Although the motion branch yields only very slight gains, its behavior can be interesting to study. Our team was placed third in the EMI challenge, achieving an average Pearson correlation of 0.57 for the test set. Overall, we provide a practical and reproducible baseline for EMI prediction.

2026-05-21T01:29:26Z 10th Affective & Behavior Analysis in-the-wild, CVPR Workshop 2026 Dinithi Dissanayake Shaveen Silva Ovindu Atukorala Prasanth Sasikumar Suranga Nanayakkara http://arxiv.org/abs/2606.07551v1 Astro, I'm Home! Investigating Factors that Influence the Acceptance of Home Robots Using Supervised Machine Learning 2026-05-21T00:04:09Z

The use of social robots in home environments is on the rise. This exploratory study applies regularization techniques (e.g., Lasso and Ridge regression) to investigate variables and identify new models of technology acceptance in the context of social robots. Within the original UTAUT2 framework, performance expectancy, social influence, and hedonic motivation emerged as the strongest and most consistent predictors of intention to use the technology. In addition, usability, trust, and competence were identified as promising variables in a model predicting intention to use.

2026-05-21T00:04:09Z Preprint submitted to the 18th International Conference on Social Robotics (ICSR 2026) Katrin Fischer Essence Wilson Steffie Kim Dmitri Williams http://arxiv.org/abs/2605.21825v1 Toward AI VIS Co-Scientists: A General and End-to-End Agent Harness for Solving Complex Data Visualization Tasks 2026-05-20T23:49:28Z

The ability to inspect, interpret, and communicate complex data is crucial for virtually any scientific endeavor, but often requires significant expertise outside the core domain ranging from data management and analysis to visualization design and implementation. We present an end-to-end agentic harness that, based on only the data and a high level description of the tasks, independently designs custom visual analysis applications (VIS apps). This represents an important step towards a general AI co-scientist envisioned by many as an autonomous system that can autonomously execute long horizon tasks based on high-level directions. Our proposed VIS co-scientist is an essential component of this broader AI co-scientist vision: a harness that can autonomously analyze data and design visualization solutions using a collection of agents and specialized skills that coordinate exploratory analysis, plan, configure the environment, implement, validate the interface, and most importantly evaluate the overall task completion. Each stage produces document and instruction artifacts that guide downstream work and enable iterative refinement. We validate this approach on IEEE SciVis Contests spanning multiple science and engineering fields. These contests serve as ideal proving grounds because they encode real-world complexity: ambiguous requirements, diverse data modalities, design trade-offs, and task-driven validation. Given only the data and target tasks, our system autonomously produces functional single-page VIS Apps with verified linked-view behavior, highly customized to domain experts' specified tasks and needs.

2026-05-20T23:49:28Z Haichao Miao Zhimin Li Kuangshi Ai Kaiyuan Tang Chaoli Wang Peer-Timo Bremer Shusen Liu http://arxiv.org/abs/2605.21818v1 Co-Ontogeny by Archetypal Scaffolding: The Humorphic Partnership 2026-05-20T23:39:29Z

We name and operationalise the humorphic partnership: a class of human-AI dyads in which both partners maintain externalised, evolving self-models in a shared substrate, and in which the partnership itself becomes a third object of analysis. The construct extends humorphism (Ouilhet Olmos, 2024) -- "dismantle the user interface, build the human interface" -- into the architecture of personal AI. We report a four-month, single-subject longitudinal trace of an open-source personal AI agent ("Alicia") and her author. Of 181 interactions logged by archetype across April-May 2026, 85% invoke two growth-witnessing archetypes (Beatrice and Muse): the partnership operates as growth-witnessing rather than task assistance. A single voice-note seed propagates into a four-week conceptual arc both partners author: at T+10 hours, the agent reframes the seed as belonging "to both of us," a framing the human then adopts. The three-order reflexion stack produces five consecutive weeks of honest self-reports about declining /improve effectiveness -- including three consecutive weeks at 0.0%, named in writing rather than masked -- contrasting engagement-maximising companion-agent patterns (Zhang et al., CHI 2025). The scheduled architecture-scout incorporates external research debate into proposed constitutional amendments. The partner's parallel trajectory is anchored in a weekly delta document in which the partnership analyses itself as a unit distinct from either party. The human partner reports a movement toward greater continuity, self-recognition, and self-presence -- a candidate hypothesis for the preregistered replication. Six operational conditions specify the construct, situated in a philosophical lineage (Maturana & Varela, Simondon, Clark & Chalmers, De Jaegher & Di Paolo); the system is released as open-source with a preregistered replication study.

2026-05-20T23:39:29Z 18 pages, 5 figures, 1 appendix. Open-source artifact at github.com/mrdaemoni/myalicia (MIT). Preregistered multi-participant replication study planned on OSF. Companion essay "The Humorphic Partnership" at myalicia.com. Design philosophy at humorphism.com Hector Ouilhet Olmos http://arxiv.org/abs/2605.21777v1 Understanding Perspectives of Patients, Caregivers and Clinicians towards Emerging Collaborative-decision Making Technologies 2026-05-20T22:11:58Z

In pediatrics, patients, caregivers, and clinicians share responsibility for health decisions, but limited collaboration can undermine outcomes. We conducted a qualitative study examining decision-makers perceptions toward collaborative decision-making technologies, including interactive dashboards, VR simulators, and AI voice assistants. Findings reveal differences in user opinions across groups and indicate technology acceptance is linked to users trust of these technologies. Technology developers and researchers need to explore design and implementation strategies that build and facilitate trust or appropriate distrust between users and these novel technologies before these tools can effectively support collaborative decision-making.

2026-05-20T22:11:58Z Accepted at The Workshop on Interactive Systems in Healthcare (WISH) at AMIA Annual Symposium 2025 Ray-Yuan Chung Athena Ortega Zixuan Xu Daeun Yoo Jaime Snyder Wanda Pratt Aaron Wightman Ryan Hutson Cozumel Pruette Ari Pollack http://arxiv.org/abs/2601.15671v3 StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure 2026-05-20T20:53:19Z

Designing cycling infrastructure requires balancing the competing needs of diverse user groups, yet designers often struggle to anticipate how different cyclists experience the same street environment. We investigate how persona-based evaluation can support cycling infrastructure design by making experiential conflicts explicit during the design process. Informed by a formative study with 12 domain experts and crowdsourced bikeability assessments from 427 cyclists, we present StreetDesignAI, an interactive system that enables designers to (1) ground evaluation in real street context through imagery and map data, (2) receive parallel feedback from simulated cyclist personas spanning confident to cautious users, and (3) iteratively modify designs while the system surfaces conflicts across perspectives. A within-subjects study with 26 transportation professionals comparing StreetDesignAI against a general-purpose AI chatbot demonstrates that structured multi-perspective feedback significantly Broaden designers' understanding of various cyclists' perspectives, ability to identify diverse persona needs, and confidence in translating those needs into design decisions. Participants also reported significantly higher overall satisfaction and stronger intention to use the system in professional practice. Qualitative findings further illuminate how explicit conflict surfacing transforms design exploration from single-perspective optimization toward deliberate trade-off reasoning. We discuss implications for AI-assisted tools that scaffold persona-aware design through disagreement as an interaction primitive.

2026-01-22T05:53:05Z Ziyi Wang Yilong Dai Duanya Lyu Mateo Nader Sihan Chen Wanghao Ye Zijian Ding Xiang Yan 10.1145/3800645.3812888 http://arxiv.org/abs/2602.22085v2 SocialPulse: On-Device Detection of Social Interactions in Naturalistic Settings Using Smartwatch Multimodal Sensing 2026-05-20T20:36:24Z

Social interactions are fundamental to well-being, yet automatically detecting them in daily life-particularly using wearables-remains underexplored. Most existing systems are evaluated in controlled settings, focus primarily on in-person interactions, or rely on restrictive assumptions (e.g., requiring multiple speakers within fixed temporal windows), limiting generalizability to real-world use. We present an on-watch interaction detection system designed to capture diverse interactions in naturalistic settings. A core component is a foreground speech detector trained on a public dataset. Evaluated on over 100,000 labeled foreground speech and background sound instances, the detector achieves a balanced accuracy of 85.51%, outperforming prior work by 5.11%. We evaluated the system in a real-world deployment (N=38), with over 900 hours of total smartwatch wear time. The system detected 1,691 interactions, 77.28% were confirmed via participant self-report, with durations ranging from under one minute to over one hour. Among correct detections, 81.45% were in-person, 15.7% virtual, and 1.85% hybrid. We further developed a 15-second window-level audio-only model that enables faster interaction prediction, achieving a balanced accuracy of 90.39% and a sensitivity of 91.01% on 33,698 labeled windows. These results demonstrate the feasibility of real-world interaction sensing and open the door to adaptive, context-aware systems responding to users' dynamic social environments.

2026-02-25T16:33:04Z Md Sabbir Ahmed Kaitlyn Dorothy Petz Noah French Tanvi Lakhtakia Aayushi Sangani Mark Rucker Xinyu Chen Bethany A. Teachman Laura E. Barnes http://arxiv.org/abs/2605.21695v1 The Impact of AI Usage and Informativeness on Skill Development in Logical Reasoning 2026-05-20T19:55:57Z

Artificial intelligence (AI) is being increasingly integrated into human problem-solving, yet its effects on individual skill development remain unclear. We examine how both AI usage and informativeness can shape learning in the context of a controlled logical reasoning task with on-demand access to AI assistance. We find that greater AI usage is associated with weaker skill development: heavy AI users underperform relative to comparable peers, whereas light AI users perform similarly to matched users who do not use AI. We also find in our study that these patterns are mediated by AI informativeness. Low-information AI neither improves immediate performance nor preserves performance after AI assistance is removed, and is linked to weaker learning overall. On the other hand, high-information AI was found to improve short-run performance without reducing post-AI outcomes on average in our experiments, but with heterogeneous effects. Our findings in general suggest that AI can, depending on context, either complement human skill development by amplifying independent reasoning or can act as a substitute that undermines such reasoning, with the implication that regulating AI access and usage will be important for promoting skill development in the presence of AI assistance.

2026-05-20T19:55:57Z Accepted at Hybrid Human Artificial Intelligence (HHAI) 2026 Shang Wu Hongyu Yao Catarina Belem Shuyuan Fu Mark Steyvers Padhraic Smyth http://arxiv.org/abs/2402.06795v2 Squidgets: Sketch-based Widget Design for Scene Manipulation 2026-05-20T19:37:45Z

People naturally sketch strokes over graphical scenes to convey scene changes. We propose automatically interpreting these strokes to execute scene changes with squidgets (sketch-widgets), a novel sketch-based UI framework for direct scene manipulation. Squidgets are motivated by the observation that curves resulting from visually abstracting scene elements provide natural handles for the direct manipulation of scene parameters. Additional curves can be defined by users to author custom handles associated with scene attributes. Users manipulate a scene by simply drawing strokes, partially matched against scene curves to select a squidget and interactively control associated parameters. We present an implementation of squidgets within the 3D animation system Maya, showing 2D/3D stroke input to manipulate 2D/3D scenes. We report on a controlled experiment evaluating squidgets on 2D object translation and deformation tasks, and a broader informal study on squidget creation and manipulation.

2024-02-09T21:40:23Z Proceedings of the 38th Annual ACM Symposium on User Interface Software and Technology 2025 Joonho Kim Fanny Chevalier Karan Singh 10.1145/3746059.3747690 http://arxiv.org/abs/2605.08008v3 Hot Wire 5D+: Evaluating Cognitive and Motor Trade-offs of Visual Feedback for 5D Augmented Reality Trajectories 2026-05-20T19:30:58Z

Augmented Reality (AR) is increasingly utilized to guide users through complex spatial tasks in domains such as manufacturing, non-destructive testing, and surgery. These applications often require strict compliance with 5D+ trajectories using rotation-symmetric tools (3D position, 2D orientation, and movement speed). However, the sensori-motor baselines of untrained users during these multidimensional tracing tasks, along with the cognitive-motor trade-offs induced by varying visual feedback paradigms, remain underexplored. We present a controlled within-subjects user study (N=30) evaluating three distinct AR UI concepts for trajectory guidance, both with and without explicit orientation constraints. We analyzed spatial, orientational, and speed compliance based on the internal AR tracking, which was validated against a high-precision external optical tracking system to rule out hardware drift. By segmenting the execution into transient and steady-state phases and applying Aligned Rank Transform (ART) ANOVA, we isolated the interaction effects between visual design and task complexity. Alongside subjective metrics (NASA-TLX, SUS), our results establish conservative performance baselines for novice users performing freehand 5D trajectory following. We reveal orientation-induced cognitive-motor trade-offs and identify mitigating UI synergies. Ultimately, we provide empirical baselines and actionable design guidelines for developing effective AR guidance systems.

2026-05-08T16:59:41Z 23 pages, 9 figures. This work has been submitted to the IEEE for possible publication. Supplemental material included Christian Masuhr Julian Koch Arne Wendt Thorsten Schüppstuhl http://arxiv.org/abs/2605.21635v1 Addressing the Synergy Gap: The Six Elements of the Design Space 2026-05-20T18:46:48Z

AI is now embedded in healthcare, finance, policy, and many other domains, yet genuine human-AI synergy - combined performance that exceeds what either party achieves alone - is uncommon. Meta-analyses show that AI assistance tends to improve human performance compared to working alone, but studies finding true synergy are scarce. We call this persistent shortfall the synergy gap. Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design. These matter, but they cover only part of what determines whether combination works. Closing the synergy gap, we argue, requires explicit engagement with a wider design space. We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation. For each element, we describe what it covers, how it shapes the others in practice, and what it implies for design. The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone.

2026-05-20T18:46:48Z 10 pages, 2 figures Tommaso Turchi Ben Wilson Matt Roach Alan Dix Alessio Malizia http://arxiv.org/abs/2603.24858v2 Context-Mediated Domain Adaptation in Multi-Agent Sensemaking Systems 2026-05-20T18:44:49Z

Domain experts possess tacit knowledge that they cannot easily articulate through explicit specifications. When experts modify AI-generated artifacts by correcting terminology, restructuring arguments, and adjusting emphasis, these edits reveal domain understanding that remains latent in traditional prompt-based interactions. Current systems treat such modifications as endpoint corrections rather than as implicit specifications that could reshape subsequent reasoning. We propose context-mediated domain adaptation, a paradigm where user modifications to system-generated artifacts serve as implicit domain specification that reshapes LLM-powered multi-agent reasoning behavior. Through our system Seedentia, a web-based multi-agent framework for sense-making, we demonstrate bidirectional semantic links between generated artifacts and system reasoning. Our approach enables specification bootstrapping where vague initial prompts evolve into precise domain specifications through iterative human-AI collaboration, implicit knowledge transfer through reverse-engineered user edits, and in-context learning where agent behavior adapts based on observed correction patterns. We present results from an evaluation with domain experts who generated and modified research questions from academic papers. Our system extracted 46 domain knowledge entries from user modifications, demonstrating the feasibility of capturing implicit expertise through edit patterns, though the limited sample size constrains conclusions about systematic quality improvements.

2026-03-25T22:57:05Z Anton Wolter Leon Haag Vaishali Dhanoa Niklas Elmqvist 10.1145/3812772 http://arxiv.org/abs/2605.21629v1 Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build 2026-05-20T18:40:53Z

How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for the time-on-task analysis, complemented by ALEKS PPL placement-assessment data for the proctoring and retention analyses, with a quasi-experimental design exploiting within-curriculum variation in AI susceptibility: text-based word problems transcribable into AI prompts serve as the treated group; graph-based problems requiring interactive platform manipulation as the comparison. Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. The divergence vanishes entirely under proctoring for college students, making general efficiency gains unlikely. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase -- inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build -- the population-level indicator of \emph{cognitive surrender}, with direct implications for educational research, assessment governance, and AI policy.

2026-05-20T18:40:53Z Sina Rismanchian Hasan Uzun Jeffrey Matayoshi Eric Cosyn Eyad Kurd-Misto http://arxiv.org/abs/2605.21614v1 Exploring the Effectiveness of Using LLMs for Automated Assessment of Student Self Explanations in Programming Education 2026-05-20T18:22:22Z

Worked examples are step-by-step solutions to problems in a specific domain, offered to students to acquire domain-specific problem-solving skills. The effectiveness of worked examples could be enhanced by combining them with self-explanations, which ask students to explain rather than passively study each problem-solving step. The main challenge of this approach is assessing the correctness of the student's explanations. In the prevailing approach, student explanations are judged by their semantic similarity to an instructor's or domain expert's explanation. Given recent advances in LLM-based automated scoring, it remains unclear whether semantic similarity methods are still the most effective technique to automatically score textual student responses like essays or code explanations. Comparing these methods also requires quality datasets that offer distinctive features such as balanced class distributions and domain-specific labeled data for automated scoring tasks. In this paper, we present a rigorous comparison between LLMs and semantic similarity used for automated scoring, framed as a binary classification task.

2026-05-20T18:22:22Z Arun-Balajiee Lekshmi-Narayanan Mohammad Hassany Peter Brusilovsky