https://arxiv.org/api/3RAVMEdQhZppVR6xzTlBr+EG18s2026-06-09T21:32:12Z308511515http://arxiv.org/abs/2606.09171v1sketch-plot: Progressive Editing for Text-to-Image Academic Figures2026-06-08T08:08:36ZText to image (T2I) models such as gpt-image-2 can now generate publication grade academic figures from a short prompt, but the output is a flat raster: a user who wants to change one arrow, one label, or one icon has to regenerate the whole image, which also disturbs the parts they wanted to keep. We present sketch-plot, an interactive system that closes this controllability gap with a three layer progressive editing pipeline: a generated PNG, an addressable puzzle of editable pieces, and a per piece SVG. The user stops at the layer that gives them enough control for the change at hand, so the cost of decomposition and vectorisation is paid only on the pieces that need it. Realising this pipeline is not trivial. General segmentation models lack the semantic discriminability to decompose a research figure cleanly, and end to end image vectorisation produces incomplete shapes and loses semantic structure. We therefore route both stages through a human in the loop interface that lets the user accept, refine, or reject decomposition and vectorisation decisions on a piece by piece basis. We validate the design with an expert user study, in which participants found sketch-plot effective for making targeted edits to AI generated academic figures and preferred it over regenerating the whole image. A demonstration video is available at https://anonymous.4open.science/r/SketchPlotVideo/.2026-06-08T08:08:36Z5 pages, 3 figures. Demonstration paperYinghao TangYupeng XieYingchaojie FengTingfeng LanWei Chenhttp://arxiv.org/abs/2606.09024v1Personal Salience: Highlighting Is Social, but Individuality Lives in Selection2026-06-08T04:44:51ZSocial highlighters let people mark passages that matter to them. We ask how much of an individual is recoverable from these naturalistic traces, using a co-readership identity control (the same document highlighted by many users) that holds document and topic fixed and asks whether a person's own history predicts their marks better than another reader's does. We separate generic salience (structure), crowd salience (what others marked), and personal salience (the individual residual). First, highlighting is social: which sentences you mark is predicted far better by the crowd than by structure or by a personal model, and even a well-estimated crowd, an information-privileged baseline that sees others' marks on the same document, beats a frontier LLM twin built from your other-document history; the within-document personal signal is at most a whisper (own-vs-other gap +0.017 by an embedding scorer, small but significant). Second, in sharp contrast, individuality lives in selection: asked which of the already-salient passages are yours, your own history is a strong, leakage-free predictor (gap +0.14). A topic decomposition shows this is largely stable thematic preference: it shrinks ~6-8x against a topically-matched peer, and a thin residual cannot be separated from finer topic. The non-obvious part is an asymmetry: under the same scorer the individual signal is ~6-8x weaker in salience than in selection. Methodologically, naive history-conditioning evaluations leak (the target's own marks enter the profile in ~42% of pairs, inflating personal scores by up to +0.15 AP) and small crowds overstate personalization; our results are leakage-free, use a dense crowd, and a model-matched control. Highlights carry a genuine individual signature, but a thin layer over a strong shared one, surfacing far more in which salient things a person selects than in what is salient.2026-06-08T04:44:51Z12 pages, 5 figures, 2 tablesKazuki NakayashikiKeisuke Watanabehttp://arxiv.org/abs/2606.08965v1Before You Scroll Again: Predicting Regretful Social Media Sessions from In-the-Wild Contextual and Wearable Sensing2026-06-08T03:04:29ZUsers often feel regret after using social media, making regret a more ecologically valid target than screen time for understanding when phone use becomes problematic. Existing self-monitoring tools cannot anticipate regret before it occurs, and prior physiological work on social media use has been confined to the lab with research-grade sensors and curated content, leaving the question of in-the-wild prediction open. We deployed a 7-day in-the-wild experience sampling study with 21 participants, combining passive smartphone logging, a low-cost consumer smartwatch (Bangle.js 2, \$80), session-level surveys (1,445 sessions), and exit interviews to investigate when and why social media sessions become regretful, and whether regret can be anticipated before a session begins. Three findings stand out: (i) the gap between intended and actual use predicts regret far more strongly than session duration, with duration's apparent effect collapsing once intention is modeled; (ii) regret is amplified when sessions displace a valued alternative, particularly at night and following productivity-app use; and (iii) pre-session contextual features generalize across participants while physiological signals add person-specific lift, pointing toward a two-layer architecture for just-in-time adaptive interventions. Interview themes of scrolling-as-avoidance and time blindness contextualize these patterns and surface design opportunities beyond timer-based interventions.2026-06-08T03:04:29ZSally AhmedJan EnkmannKye ShimizuIvy YipVincent BeermannAyse AlomarFalk UebernickelPattie Maeshttp://arxiv.org/abs/2603.13679v2Toward Scalable Co-located Practical Learning: Assisting with Computer Vision and Multimodal Analytics2026-06-08T02:34:07ZCo-located practical learning leaves evidence in visible actions around patients, task resources and room zones, but these traces are often recovered through live observation or retrospective video review. Fixed wide-angle video could reduce sensing burden, yet a debriefing pipeline must do more than detect behaviours: it must maintain detection after small camera-position shifts, relate the detector-derived behaviour trace to instructor-labelled outcomes and preserve room-zone context. This study evaluates a fixed-camera pipeline in repeated nursing simulation. Using a harmonised six-code taxonomy, we tested YOLO26 target-only training and two-stage source-to-target adaptation across two same-room side-view data sources. We then converted detections from 51 instructor-labelled sessions into one-second behaviour and behaviour-zone traces for rate, ordered-network, transition-network and sequence analyses.
Two-stage adaptation improved mean mAP50 from 0.815 to 0.848 for the 2021 target view and from 0.690 to 0.855 for the smaller 2022 target view; with a balanced target quota of \(N = 22\), the 2022 model reached 0.850 mAP50. In the detector-derived behaviour trace analyses, higher phone use characterised low task-performance sessions. Zone labels changed the interpretation of patient interaction: primary patient-care-zone interaction was stronger in higher-performance sessions, while secondary-zone interaction was stronger in lower-performance sessions. Ordered and transition network models showed that ordered room-zone relations contributed beyond behaviour frequency, with the strongest task-performance classifier using zoned and co-presence features. The resulting trace is most appropriate for searchable simulation debriefing, where instructors inspect detected moments rather than receive automated assessment scores.2026-03-14T01:04:58ZXinyu LiLinxuan ZhaoYueqiao JinYuchen LiuJin ZhouRoberto Martinez-MaldonadoDragan GasevicLixiang Yanhttp://arxiv.org/abs/2606.08936v1Report on CHIIR 2026 Workshop on Generative AI and Academic Search (GAI&AS)2026-06-08T02:31:14ZThis report summarizes the CHIIR 2026 Workshop on Generative AI and Academic Search (GAI\&AS), which examined how GenAI is reshaping academic search systems and research practices. The workshop brought together researchers in human information interaction and information retrieval to explore key challenges and opportunities in designing and evaluating future academic search systems that integrate GenAI, moving beyond traditional document retrieval to support summarization, recommendation, synthesis, and conversational interaction. Participants' interests and discussions focused on three thematic clusters: foundations and principles, applications and opportunities, and search-as-learning. Across these themes, the workshop highlighted the importance of academic search systems in supporting transparency, credibility, research integrity, and long-term scholarly needs, as well as in fostering higher-order cognitive processes. Participants discussed guiding theories, design principles, methodological approaches, partnerships, and community-building efforts aimed at advancing human-centered GenAI-enhanced academic search systems. Overall, the workshop demonstrated strong community interest and a diverse range of ongoing and emerging research initiatives at the intersection of GenAI and academic search.2026-06-08T02:31:14ZYifan LiuKlaraJaime ArguelloKlaraOrland HoeberKlaraChang LiuKlaraSoo Young RiehKlaraLuanne SinnamonKlaraDean AlvarezKlaraSusan ArchambaultKlaraRob CapraKlaraHenson ChenKlaraCharles CostaKlaraAnita CrescenziKlara ZhitongKlara GuanJacek GwizdkaPao-Pei HuangGavindya JayawardenaGhazal KalhorDagmar KernOliver KoopAlice LiAfra MashhadiGaohui MengMarta MicheliAnil B. MurthyKevin SchottSebastian SchultheißJiwoo SeoPhaneendra SivangulaFrans van der SluisXiaoxuan SongSilang WangDan Zhanghttp://arxiv.org/abs/2606.08927v1In-Situ Immersive Analytics Authoring through Ergonomic Keyboard Support2026-06-08T02:09:11ZImmersive analytics uses augmented reality (AR) to integrate data analysis and authoring within physical environments. However, extensive text entry required for immersive analytics authoring remains a fundamental challenge in AR, as popular natural user interfaces often hinder expressive input. This paper presents the Body-Supported Keyboard (BSK), an ergonomic system that allows the mobile use of a Bluetooth keyboard in AR. We conducted a controlled study with 20 participants to compare the BSK with a standing desk during text transcription and a mobile AR scenario. The results showed slightly higher error rates but comparable task completion times. Participants reported comfort improvements during mobile use and positive usability ratings (mean SUS = 74.5). The BSK allows users to move freely and maintain stable postures while authoring in AR. In general, the findings show evidence of the potential for body-supported input to enhance expressive and ergonomic workflows in immersive analytics and emphasize the importance of comfort and mobility in the design of AR authoring tools.2026-06-08T02:09:11Z31 pages, 7 tables, 5 figuresInternational Journal of Human-Computer Interaction, 1-27. 2026Leonel MerinoBegoña Juliá-NehmeSantiago Viana10.1080/10447318.2026.2676765http://arxiv.org/abs/2606.08914v1Vibe Visualizing: How Visualization Novices Try (and Fail) to Generate and Interpret Visualizations with Conversational AI2026-06-08T01:29:22ZConversational AI has enabled users to generate and interpret visualizations through natural language, significantly lowering the technical barrier to entry. The increased accessibility brings visualization novices into data visualization, but also exposes them to misinformation and misinterpretations. We are motivated to examine what issues can arise in interactions with current conversational AI, whether visualization novices can recognize such issues, and how they respond to them. To examine these questions, we conducted a user study on ChatGPT with 20 visualization novices, collecting their conversation logs, semi-structured interview transcripts, and Likert-scale questionnaire responses. Through thematic analysis, we developed a codebook that covers AI execution compliance, issues of AI-generated visualizations, patterns of AI responses, and prompting patterns of users. We summarized four themes, including the quality of outcomes, recurring errors from ChatGPT, misuse by users, factors that affect user trust, confidence, and verification behavior, and human-AI collaboration dynamics. To demonstrate the generalizability of our codebook and findings, we replayed the initial user prompts on Gemini and Claude and compared the outcomes, which revealed distinct failure modes for each model. Based on the results of all analyses, we derive a set of design recommendations for future AI-assisted visualization systems. We conclude with discussions on literacy gaps, diverse human-AI collaboration dynamics, and implications for agentic visualization.2026-06-08T01:29:22ZSam Yu-Te LeeYun-Hsin KuoChifang ChouMatthew WardXiwei XuanKwan-Liu Mahttp://arxiv.org/abs/2606.08912v1Enhancing Presence, Deepening Fan Intensity: How Presence in Immersive Video Shapes Psychological Closeness to Performers2026-06-08T01:26:23ZImmersive video differs from conventional flat 2D video in that it is experienced as 180-degree stereoscopic video on a head-mounted display, thereby eliciting bodily and spatial subjective experience. Previous studies have shown that viewing and interpersonal distance affect Presence; however, it remains insufficiently understood how Presence differences are related to psychological closeness to content. In the present study, we examined whether differences in Presence could increase viewers' psychological closeness to performers within the content. This psychological closeness was operationally defined as fan intensity. Specifically, a live performance by a Japanese idol group was recorded as 180-degree immersive video, and a high-Presence condition (1.2 m) and a low-Presence condition (7.6 m) were established by manipulating filming distance. Twenty-four participants with different levels of prior involvement, comprising Avid fans and Casual fans, experienced both conditions in a counterbalanced within-participants design. Fan intensity was measured before and after the experience as perceived psychological overlap between the self and the performers. The results showed that, compared with the low-Presence condition, the high-Presence condition significantly increased all Presence-related measures except the Slater-Usoh-Steed questionnaire, with the largest condition differences observed for Possible Actions, Social Presence, and Observability. Moreover, a mixed analysis of variance on changes in fan intensity revealed a significant main effect of Presence condition, indicating that the high-Presence video produced a greater increase in fan intensity than the low-Presence video. These findings suggest that filming distance in immersive video is not merely a factor that determines angle of view or composition, but a design variable that can enhance Presence and deepen fan intensity.2026-06-08T01:26:23Z20 pages, including 6 pages of supplementary materials; 10 figures, 2 tablesKoichi ToidaHideto HiranumaShimpei MiuraNorihiro YamamotoYuki KobayashiShingo Megurohttp://arxiv.org/abs/2603.29495v2All-in-One Augmented Reality Guided Head and Neck Tumor Resection2026-06-07T19:53:36ZPositive margins are common in head and neck squamous cell carcinoma, yet intraoperative re-resection is often imprecise because margin locations are typically communicated verbally from pathology. We present an all-in-one augmented reality (AR) system that relocalizes positive margins from a resected specimen to the resection bed and visualizes them in situ using HoloLens 2 depth sensing and fully automated markerless surface registration. In a silicone phantom study with six medical trainees, markerless registration achieved target registration errors comparable to a marker-based baseline (median 1.8 mm vs. 1.7 mm; maximum < 4 mm). In a margin relocalization task, AR guidance reduced error from verbal guidance (median 14.2 mm) to a few millimeters (median 3.2 mm), with all AR localizations within 5 mm error. These results support the feasibility of markerless AR margin guidance for more precise intraoperative re-excision.2026-03-31T09:38:52ZYue YangMatthieu ChabanasCarrie RealeAnnie BensonJason SlagleMatthew WeingerMichael TopfJie Ying Wuhttp://arxiv.org/abs/2508.10239v3Breaking the Curse of Knowledge: Designing Personalized Jargon Support for Real-Time Online Meetings2026-06-07T19:49:25ZCross-disciplinary communication is often hindered by specialized language (i.e., jargon) and uneven background knowledge. Recent advances in speech-to-text and large language models make it possible to provide jargon support during online meetings, but generic support (i.e., defining the same terms for everyone) can overwhelm listeners with definitions they do not need. We present ParseJargon, a system for personalized jargon support in real-time online meetings. We begin with an initial prototype to probe the use of single-sentence user profiles for personalization. We conducted a controlled study and showed that even this minimal personalization enhanced listeners' comprehension and engagement over generic support because of more precise jargon identification. Guided by insights from participants' feedback, we refined the system with more advanced personalization techniques, including in-session user feedback and portable glossary-based profiles. We evaluated how these techniques can further improve jargon identification precision using data collected in the controlled study to simulate personalization over time. We also conducted a latency test, complemented by a lightweight deployment, to analyze the system's real-time capability and usability.2025-08-13T23:42:12ZPortions of this work appeared in CHI '26 Extended Abstracts ("Breaking the Curse of Knowledge: Toward Personalized Jargon Support in Online Meetings") and ACL '26 System Demonstrations ("ParseJargon: Personalized Real-time Jargon Support in Online Meetings")Yifan SongYijun LiuWing Yee AuHon Yung WongBrian P. BaileyTal Augusthttp://arxiv.org/abs/2605.16972v2WhiteTesseract: Reframing the Interpretation of Cultural Heritage through XR and Conversational AI2026-06-07T13:19:01ZCultural heritage exhibitions often struggle to sustain attention and support reflective engagement. Physical exhibitions rely on fixed interpretive aids that lack adaptability to individual backgrounds or curiosity, and their effectiveness depends heavily on a visitor's Personal Context, prior knowledge, and cultural literacy. Meanwhile, digital exhibitions prioritize convenience and accessibility but risk weakening the Physical and Social Contexts that define embodied cultural experience.
WhiteTesseract addresses this gap by enabling in-situ interpretation through high-resolution XR and conversational AI. The system integrates spatial intelligence via artwork recognition to allow visitors to selectively reduce environmental distractions (via diminished reality) and engage in context-aware dialogue (via large language models). The goal is to preserve the richness of the physical and social environment while providing a flexible space for personal reflection, enhancing Personal Context without compromising physical authenticity.
We deployed the system in a Claude Monet exhibition and conducted a controlled user study with 26 participants. Quantitative results showed that WhiteTesseract modulation significantly increased average viewing duration from 35.3 to 98.3 seconds (p < 0.001). Analysis of 529 visitor-AI interactions revealed that 60% extended beyond factual queries to include analytical, emotional, and comparative inquiries. These findings demonstrate how XR and AI can enrich the physical exhibition experience by supporting deeper, more personalized engagement without displacing the embodied value of cultural heritage. We discuss technical and social constraints for real-world deployment and limitations of our controlled setting.2026-05-16T12:50:37Z38 pages, 13 figures. Accepted for publication in ACM Journal on Computing and Cultural Heritage (JOCCH)Jingjing LiZhi LiuXiyao JinTatsuki FushimiYoichi Ochiaihttp://arxiv.org/abs/2606.08596v1Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration2026-06-07T12:20:32ZConstructing efficient and reliable policies to assist humans is indispensable for human-AI collaboration. Existing methods mainly follow two lines of work. Most prior work relies on multi-agent reinforcement learning (MARL) to learn black-box policies, which limits interpretability and raises safety concerns. Recent methods query large language models (LLMs) at each decision step, causing slow responses and high inference costs. We propose Collaboration Policy Tree (Co-pi-tree), a closed-loop method that learns an executable policy tree consisting of a partner-behavior prediction tree and an agent-action selection tree. Co-pi-tree constructs a policy by distilling LLM reasoning into policy tree code. It then evaluates the policy through partner interaction, obtains feedback, and uses natural language to summarize the interaction feedback to improve problematic branches. Experiments in Overcooked-AI show that Co-pi-tree improves average reward by 35.4% over the baseline average, while reducing the number of LLM queries by 77.7% and test-time latency by 97.1%. Project page: https://beiwenzhang.github.io/Co-pi-tree/2026-06-07T12:20:32ZBeiwen ZhangYongheng LiangGuowei ZouHaitao WangHejun Wuhttp://arxiv.org/abs/2508.06336v2Unsupervised Partner Design Enables Robust Ad-hoc Teamwork2026-06-07T11:04:52ZWe introduce Unsupervised Partner Design (UPD), a population-free multi-agent reinforcement learning method for robust ad-hoc teamwork. UPD generates training partners on-the-fly and selects them adaptively based on a learnability criterion, removing the need for pre-trained partner populations or manual parameter tuning. We show that this simple mechanism enables effective partner diversity and can be extended to joint partner-environment selection when a procedural level generator is available. Across Level-Based Foraging, Overcooked-AI, and the Overcooked Generalisation Challenge, UPD consistently achieves strong performance compared to both population-based and population-free baselines. In a human-AI user study, agents trained with UPD achieve higher returns and are rated as more adaptive, more human-like, and less frustrating than all evaluated baseline methods.2025-08-08T14:11:15Z27 pagesConstantin RuhdorferMatteo BortolettoVictor OeiAnna PenzkoferAndreas Bullinghttp://arxiv.org/abs/2606.08441v1Comparing Controller-Free Pointing Techniques Across Depth for 2D Selection in Augmented Reality2026-06-07T03:38:17ZThis paper presents a systematic evaluation of five controller-free pointing techniques for 2D target selection in AR, using ISO 9241-411. We compared them across multiple depths (2 m, 6 m, 10 m) in terms of movement time, accuracy, throughput, and workload (NASA TLX). Head- and eye-based pointing significantly outperformed the hand-based methods (Finger, Wrist, and Arm); Head input was the most accurate and remained the most consistent across depth. Depth significantly impacted performance, with complex interactions with target size and distance. Our results offer a comprehensive empirical basis for selecting appropriate controller-free techniques in depth-varying AR tasks.2026-06-07T03:38:17ZProceedings of the Graphics Interface Conference 2026Samiha SultanaJ. Felipe GonzalezRobert J. Teatherhttp://arxiv.org/abs/2606.08426v1CritLens: Visual Analytics for Criteria Discovery in Review-Based Decision Making2026-06-07T02:50:54ZWe present CritLens, a visual analytics system that helps users build personalized multi-criteria decision models from review text. In everyday decisions -- choosing equipment, hotels, or restaurants -- evaluation criteria are either preset by platforms or generated by LLMs, leaving users unable to discover, adjust, or verify them against the underlying evidence. This is problematic because many preferences are latent: they surface only upon encountering specific reviews, and any fixed framework risks overlooking low-frequency but decisive details. CritLens addresses this gap by using LLMs to transform reviews into an initial AHP decision model, then supporting iterative, human-in-the-loop refinement. Through coverage gap detection in the embedding space, users discover criteria missed by the initial model; through interactive weight adjustment under AHP consistency constraints, they express personal priorities; and through a multi-level scorecard and exportable decision report, they trace every ranking back to the original review text. Two case studies, an eight-participant user study, and a quantitative consistency-repair experiment demonstrate the system's effectiveness.2026-06-07T02:50:54ZHongjia WuShuai ZhouHongxin ZhangWei Chen