https://arxiv.org/api/kjiHf3lCyOypqYuUpw6hALJl6CA2026-06-14T13:08:13Z3093437515http://arxiv.org/abs/2512.13253v2Fostering human learning is crucial for boosting human-AI synergy2026-05-27T08:58:04ZThe collaboration between humans and artificial intelligence (AI) holds the promise of achieving superior outcomes compared to either acting alone-a phenomenon called human-AI synergy. Nevertheless, our understanding of the conditions that facilitate such human-AI synergy when humans are advised by AI remains limited. A recent meta-analysis showed that, on average, human-AI combinations do not outperform the better individual agent. We argue that this pessimistic conclusion arises from insufficient attention to human learning in the experimental designs. To substantiate this claim, we re-analyzed all 74 studies included in the original meta-analysis, yielding two new findings. First, most previous research overlooked design features that foster human learning, such as providing outcome feedback to participants. Second, our re-analysis demonstrated that studies providing outcome feedback show tentatively higher synergy than those without outcome feedback. Crucially, feedback paired with AI explanations tends to yield positive synergy, while explanations without feedback were linked to negative synergy-indicating that explanations increase synergy only when humans can learn to verify the AI's reliability through feedback. We conclude that the current literature underestimates the potential of human-AI collaboration because it predominantly relies on paradigms that do not facilitate human learning, thus hindering humans from effectively adapting their collaboration strategies. We therefore advocate for a paradigm shift in human-AI interaction research that explicitly addresses human learning and thus enhances our understanding of and support for successful human-AI collaboration.2025-12-15T12:08:23ZJulian BergerJason W. BurtonRalph HertwigThomas KoschRalf H. J. M. KurversBenito KurzenbergerChristopher LazikLinda OnnaschTobias RiegerAnna I. ThomaDirk U. WulffStefan M. Herzoghttp://arxiv.org/abs/2605.28154v1Robo-Blocks: Generative Scaffolding in End-User Design and Programming of Social Robots2026-05-27T08:39:01ZProgramming social robots is challenging for novice robot programmers due to required expertise in planning, interaction design, and programming. While large language models (LLMs) hold significant promise through code generation from natural-language descriptions, they can obscure critical elements of programming and supplant designer intent, eventually resulting in over-reliance instead of developing programming skills. In this paper, we explore how LLM-based social-robot-programming tools can support novice robot programmers through a Research through Design (RtD) process. We designed and prototyped Robo-Blocks, a block-based programming environment that leverages LLMs to offer novice robot programmers generative scaffolding through structured narratives that connect high-level ideas to executable robot behaviors. Through deployment with novices, we discovered emerging user personas and usage patterns for generative scaffolding and showed how this scaffolding shapes end-user design and programming strategies. We present design insights for the effective use of generative scaffolding and its integration into the practice of social-robot programming.2026-05-27T08:39:01ZArissa J. SatoCallie Y. KimNathan Thomas WhiteAbhinav ManeeshYuqing WangHui-Ru HoBilge Mutlu10.1145/3800645.3812997http://arxiv.org/abs/2605.28064v1I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors2026-05-27T07:16:02ZAutomatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually encounter synthetic speech remains poorly understood. We investigate voice deepfake detection as a perceptual and contextual process, presenting a localization task in which 47 participants marked suspected synthetic segments across authentic, fully synthetic, and partially synthetic utterances under three manipulated trust cues: instructional framing, affective priming, and provenance labeling. Participants provided quality ratings on mechanicalness, expressiveness, intelligibility, clarity, calmness, and confidence of evaluation. Utterance class was the primary determinant of detection accuracy and perceptual quality; trust cues produced no main effects but motivated detection behavior. Fully synthetic speech was detected at below-chance levels. Quality ratings tracked utterance type, indicating implicit discrimination where overt detection failed.2026-05-27T07:16:02ZTo be included in Odyssey 2026: The Speaker and Language Recognition Workshop, Session 4.2, 23-26 June, Lisbon, PortugalLelia ErscoiComputational Speech Group, University of Eastern FinlandTomi KinnunenComputational Speech Group, University of Eastern Finlandhttp://arxiv.org/abs/2605.27999v1Learning to Assign Prediction Tasks to Agents with Capacity Constraints2026-05-27T05:45:47ZWe address the problem of learning to assign prediction tasks to one agent from a set of available human or AI agents. In particular, we focus on the sequential learning of agent expertise and assignment policies where each agent is constrained to handle a fraction of tasks. We provide a general theoretical characterization of this problem in terms of agent capacities, differences in agent expertise, and task context. We then develop a framework of sequential explore-exploit policy-learning algorithms that seek to maximize overall performance. Experimental results over a variety of tabular, image, and text prediction tasks demonstrate systematic gains from our policy-learning algorithms relative to non-contextual baselines across different types of agents, including LLMs and humans.2026-05-27T05:45:47ZShang WuSaatvik KherPadhraic Smythhttp://arxiv.org/abs/2605.16237v2Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement2026-05-27T04:47:58ZClearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from Science and Technology Studies (STS), we show that such distance exists because (1) historically, the "ground truth" of the strike zone is contested: the rule in practice has always reflected a hybrid between the rulebook definition and umpires' enforcement decisions; and (2) the use of ABS is embedded in an existing eco-system, where the implementation of a technological enforcement system needs to balance multiple stakeholder values. This perspective challenges conventional evaluation paradigms that center on the distance between a formalized rule and its technological implementation, and instead calls for evaluating how such systems are experienced in practice. Addressing this question requires in-depth social science approaches, contributing to ongoing conversations in FAccT about the implementation and evaluation of sociotechnical systems.2026-05-15T17:45:04ZAndrea Wen-Yi WangWaki KaminoDavid MimnoKaren LevyMalte F. Jung10.1145/3805689.3812385http://arxiv.org/abs/2605.27939v1EyeSpy: Inferring Eye Gaze via Side-Channel Attacks Against Foveated Rendering2026-05-27T04:20:37ZWhile eye tracking provides valuable capabilities for virtual reality, such as gaze interaction and dynamic foveated rendering (DFR), eye-tracking data can inadvertently reveal sensitive user information if not properly protected. Current protections, such as adding permission prompts or gatekeeping gaze data, are insufficient on DFR-enabled systems because gaze data is used internally to drive DFR. When DFR is implemented, objects in the fovea (i.e., immediate gaze area) incur a higher GPU workload than those in the periphery. This gaze-contingent workload creates a novel side channel, which can be leveraged to reconstruct gaze positions. Specifically, we design a novel attack that sweeps imperceptible high-cost objects (HCOs) across the user's field of view and logs rendering performance metrics (e.g., frame rate or frame time) commonly exposed through standard game engines. Then, we correlate variation in these metrics (caused by HCO-foveal overlap) with the known HCOs' positions to infer gaze coordinates directly without using eye-tracking APIs. Our experimental results show that mean gaze prediction errors (1.1-4.4 degrees) across the Meta Quest Pro, Varjo XR-4, and desktop platforms are comparable to typical eye-tracker accuracy. We demonstrate that the attack generalizes across various hardware platforms, standard game engines, and foveated rendering pipelines. Finally, we design defense mechanisms based on supervised and unsupervised detectors that can flag the attack reliably (F1 of 0.99) over short time windows.2026-05-27T04:20:37Z20 pages, 12 figures. Accepted to the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026). Artifacts: https://bmdj-vt.github.io/project_pages/xr_side_channelsProceedings of the 47th IEEE Symposium on Security and Privacy (SP), pp. 2646-2665, 2026Paul MaynardHarris AmjadCamila MolinaresBo JiBrendan David-John10.1109/SP63933.2026.00145http://arxiv.org/abs/2605.27921v1Show, Don't TELL: Explainable AI-Generated Text Detection2026-05-27T03:47:25ZResearch on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.2026-05-27T03:47:25ZAldan CreoSuraj Ranganathhttp://arxiv.org/abs/2605.27801v1Local Privacy Laws in a Globalized World2026-05-27T00:41:31ZPersonal data has emerged as a highly valuable yet sensitive asset that drives business decisions, enables targeted advertising, and generates substantial revenue for companies, while simultaneously facilitating invasive monitoring of users. In recent years, research on digital privacy violations, including undue access, collection, and sharing of user data, has grown significantly. Much of this research adopts the European General Data Protection Regulation (GDPR) as the primary reference framework. This is reasonable, as GDPR was a pioneering legislation, and many of its stipulations are clear and unambiguous. However, we argue that focusing solely on GDPR (and a small set of other Western regulatory frameworks) ignores privacy-related concerns, attitudes, and problems faced by users from other locales, creating a significant research blind spot. This work systematically normalizes the heterogeneous legal requirements of multiple data protection laws into a unified abstraction aligned with the data lifecycle, which forms the foundation for the implementation of such regulations. We further investigate the implications of these laws on different stakeholders, including users, organizations, and governments. Overall, this work aims to broaden the digital privacy research community's perspective and to serve as a set of guiding principles for developing technological privacy solutions spanning multiple countries.2026-05-27T00:41:31ZAccepted in ACM Conference on Data and Application Security and Privacy (CODASPY) 2026Shantanu SharmaEthan MyersLorenzo De CarliRitwik BanerjeeIndrakshi Rayhttp://arxiv.org/abs/2605.27749v1Chameleon Clippers: A Tool for Developing Fine Motor Skills in Remote Education Settings2026-05-26T22:51:32ZArt education plays a significant role in K-2 learners' physical and cognitive development. However, teachers struggle to translate in-person activities to remote settings and to give necessary feedback to help learners develop fine motor skills. Previous research shows the benefits of tangible technology and real-time system feedback for supporting teachers and students in digital environments, but little research explores their affordances for remote art education. We developed Chameleon Clippers: interactive scissors that give real-time feedback to learners as they cut along a line. In preliminary tests, learners felt engaged and responded to feedback, enjoying their experience. Our low-cost design augments existing classroom artifacts and practices, supporting classroom integration. Testing also revealed directions for future study, including the frequency of feedback and assimilation into a broader, art education platform. Through our study, we demonstrate the potential for tangible technology to create more interactive, engaging, and supportive remote K-2 learning experiences.2026-05-26T22:51:32Z4 pages, 1 figure, https://repository.isls.org//handle/1/8300Proceedings of the 15th International Conference on Computer-Supported Collaborative Learning - CSCL 2022 (pp. 332-335). International Society of the Learning SciencesGennie MansiAshley BooneSue Reon KimJessica Roberts10.22318/cscl2022.332http://arxiv.org/abs/2605.27685v1Decoupled Intelligence: A Multi-Agent LLM Framework for Controllable Traffic Scenario Generation in SUMO2026-05-26T21:03:09ZThe integration of Large Language Models (LLMs) with microscopic traffic simulation offers a promising path toward autonomous urban planning and intelligent transportation analysis. However, existing monolithic agent architectures often struggle with the complexity of end-to-end simulation workflows, leading to reasoning failures, parameter inconsistency, and a lack of systematic state management. This paper proposes a novel multi-agent collaborative framework designed to automate the entire lifecycle of traffic simulation in SUMO (Simulation of Urban Mobility). Our approach decouples the simulation pipeline into specialized roles, including Planner, Builder, Demand, Runner, and Analyst, coordinated by a high-level reasoning engine. We introduce a state-persistent Orchestrator leveraging the Model Context Protocol (MCP) to ensure seamless data handover and environmental consistency across distributed agent actions. This architecture enables a robust closed-loop refinement process, where simulation outcomes are iteratively analyzed and optimized to satisfy user-defined Key Performance Indicators (KPIs). Experimental results through role ablation studies demonstrate that the proposed multi-agent framework significantly enhances task success rates and parameter accuracy compared to single-agent baselines. Furthermore, case studies on real-world network extraction and traffic optimization highlight the system's capability to bridge the gap between high-level natural language intent and low-level simulation execution.2026-05-26T21:03:09ZShuyang LiRuimin Kehttp://arxiv.org/abs/2510.03559v3PrivacyMotiv: Vulnerability-Centered Persona Journeys for Empathic Privacy Reviews in UX Design2026-05-26T20:35:32ZUX professionals routinely conduct design reviews, yet privacy concerns are often overlooked, not only due to limited tools, but more fundamentally from low intrinsic motivation, driven by limited privacy knowledge, weak empathy for unexpectedly affected users, and low autonomy in identifying harms. We present PrivacyMotiv, an LLM-powered system that generates vulnerability-centered personas, persona journey stories, and traceable design diagnoses grounded in lo-fi user flows to support privacy-oriented UX design review. In a within-subjects study with professional UX practitioners (N=16), PrivacyMotiv significantly improved empathy, intrinsic motivation, and perceived usefulness, with participants identifying 59% more privacy issues and proposing 70% more redesign solutions compared to self-proposed methods. This work contributes empirical insight into motivational barriers in privacy-aware UX and a structured, narrative-driven approach for integrating privacy review into early-stage UX practice.2025-10-03T23:14:22Z33 pages, 17 figuresProceedings of the 2026 ACM Designing Interactive Systems Conference (DIS 2026)Zeya ChenJianing WenYaxing YaoToby Jia-Jun LiTianshi Li10.1145/3800645.3813014http://arxiv.org/abs/2605.27666v1Explanations as Dialogues: Toward Human-Centered Conversational Explainable AI2026-05-26T20:33:51ZAs AI systems become increasingly conversational, a gap emerges wherein explanations are studied as static artifacts, yet in practice, are experienced as dialogue. In this provocation, we argue that the conversational layer around an explanation is not incidental to its effectiveness, but a critical constituent. Drawing on three illustrative scenarios, we invite the CUI community to study explanations as interactive, conversational exchanges shaped by timing, tone, persona and conversational history, and introduce our vision for Human-Centered Conversational XAI (HC2XAI).2026-05-26T20:33:51ZTo be published in the ACM Conversational User Interfaces (CUI)'26 Conference as ProvocationNiharika MathurSmit Desai10.1145/3816046.3816314http://arxiv.org/abs/2605.27634v1Structuring Human-AI Productive Interdependence by Strategic Level of Automation Selection for Qualitative Inquiry2026-05-26T19:54:26ZWhile Large Language Models (LLMs) offer a solution to the scale-versus-depth dilemma in qualitative analysis, the paradigm of maximizing automation is fundamentally at odds with the interpretive nature of qualitative inquiry. We argue that effective Human-AI collaboration is not an automation problem, but an interdependence problem. This paper reframes the design of "co-data" systems through the lens of Interdependence Theory, proposing a formal framework to structure human-AI productive interdependence. The framework guides the selection of an appropriate Level of Automation (LoA) for different stages of the qualitative analysis process by assessing task risk and the cost of validation. We present a case study where this framework led to a deliberately interdependent workflow, fostering the calibrated trust necessary for rigorous analysis. We conclude by presenting three design principles that instantiate this framework, demonstrating how to leverage AI as a powerful partner while preserving the human researcher's irreplaceable role in the transformation process of meaning-making.2026-05-26T19:54:26ZFeng ZhouJacqueline Meijer-IronsAmbar Murillohttp://arxiv.org/abs/2605.16578v3Voice "Cloning" is Style Transfer2026-05-26T19:32:15ZArtificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity preservation is important, such as completing a recording, dubbing in a new language, or preserving the voices of individuals with speech loss. However, in our work, we find that despite the term, voice cloning does not faithfully ''clone'' an individual's voice. Instead, we find that widely-used voice cloning models systematically apply style transfer to source voices. As rated by human annotators, cloned voices are perceived as more authoritative, warm, customer-service-like, and human-like compared to their sources. Human annotators also report greater trust in cloned voices than source voices, and a greater willingness to disclose sensitive personal information to them. Our work furthermore shows that voice cloning leads to homogenization of speaker characteristics, as measured by reduced variance in accent, speaking rate, and the audio embedding space. Together, our results highlight a new set of limitations and risks of voice cloning technology and their potential impact on human behavior.2026-05-15T19:32:28ZKaitlyn ZhouFederico BianchiMartijn BarteldsAnna PotYongchan KwonJames Zouhttp://arxiv.org/abs/2605.27610v1Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning2026-05-26T19:25:43ZThe rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-based assistants retrieve or summarize papers, but often hide how the corpus was selected, organized, or connected to temporal patterns. We present $\texttt{Eliot}$, a publicly deployed interactive system for traceable exploration of evolving scientific literature. Motivated by two studies on Large Language Models (LLMs) and Automated Planning and Scheduling (APS), $\texttt{Eliot}$ generalizes literature-evolution analysis beyond hand-built taxonomies and domain-specific scripts. Given explicit query terms and filters, it retrieves arXiv papers at query time, represents each paper by title and abstract, clusters the corpus into themes, assigns representative keywords, and visualizes each cluster's publication-year distribution. We evaluate $\texttt{Eliot}$ as both an applied system and an interactive research aid. An offline configuration study across eight arXiv domains compares document representations, dimensionality reduction methods, and clustering algorithms using intrinsic clustering and topic-coherence metrics; the results support MiniLM embeddings with 10-dimensional UMAP and Agglomerative Clustering as a practical default. A scenario-based survey and expert focus group assess interpretability and use contexts: participants rated cluster labels as meaningful in 85% of scenario responses, and feedback indicated that $\texttt{Eliot}$ is most valuable for auditable overviews of rapidly changing technical areas. These results suggest that query-time clustering and temporal inspection can complement search and generation tools by helping researchers inspect and refine the evidence behind literature trends.2026-05-26T19:25:43ZUnder-review at CIKM Applied Research 2026Bernardo A. DenkvittsNitin GuptaBiplav Srivastava