https://arxiv.org/api/kjiHf3lCyOypqYuUpw6hALJl6CA 2026-06-14T13:08:13Z 30934 375 15 http://arxiv.org/abs/2512.13253v2 Fostering human learning is crucial for boosting human-AI synergy 2026-05-27T08:58:04Z

The collaboration between humans and artificial intelligence (AI) holds the promise of achieving superior outcomes compared to either acting alone-a phenomenon called human-AI synergy. Nevertheless, our understanding of the conditions that facilitate such human-AI synergy when humans are advised by AI remains limited. A recent meta-analysis showed that, on average, human-AI combinations do not outperform the better individual agent. We argue that this pessimistic conclusion arises from insufficient attention to human learning in the experimental designs. To substantiate this claim, we re-analyzed all 74 studies included in the original meta-analysis, yielding two new findings. First, most previous research overlooked design features that foster human learning, such as providing outcome feedback to participants. Second, our re-analysis demonstrated that studies providing outcome feedback show tentatively higher synergy than those without outcome feedback. Crucially, feedback paired with AI explanations tends to yield positive synergy, while explanations without feedback were linked to negative synergy-indicating that explanations increase synergy only when humans can learn to verify the AI's reliability through feedback. We conclude that the current literature underestimates the potential of human-AI collaboration because it predominantly relies on paradigms that do not facilitate human learning, thus hindering humans from effectively adapting their collaboration strategies. We therefore advocate for a paradigm shift in human-AI interaction research that explicitly addresses human learning and thus enhances our understanding of and support for successful human-AI collaboration.

2025-12-15T12:08:23Z Julian Berger Jason W. Burton Ralph Hertwig Thomas Kosch Ralf H. J. M. Kurvers Benito Kurzenberger Christopher Lazik Linda Onnasch Tobias Rieger Anna I. Thoma Dirk U. Wulff Stefan M. Herzog http://arxiv.org/abs/2605.28154v1 Robo-Blocks: Generative Scaffolding in End-User Design and Programming of Social Robots 2026-05-27T08:39:01Z

Programming social robots is challenging for novice robot programmers due to required expertise in planning, interaction design, and programming. While large language models (LLMs) hold significant promise through code generation from natural-language descriptions, they can obscure critical elements of programming and supplant designer intent, eventually resulting in over-reliance instead of developing programming skills. In this paper, we explore how LLM-based social-robot-programming tools can support novice robot programmers through a Research through Design (RtD) process. We designed and prototyped Robo-Blocks, a block-based programming environment that leverages LLMs to offer novice robot programmers generative scaffolding through structured narratives that connect high-level ideas to executable robot behaviors. Through deployment with novices, we discovered emerging user personas and usage patterns for generative scaffolding and showed how this scaffolding shapes end-user design and programming strategies. We present design insights for the effective use of generative scaffolding and its integration into the practice of social-robot programming.

2026-05-27T08:39:01Z Arissa J. Sato Callie Y. Kim Nathan Thomas White Abhinav Maneesh Yuqing Wang Hui-Ru Ho Bilge Mutlu 10.1145/3800645.3812997 http://arxiv.org/abs/2605.28064v1 I Hear, Therefore I Trust: A Socio-Technical Investigation of Humans as Synthetic Speech Detectors 2026-05-27T07:16:02Z

Automatic deepfake detection has received considerable research attention, yet the socio-technical environment in which humans actually encounter synthetic speech remains poorly understood. We investigate voice deepfake detection as a perceptual and contextual process, presenting a localization task in which 47 participants marked suspected synthetic segments across authentic, fully synthetic, and partially synthetic utterances under three manipulated trust cues: instructional framing, affective priming, and provenance labeling. Participants provided quality ratings on mechanicalness, expressiveness, intelligibility, clarity, calmness, and confidence of evaluation. Utterance class was the primary determinant of detection accuracy and perceptual quality; trust cues produced no main effects but motivated detection behavior. Fully synthetic speech was detected at below-chance levels. Quality ratings tracked utterance type, indicating implicit discrimination where overt detection failed.

2026-05-27T07:16:02Z To be included in Odyssey 2026: The Speaker and Language Recognition Workshop, Session 4.2, 23-26 June, Lisbon, Portugal Lelia Erscoi Computational Speech Group, University of Eastern Finland Tomi Kinnunen Computational Speech Group, University of Eastern Finland http://arxiv.org/abs/2605.27999v1 Learning to Assign Prediction Tasks to Agents with Capacity Constraints 2026-05-27T05:45:47Z

We address the problem of learning to assign prediction tasks to one agent from a set of available human or AI agents. In particular, we focus on the sequential learning of agent expertise and assignment policies where each agent is constrained to handle a fraction of tasks. We provide a general theoretical characterization of this problem in terms of agent capacities, differences in agent expertise, and task context. We then develop a framework of sequential explore-exploit policy-learning algorithms that seek to maximize overall performance. Experimental results over a variety of tabular, image, and text prediction tasks demonstrate systematic gains from our policy-learning algorithms relative to non-contextual baselines across different types of agents, including LLMs and humans.

2026-05-27T05:45:47Z Shang Wu Saatvik Kher Padhraic Smyth http://arxiv.org/abs/2605.16237v2 Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement 2026-05-27T04:47:58Z

Clearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from Science and Technology Studies (STS), we show that such distance exists because (1) historically, the "ground truth" of the strike zone is contested: the rule in practice has always reflected a hybrid between the rulebook definition and umpires' enforcement decisions; and (2) the use of ABS is embedded in an existing eco-system, where the implementation of a technological enforcement system needs to balance multiple stakeholder values. This perspective challenges conventional evaluation paradigms that center on the distance between a formalized rule and its technological implementation, and instead calls for evaluating how such systems are experienced in practice. Addressing this question requires in-depth social science approaches, contributing to ongoing conversations in FAccT about the implementation and evaluation of sociotechnical systems.

2026-05-15T17:45:04Z Andrea Wen-Yi Wang Waki Kamino David Mimno Karen Levy Malte F. Jung 10.1145/3805689.3812385 http://arxiv.org/abs/2605.27939v1 EyeSpy: Inferring Eye Gaze via Side-Channel Attacks Against Foveated Rendering 2026-05-27T04:20:37Z

While eye tracking provides valuable capabilities for virtual reality, such as gaze interaction and dynamic foveated rendering (DFR), eye-tracking data can inadvertently reveal sensitive user information if not properly protected. Current protections, such as adding permission prompts or gatekeeping gaze data, are insufficient on DFR-enabled systems because gaze data is used internally to drive DFR. When DFR is implemented, objects in the fovea (i.e., immediate gaze area) incur a higher GPU workload than those in the periphery. This gaze-contingent workload creates a novel side channel, which can be leveraged to reconstruct gaze positions. Specifically, we design a novel attack that sweeps imperceptible high-cost objects (HCOs) across the user's field of view and logs rendering performance metrics (e.g., frame rate or frame time) commonly exposed through standard game engines. Then, we correlate variation in these metrics (caused by HCO-foveal overlap) with the known HCOs' positions to infer gaze coordinates directly without using eye-tracking APIs. Our experimental results show that mean gaze prediction errors (1.1-4.4 degrees) across the Meta Quest Pro, Varjo XR-4, and desktop platforms are comparable to typical eye-tracker accuracy. We demonstrate that the attack generalizes across various hardware platforms, standard game engines, and foveated rendering pipelines. Finally, we design defense mechanisms based on supervised and unsupervised detectors that can flag the attack reliably (F1 of 0.99) over short time windows.

2026-05-27T04:20:37Z 20 pages, 12 figures. Accepted to the 47th IEEE Symposium on Security and Privacy (IEEE S&P 2026). Artifacts: https://bmdj-vt.github.io/project_pages/xr_side_channels Proceedings of the 47th IEEE Symposium on Security and Privacy (SP), pp. 2646-2665, 2026 Paul Maynard Harris Amjad Camila Molinares Bo Ji Brendan David-John 10.1109/SP63933.2026.00145 http://arxiv.org/abs/2605.27921v1 Show, Don't TELL: Explainable AI-Generated Text Detection 2026-05-27T03:47:25Z

Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.

2026-05-27T03:47:25Z Aldan Creo Suraj Ranganath http://arxiv.org/abs/2605.27801v1 Local Privacy Laws in a Globalized World 2026-05-27T00:41:31Z

Personal data has emerged as a highly valuable yet sensitive asset that drives business decisions, enables targeted advertising, and generates substantial revenue for companies, while simultaneously facilitating invasive monitoring of users. In recent years, research on digital privacy violations, including undue access, collection, and sharing of user data, has grown significantly. Much of this research adopts the European General Data Protection Regulation (GDPR) as the primary reference framework. This is reasonable, as GDPR was a pioneering legislation, and many of its stipulations are clear and unambiguous. However, we argue that focusing solely on GDPR (and a small set of other Western regulatory frameworks) ignores privacy-related concerns, attitudes, and problems faced by users from other locales, creating a significant research blind spot. This work systematically normalizes the heterogeneous legal requirements of multiple data protection laws into a unified abstraction aligned with the data lifecycle, which forms the foundation for the implementation of such regulations. We further investigate the implications of these laws on different stakeholders, including users, organizations, and governments. Overall, this work aims to broaden the digital privacy research community's perspective and to serve as a set of guiding principles for developing technological privacy solutions spanning multiple countries.

2026-05-27T00:41:31Z Accepted in ACM Conference on Data and Application Security and Privacy (CODASPY) 2026 Shantanu Sharma Ethan Myers Lorenzo De Carli Ritwik Banerjee Indrakshi Ray http://arxiv.org/abs/2605.27749v1 Chameleon Clippers: A Tool for Developing Fine Motor Skills in Remote Education Settings 2026-05-26T22:51:32Z

Art education plays a significant role in K-2 learners' physical and cognitive development. However, teachers struggle to translate in-person activities to remote settings and to give necessary feedback to help learners develop fine motor skills. Previous research shows the benefits of tangible technology and real-time system feedback for supporting teachers and students in digital environments, but little research explores their affordances for remote art education. We developed Chameleon Clippers: interactive scissors that give real-time feedback to learners as they cut along a line. In preliminary tests, learners felt engaged and responded to feedback, enjoying their experience. Our low-cost design augments existing classroom artifacts and practices, supporting classroom integration. Testing also revealed directions for future study, including the frequency of feedback and assimilation into a broader, art education platform. Through our study, we demonstrate the potential for tangible technology to create more interactive, engaging, and supportive remote K-2 learning experiences.

2026-05-26T22:51:32Z 4 pages, 1 figure, https://repository.isls.org//handle/1/8300 Proceedings of the 15th International Conference on Computer-Supported Collaborative Learning - CSCL 2022 (pp. 332-335). International Society of the Learning Sciences Gennie Mansi Ashley Boone Sue Reon Kim Jessica Roberts 10.22318/cscl2022.332 http://arxiv.org/abs/2605.27685v1 Decoupled Intelligence: A Multi-Agent LLM Framework for Controllable Traffic Scenario Generation in SUMO 2026-05-26T21:03:09Z

The integration of Large Language Models (LLMs) with microscopic traffic simulation offers a promising path toward autonomous urban planning and intelligent transportation analysis. However, existing monolithic agent architectures often struggle with the complexity of end-to-end simulation workflows, leading to reasoning failures, parameter inconsistency, and a lack of systematic state management. This paper proposes a novel multi-agent collaborative framework designed to automate the entire lifecycle of traffic simulation in SUMO (Simulation of Urban Mobility). Our approach decouples the simulation pipeline into specialized roles, including Planner, Builder, Demand, Runner, and Analyst, coordinated by a high-level reasoning engine. We introduce a state-persistent Orchestrator leveraging the Model Context Protocol (MCP) to ensure seamless data handover and environmental consistency across distributed agent actions. This architecture enables a robust closed-loop refinement process, where simulation outcomes are iteratively analyzed and optimized to satisfy user-defined Key Performance Indicators (KPIs). Experimental results through role ablation studies demonstrate that the proposed multi-agent framework significantly enhances task success rates and parameter accuracy compared to single-agent baselines. Furthermore, case studies on real-world network extraction and traffic optimization highlight the system's capability to bridge the gap between high-level natural language intent and low-level simulation execution.

2026-05-26T21:03:09Z Shuyang Li Ruimin Ke http://arxiv.org/abs/2510.03559v3 PrivacyMotiv: Vulnerability-Centered Persona Journeys for Empathic Privacy Reviews in UX Design 2026-05-26T20:35:32Z

UX professionals routinely conduct design reviews, yet privacy concerns are often overlooked, not only due to limited tools, but more fundamentally from low intrinsic motivation, driven by limited privacy knowledge, weak empathy for unexpectedly affected users, and low autonomy in identifying harms. We present PrivacyMotiv, an LLM-powered system that generates vulnerability-centered personas, persona journey stories, and traceable design diagnoses grounded in lo-fi user flows to support privacy-oriented UX design review. In a within-subjects study with professional UX practitioners (N=16), PrivacyMotiv significantly improved empathy, intrinsic motivation, and perceived usefulness, with participants identifying 59% more privacy issues and proposing 70% more redesign solutions compared to self-proposed methods. This work contributes empirical insight into motivational barriers in privacy-aware UX and a structured, narrative-driven approach for integrating privacy review into early-stage UX practice.

2025-10-03T23:14:22Z 33 pages, 17 figures Proceedings of the 2026 ACM Designing Interactive Systems Conference (DIS 2026) Zeya Chen Jianing Wen Yaxing Yao Toby Jia-Jun Li Tianshi Li 10.1145/3800645.3813014 http://arxiv.org/abs/2605.27666v1 Explanations as Dialogues: Toward Human-Centered Conversational Explainable AI 2026-05-26T20:33:51Z

As AI systems become increasingly conversational, a gap emerges wherein explanations are studied as static artifacts, yet in practice, are experienced as dialogue. In this provocation, we argue that the conversational layer around an explanation is not incidental to its effectiveness, but a critical constituent. Drawing on three illustrative scenarios, we invite the CUI community to study explanations as interactive, conversational exchanges shaped by timing, tone, persona and conversational history, and introduce our vision for Human-Centered Conversational XAI (HC2XAI).

2026-05-26T20:33:51Z To be published in the ACM Conversational User Interfaces (CUI)'26 Conference as Provocation Niharika Mathur Smit Desai 10.1145/3816046.3816314 http://arxiv.org/abs/2605.27634v1 Structuring Human-AI Productive Interdependence by Strategic Level of Automation Selection for Qualitative Inquiry 2026-05-26T19:54:26Z

While Large Language Models (LLMs) offer a solution to the scale-versus-depth dilemma in qualitative analysis, the paradigm of maximizing automation is fundamentally at odds with the interpretive nature of qualitative inquiry. We argue that effective Human-AI collaboration is not an automation problem, but an interdependence problem. This paper reframes the design of "co-data" systems through the lens of Interdependence Theory, proposing a formal framework to structure human-AI productive interdependence. The framework guides the selection of an appropriate Level of Automation (LoA) for different stages of the qualitative analysis process by assessing task risk and the cost of validation. We present a case study where this framework led to a deliberately interdependent workflow, fostering the calibrated trust necessary for rigorous analysis. We conclude by presenting three design principles that instantiate this framework, demonstrating how to leverage AI as a powerful partner while preserving the human researcher's irreplaceable role in the transformation process of meaning-making.

2026-05-26T19:54:26Z Feng Zhou Jacqueline Meijer-Irons Ambar Murillo http://arxiv.org/abs/2605.16578v3 Voice "Cloning" is Style Transfer 2026-05-26T19:32:15Z

Artificially generated speech is increasingly embedded in everyday life. Voice cloning in particular enables applications where identity preservation is important, such as completing a recording, dubbing in a new language, or preserving the voices of individuals with speech loss. However, in our work, we find that despite the term, voice cloning does not faithfully ''clone'' an individual's voice. Instead, we find that widely-used voice cloning models systematically apply style transfer to source voices. As rated by human annotators, cloned voices are perceived as more authoritative, warm, customer-service-like, and human-like compared to their sources. Human annotators also report greater trust in cloned voices than source voices, and a greater willingness to disclose sensitive personal information to them. Our work furthermore shows that voice cloning leads to homogenization of speaker characteristics, as measured by reduced variance in accent, speaking rate, and the audio embedding space. Together, our results highlight a new set of limitations and risks of voice cloning technology and their potential impact on human behavior.

2026-05-15T19:32:28Z Kaitlyn Zhou Federico Bianchi Martijn Bartelds Anna Pot Yongchan Kwon James Zou http://arxiv.org/abs/2605.27610v1 Eliot: Interactively $\underline{E}$xploring Fast-Changing Scientific $\underline{Li}$terature Trends with $\underline{O}$nline Da$\underline{t}$a and Learning 2026-05-26T19:25:43Z

The rapid growth of scientific publishing has made it increasingly difficult to track how fast-moving areas evolve. Search engines and LLM-based assistants retrieve or summarize papers, but often hide how the corpus was selected, organized, or connected to temporal patterns. We present $\texttt{Eliot}$, a publicly deployed interactive system for traceable exploration of evolving scientific literature. Motivated by two studies on Large Language Models (LLMs) and Automated Planning and Scheduling (APS), $\texttt{Eliot}$ generalizes literature-evolution analysis beyond hand-built taxonomies and domain-specific scripts. Given explicit query terms and filters, it retrieves arXiv papers at query time, represents each paper by title and abstract, clusters the corpus into themes, assigns representative keywords, and visualizes each cluster's publication-year distribution. We evaluate $\texttt{Eliot}$ as both an applied system and an interactive research aid. An offline configuration study across eight arXiv domains compares document representations, dimensionality reduction methods, and clustering algorithms using intrinsic clustering and topic-coherence metrics; the results support MiniLM embeddings with 10-dimensional UMAP and Agglomerative Clustering as a practical default. A scenario-based survey and expert focus group assess interpretability and use contexts: participants rated cluster labels as meaningful in 85% of scenario responses, and feedback indicated that $\texttt{Eliot}$ is most valuable for auditable overviews of rapidly changing technical areas. These results suggest that query-time clustering and temporal inspection can complement search and generation tools by helping researchers inspect and refine the evidence behind literature trends.

2026-05-26T19:25:43Z Under-review at CIKM Applied Research 2026 Bernardo A. Denkvitts Nitin Gupta Biplav Srivastava