https://arxiv.org/api/InTp1wpGeUZjqtrk4Pfr3eRCSnA2026-03-22T08:57:28Z29041015http://arxiv.org/abs/2603.19215v1$R$-equivalence on Cubic Surfaces I: Existing Cases with Non-Trivial Universal Equivalence2026-03-19T17:57:38ZLet $V$ be a smooth cubic surface over a $p$-adic field $k$ with good reduction. Swinnerton-Dyer (1981) proved that $R$-equivalence is trivial on $V(k)$ except perhaps if $V$ is one of three special types--those whose $R$-equivalence he could not bound by proving the universal (admissible) equivalence is trivial. We consider all surfaces $V$ currently known to have non-trivial universal equivalence. Beyond being intractable to Swinnerton-Dyer's approach, we observe that if these surfaces also had non-trivial $R$-equivalence, they would contradict Colliot-Thélène and Sansuc's conjecture regarding the $k$-rationality of universal torsors for geometrically rational surfaces.
By devising new methods to study $R$-equivalence, we prove that for 2-adic surfaces with all-Eckardt reductions (the third special type, which contains every existing case of non-trivial universal equivalence), $R$-equivalence is trivial or of exponent 2. For the explicit cases, we confirm triviality: the diagonal cubic $X^3+Y^3+Z^3+ζ_3 T^3=0$ over $\mathbb{Q}_2(ζ_3)$--answering a long-standing question of Manin's (Cubic Forms, 1972)--and the cubic with universal equivalence of exponent 2 (Kanevsky, 1982).
This is the first in a series of works derived from a year of interactions with generative AI models such as AlphaEvolve and Gemini 3 Deep Think, with the latter proving many of our lemmas. We disclose the timeline and nature of their use towards this paper, and describe our broader AI-assisted research program in a companion report (in preparation).2026-03-19T17:57:38Z23 pagesDimitri KanevskyJulian SalazarMatt Harveyhttp://arxiv.org/abs/2603.19213v1Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems2026-03-19T17:57:07ZAs AI systems increasingly permeate high-stakes decision-making, the terminology regarding human involvement - Human-in-the-Loop (HITL), Human-on-the-Loop (HOTL), and Human Oversight - has become vexingly ambiguous. This ambiguity complicates interdisciplinary collaboration between computer science, law, philosophy, psychology, and sociology and can lead to regulatory uncertainty. We propose a clarification grounded in causal structure, focused on human involvement during the runtime of AI systems. The distinction between HITL and HOTL, we argue, is not primarily spatial but causal: HITL is constitutive (a human contribution is necessary for the decision output), while HOTL is corrective (external to the primary causal chain, capable of preventing or modifying outputs). Within HOTL, we distinguish three temporal modes - synchronous, asynchronous, and anticipatory - situated within a nested model of provider and deployer runtime that clarifies their different capacities for intervention. A second, orthogonal dimension captures cognitive integration: whether human and machine operate as complementary or hybrid intelligence, yielding four structurally distinct configurations. Finally, we distinguish these descriptive categories from the normative requirements they serve: statutory "Human Oversight" is a specific normative mode of HOTL that demands not merely a corrective causal position, but genuine preparedness and capacity for effective intervention. Because the same person may occupy both HITL and HOTL roles simultaneously, we argue that this role duality must be treated as a design problem requiring architectural and epistemic mitigation rather than mere acknowledgment.2026-03-19T17:57:07ZKevin BaumJohann Lauxhttp://arxiv.org/abs/2603.19196v1Exploring the Role of Interaction Data to Empower End-User Decision-Making In UI Personalization2026-03-19T17:50:56ZUser interface personalization enhances digital efficiency, usability, and accessibility. However, in user-driven setups, limited support for identifying and evaluating worthwhile opportunities often leads to underuse. We explore a reflexive personalization approach where individuals engage with their digital interaction data to identify meaningful personalization opportunities and benefits. We interviewed 12 participants, using experimental vignettes as design probes to support reflection on different forms of using interaction data to empower decision-making in personalization and the preferred level of system support. We found that people can independently identify personalization opportunities but prefer system support through visual personalization suggestions. Interaction data can shape how users perceive and approach personalization by reinforcing the perceived value of change and data collection, helping them weigh benefits against effort, and increasing the transparency of system suggestions. We discuss opportunities for designing personalization software that raises end-users' agency over interfaces through reflective engagement with their interaction data.2026-03-19T17:50:56ZProceedings of the 2026 CHI Conference on Human Factors in Computing SystemsSérgio AlvesCarlos DuarteKyle MontagueTiago Guerreiro10.1145/3772318.3791022http://arxiv.org/abs/2512.08193v2ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access2026-03-19T17:24:58ZWe present ClinicalTrialsHub, an interactive search-focused platform that consolidates all data from ClinicalTrials.gov and augments it by automatically extracting and structuring trial-relevant information from PubMed research articles. Our system effectively increases access to structured clinical trial data by 83.8% compared to relying on ClinicalTrials.gov alone, with potential to make access easier for patients, clinicians, researchers, and policymakers, advancing evidence-based medicine. ClinicalTrialsHub uses large language models such as GPT-5.1 and Gemini-3-Pro to enhance accessibility. The platform automatically parses full-text research articles to extract structured trial information, translates user queries into structured database searches, and provides an attributed question-answering system that generates evidence-grounded answers linked to specific source sentences. We demonstrate its utility through a user study involving clinicians, clinical researchers, and PhD students of pharmaceutical sciences and nursing, and a systematic automatic evaluation of its information extraction and question answering capabilities.2025-12-09T02:52:06ZJiwoo ParkRuoqi LiuAvani JagdaleAndrew SrisuwananukornJing ZhaoLang LiPing ZhangSachin Kumarhttp://arxiv.org/abs/2603.19134v1Introducing M: A Modular, Modifiable Social Robot2026-03-19T16:51:37ZWe present M, an open-source, low-cost social robot platform designed to reduce platform friction that slows social robotics research by making robots easier to reproduce, modify, and deploy in real-world settings. M combines a modular mechanical design, multimodal sensing, and expressive yet mechanically simple actuation architecture with a ROS2-native software package that cleanly separates perception, expression control, and data management. The platform includes a simulation environment with interface equivalence to hardware to support rapid sim-to-real transfer of interaction behaviors. We demonstrate extensibility through additional sensing/actuation modules and provide example interaction templates for storytelling and two-way conversational coaching. Finally, we report real-world use in participatory design and week-long in-home deployments, showing how M can serve as a practical foundation for longitudinal, reproducible social robotics research.2026-03-19T16:51:37ZVictor Nikhil AntonyZhili GongYoonjae KimChien-Ming Huanghttp://arxiv.org/abs/2603.19030v1LLMs Aren't Human: A Critical Perspective on LLM Personality2026-03-19T15:29:07ZA growing body of research examines personality traits in Large Language Models (LLMs), particularly in human-agent collaboration. Prior work has frequently applied the Big Five inventory to assess LLM behavior analogous to human personality, without questioning the underlying assumptions. This paper critically evaluates whether LLM responses to personality tests satisfy six defining characteristics of personality. We find that none are fully met, indicating that such assessments do not measure a construct equivalent to human personality. We propose a research agenda for shifting from anthropomorphic trait attribution toward functional evaluations, clarifying what personality tests actually capture in LLMs and developing LLM-specific frameworks for characterizing stable, intrinsic behavior.2026-03-19T15:29:07Z4 pagesKim ZierahnCristina CacheroAnna KorhonenNuria Oliverhttp://arxiv.org/abs/2603.19000v1SVLAT: Scientific Visualization Literacy Assessment Test2026-03-19T15:04:56ZScientific visualization (SciVis) has become an essential means for exploring, understanding, and communicating complex scientific phenomena. However, the field still lacks a validated instrument assessing how well people read, understand, and interpret them. We present a scientific visualization literacy assessment test (SVLAT) that measures the general public's SciVis literacy. Covering a range of visualization forms and interpretation demands, SVLAT comprises 49 items grounded in 18 scientific visualizations and illustrations spanning eight visualization techniques and 11 tasks. Instrument development followed a staged, psychometrically grounded pipeline. We defined the construct and blueprint, followed by item generation, and expert review with five SciVis experts using the content validity ratio (mean CVR = 0.79). We subsequently administered a pilot test (30 participants) and a large-scale test tryout (485 participants) to evaluate the instrument's psychometric properties. For validation, we performed item analysis and refinement using both classical test theory (CTT) and item response theory (IRT) to examine item functioning and overall test quality. SVLAT demonstrates high reliability in the tryout sample (McDonald's omega_t = 0.82, Cronbach's alpha = 0.81). The assessment materials are available at https://osf.io/hr3nw/.2026-03-19T15:04:56ZPatrick Phuoc DoKaiyuan TangKuangshi AiChaoli Wanghttp://arxiv.org/abs/2510.20558v2From Far and Near: Perceptual Evaluation of Crowd Representations Across Levels of Detail2026-03-19T15:04:19ZIn this paper, we investigate how users perceive the visual quality of crowd character representations at different levels of detail (LoD) and viewing distances. Each representation, including geometric meshes, image-based impostors, Neural Radiance Fields (NeRFs), and 3D Gaussians, exhibits distinct trade-offs between visual fidelity and computational performance. Our qualitative and quantitative results provide insights to guide the design of perceptually optimized LoD strategies for crowd rendering.2025-10-23T13:39:18ZXiaohan SunCarol O'Sullivanhttp://arxiv.org/abs/2603.18981v1Book your room in the Turing Hotel! A symmetric and distributed Turing Test with multiple AIs and humans2026-03-19T14:44:47ZIn this paper, we report our experience with ``TuringHotel'', a novel extension of the Turing Test based on interactions within mixed communities of Large Language Models (LLMs) and human participants. The classical one-to-one interaction of the Turing Test is reinterpreted in a group setting, where both human and artificial agents engage in time-bounded discussions and, interestingly, are both judges and respondents. This community is instantiated in the novel platform UNaIVERSE (https://unaiverse.io), creating a ``World'' which defines the roles and interaction dynamics, facilitated by the platform's built-in programming tools. All communication occurs over an authenticated peer-to-peer network, ensuring that no third parties can access the exchange. The platform also provides a unified interface for humans, accessible via both mobile devices and laptops, that was a key component of the experience in this paper. Results of our experimentation involving 17 human participants and 19 LLMs revealed that current models are still sometimes confused as humans. Interestingly, there are several unexpected mistakes, suggesting that human fingerprints are still identifiable but not fully unambiguous, despite the high-quality language skills of artificial participants. We argue that this is the first experiment conducted in such a distributed setting, and that similar initiatives could be of national interest to support ongoing experiments and competitions aimed at monitoring the evolution of large language models over time.2026-03-19T14:44:47ZChristian Di MaioTommaso GuidiLuigi QuarantielloJack BellMarco GoriStefano MelacciVincenzo Lomonacohttp://arxiv.org/abs/2603.18960v1Sketch2Topo: Using Hand-Drawn Inputs for Diffusion-Based Topology Optimization2026-03-19T14:26:25ZTopology optimization (TO) is employed in engineering to optimize structural performance while maximizing material efficiency. However, traditional TO methods incur significant computational and time costs. Although research has leveraged generative AI to predict TO outcomes and validated feasibility and accuracy, existing approaches still suffer from limited customizability and impose a high cognitive load on users. Furthermore, balancing structural performance with aesthetic attributes remains a persistent challenge. We developed Sketch2Topo, which augments a diffusion-based TO model with image-to-image generation and image editing capabilities. With Sketch2Topo, users can use sketching to customize geometries and specify physical constraints. The tool also supports mask input, enabling users to perform TO on selected regions only, thereby supporting higher levels of customization. We summarize the workflow and details of the tool and conduct a brief quantitative evaluation. Finally, we explore application scenarios and discuss how hand-drawn input improves usability while balancing functionality and aesthetics.2026-03-19T14:26:25Z5 pages, 4 figures, accepted at CHI 2026 as a posterShuyue FengCedric CaremelYoshihiro Kawahara10.1145/3772363.3798434http://arxiv.org/abs/2603.18950v1What We Talk About When We Talk About Frameworks in HCI2026-03-19T14:22:56ZIn HCI, frameworks function as a type of theoretical contribution, often supporting ideation, design, and evaluation. Yet, little is known about how they are actually used, what functions they serve, and which scholarly practices that shape them. To address this gap, we conducted a systematic review of 615 papers from a decade of CHI proceedings (2015-2024) that prominently featured the term framework. We classified these papers into six engagement types. We then examined the role, form, and essential components of newly proposed frameworks through a functional typology, analyzing how they are constructed, validated, and articulated for reuse. Our results show that enthusiasm for proposing new frameworks exceeds the willingness to iterate on existing ones. They also highlight the ambiguity in the function of frameworks and the scarcity of systematic validation. Based on these insights, we call for more rigorous, reflective, and cumulative practices in the development and use of frameworks in HCI.2026-03-19T14:22:56Z25 pages, 8 figures, The ACM CHI conference on Human Factors in Computing Systems 2026Shitao FangKoji YataniKasper Hornbæk10.1145/3772318.3791400http://arxiv.org/abs/2603.18895v1From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-Making2026-03-19T13:35:29ZArtificial intelligence (AI) systems are deployed as collaborators in human decision-making. Yet, evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared to collaborate safely and effectively. Empirical evidence shows that many failures arise from miscalibrated reliance, including overuse when AI is wrong and underuse when it is helpful.
This paper proposes a measurement framework for evaluating human-AI decision-making centered on team readiness. We introduce a four part taxonomy of evaluation metrics spanning outcomes, reliance behavior, safety signals, and learning over time, and connect these metrics to the Understand-Control-Improve (U-C-I) lifecycle of human-AI onboarding and collaboration.
By operationalizing evaluation through interaction traces rather than model properties or self-reported trust, our framework enables deployment-relevant assessment of calibration, error recovery, and governance. We aim to support more comparable benchmarks and cumulative research on human-AI readiness, advancing safer and more accountable human-AI collaboration.2026-03-19T13:35:29ZACM CHI 2026 PosterMin Hun Lee10.1145/3772363.3798377http://arxiv.org/abs/2603.18873v1Evaluating LLM-Generated Lessons from the Language Learning Students' Perspective: A Short Case Study on Duolingo2026-03-19T13:17:50ZPopular language learning applications such as Duolingo use large language models (LLMs) to generate lessons for its users. Most lessons focus on general real-world scenarios such as greetings, ordering food, or asking directions, with limited support for profession-specific contexts. This gap can hinder learners from achieving professional-level fluency, which we define as the ability to communicate comfortably various work-related and domain-specific information in the target language. We surveyed five employees from a multinational company in the Philippines on their experiences with Duolingo. Results show that respondents encountered general scenarios more frequently than work-related ones, and that the former are relatable and effective in building foundational grammar, vocabulary, and cultural knowledge. The latter helps bridge the gap toward professional fluency as it contains domain-specific vocabulary. Each participant suggested lesson scenarios that diverge in contexts hen analyzed in aggregate. With this understanding, we propose that language learning applications should generate lessons that adapt to an individual's needs through personalized, domain specific lesson scenarios while maintaining foundational support through general, relatable lesson scenarios.2026-03-19T13:17:50Z5 pages,3 figures,presented at the 3rd HEAL Workshop at CHI 2026Carlos Rafael CatalanPatricia Nicole MonderinLheane Marie DizonGap EstrellaRaymund John SarmimentoMarie Antoinette Patalagsahttp://arxiv.org/abs/2603.18868v1Through the Looking-Glass: AI-Mediated Video Communication Reduces Interpersonal Trust and Confidence in Judgments2026-03-19T13:13:17ZAI-based tools that mediate, enhance or generate parts of video communication may interfere with how people evaluate trustworthiness and credibility. In two preregistered online experiments (N = 2,000), we examined whether AI-mediated video retouching, background replacement and avatars affect interpersonal trust, people's ability to detect lies and confidence in their judgments. Participants watched short videos of speakers making truthful or deceptive statements across three conditions with varying levels of AI mediation. We observed that perceived trust and confidence in judgments declined in AI-mediated videos, particularly in settings in which some participants used avatars while others did not. However, participants' actual judgment accuracy remained unchanged, and they were no more inclined to suspect those using AI tools of lying. Our findings provide evidence against concerns that AI mediation undermines people's ability to distinguish truth from lies, and against cue-based accounts of lie detection more generally. They highlight the importance of trustworthy AI mediation tools in contexts where not only truth, but also trust and confidence matter.2026-03-19T13:13:17ZNelson Navajas FernándezJeffrey T. HancockMaurice Jakeschhttp://arxiv.org/abs/2603.11667v2A technology-oriented mapping of the language and translation industry: Analysing stakeholder values and their potential implication for translation pedagogy2026-03-19T12:44:02ZThis paper examines how value is constructed and negotiated in today's increasingly automated language and translation industry. Drawing on interview data from twenty-nine industry stakeholders collected within the LT-LiDER project, the study analyses how human value, technological value, efficiency, and adaptability are articulated across different professional roles. Using Chesterman's framework of translation ethics and associated values as an analytical lens, the paper shows that efficiency-oriented technological values aligned with the ethics of service have become baseline expectations in automated production environments, where speed, scalability, and deliverability dominate evaluation criteria. At the same time, human value is not displaced but repositioned, emerging primarily through expertise, oversight, accountability, and contextual judgment embedded within technology-mediated workflows. A central finding is the prominence of adaptability as a mediating value linking human and technological domains. Adaptability is constructed as a core professional requirement, reflecting expectations that translators continuously adjust their skills, roles, and identities in response to evolving tools and organisational demands. The paper argues that automation reshapes rather than replaces translation value, creating an interdependent configuration in which technological efficiency enables human communicative work.2026-03-12T08:34:05ZUnder reviewMaría Isabel Rivas GinelJaniça HackenbuchnerAlina SecarăRalph KrügerCaroline Rossi