https://arxiv.org/api/tqMJcVt79TgPpdbDKfWMRvFI6fs2026-06-14T16:06:55Z3093442015http://arxiv.org/abs/2605.25489v1ATWL: A Formal Language for Representing, Comparing, and Reusing Visual Analytics Workflows2026-05-25T06:51:02ZVisual analytics (VA) workflows are inherently complex, involving data transformation, feature engineering, visual representation, and human interpretation. They are typically described in unstructured prose, hindering systematic comparison, reuse of proven strategies, and training of novices. We present Artifact-Transform Workflow Language (ATWL), a domain-agnostic, declarative language that formally represents VA workflows by capturing their structure and underlying analytical intent. ATWL is built upon a modular ontology of eight artifact types (entities, features, arrangements, visualisations, patterns, models, knowledge, specifications) and transforms characterised by standardised intents (e.g., define-unit, characterise, contextualise, abstract). To show that formalisation effort need not impede adoption, we extract workflows from research papers through supervised interaction with LLM agents, reducing the human role to review and refinement. Using this process, we constructed a library of seventeen ATWL workflows from published VA papers. Cross-workflow analysis reveals structural regularities -- a recurrent meta-structure, recurring motifs, reusable building blocks, diverse iterative strategies, and cross-domain equivalences -- that remain invisible in prose. We further evaluate practical utility through a controlled experiment in which the same LLM addressed two analytical problems with the library supplied either as original papers or as ATWL representations. Both forms enabled useful recommendations, but the formal representation systematically added explicit iteration structure, typed data flow, fragment-level adaptation provenance, and compactness supporting scaling beyond what prose libraries can fit in an LLM's context. ATWL enables a transition from narrative descriptions to formally represented, comparable, and reusable analytical knowledge.2026-05-25T06:51:02ZNatalia AndrienkoGennady AndrienkoJürgen BernardMichael Sedlmairhttp://arxiv.org/abs/2605.25454v1AI Content Moderation in Therapy Conversations2026-05-25T06:05:16ZLarge language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes. However, LLMs like ChaptGPT or Llama are often developed with content moderation guardrails that prevent them from discussing sensitive subjects with users for both liability and safety purposes, and this inability to broach these subjects may affect their capacity as therapists. In this study, we perform an algorithm audit on three state-of-the-art moderation systems (OpenAI's moderation endpoint, Meta's Llama Guard, and Google's Shield Gemma) to investigate the extent to which these systems flag the content of real-life therapy sessions as undesirable. Our results raise implications for the limitations that users and organizations may encounter when designing LLMs to play the part of a therapist.2026-05-25T06:05:16ZJiwon KimClaire WangTaeung YoonSabelle HuangKoustuv Sahahttp://arxiv.org/abs/2508.08043v2False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability2026-05-25T03:27:49ZVirtual Reality (VR) techniques, serving as the bridge between the real and virtual worlds, have boomed and are widely used in manufacturing, remote healthcare, gaming, etc. Specifically, VR systems offer users immersive experiences that include both perceptions and actions. Various studies have demonstrated that attackers can manipulate VR software to influence users' interactions, including perception and actions. However, such attacks typically require strong access and specialized expertise. In this paper, we are the first to present a systematic analysis of physical attacks against VR systems and introduce False Reality, a new attack threat to VR devices without requiring access to or modification of their software. False Reality disturbs VR system services by tampering with sensor measurements, and further spoofing users' perception even inducing harmful actions, e.g., inducing dizziness or causing users to crash into obstacles, by exploiting perceptual and psychological effects. We formalize these threats through an attack pathway framework and validate three representative pathways via physical experiments and user studies on five commercial VR devices. Finally, we further propose a defense prototype to mitigate such threats. Our findings shall provide valuable insights for enhancing the security and resilience of future VR systems.2025-08-11T14:47:23ZThe paper is being extensively rewrittenYancheng JiangYan JiangRuochen ZhouYi-Chao ChenXiaoyu JiWenyuan Xuhttp://arxiv.org/abs/2605.25296v1Subjective Code Preferences in Experts and Large Language Models2026-05-24T23:20:12ZLarge Language Models (LLMs) have become increasingly popular for coding tasks, with subjective coding preferences being an essential element to adapt to programmers' personal needs. Existing work overlooks such characteristics and mainly focuses on code correctness. In this study, we propose a typification of four subjective coding preference axes - complexity, commenting, modularity, and readability - motivated by common engineering habits and validated by 25 software engineers. We collect a dataset of ~3,000 paired Python code snippets reflecting these axes, annotated by 73 experts who rate their preferences on a Likert scale. Using our dataset, we study how LLMs handle subjective coding preferences. We present 13 LLMs with pairs of solutions to the same programming task, first as textual descriptions and then as concrete code snippets. We find that models often prefer one option in natural language but the opposite when evaluating code. More consistent models (i.e., those that are coherent in their choices between deeds and words) frequently reveal positional bias: swapping the order of options changes the preferred alternative. We then use the five most consistent models to re-annotate the dataset. Compared to humans, models show polarized Likert distributions and notable divergence in ratings. A case study on GPT-5 reveals reliance on external assumptions and brittle reasoning.2026-05-24T23:20:12ZAnna MokhovaSubhabrata DuttaIryna GurevychSimone Balloccuhttp://arxiv.org/abs/2605.25260v1Working Relations2026-05-24T21:23:28ZThis paper offers a concept of working relations as a complement and extension to existing theories of maintenance, care and repair. Building on the cases of an umbrella, a tractor and a pond, it advances seven propositions that might guide and inform further work and thinking in this space. It concludes with the challenging figures of Chernobyl, nickel extraction, and AI, and argues for the centrality of working relations to more generative and pluralistic relations with the things and worlds around us.2026-05-24T21:23:28ZSteven J. Jacksonhttp://arxiv.org/abs/2605.25120v1Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence2026-05-24T15:07:14ZRadiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structured information behind these reports, including measurements, image evidence, prior comparisons, lesion identity, uncertainty, and terminology, often remains trapped in free text or fragmented across picture archiving and communication systems, radiology information systems, reporting workstations, worksheets, advanced visualization tools, and electronic health records. This paper proposes a human-supervised, evidence-linked reference architecture for structured radiology reporting. The framework combines exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability using DICOM, DICOM Structured Reporting, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM. The system is positioned not as an autonomous report generator, but as a structured intelligence layer for enterprise imaging that supports reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS, RIS, EHR, analytics, and registry workflows. The paper also discusses modality-specific deployment considerations, clinical safety risks, validation requirements, cybersecurity, privacy, quality management, and regulatory boundaries for AI-assisted radiology reporting systems.2026-05-24T15:07:14ZTechnical report, 27 pages, 2 figures, 12 tables, 1 listing; reference architecture paper; does not report clinical outcomes or validated diagnostic performanceHouman KazemzadehKamyar Naderihttp://arxiv.org/abs/2507.10644v4From Multi-Agent Systems and the Semantic Web to Agentic AI: A Unified Narrative of the Web of Agents2026-05-24T14:18:12ZThe Web of Agents (WoA) transforms the document-centric Web into an environment of autonomous agents acting on users' behalf, a vision newly tractable as large language models (LLMs) mature. We argue that across three decades the WoA has undergone a \emph{semantic-effort migration} in chronological order: from platform-side coordination (Multi-Agent Systems, Generation~I), through data-side annotation (Semantic Web, Generation~II), to model-side interpretation (LLM-era, Generation~III). The central Gen~II~$\rightarrow$~Gen~III transition within this trajectory, which we call the \emph{semantics-in-data $\rightarrow$ semantics-in-models} shift, is predictive: each generation's failure modes and current open problems follow from where that generation located its semantic effort. The survey makes five contributions: (i)~a unified evolutionary narrative spanning 1990--2026; (ii)~a four-dimensional comparative framework (semantic foundation, communication paradigm, locus of intelligence, discovery mechanism) applied uniformly across all three generations; (iii)~classification of sixteen representative systems on these dimensions, including hybrid LLM--knowledge-graph and computer-use agents; (iv)~coverage of the November~2024--August~2026 institutional convergence (Linux Foundation's Agentic AI Foundation, A2A v1.0, MCP November~2024 launch and November~2025 specification, Visa/Mastercard/Stripe payment-network protocols, EU AI Act phased enforcement, the NIST AI Agent Standards Initiative, International AI Safety Report 2026); and (v)~seven named lessons grounded in cross-generational evidence paired with seven generation-invariant challenges that persist regardless of which protocol prevails. Further progress depends less on protocol design than on the socio-technical infrastructure now being assembled by standards bodies, regulators, and commercial payment networks.2025-07-14T16:47:19ZTatiana PetrovaSEDAN SnT, University of Luxembourg, Luxembourg, LuxembourgBoris BliznioukovSEDAN SnT, University of Luxembourg, Luxembourg, LuxembourgAleksandr PuzikovSEDAN SnT, University of Luxembourg, Luxembourg, LuxembourgRadu StateSEDAN SnT, University of Luxembourg, Luxembourg, Luxembourghttp://arxiv.org/abs/2605.25058v1Intent Signal Theory: A Computational Framework for Intent-State Control in Human-AI Interaction2026-05-24T13:10:33ZCurrent AI interaction models treat the prompt as the primary object of exchange, omitting a critical layer: the user's latent source intent, the goal state preceding and motivating the prompt. Here we introduce Intent Signal Theory (IST), a computational framework that formalises this missing intent layer. IST distinguishes four objects routinely conflated: latent source intent (I*), observable intent proxy (I-hat), encoded carrier (P), and model output (O). It formalises dimensional weights, encoding masks, structural and fidelity recovery scores, and public-private intent decomposition. The Theorem of Irreversible Intent Loss establishes that private intent absent from the carrier cannot be recovered beyond generic substitution. Evidence from four companion studies spanning six LLMs, three languages and three task domains shows structural-fidelity splits, human-validated metric dissociation, and weight-tolerance plateaus consistent with IST's predictions. IST reframes prompt engineering as intent-protocol design and identifies a computational layer that current AI systems lack.2026-05-24T13:10:33Z10 pages, 2 figures. Theoretical framework paper grounded in four companion empirical studies. Data and code repository: https://github.com/PGlarry/prompt-protocol-specificationGang Penghttp://arxiv.org/abs/2512.10961v2AI as Equalizer or Amplifier? Task Complexity as the Moderating Factor for Human Expertise in Hybrid Intelligence Systems2026-05-24T03:46:01ZA growing body of empirical research suggests that generative AI narrows performance gaps between novice and expert workers on routine tasks--the so-called "equalizer" effect. This paper challenges the generality of that conclusion. Drawing on cognitive augmentation theory, expert-novice research, and structured observations of in-house generative-AI use across a small software product team, we argue that AI functions primarily as a cognitive amplifier: a system whose output quality depends fundamentally on the expertise of the human who directs it. We present a framework comprising three layers of human contribution (problem definition, quality evaluation, iterative refinement) and three levels of engagement (passive acceptance, iterative collaboration, cognitive direction), demonstrating that domain expertise--not prompt engineering skill--determines amplification effectiveness. We reconcile the equalizer and amplifier perspectives by proposing that AI equalizes performance on well-structured, routine tasks while amplifying pre-existing differences on complex tasks requiring deep judgment. This reconciliation carries direct implications for hybrid human-AI system design: rather than building AI that replaces expertise, we should build AI that rewards and develops it. We outline a research agenda for the HHAI community centered on expertise-sensitive AI design, adaptive collaboration interfaces, and longitudinal studies of human capability development in AI-augmented work.2025-10-30T11:55:34Z9 pages, 3 figures, 1 table. v2 matches the camera-ready version accepted at HHAI 2026. Removed v1 aggregated projections (training timeline figure, n=580). Empirical basis is structured field observations of 10 to 20 colleagues at a single organization (Beijing Feimu) since mid-2024. Conceptual framework unchanged. To appear in Frontiers in Artificial Intelligence and Applications (IOS Press)Tao Anhttp://arxiv.org/abs/2605.24830v1Macaron-A2UI: A Model for Generative UI in Personal Agents2026-05-24T02:51:07ZAs personal agents evolve to handle complex, user-centric tasks, static plain-text chat is rapidly becoming a bottleneck. Generative UI emerges as the necessary new interface layer, dynamically synthesizing the right controls, options, and state from the interaction context in real time. We present Macaron-A2UI, a model for Generative UI in personal agents. Our goal is to move beyond text-only interaction by enabling agents to generate natural language together with lightweight, executable UI actions for information collection, preference refinement, confirmation, and multi-goal organization. We build a large-scale Generative UI corpus from heterogeneous dialogue sources, introduce A2UI-Bench for controlled evaluation, and train 30B, 235B and 754B models with parameter-efficient LoRA-based supervised fine-tuning followed by reward-driven reinforcement learning. The best Macaron-A2UI model reaches 75.6 overall on A2UI-Bench without explicit schema hints, surpassing the strongest full-schema frontier baseline. We release the models, benchmark, and evaluation protocol to support future work on Generative UI for personal agents.2026-05-24T02:51:07ZFancy KongCongjie ZhengMurphy ZhuangRio YangSueky ZhangHao FuGene JinSong CaoKaijie ChenAndrew ChenPony Mahttp://arxiv.org/abs/2605.24729v1"It Felt a Bit Eerie": Exploring Humanlike Interactions During Collaborative Writing with an Artificial Agent2026-05-23T20:48:34ZWhile human-AI collaboration systems have increasingly been built to increase efficiency or support creativity, little work has examined how the design of interactions shapes the social connection between human and artificial agent. We examine how the temporal and visual dimensions of collaboration shape the experience of a writing task. Specifically, we built three variants of an AI-assisted text editor along a spectrum of simulated humanlike interaction (synchronous and with a cursor) to machinelike interaction (asynchronous and without a cursor), and conducted a comparative user study (n=48). Our exploratory findings suggest that synchronous suggestions increased efficiency but led to contextual misalignment, while a visual cursor increased intent understanding but evoked feelings of surveillance. Taken together, humanlike design of artificial agents can create positive social expectations but also elicit social costs, especially without the alignment present in human-human collaboration. We extend our findings into design implications and ethical considerations when building human-AI collaboration systems.2026-05-23T20:48:34Z29 pages, 3 figuresMichael YinAngela ChiangSamuel Rhys CoxRobert Xiaohttp://arxiv.org/abs/2605.24712v1Hardware-Aware Federated Learning for Speech Emotion Recognition2026-05-23T19:52:38ZFederated learning (FL) enables privacy-preserving collaborative training across distributed edge devices, but real deployments involve heterogeneous clients with different processing power, memory capacity, and communication latency, which often increase round duration and system cost. This paper proposes a hardware-aware federated learning framework for emotion recognition on session-partitioned IEMOCAP that integrates hardware profiling, top-K client selection, and adaptive local epochs within a unified training loop. We compare the method against FedAvg, FedProx, and random top-K selection under a non-IID setup and show that, across 50 federated rounds and 5 independent trials, the proposed approach achieves competitive validation accuracy (0.352), reduces total training time by about 36.5% compared to FedAvg, and lowers cumulative communication cost by 40%.2026-05-23T19:52:38Z4 pages, 3 figures, 4 TablesBeyazit Bestami YukselEmrah Dikbiyikhttp://arxiv.org/abs/2605.07185v2Metaphors as Scaffolds: Spatial, Embodied, Fantastical, and Relational Framings for Youth Usable Privacy Design2026-05-23T13:47:51ZDrawing on observations from three prior studies with youth aged 13--24, we examine how metaphor shapes the way young people reason about privacy and imagine privacy designs beyond settings panels. Spatial metaphors made complex permission structures feel like movement through rooms and the placing of objects within them. Embodied metaphors gave youth language for shared norms around presence, access, and intrusion. Fantastical metaphors turned privacy work into something playful and discoverable, prompting more generative and granular design ideas. Relational metaphors, however, exposed the same mechanism's downside: when a system feels like a loyal companion while data passes through an institution, youth may disclose more than they otherwise would. This provocation does not argue that some metaphors are good and others bad. It argues that metaphors meaningfully scaffold both the design process and the user experience of usable privacy, and that choosing one is an ethical decision about which norms a privacy interface makes easy to see, imagine, and act on.2026-05-08T03:24:56ZJaeWon KimAlexis Hiniker10.1145/3802974.3809449http://arxiv.org/abs/2605.22715v2AnyMo: Geometry-Aware Setup-Agnostic Modeling of Human Motion in the Wild2026-05-23T13:34:44ZAs wearable and mobile devices become increasingly embedded in daily life, they offer a practical way to continuously sense human motion in the wild. But inertial signals are highly dependent on the sensing setup, including body location, mounting position, sensor orientation, device hardware, and sampling protocol. This setup dependence makes it difficult to learn motion representations that transfer across devices and datasets, and limits the broader use of wearable IMUs beyond closed-set recognition. We introduce AnyMo, a geometry-aware framework for setup-agnostic human motion modeling. AnyMo uses physics-grounded IMU simulation over dense body-surface placements to generate diverse and plausible synthetic signals, pre-trains a graph encoder from paired synthetic placement views and masked partial observations, tokenizes multi-position IMU into full-body motion tokens, and aligns these tokens with an LLM for motion-language understanding. We evaluate AnyMo on three complementary tasks: zero-shot activity recognition across 14 unseen downstream datasets, cross-modal retrieval, and wearable IMU motion captioning, where it improves average Accuracy/F1/R@2 by 11.7\%/11.6\%/22.6\% on HAR, increases zero-shot IMU-to-text and text-to-IMU retrieval MRR by 15.9\% and 28.6\%, respectively, and improves zero-shot captioning BERT-F1 by 18.8\%. These results support AnyMo as a generalist model for wearable motion understanding in the wild. Project page: https://baiyuchen.com/project/AnyMo.2026-05-21T16:52:10ZBaiyu ChenZechen LiWilson WongsoLihuan LiXiachong LinHao XueBenjamin TagFlora Salimhttp://arxiv.org/abs/2603.00177v3Detecting Cognitive Signatures in Typing Behavior for Non-Intrusive Authorship Verification2026-05-23T13:00:40ZThe proliferation of AI-generated text has intensified the need for reliable authorship verification, yet current output-based methods are increasingly unreliable. We observe that the ordinary typing interface captures rich cognitive signatures, measurable patterns in keystroke timing that reflect the planning, translating, and revising stages of genuine composition. Drawing on large-scale keystroke datasets comprising over 136 million events, we define the Cognitive Load Correlation (CLC) and show it distinguishes genuine composition from mechanical transcription. We present a non-intrusive verification framework that operates within existing writing interfaces, collecting only timing metadata to preserve privacy. Our analytical evaluation estimates 85 to 95 percent discrimination accuracy under stated assumptions, while limiting biometric leakage via evidence quantization. We analyze the adversarial robustness of cognitive signatures, showing they resist timing-forgery attacks that defeat motor-level authentication because the cognitive channel is entangled with semantic content. We conclude that reframing authorship verification as a human-computer interaction problem provides a privacy-preserving alternative to invasive surveillance.2026-02-26T20:02:55Z7 pagesDavid Condrey