https://arxiv.org/api/QMZL/51hqTAkV9h6KBO89t01hII 2026-06-18T20:47:25Z 28983 435 15 http://arxiv.org/abs/2605.31556v1 Vision-Language Models Suppress Female Representations Under Ambiguous Input 2026-05-29T17:20:02Z

Alignment teaches vision-language models (VLMs) to avoid expressing demographic biases, and when gender is clearly visible they largely succeed. Far less is known about ambiguous inputs (a worker in full gear, a figure seen from behind) cases common in practice yet rarely studied. We find that minimal prompting pressure exposes occupation-gender defaults when prompting ambiguous input images, with models collapsing to male even for strongly female-stereotyped occupations. But do these outputs reflect what models actually encode internally? We introduce LALS (Latent Association Leaning Score), a zero-shot metric that projects visual-token activations into the model's text-embedding space to measure concept associations per token and layer. Across 15 occupations, over 800 gender-ambiguous images, and four VLMs, internal representations and outputs are systematically decoupled: models often encode a female association internally yet output male. Layer-wise analysis reveals an asymmetric filter -- male signal amplifies end-to-end while female signal peaks mid-network and is suppressed before generation -- and a color ablation shows that culturally loaded visual cues such as clothing color further modulate these internal associations.

2026-05-29T17:20:02Z 16 pages, 12 figures, 1 table Arnau Marin-Llobet Simon Henniger Mahzarin R. Banaji http://arxiv.org/abs/2606.00200v1 Student Competency Assessment and Presentation Methods Based on Algorithm Courses 2026-05-29T17:03:59Z

This full research paper describes the assessment and presentation of student competencies in algorithm courses, grounded in the CC2020 competency model. With the growing emphasis on bridging the gap between academic training and industry demands, competency-based education, which integrates knowledge, skills, and dispositions, has become pivotal in computer science education. To bridge the gap, we need to develop a comprehensive framework to evaluate competencies (knowledge, skills, and dispositions) in computer science education. The research aims to analyze learning behavior patterns, design methods for competency assessment in algorithm courses, and evaluate the difficulty of course experiments to inform curriculum design. We collected programming experiment and written assignment data from 169 students, adapting it to the xAPI specification for unified analysis. In this work, Markov process modeling was employed to analyze behavioral sequences, revealing cognitive patterns during programming tasks. Multiple methods were applied to quantify competencies (knowledge, skills, dispositions) and identify distinct student clusters. Course difficulty was quantified using proactiveness metrics derived from submission timeliness. This work contributes a scalable framework for competency assessment in algorithm courses and offers actionable insights for personalized teaching and curriculum optimization. Practically, it enables instructors to tailor interventions based on student clusters and optimize task difficulty. Future work will integrate more students' performance to validate competency models and extend the framework to broader computer science curricula.

2026-05-29T17:03:59Z Published in: 2025 IEEE Frontiers in Education Conference (FIE). 9 pages, 10 figures, 3 tables. Author accepted manuscript Proc. 2025 IEEE Frontiers in Education Conference (FIE), Nashville, TN, USA, 2025, pp. 1-9 Yingqi Zhang Ninghan Zheng Shanshan Li Weidong Liu 10.1109/FIE63693.2025.11328247 http://arxiv.org/abs/2605.26309v2 Visual Matters: Connecting Aesthetic Appeal and Production Quality of Photos, Infographics and Data Visualizations to Credibility of Social Media Posts 2026-05-29T16:52:09Z

The rapid proliferation of visual content raises fundamental questions about how different visual formats and features shape perceived credibility. Drawing on processing fluency theory, this research examines how visuals shape credibility judgments. We focus on three popular formats-photos, infographics, and data visualizations-comparing them to text-only posts, and test how two visual features, aesthetic appeal and production quality, influence credibility through processing fluency as a mediating mechanism. Through a preregistered experiment with 1200 US participants, we found that visual posts are generally perceived as more credible than text-only posts but this credibility advantage only applies to photos and infographics, not to data visualizations. Aesthetic appeal increases perceived credibility, partially mediated by processing fluency, while production quality had no significant effect on credibility across formats. These findings differentiate visual formats, advance conceptualizations of visual features, and identify processing fluency as a key mechanism for theorizing credibility across multimodal contexts.

2026-05-25T20:03:59Z Salman Khawar Yingdan Lu Yilang Peng Jiyoung Yeon Cuihua Shen http://arxiv.org/abs/2606.07612v1 Position: Anthropomorphic Misalignment Research Needs Stronger Evidence 2026-05-29T16:38:53Z

We argue that many Anthropomorphic Misalignment Research (AMR) studies need stronger evidence to ensure that they can provide a robust foundation for critical safety decisions, such as model deployment and regulation. By evaluating failure modes across different misalignment concepts, such as deception, emergent misalignment, and sycophancy, we show how conceptual ambiguity, non-robust datasets, experimental design, and insufficient causal interventions can lead to overinterpretation of model behaviors. This position paper aims to offer guidance on evidentiary considerations that can help improve methodological rigor in AMR. To achieve this, we provide a clear call to action through a proposed framework of evidence levels and a diagnostic checklist. These shared standards will enable more productive scientific discourse and ensure that claims about AI risks rest on solid empirical foundations.

2026-05-29T16:38:53Z Vansh Gupta Peter Nutter Samuel Stante Andreas Krause Florian Tramèr Lukas Fluri Xin Chen Anna Hedström http://arxiv.org/abs/2605.03873v2 Bodyless Presence: Reconsidering the Minimal Self in Immersive Video 2026-05-29T14:24:35Z

Immersive video, namely 180-degree and 360-degree video designed to be viewed through head-mounted displays, constitutes an important boundary case between interactive VR and conventional two-dimensional video viewing for reconsidering self-experience in XR. In immersive video, the user can select the direction of the viewpoint through head rotation, while being unable to actively change the recorded environment through walking, approaching, grasping, or manipulating. In many cases, no explicit body or avatar corresponding to the user is provided. This paper reinterprets presence in immersive video not as bodily extension or body ownership of an avatar, but as a form of self-experience in which self-location becomes relatively dominant under conditions of reduced body schema availability. This paper calls this condition a self-location-dominant state. In this state, viewpoint-directed agency is retained, whereas environment-directed agency and body ownership are constrained. Nevertheless, events such as viewpoint motion, impact, contact, and direct address may be experienced not merely as changes within an image, but as events concerning the viewpoint position at which the self is located. This paper examines this structure by connecting research on presence, the sense of embodiment, bodily self-consciousness, and the minimal self. The minimal self in immersive video is thereby redescribed not primarily in terms of agency or ownership, but in terms of viewpoint-based self-location established under conditions in which the contribution of the body schema is reduced. This perspective provides a basis for theorising self-experience in non-interactive immersive media and for reconsidering the relation between body, viewpoint, and presence in XR.

2026-05-05T15:34:39Z 12 pages, 3 figures. Revised version with expanded theoretical discussion of self-location, agency, body schema availability, and bodily self-consciousness. Project page: https://sites.google.com/view/bodylesspresence/ Koichi Toida http://arxiv.org/abs/2603.00068v2 The Global Landscape of Environmental AI Regulation: From the Cost of Reasoning to a Right to Green AI 2026-05-29T14:12:23Z

Artificial intelligence (AI) systems impose substantial and growing environmental costs, yet transparency about these impacts has declined even as their deployment has accelerated. This paper makes three contributions. First, we collate empirical evidence that generative Web search and reasoning models - which have proliferated in 2025 - come with much higher cumulative environmental impacts than previous generations of AI approaches. Second, we map the global regulatory landscape across eleven jurisdictions and find that the manner in which environmental governance operates (predominantly at the facility-level rather than the model-level, with a focus on training rather than inference, with limited AI-specific energy disclosure requirements outside the EU) limits its applicability. Third, to address this, we propose a three-pronged policy response: mandatory model-level transparency that covers inference consumption, benchmarks, and compute locations; user rights to opt out of unnecessary generative AI integration and to select environmentally optimized models; and international coordination to prevent regulatory arbitrage. We conclude with concrete legislative proposals - including amendments to the EU AI Act, Consumer Rights Directive, and Digital Services Act - that could serve as templates for other jurisdictions.

2026-02-10T19:20:22Z 23 pages, 1 table, preprint Kai Ebert Boris Gamazaychikov Philipp Hacker Sasha Luccioni http://arxiv.org/abs/2605.04127v2 Position: the Stochastic Parrot in the Coal Mine. Model Collapse is a Threat to Low-Resource Communities 2026-05-29T13:50:19Z

Model collapse, the degradation in performance that arises when generative models are trained on the outputs of prior models, is an increasing concern as artificially generated content proliferates. Related critiques of large language models have highlighted their tendency to reproduce frequent patterns in training data, their reliance on vast datasets, and their substantial environmental cost. Together, these factors contribute to data degradation, the reinforcement of cultural biases, and inefficient resource use. In this position paper we aim to combine these views and argue that model collapse threatens current efforts to democratize AI. By reducing training efficiency and skewing data distributions away from the tails of their support, model collapse disproportionately impacts low-resource and marginalized communities. We examine both the environmental and cultural implications of this phenomenon, situate our position within recent position papers on model collapse, and conclude with a call to action. Finally, we outline initial directions for mitigating these effects.

2026-05-05T15:42:31Z 14 pages, 1 figure, 1 table, International Conference on Machine Learning Devon Jarvis Richard Klein Benjamin Rosman Steven James Stefano Sarao Mannelli http://arxiv.org/abs/2605.31316v1 Governance-Aware Software Architecture for Multi-Stakeholder Platforms 2026-05-29T13:47:08Z

Multi-stakeholder platforms (MSPs) coordinate diverse stakeholder groups, often with competing or conflicting requirements. As these platforms increasingly take digital form, engineers building them make architectural decisions about data visibility, service decomposition, and algorithm design that directly determine which stakeholder requirements are prioritized when conflicts arise. Software architecture literature provides patterns for data isolation and access control among tenants but does not address how architectural decisions resolve conflicts among stakeholders with structurally divergent interests. MSP governance literature identifies the principles at stake but treats technology as neutral infrastructure. Neither addresses the translation between governance principles and architectural decision spaces. This paper proposes a governance-architecture correspondence framework that surfaces implicit governance decisions, making them explicit and debatable before deployment. The framework maps five MSP governance principles to the architectural decision spaces where they must be addressed, identifying for each the governance-aware design choice and the technically convenient default it overrides. We illustrate the framework in a constructed knowledge platform for pig farming in Rwanda, where five stakeholder types present structurally conflicting requirements. As work in progress, the framework is proposed but not yet empirically validated; a planned pre/post judgment study with platform users across all stakeholder types will test falsifiable predictions about governance outcomes.

2026-05-29T13:47:08Z Michael Nwankwo Eric Umuhoza http://arxiv.org/abs/2605.31287v1 Neither Replacement nor Panacea: Comparing LLM-Based Conversational and Graphical Decision Support in Industrial Tasks 2026-05-29T13:22:58Z

Managers in manufacturing settings rely on digital interfaces to interpret operational data for decision-making, but growing data volume and complexity can make relevant insights difficult to identify efficiently. While dashboards remain dominant in industrial contexts, Large Language Model (LLM)-based conversational agents (CAs), accessed through conversational user interfaces (CUIs), may provide more direct access to such data. However, their effectiveness may depend on the information-processing demands of the task. This study compares an LLM-based CA delivered through a CUI with a dashboard in a manufacturing decision-support scenario. In a mixed factorial experiment with a 2x3 design, 134 industrial decision-makers were assigned to one interface condition and completed three tasks of increasing complexity. We examined perceived Mental Workload (MWL), decision accuracy, completion time, and intended reliance, and tested self-reported data literacy as a moderator. Results showed that the CUI reduced perceived MWL overall and supported faster completion in less demanding tasks, but both advantages diminished as task complexity increased. Neither interface produced a consistent overall advantage in decision accuracy, and the CUI was not preferred as a sole basis for subsequent decisions. Furthermore, data literacy did not reliably moderate interface effects. These findings indicate that conversational interaction offers conditional rather than universal benefits for industrial decision support. LLM-based CAs may reduce information-access effort, whereas complex decisions continue to benefit from persistent, inspectable visual representations.

2026-05-29T13:22:58Z Roberto Figliè Simone Caputo Alan Serrano Daria Mikhaylova Tommaso Turchi Daniele Mazzei http://arxiv.org/abs/2506.12060v2 Organizational Adaptation to Generative AI in Cybersecurity 2026-05-29T13:22:11Z

Cybersecurity organizations are adapting to GenAI integration through modified frameworks and hybrid operational processes, with success influenced by existing security maturity, regulatory requirements, and investments in human capital and infrastructure. This qualitative research employs systematic document analysis and comparative case study methodology to examine how 25 studies from 2022 to 2025 document organizational adaptation of threat modeling frameworks, revealing a shift away from traditional signature-based systems toward AI-capable frameworks across three primary patterns: LLM integration for security applications, GenAI frameworks for risk detection and response automation, and AI/ML integration for threat hunting and matching. Organizations with mature infrastructures, particularly in finance and critical infrastructure, demonstrate higher readiness through structured governance, dedicated AI teams, and robust incident response processes, with central banks and financial institutions leading adaptation efforts under regulatory pressure. Successful integration requires human oversight of automated systems, attention to data quality and explainability, and sector-specific governance, though ongoing difficulties with privacy protection, bias reduction, personnel training, and adversarial defense persist. Notable imbalances between offensive and defensive GenAI capabilities create strategic concerns for security planning. The findings offer actionable insights for cybersecurity professionals and underscore the need for adaptive approaches, ethical frameworks, and staff development when managing AI-enhanced threats.

2025-05-31T18:16:11Z 38 pages, 1 table, 1 figure Revised title, abstract, and formatting for journal submission, corrected heading numbers, no substantive changes in content Christopher Nott http://arxiv.org/abs/2605.31224v1 Comparing LLM-Based Conversational and Graphical Interfaces for Industrial Decision Tasks: An Exploratory Mixed-Methods Study 2026-05-29T12:27:39Z

The use of Generative AI Conversational User Interfaces (CUI) as a new way to access and analyze data is growing in all sectors, and the industrial one is no exception. There, large amounts of data produced by IoT devices are flowing through user interfaces and may require them a new adaptation to the new analyses needs of decision-makers. LLM-based CUIs are promising a new way to directly interact with those data through the directness of natural language and without the learning costs that every GUI design has. Moreover, the capabilities of LLMs and their agency open up the possibility to automate some tasks and help with the reasoning during decision-making activities. But are this promises well founded? We try to scope this general question with a mixed-approach study comparing a state-of-the-art dashboard with a conversational agent. A total of 20 participants used both interfaces to complete four simulated industrial decision tasks of varying complexity. We combined measures of mental workload, completion time, and decision accuracy with a post-study questionnaire and semi-structured interviews analyzed through thematic analysis. The findings suggest that the conversational agent can reduce interactional effort by supporting more direct access to information, while the dashboard remains valuable for overview and verification. However, these benefits may vary across tasks and require validation through larger-scale studies.

2026-05-29T12:27:39Z Roberto Figliè Simone Caputo Alan Serrano Tommaso Turchi Daniele Mazzei http://arxiv.org/abs/2605.30930v1 TUX: Measuring Human--AI Tacit Understanding 2026-05-29T07:19:58Z

As large language models (LLMs) increasingly act as collaborative partners, human--AI alignment is often evaluated through explicit task success, accuracy, or reward optimization. Yet many collaborative settings depend on tacit understanding: whether an agent can align with a human's evaluative stance or representational priors without clear objectives, communication, or feedback. To study this capacity, we develop a spectrum-placement task inspired by the social party game Wavelength, in which humans and agents independently place concepts along subjective spectra. We operationalize the Tacit Understanding Index (TUX) as a pairwise measure of similarity between human and agent judgments, and evaluate it with 241 human participants and 200 profile-conditioned LLM agents across four models. We find that nearest human--agent pairs in trait space achieve significantly higher TUX, suggesting that tacit alignment is structured by person-level characteristics rather than random similarity. Regression analyses show that TUX becomes more explainable as predictor sets become richer, with individual traits, decision-making styles, and confidence improving over aggregate trait-distance baselines. These findings suggest that tacit understanding between humans and LLMs is measurable, while revealing the limits of profile-based conditioning for capturing deeper representational alignment.

2026-05-29T07:19:58Z Yueshen Li Hanyi Min Vedant Das Swain Koustuv Saha http://arxiv.org/abs/2603.27052v3 Multi-Level Barriers to Generative AI Adoption Across Disciplines and Professional Roles in Higher Education 2026-05-29T07:06:32Z

Generative Artificial Intelligence (GenAI) is rapidly reshaping higher education, yet barriers to its adoption across different disciplines and institutional roles remain underexplored. Existing literature frequently attributes adoption barriers to individual-level factors such as perceived usefulness and ease of use. This study instead investigates whether such barriers are structurally produced. Drawing on a multi-method survey analysis of 272 academic and professional services (PSs) staff at a Russell Group university, we examine how disciplinary contexts and institutional roles shape perceived barriers. By integrating multinomial logistic regression (MLR), structural equation modelling (SEM), and semantic clustering of open-ended responses, we move beyond descriptive accounts to provide a multi-level explanation of GenAI adoption. Our findings reveal clear, systematic differences: non-STEM academics primarily report ethical and cultural barriers related to academic integrity, whereas STEM and PSs staff disproportionately emphasize institutional, governance, and infrastructure constraints. We conclude that GenAI adoption barriers are deeply embedded in organizational ecosystems and epistemic norms, suggesting that universities must move beyond generalized training to develop role-specific governance and support frameworks.

2026-03-27T23:48:25Z 21 pages, 3 figures, 6 tables Educ. Sci. 2026, 16(6), 838; Jianhua Yang Kerem Öge Adrian von Mühlenen Abdullah Bilal Akbulut Tanya Suzanne Carey Chidi Okorro 10.3390/educsci16060838 http://arxiv.org/abs/2605.30913v1 Toxic HallucinAItions: Perturbing Prompts and Tracing LLM Circuits 2026-05-29T06:58:47Z

Large language models (LLMs) are increasingly deployed in conversational settings where user tone ranges from polite to adversarial or toxic, yet less is known about whether toxic language in otherwise semantically equivalent prompts can degrade factual reliability. We study how lexical and tone-based prompt perturbations affect the factual reliability of LLMs. Using controlled prompt variations across polite, random, and three toxicity levels, we evaluate five LLMs on ARC-Easy, GSM8K, and MMLU. We find that toxic lexical perturbations consistently reduce factual accuracy and increase uncertainty, while polite phrasing yields limited and inconsistent changes. To examine whether these answer inconsistencies correspond to internal changes, we conduct attribution-graph analyses of model activations and influences. We find that increasing toxicity selectively amplifies perturbation-sensitive variant nodes while relatively stable core reasoning nodes remain more invariant. These findings position prompt tone as a critical dimension of LLM reliability and provide behavioral and mechanistic evidence that surface-level lexical variation can alter factual outputs and internal computation.

2026-05-29T06:58:47Z Soorya Ram Shimgekar Agam Goyal Amruta Parulekar Joshua Chen Yian Wang Navin Kumar Hari Sundaram Eshwar Chandrasekharan Koustuv Saha http://arxiv.org/abs/2507.05488v2 OLG++: A Semantic Extension of Obligation Logic Graph 2026-05-29T06:54:11Z

We present OLG++, a semantic extension of the Obligation Logic Graph (OLG) for modeling regulatory and legal rules in municipal and interjurisdictional contexts. OLG++ introduces richer node and edge types, including spatial, temporal, party group, defeasibility, and logical grouping constructs, enabling nuanced representations of legal obligations, exceptions, and hierarchies. The model supports structured representation of rules with contextual conditions, precedence, and complex triggers. We demonstrate its use through examples from food-business regulations, showing how OLG++ supports legal question answering using property-graph queries. We also discuss how OLG++ can complement LegalRuleML by providing graph-native constructs for subclass relations, spatial constraints, and reified exception structures. The worked examples and first-pass coverage analysis show that, on the dimensions studied, OLG++ is more expressive than the baseline OLG model for municipal regulatory representation.

2025-07-07T21:24:52Z Subhasis Dasgupta Jon Stephens Amarnath Gupta