https://arxiv.org/api/m5QSiqg2EtWcKePAUhLrZH+dSzU2026-03-22T08:46:37Z27354015http://arxiv.org/abs/2603.19213v1Constitutive vs. Corrective: A Causal Taxonomy of Human Runtime Involvement in AI Systems2026-03-19T17:57:07ZAs AI systems increasingly permeate high-stakes decision-making, the terminology regarding human involvement - Human-in-the-Loop (HITL), Human-on-the-Loop (HOTL), and Human Oversight - has become vexingly ambiguous. This ambiguity complicates interdisciplinary collaboration between computer science, law, philosophy, psychology, and sociology and can lead to regulatory uncertainty. We propose a clarification grounded in causal structure, focused on human involvement during the runtime of AI systems. The distinction between HITL and HOTL, we argue, is not primarily spatial but causal: HITL is constitutive (a human contribution is necessary for the decision output), while HOTL is corrective (external to the primary causal chain, capable of preventing or modifying outputs). Within HOTL, we distinguish three temporal modes - synchronous, asynchronous, and anticipatory - situated within a nested model of provider and deployer runtime that clarifies their different capacities for intervention. A second, orthogonal dimension captures cognitive integration: whether human and machine operate as complementary or hybrid intelligence, yielding four structurally distinct configurations. Finally, we distinguish these descriptive categories from the normative requirements they serve: statutory "Human Oversight" is a specific normative mode of HOTL that demands not merely a corrective causal position, but genuine preparedness and capacity for effective intervention. Because the same person may occupy both HITL and HOTL roles simultaneously, we argue that this role duality must be treated as a design problem requiring architectural and epistemic mitigation rather than mere acknowledgment.2026-03-19T17:57:07ZKevin BaumJohann Lauxhttp://arxiv.org/abs/2603.19093v1Follow the Rules (or Not): Community Norms and AI-Generated Support in Online Health Communities2026-03-19T16:19:29ZGenerative AI (GenAI) is increasingly being integrated into the online ecosystem, including online health communities (OHCs), where people with diverse health conditions exchange social support. For example, in OHCs, support providers are beginning to share content generated, directly or indirectly, by popular GenAI-based tools. OHCs are governed by norms that define appropriate behavior when providing support. Ways in which AI-generated support interacts with these norms remain underexplored. Inappropriate conformance or outright violation can erode seekers' trust, distort decision-making, and threaten community sustenance. In this work, we examine whether (and how) AI-generated support conforms to norms, using popular opioid-use recovery subreddits as our testbed. First, we provide an inventory of norms regulating text-based support provision in OHCs. Next, using human-validated LLM judges, we assess the prevalence of AI's conformity to these norms. Finally, through an expert review, we identify risks to seekers (and OHCs) resulting from norm (non)conformity. Our analysis revealed that, while AI-generated support conforms to norms, such conformity may be inappropriate or insufficient, for example, by over- or under-validating seekers in distress. Moreover, we observed instances of outright norm violation. This work provides insights that can help moderators and OHC designers adapt existing and develop new norms to regulate AI integration, protecting both seekers and communities they rely on.2026-03-19T16:19:29ZShravika MittalErin KassonLayna ParaboschiEleanor LaufenbergJiawei ZhouPatricia A. Cavazos-RehgTanushree MitraMunmun De Choudhuryhttp://arxiv.org/abs/2603.19084v1On The Effectiveness of the UK NIS Regulations as a Mandatory Cybersecurity Reporting Regime2026-03-19T16:10:15ZExisting cybersecurity literature lacks a source of empirical, representative data as to the true nature of cyberattacks on Critical National Infrastructure. We have obtained UK-wide data on incidents reported under the Network and Information Systems (NIS) Regulations in 2024 causing "a significant impact on the continuity" of essential services and comparator data from intelligence agencies. We find that 29% of NIS reports already concern cybersecurity incidents. As the UK Government seeks to extend cybersecurity reporting, we find the NIS Regulations are limited in their effectiveness; whilst our requests revealed 30 cybersecurity incidents reported under the NIS regulations, there were 89 incidents classified as "highly significant and significant" captured by the National Cyber Security Centre in the 2024 reporting year. Whereas 36% of Cybersecurity and Infrastructure Security Agency reported attacks concerned espionage, from NIS data we find 100% NIS-reportable cyberattacks concerning healthcare systems in England in 2024 were ransomware.2026-03-19T16:10:15ZJunade AliChris Hickshttp://arxiv.org/abs/2603.17899v2Crisis-induced differences in attention towards Ukraine in Twitter 2008-20232026-03-19T14:45:41ZAggression against Ukraine has drawn widespread international attention, particularly in the wake of the two Russian invasions into Ukrainian territory in 2014 and 2022. Although previous studies have examined social-media dynamics around these events, a comparative longitudinal data-driven view across languages is still missing. This article fills this gap by mapping added attention to "Ukraine" on Twitter in 28 languages from 2008 to 2023, using a deceptively simple DNA microarray-inspired cartography of log over-expression relative to each language's baseline frequency. This macro-scale visualization makes familiar events stand out while uncovering subtler patterns beyond the cognitive reach of any single-language audience. Most strikingly, two nearly non-overlapping language clusters emerge, one peaking around 2014 and the other around 2022 with distinct onset and decay profiles that mirror national readiness (or reluctance) to support Ukraine. By capturing attention at local, meso, and global scales, our approach offers a versatile tool for comparing relative bias across languages, user subgroups, platforms, or even historical print corpora. Ultimately, our cartographic approach reveals a troubling asymmetry: while publicly accessible data allows for an approximation of global attention patterns, the complete and unfiltered view remains largely hidden behind the closed, proprietary algorithms of major social media platforms, granting a far more comprehensive access to understanding global information flows.2026-03-18T16:31:39ZSubmitted to Humanities and Social Sciences CommunicationsMark MetsPeter Sheridan DoddsMaximilian Schichhttp://arxiv.org/abs/2603.18964v1Terms of (Ab)Use: An Analysis of GenAI Services2026-03-19T14:30:00ZGenerative AI services like ChatGPT and Gemini are some of the fastest-growing consumer services. Individuals using such services must accept their terms of use before access, and conform to these terms for continued use of the service. Established literature has shown that despite their status as legally-binding agreements, terms of use are not actually well-understood, and may contain implications that are surprising for consumers. In this paper, we analyse the terms of 6 generative AI services from the perspective of an EU-based consumer. Our findings, based on a developed codebook which we provide in the paper, reiterate known issues regarding generative AI services such as the default use of user data for training and surface new concerns regarding responsibility, liability, and rights. All terms in our analysis contained language that explicitly discards assurances regarding the quality, availability and appropriateness of the service, regardless of whether the service is free or paid. The terms also make users solely responsible for outputs meeting norms dictated by the provider, despite no information or control being provided over the functioning of the model, and at the risk of account termination. The terms further restrict users in how outputs can be used while service providers utilise both user-provided inputs as well as user-liable outputs for a wide variety of purposes at their discretion. The implications of these practices are severe, as we find consumers suffer from lack of necessary information, significant imbalance of power, and have responsibilities they cannot materially fulfil without violating the terms. To remedy this situation, we make concrete recommendations for authorities and policymakers to urgently upgrade existing consumer protection mechanisms to tackle this growing issue.2026-03-19T14:30:00ZPeer-reviewed, to be presented at ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2026Harshvardhan J. PanditDick A. H. BlankvoortDick A. H. BlankvoortSasha LuccioniAbeba Birhanehttp://arxiv.org/abs/2603.18945v1A conceptual framework for ideology beyond the left and right2026-03-19T14:20:38ZNLP+CSS work has operationalized ideology almost exclusively on a left/right partisan axis. This approach obscures the fact that people hold interpretations of many different complex and more specific ideologies on issues like race, climate, and gender. We introduce a framework that understands ideology as an attributed, multi-level socio-cognitive concept network, and explains how ideology manifests in discourse in relation to other relevant social processes like framing. We demonstrate how this framework can clarifies overlaps between existing NLP tasks (e.g. stance detection and natural language inference) and also how it reveals new research directions. Our work provides a unique and important bridge between computational methods and ideology theory, enabling richer analysis of social discourse in a way that benefits both fields.2026-03-19T14:20:38ZKenneth JosephKim WilliamsDavid Lazerhttp://arxiv.org/abs/2603.18914v1Security, privacy, and agentic AI in a regulatory view: From definitions and distinctions to provisions and reflections2026-03-19T13:50:52ZThe rapid proliferation of artificial intelligence (AI) technologies has led to a dynamic regulatory landscape, where legislative frameworks strive to keep pace with technical advancements. As AI paradigms shift towards greater autonomy, specifically in the form of agentic AI, it becomes increasingly challenging to precisely articulate regulatory stipulations. This challenge is even more acute in the domains of security and privacy, where the capabilities of autonomous agents often blur traditional legal and technical boundaries. This paper reviews the evolving European Union (EU) AI regulatory provisions via analyzing 24 relevant documents published between 2024 and 2025. From this review, we provide a clarification of critical definitions. We deconstruct the regulatory interpretations of security, privacy, and agentic AI, distinguishing them from closely related concepts to resolve ambiguity. We synthesize the reviewed documents to articulate the current state of regulatory provisions targeting different types of AI, particularly those related to security and privacy aspects. We analyze and reflect on the existing provisions in the regulatory dimension to better align security and privacy obligations with AI and agentic behaviors. These insights serve to inform policymakers, developers, and researchers on the compliance and AI governance in the society with increasing algorithmic agencies.2026-03-19T13:50:52ZAccepted by 2026 Governing Agentic AI SymposiumShiliang ZhangSabita Maharjanhttp://arxiv.org/abs/2603.18881v1Geography According to ChatGPT -- How Generative AI Represents and Reasons about Geography2026-03-19T13:24:09ZUnderstanding how AI will represent and reason about geography should be a key concern for all of us, as the broader public increasingly interacts with spaces and places through these systems. Similarly, in line with the nature of foundation models, our own research often relies on pre-trained models. Hence, understanding what world AI systems construct is as important as evaluating their accuracy, including factual recall. To motivate the need for such studies, we provide three illustrative vignettes, i.e., exploratory probes, in the hope that they will spark lively discussions and follow-up work: (1) Do models form strong defaults, and how brittle are model outputs to minute syntactic variations? (2) Can distributional shifts resurface from the composition of individually benign tasks, e.g., when using AI systems to create personas? (3) Do we overlook deeper questions of understanding when solely focusing on the ability of systems to recall facts such as geographic principles?2026-03-19T13:24:09ZAccepted book chapter (introduction to valume)Krzysztof JanowiczGengchen MaiRui ZhuSong GaoZhangyu WangYingjie HuLauren Bennetthttp://arxiv.org/abs/2506.10586v2Size-adaptive Hypothesis Testing for Fairness2026-03-19T12:38:37ZDetermining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments.
In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $α$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.2025-06-12T11:22:09Z39th Conference on Neural Information Processing Systems (NeurIPS 2025)Antonio FerraraFrancesco CozziAlan PerottiAndré PanissonFrancesco Bonchihttp://arxiv.org/abs/2603.18827v1Student views in AI Ethics and Social Impact2026-03-19T12:23:14ZAn investigation, from a gender perspective, of how students view the ethical implications and societal effects of artificial intelligence is conducted, examining concepts that could have a big influence on how artificial intelligence may be taught in the future. For this, we conducted a survey on a cohort of 230 second year computer science students to reveal their opinions. The results revealed that AI, from the students' perspective, will significantly impact daily life, particularly in areas such as medicine, education, or media. Men are more aware of potential changes in Computer Science, autonomous driving, image and video processing, and chatbot usage, while women mention more the impact on social media. Both men and women perceive potential threats in the same manner, with men more aware of war, AI controlled drones, terrain recognition, and information war. Women seem to have a stronger tendency towards ethical considerations and helping others.2026-03-19T12:23:14ZTudor-Dan MihocManuela-Andreea PetrescuEmilia-Loredana Pop10.5220/0013139500003932http://arxiv.org/abs/2511.11599v3SynBullying: A Multi LLM Synthetic Conversational Dataset for Cyberbullying Detection2026-03-19T10:58:50ZWe introduce SynBullying, a synthetic multi-LLM conversational dataset for studying and detecting cyberbullying (CB). SynBullying provides a scalable and ethically safe alternative to human data collection by leveraging large language models (LLMs) to simulate realistic bullying interactions. The dataset offers (i) conversational structure, capturing multi-turn exchanges rather than isolated posts; (ii) context-aware annotations, where harmfulness is assessed within the conversational flow considering context, intent, and discourse dynamics; and (iii) fine-grained labeling, covering various CB categories for detailed linguistic and behavioral analysis. We evaluate SynBullying across five dimensions, including conversational structure, lexical patterns, sentiment/toxicity, role dynamics, harm intensity, and CB-type distribution. We further examine its utility by testing its performance as standalone training data and as an augmentation source for CB classification.2025-10-30T09:27:36ZArefeh KazemiHamza QadeerJoachim WagnerHossein HosseiniSri Balaaji Natarajan KalaivendanBrian Davishttp://arxiv.org/abs/2603.18741v1Beyond the Code: A Multi-Modal Assessment Strategy for Fostering Professional Competencies via Introductory Programming Projects2026-03-19T10:42:34ZAs the landscape of software engineering evolves, introductory programming courses must go beyond teaching syntax to foster comprehensive technical competencies and professional soft skills. This paper reports on a pedagogical experience in a "Fundamentals of Programming" course that used a Project-Based Learning (PBL) framework to develop a 2D "Maze Runner"-style game. While game development serves as a high-engagement vehicle for mastering core concepts, such as multidimensional arrays, control structures, and logic, the core of this study focuses on implementing a rigorous, multifaceted assessment model structured across four distinct dimensions: (1) an in-situ technical demonstration, evaluating real-time code execution and algorithmic robustness; (2) a technical screencast, requiring students to articulate their work in a concise audiovisual format; (3) a formal presentation to instructors, defending their project's design patterns and problem-solving strategies; and (4) a structured peer-review process, where students evaluated their colleagues' projects.
Our findings suggest that this multi-dimensional approach not only improves student retention of programming fundamentals but also significantly enhances communication skills and critical thinking. By integrating peer evaluation and multimedia documentation, the course successfully bridges the gap between basic coding and the collaborative requirements of modern software engineering. This paper details the curriculum design, the challenges of implementing diverse assessment pillars, and the measurable impact on student performance and engagement, providing a scalable roadmap for educators looking to modernize introductory computing curricula.2026-03-19T10:42:34ZArticle submitted to IEEESantiago Berrezueta-GuzmanVanesa MetajStefan Wagnerhttp://arxiv.org/abs/2510.08663v3Augmenting Rating-Scale Measures with Text-Derived Items Using the Information-Determined Scoring (IDS) Framework2026-03-19T09:58:28ZPsychological assessments commonly rely on rating-scale items, which require respondents to condense complex experiences into predefined categories. Although rich, unstructured text is often captured alongside these scales, it rarely contributes to measuring the target trait because it lacks direct mapping to the latent scale. We introduce the Information-Determined Scoring (IDS) framework, where large language models (LLMs) score free-text responses with simple prompts to generate candidate items that are co-calibrated with a baseline scale and retained based on the psychometric information they provide about the target trait. This marks a conceptual departure from traditional automated text scoring by prioritising information gain over fidelity to expert rubrics or human-annotated data. Using depression as a case study, we developed and tested the method in upper-secondary students (n = 693) and a matched synthetic dataset (n = 3,000). Across held-out test sets, augmenting a 19-item rating-scale measure with LLM-derived items yielded significant improvements in measurement precision and accuracy, and stronger convergent validity with an external suicidality measure throughout the adaptive test. In adaptive simulations, LLM-derived items contributed information equivalent to adding up to 6.3 and 16.0 rating-scale items in real and synthetic data, respectively. This enabled earlier high-precision measurement: after 10 items, 46.3% of respondents reached SE <= .3 under the strongest augmented test compared with 35.5% at baseline in real data, and 60.4% versus 34.7% in synthetic data. These findings illustrate how the IDS framework leverages unstructured text to enhance existing psychological measures, with applications in clinical health and beyond.2025-10-09T15:37:24ZJoe WatsonIvan O'ConnorChia-Wen ChenLuning SunFang LuoDavid Stillwellhttp://arxiv.org/abs/2603.18677v1Cognitive Amplification vs Cognitive Delegation in Human-AI Systems: A Metric Framework2026-03-19T09:39:24ZArtificial intelligence is increasingly embedded in human decision-making, where it can either enhance human reasoning or induce excessive cognitive dependence. This paper introduces a conceptual and mathematical framework for distinguishing cognitive amplification, in which AI improves hybrid human-AI performance while preserving human expertise, from cognitive delegation, in which reasoning is progressively outsourced to AI systems.
To characterize these regimes, we define a set of operational metrics: the Cognitive Amplification Index (CAI*), the Dependency Ratio (D), the Human Reliance Index (HRI), and the Human Cognitive Drift Rate (HCDR). Together, these quantities provide a low-dimensional metric space for evaluating not only whether human-AI systems achieve genuine synergistic performance, but also whether such performance is cognitively sustainable for the human component over time.
The framework highlights a central design tension in human-AI systems: maximizing short-term hybrid capability does not necessarily preserve long-term human cognitive competence. We therefore argue that human-AI systems should be designed under a cognitive sustainability constraint, such that gains in hybrid performance do not come at the cost of degradation in human expertise.2026-03-19T09:39:24Z16 pages, 2 figures. Conceptual and mathematical framework for human-AI collaboration, cognitive amplification, cognitive delegation, and cognitive sustainabilityEduardo Di Santihttp://arxiv.org/abs/2603.18530v1When Names Change Verdicts: Intervention Consistency Reveals Systematic Bias in LLM Decision-Making2026-03-19T06:21:08ZLarge language models (LLMs) are increasingly used for high-stakes decisions, yet their susceptibility to spurious features remains poorly characterized. We introduce ICE-Guard, a framework applying intervention consistency testing to detect three types of spurious feature reliance: demographic (name/race swaps), authority (credential/prestige swaps), and framing (positive/negative restatements). Across 3,000 vignettes spanning 10 high-stakes domains, we evaluate 11 LLMs from 8 families and find that (1) authority bias (mean 5.8%) and framing bias (5.0%) substantially exceed demographic bias (2.2%), challenging the field's narrow focus on demographics; (2) bias concentrates in specific domains -- finance shows 22.6% authority bias while criminal justice shows only 2.8%; (3) structured decomposition, where the LLM extracts features and a deterministic rubric decides, reduces flip rates by up to 100% (median 49% across 9 models). We demonstrate an ICE-guided detect-diagnose-mitigate-verify loop achieving cumulative 78% bias reduction via iterative prompt patching. Validation against real COMPAS recidivism data shows COMPAS-derived flip rates exceed pooled synthetic rates, suggesting our benchmark provides a conservative estimate of real-world bias. Code and data are publicly available.2026-03-19T06:21:08ZAbhinaba BasuPavan Chakraborty