LLMs and Childhood Safety: Identifying Risks and Proposing a Protection Framework for Safe Child-LLM Interaction

2026-05-22T19:52:45Z

Large Language Models (LLMs) are increasingly embedded in child-facing contexts such as education, companionship, creative tools, but their deployment raises safety, privacy, developmental, and security risks. We conduct a systematic literature review of child-LLM interaction risks and organize findings into a structured map that separates (i) parent-reported concerns, (ii) empirically documented harms, and (iii) gaps between perceived and observed risk. Moving beyond descriptive listing, we compare how different evidence streams in surveys, incident reports, youth interaction logs, and governance guidance operationalize "harm," where they conflict, and what mitigations they imply. Based on this synthesis, we propose a protection framework that couples child-specific content safety and developmental sensitivity with security-grade controls for adversarial misuse, including prompt injection and multimodal jailbreak pathways. The framework specifies measurable evaluation targets (e.g., harmful-content avoidance, age-calibrated readability, bias parity checks, prompt-injection robustness, and monitoring transparency) to support developers, educators, and policymakers in assessing and improving child-safe LLM deployments.

LLM Harms: A Taxonomy and Discussion

2026-05-22T19:51:14Z

This study addresses categories of harm surrounding Large Language Models (LLMs) in the field of artificial intelligence. It addresses five categories of harms addressed before, during, and after development of AI applications: pre-development, direct output, Misuse and Malicious Application, and downstream application. By underscoring the need to define risks of the current landscape to ensure accountability, transparency and navigating bias when adapting LLMs for practical applications. It proposes mitigation strategies and future directions for specific domains and a dynamic auditing system guiding responsible development and integration of LLMs in a standardized proposal.

Safe-Child-LLM: A Developmental Benchmark for Evaluating LLM Safety in Child-LLM Interactions

2026-05-22T19:49:58Z

As Large Language Models (LLMs) increasingly power applications used by children and adolescents, ensuring safe and age-appropriate interactions has become an urgent ethical imperative. Despite progress in AI safety, current evaluations predominantly focus on adults, neglecting the unique vulnerabilities of minors engaging with generative AI. We introduce Safe-Child-LLM, a comprehensive benchmark and dataset for systematically assessing LLM safety across two developmental stages: children (7-12) and adolescents (13-17). Our framework includes a novel multi-part dataset of 200 adversarial prompts, curated from red-teaming corpora (e.g., SG-Bench, HarmBench), with human-annotated labels for jailbreak success and a standardized 0-5 ethical refusal scale. Evaluating leading LLMs -- including ChatGPT, Claude, Gemini, LLaMA, DeepSeek, Grok, Vicuna, and Mistral -- we uncover critical safety deficiencies in child-facing scenarios. This work highlights the need for community-driven benchmarks to protect young users in LLM interactions. To promote transparency and collaborative advancement in ethical AI development, we are publicly releasing both our benchmark datasets and evaluation codebase at https://github.com/The-Responsible-AI-Initiative/Safe_Child_LLM_Benchmark.git

A Task-Driven Human-AI Collaboration: When to Automate, When to Collaborate, When to Challenge

2026-05-22T19:47:45Z

According to several empirical investigations, despite enhancing human capabilities, human-AI cooperation frequently falls short of expectations and fails to reach true synergy. We propose a task-driven framework that reverses prevalent approaches by assigning AI roles according to how the task's requirements align with the capabilities of AI technology. Three major AI roles are identified through task analysis across risk and complexity dimensions: autonomous, assistive/collaborative, and adversarial. We show how proper human-AI integration maintains meaningful agency while improving performance by methodically mapping these roles to various task types based on current empirical findings. This framework lays the foundation for practically effective and morally sound human-AI collaboration that unleashes human potential by aligning task attributes to AI capabilities. It also provides structured guidance for context-sensitive automation that complements human strengths rather than replacing human judgment.

What Medicine Taught Us About Fairness and What It Missed: Lessons from Reconsidering Race-Specific Lung Function Reference Algorithms

2026-05-22T19:16:24Z

Since 2019, medical societies have reconsidered race-specific clinical equations often in parallel to and largely independent from algorithmic fairness research. Focusing on lung function reference algorithms that affect medical care, insurance, and employment for hundreds of millions globally, we analyze the transition from race-specific GLI-2012 to race-averaged GLI-Global through a fairness lens. Drawing on historical context, citation analysis, and quantitative evaluation, we show (i) limited cross-citation between FAccT and clinical guideline revision efforts; (ii) that GLI-Global implicitly encodes assumptions about social determinants of health, behaving as if ~62% of the Black-White gap in FEV1 is exposure-related; and (iii) clinical validation studies operationalized a sufficiency-like fairness criterion long before its formalization in fairness literature, while neglecting foundational results such as the impossibility theorem has led to inefficiencies in clinical research. Overall, our analysis highlights the value of deeper, mutually beneficial engagement between medical and fairness communities and the public to accelerate progress toward equitable healthcare algorithms.

Divergent Paths to Depolarization: Dialogue Design Determines the Prosocial Benefits of AI-Assisted Political Argumentation

2026-05-22T17:51:10Z

Argumentative dialogues across political divides can reduce polarization, yet opportunities for citizens to engage with opposing views in accessible and structured ways remain limited. AI dialogue partners offer a scalable framework for such open-mindedness exercises, but how the format of human-AI dialogues shapes their benefits remains unclear. In a two-session online experiment, 469 US participants were assigned to argue either for or against their own attitude on a contested political issue with an AI chatbot. Our experimental findings show attitude-congruent dialogues produced greater immediate reduction in both affective and opinion polarization than attitude-incongruent dialogues. By contrast, attitude-incongruent dialogues elicited weaker cognitive state empathy than the non-AI reference task but increased cognitive trait empathy in the two-week period between sessions, suggesting the effects of active generation of attitude-incongruent arguments may emerge over time. These findings highlight dialogue design as a key determinant of effective AI-mediated behavioral interventions.

Towards an Evaluation Methodology for AI in Second Language Education: Lessons Learned from Developing L2-Bench

2026-05-22T17:33:30Z

The rapid adoption of large language models in AI-powered language education has created an urgent need for evaluations that assess pedagogical effectiveness, particularly in language learning--one of the most common LLM use cases (Tamkin et al. 2024; Costa-Gomes et al. 2025). With only narrowly defined task-specific evaluations of AI system capabilities in second language (L2) education existing in the literature, we require more holistic approaches in this AI for education space. To address this gap, we describe the iteration of the methodology we developed to build L2-Bench, a novel, context-specific evaluation benchmark grounded in a validated "language learning experience designer" construct to assess AI capabilities across L2 education contexts. Our methodology integrates pedagogical theory, sociotechnical AI evaluation methods, and operationalizes a hierarchical taxonomy to structure an expert-curated dataset of over 1,000 authentic rubric-scored task-response pairs with measurement and scoring pipeline. We report the results of a pilot validation exercise (N = 39) on an initial sample of our dataset (tasks were validated as authentic [M = 4.23/5], but criteria scores were lower [M = 3.94], with universally poor inter-annotator agreement despite good internal consistency), alongside the experimental design for our follow-up practitioner data validation study as we iterate and scale to the full dataset. Ultimately, this research not only offers methodological lessons towards a more context-specific AI evaluations ecosystem, but also works towards better design of reproducible evaluations for AI systems deployed to educational contexts

Inferential Privacy Leakage in Anonymized Conversational AI Logs

2026-05-22T16:22:14Z

Hundreds of millions of users now hold detailed, multi-turn conversations with ChatGPT and similar LLM assistants. We measure two privacy-relevant features of these conversations on a corpus of complete ChatGPT histories donated by over 1,000 users in four Global South countries (Brazil, India, Nigeria, Pakistan). First, on explicit disclosure: 34.5% of user messages contain personal information across a twenty-category taxonomy, with the median user first revealing identifying content within the first 14% of their conversation history. Second, on inference beyond explicit disclosure: we restrict to a cohort whose conversations contain no messages flagged by an LLM-based filter for explicit demographic self-identification (a separate NER pass marks PII for the disclosure audit but does not drive cohort exclusion). On this filtered cohort, an off the shelf large language model still recovers each user's age, gender, and country at weighted F1 of 0.84, 0.90, and 0.88, respectively, with the median user identified from the first 5% of their conversation history. Reading the model's natural-language reasoning traces, we identify four recurring stereotype patterns that drive both successful inference and an asymmetric error distribution concentrating on women in technical fields, older users with contemporary skills, and Global South tech professionals. We also compare ChatGPT against the same users' Google Search and YouTube histories as inference surfaces, and find it competitive with these older substrates that have driven behavioral advertising for two decades. Message-level PII removal is insufficient on its own as a privacy intervention for conversational AI data.

Engagement-Optimized Care: When LLMs become Mental Health Infrastructure

2026-05-22T15:50:26Z

General-purpose LLMs are increasingly functioning as mental health infrastructure due to gaps in care left by provider shortages, inadequate insurance coverage, social isolation, and stigma around formal help-seeking. This shift poses a distinct problem for AI ethics: systems neither designed nor governed as care technologies are being used as such, while their dominant design incentives optimize for engagement rather than user well-being. We present findings from a qualitative, longitudinal study with 18 US-based participants who use general-purpose LLMs for socioemotional support and participated in one or more of our study phases, including initial interviews, a four-week diary study, focus groups, and exit interviews. Participants turned to LLMs because other forms of support were unavailable, unaffordable, socially costly, or inadequate. As they continued to use these systems, design features such as anthropomorphic cues, default validation, persistent responsiveness, and weak disengagement mechanisms shaped their ongoing reliance. Participants described meaningful support alongside dependency, epistemic distortion through one-sided validation, privacy expectations without corresponding legal protection, and continued use despite awareness of these risks. We argue these dynamics reflect a structurally unfair tradeoff: users accept risks because support is otherwise absent, while available systems are optimized to deepen engagement and lack care-based accountability. The paper makes three contributions: it traces the arc through which LLMs become care infrastructure and identifies distinct ethical tensions at each stage, shifts analysis from turn-based exchanges to longitudinal trajectories of use, and argues that accountability belongs at the design and incentive conditions through which these systems become care infrastructure rather than at the output or crisis-response layer.

Synthetic Sources?: Auditing Generative Search Engine Citations for Evidence of AI-Generated Sources

2026-05-22T14:33:52Z

The growing accessibility of Large Language Models via conversational interfaces capable of responding to users' questions by drawing on, synthesizing, and citing information from the web (i.e., Generative Search Engines) has simplified the information-seeking process for users. However, with the proliferation of AI-generated content on the web, it is unclear whether these engines can reliably omit citing synthetic sources (i.e., AI-generated sources). Should these engines be unable to do so, this puts users at risk of harm by treating information from AI-generated sources synthesized in responses of generative search engines as equivalent to information from authoritative or official sources. In a step towards identifying whether AI-generated sources are being cited by these engines, this work presents an audit of four generative search engines (ChatGPT, Copilot, Gemini, Perplexity) using a total of 712 real-world human-generated queries spanning domains of public importance: politics, health, and the environment. Our findings show evidence of AI-generated sources being cited across all four generative search engines (~16% of cited sources) and identifies key source web domains these sources belong to that are frequently cited across these engines and topics. In addition, we observed that generative search engines include a somewhat narrow set of repeatedly cited domains while predominantly surfacing a large number of minimally cited domains in responses to users' queries. These findings contribute to the growing body of work on assessing the risks of generative search engines with the objective of increasing public awareness of their limitations and encouraging appropriate measures to improve information quality and governance of these systems.

Can the Recovery Mechanism Survive AI? Skill Formation, Labor, and What Current Measurement Misses

2026-05-22T12:26:41Z

Throughout the modern era, when new technologies displaced workers, societies adapted through the same mechanism: education raised the cognitive ceiling, producing workers capable of tasks machines could not yet reach. Generative AI may be the first technology to break this cycle, because it now operates at the top of that ceiling. Drawing on labor economics, deployment data from millions of AI conversations across multiple platforms, original reanalysis of two public datasets, and skill-formation experiments, this paper develops three contributions. First, a stock-versus-flow framework showing that economic data and education data tell divergent stories about the same technology: augmentation dominates current workers, but the developmental pipeline producing the next generation is under strain. Second, a systematic gap analysis of the evidence base, revealing that the knowledge dimension of cognition is unmeasured across all major studies, that the three studies measuring learning outcomes (each $n < 200$) consistently find AI improves performance without improving learning ($d = 1.21$ in our cross-platform reanalysis), and that no study bridges professional and student populations. Third, an extended cognitive taxonomy (judgment under uncertainty, epistemic identity, and epistemic agency) applied to three cases from the evidence to distinguish AI interaction patterns that preserve learning from structurally similar ones that erode it. The paper argues that AI's societal risk lies not in replacing teachers but in eliminating the productive struggle through which the next generation's capacity forms, and proposes a research and design agenda targeting what current measurement systems miss.

Unjust Enrichment as a Remedy for AI's Unauthorised Use of Protected Data

2026-05-22T11:06:11Z

The unauthorised use of data in the training of generative AI models presents significant legal challenges, particularly under intellectual property (IP) and privacy laws. These frameworks frequently grapple with the intricate relationship between data ownership and AI innovation, resulting in ongoing debates regarding optimal protection and enforceability. This article delves into considerable potential of unjust enrichment as an alternative legal doctrine for resolving disputes arising from such unauthorised data use. We explore how the concept of unjust enrichment captures the wrongfulness of unauthorised data use in a manner distinct from IP infringement and privacy violations. Furthermore, we analyse the extent to which gain-based restitution for unjust enrichment may prove more advantageous than existing remedies, including legal, equitable, and statutory options. We content that by shifting the emphasis from establishing wrongful conduct to recovering benefits obtained unjustly, unjust enrichment offers a pragmatic and equitable framework that reconciles the rights of data owners with the interests of AI developers.

Strategic Stalemates: The Paradox of Export Controls in the U.S.-China AI Race

2026-05-22T10:36:36Z

Export control is a policy and legal tool to protect national interests by regulating exports of sensitive goods and technology to foreign nations. It has become central to U.S.-China tech rivalry, especially in AI. Controls cover advanced chips, capital, personnel, and critical minerals for semiconductors. Since October 2022, the U.S. BIS has progressively tightened restrictions on advanced computing components to China. China responded with export curbs on critical minerals and filed a WTO complaint against the U.S. under GATT. This article argues that while export controls are strategic in U.S.-China AI competition, their long-term effectiveness is questionable. They often unintentionally boost China's self-reliance and R&D. Moreover, overly strict or arbitrary controls may violate WTO obligations, complicating dispute resolution and hindering AI progress. The study further examines legal implications of overusing export controls. It advocate for a restrained interpretation of security interests, arguing that commercial or dual-use AI models and semiconductors do not meet the security exception criteria under GATT Article XXI(b).

IyàwóBench: A Benchmark for Evaluating Large Language Model Clinical Triage Accuracy on Undifferentiated Febrile Illness in Nigerian Primary Health Settings

2026-05-22T10:25:51Z

Background. Undifferentiated febrile illness is the leading cause of primary care outpatient visits in Nigeria, yet no validated benchmark exists for evaluating large language model (LLM) clinical triage reasoning in West African primary health settings. Methods. We introduce IyàwóBench v1.0, a dataset of 200 synthetic clinical vignettes across eight febrile illness categories derived from statistical distributions of 1,200 real patient encounters at 19 primary health centres (PHCs) in Oyo State, Nigeria. Six LLMs were evaluated on structured triage classification across two metrics: triage accuracy and safety score. Results. All six models achieved 100% safety scores (95% CI: 96.4-100.0%), never downgrading a critical REFER NOW case to TREAT HERE. Triage accuracy varied substantially: Claude Sonnet (claude-sonnet-4-5) 67.5% (95% CI: 60.8-73.7%), Llama 4 Scout 59.5% (52.5-66.2%), Llama 3.3 70B 43.0% (36.2-50.0%), and Llama 3.1 8B 39.0% (32.4-45.9%). Two models demonstrated near-zero accuracy attributable to structured output non-compliance. Conclusions. Modern LLMs exhibit safe triage behaviour but vary substantially in structured clinical accuracy. Clinically engineered systems with embedded WHO guidelines outperform general-purpose models by up to 28.5 percentage points. IyàwóBench provides the first reproducible evaluation framework for LLM clinical decision support in West African primary care.

AI Evaluation Should Require Standardized Item-Level Data Releases

2026-05-22T10:20:50Z

This position paper argues that standardized item-level benchmark data should become the default infrastructure for AI evaluation. Current evaluations suffer from underspecified item selection, construct misalignment, and poor generalization. The root cause of these failures is a misplaced focus on aggregate model scores. Without item-level evidence, validity claims cannot be assessed, resulting in inflated capability claims, misdirected research, and unwarranted trust in deployed systems. Our position is that designing valid evaluations requires empirical evidence from item-level model responses, and the standardized release of such data should be treated as core AI evaluation infrastructure. Such a release, in addition, enables transparency, replicability, and auditability of evaluation results. To show the norm is both feasible and consequential, we construct OpenEval, an item-level archive of 10M responses across 155k items from widely-used benchmarks, under a unified schema that the AI evaluation community can develop upon. We demonstrate how item-level data can identify low-quality items, document construct misalignment, and recover validity evidence about benchmarks' internal structure. We address objections around contamination and author burden, and show each is tractable relative to the cost of decisions made on claims that cannot be trusted.