Ozone: A Unified Platform for Transportation Research

2026-05-19T07:10:31Z

Intelligent Transportation Systems increasingly depend on heterogeneous data from roadside cameras, UAV imagery, LiDAR, and in-vehicle sensors, yet the lack of unified data standards, model interfaces, and evaluation protocols across these sources hampers reproducibility, cross-dataset benchmarking, and cross-region transferability of research findings. Existing trajectory datasets follow incompatible conventions for coordinate systems, object representations, and metadata fields, forcing researchers to build custom preprocessing pipelines for each dataset and simulator combination. To address these challenges, we propose Ozone, a unified platform for transportation research organized around five interconnected layers -- Hardware, Data, Model, Evaluation, and Prototype -- each with standardized schemas, automated conversion pipelines, and interoperable interfaces. In the first release, the data schema unifies four trajectory datasets -- NGSIM, highD, CitySim, and UTE -- into a canonical format with oriented bounding boxes, kinematic variables, and pre-computed surrogate safety measures. Digital-twin maps in CARLA and calibrated traffic models provide integrated benchmarking environments. Case studies in human-factor research, traffic scene generation, and safety-critical modeling demonstrate that Ozone reduces experiment setup time by 85%, achieves 91% cross-city transfer efficiency for safety models, and improves cross-dataset reproducibility to within 3% variance. The source code and datasets are publicly available.

Digital Voices of Survival: From Social Media Disclosures to Support Provisions for Domestic Violence Victims

2026-05-19T06:36:03Z

Domestic Violence (DV) is a pervasive public health problem characterized by patterns of coercive and abusive behavior within intimate relationships. With the rise of social media as a key outlet for DV victims to disclose their experiences, online self-disclosure has emerged as a critical yet underexplored avenue for support-seeking. In addition, existing research lacks a comprehensive and nuanced understanding of DV self-disclosure, support provisions, and their connections. To address these gaps, this study proposes a novel computational framework for modeling DV support-seeking behavior alongside community support mechanisms. The framework consists of four key components: self-disclosure detection, post clustering, topic summarization, and support extraction and mapping. We implement and evaluate the framework with data collected from relevant social media communities. Our findings not only advance existing knowledge on DV self-disclosure and online support provisions but also enable victim-centered digital interventions.

Rethinking the A in STEAM: Insights from and for AI Literacy Education

2026-05-19T06:28:01Z

This article rethinks the role of arts in STEAM education, emphasizing its importance in AI literacy within K-12 contexts. Arguing against the marginalization of arts, the paper is structured around four key domains: language studies, philosophy, social studies, and visual arts. Each section addresses critical AI-related phenomena and provides pedagogical strate-gies for effective integration into STEAM education. Language studies focus on media representations and the probabilistic nature of AI language models. The philosophy section examines anthropomorphism, ethics, and the misconstrued human-like capabilities of AI. Social studies discuss AI's societal impacts, biases, and ethical considerations in data prac-tices. Visual arts explore the implications of generative AI on artistic processes and intellec-tual property. The article concludes by advocating for a robust inclusion of arts in STEAM to foster a holistic, equitable, and sustainable understanding of AI, ultimately inspiring technologies that promote fairness and creativity.

ALSO: Adversarial Online Strategy Optimization for Social Agents

2026-05-19T06:16:22Z

Social simulation provides a compelling testbed for studying social intelligence, where agents interact through multi-turn dialogues under evolving contexts and strategically adapting opponents. Such environments are inherently non-stationary, requiring agents to dynamically adjust their strategies over time. However, most Large Language Model (LLM) based social agents rely on static personas, while existing approaches for enhancing social intelligence, such as offline reinforcement learning or external planners, are ill-suited to these settings, typically assuming stationarity and incurring substantial training overhead. To bridge this gap, we propose \textbf{ALSO} (\textbf{A}dversarial on\textbf{L}ine \textbf{S}trategy \textbf{O}ptimization), the first framework for online strategy optimization in multi-agent social simulation. ALSO advances social adaptation through two key contributions. (1) ALSO formulates multi-turn interaction as an adversarial bandit problem, where combinations of static personas and dynamic strategy instructions are treated as arms, providing a principled solution to non-stationarity without relying on environmental stability assumptions. (2) To predict rewards and generalize sparse feedback in multi-turn dialogues, ALSO introduces a lightweight neural surrogate to predict rewards from interaction histories, enabling sample-efficient exploration and continuous online adaptation. Experiments on the Sotopia benchmark demonstrate that ALSO consistently outperforms static baselines and existing optimization methods in dynamic environments, validating the effectiveness of adversarial online strategy optimization for building robust social agents.

Locked Out at 8,000 Miles: Why UK-China Partnership Students Are Suffering

2026-05-19T04:59:59Z

University cybersecurity protocols have intensified dramatically in response to rising threats of data breaches, ransomware, and credential theft. While necessary, these measures have created a parallel crisis of accessibility - even for students physically on campus. This paper argues that domestic, on-campus students already face significant barriers: mandatory multi-factor authentication (MFA), device compliance rules, browser and operating system restrictions, and administrative remote-management permissions on personal phones and laptops. However, these difficulties are magnified to near-breaking point in the context of international partnerships, such as the increasingly common UK-China transnational education programmes. For a student in China accessing a UK university's virtual learning environment (VLE) from an 8-hour time difference, with no on-hand IT support during their active hours, the same security architecture becomes functionally disabling. Drawing on testimonies from public forums (Reddit's r/college, r/UniUK, r/Professors), higher education IT help boards, and student accounts from UK-China partnership programmes, this paper documents how over-engineering digital security disproportionately harms remote international learners. We show that while on-campus students can at least visit an IT desk or borrow a library terminal, their counterparts in partner institutions abroad face authentication failures, device lockouts, and unsupported browsers with no real-time remedy. The paper concludes that current university security models assume a co-located, 9-to-5, English-time-zone user - an assumption that fails both domestic students and, catastrophically, international partnership cohorts.

The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets

2026-05-19T04:41:39Z

Generative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa) and the welfare-maximizing watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi). We prove an information-theoretic Cramer-Rao lower bound on any provenance estimator using only producer-side observations and show that the Provenance-Market Iterative Retraining (PMIR) algorithm attains this bound up to constants while converging to an epsilon-SDCE in O(epsilon^-2 log T) iterations. A reduced-form OLS estimation on a C4-synthetic benchmark over ten retraining generations yields a collapse-rate coefficient b-hat = 0.181 (HAC s.e. 0.024), within one standard error of the structural prediction 0.183. Calibrated experiments raise generation-ten model quality by 23.1 percent over the unregulated benchmark while lowering the 2-Wasserstein drift on a held-out diversity probe from 0.318 to 0.142. Scaling experiments over generations t in {1,...,10} recover a logarithmic-in-t collapse law log Q_t = log Q_0 - 0.183 t rho^2 with R^2 = 0.962.

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

2026-05-19T03:03:32Z

The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South

2026-05-18T23:34:53Z

Despite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant vulnerabilities for the rest of the world. To embrace cultural pluralism and bring historically under-represented perspectives in T2I safety, we conduct localised community-centered red teaming studies in the Global South. Our two-fold approach prioritizes localization and participation, by focusing on secondary urban centers in these regions, and conducting community engagement and training workshops to contextualize local norms. As a result, we present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India (Karnataka and Punjab). Analysis of prompts collected reveals a wide-ranging diversity in socio-cultural and linguistic attributes, when compared to existing geography-agnostic crowdsourced red-teaming data. We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India. Moreover, we uncover structural contextual gaps in existing safety frameworks by identifying novel harms showing normative dissonance (e.g., violating religious norms, ignoring local customs, and ominous symbolism). This work argues that expanding T2I safety requires moving beyond mere scale to incorporate deeply localised, participatory methodologies for data collection and contextualization. Content warning: This paper includes examples containing potentially harmful or offensive content.

Context-Aware Detection and Victim-Centered Response Generation for Online Harassment in Private Messaging

2026-05-18T23:08:19Z

Online harassment is a widespread social and public health concern, yet most computational approaches for detecting and addressing harassment focus on publicly visible social media content rather than private messaging environments. Private conversations present unique challenges because harmful interactions often unfold through context-dependent, multi-turn exchanges, while victims may lack timely support during moments of harassment. In this study, we investigate how large language models (LLMs) can support both the detection of and response to online harassment in private messaging. Using a dataset of 80,053 Instagram direct messages donated by 26 adolescents aged 12-18, including youth with suicide risk factors, we first construct a human-labeled dataset of online harassment in private conversations and develop a context-aware cascading LLM classification pipeline. The proposed pipeline outperforms baseline toxicity classifiers trained primarily on public social media data. We then develop a victim-centered response framework that produces context-sensitive and psychologically-grounded AI-generated responses to online harassment messages. Human evaluators perceived the AI-generated responses as significantly more helpful than the original participant responses (95% CI: 0.767--0.815, p < .001), particularly in terms of emotional support and de-escalation. Our findings highlight the potential of context-aware and victim-centered AI systems to provide just-in-time support during harassment in private messaging environments.

How Far Are We From True Auto-Research?

2026-05-18T22:20:33Z

Recent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.

CAPC-CG: A Large-Scale, Expert-Directed LLM-Annotated Corpus of Adaptive Policy Communication in China

2026-05-18T22:12:24Z

We introduce CAPC-CG, the Chinese Adaptive Policy Communication (Central Government) Corpus, the first open dataset of Chinese policy directives annotated with a five-color taxonomy of clear and ambiguous language categories, building on Ang's theory of adaptive policy communication. Spanning 1949-2023, this corpus includes national laws, administrative regulations, and ministerial rules issued by China's top authorities. Each document is segmented into paragraphs, producing a total of 3.3 million units. Alongside the corpus, we release comprehensive metadata, a two-round labeling framework, and a gold-standard annotation set developed by expert and trained coders. Inter-annotator agreement achieves a Fleiss's kappa of K = 0.86 on directive labels, indicating high reliability for supervised modeling. We provide baseline classification results with several large language models (LLMs), together with our annotation codebook, and describe patterns from the dataset. This release aims to support downstream tasks and multilingual NLP research in policy communication.

GRASP: Deterministic argument ranking in interaction graphs

2026-05-18T21:49:02Z

Large language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.

Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption

2026-05-18T20:55:38Z

Generative AI has rapidly entered education through free consumer tools, outpacing the ability of schools and universities to respond. Now a new wave of more autonomous agentic AI systems--with the capacity to plan and act towards goals--promises both greater educational personalization and greater disruption. This chapter argues that successfully navigating these innovations requires balancing three core tensions: (1) Implementation Feasibility, or the practical capacity to integrate AI sustainably into real classrooms; (2) Adaptation Speed, or the mismatch between fast-evolving AI capabilities and the slower pace of educational change; and (3) Mission Alignment, or the need to ensure AI applications uphold educational values such as equity, privacy, and pedagogical integrity. First, we review early evidence of generative and agentic AI in various sectors and in frontline education to illustrate these tensions in context. Then, we present a three-tension framework to guide decision-makers in evaluating and designing AI initiatives across K-12 and higher education. We provide examples of how the framework can be applied to plan responsible AI deployments, and we identify emerging trends--such as curriculum-linked AI agents and educator-informed AI design--along with open research directions. We conclude the chapter with recommendations for educational leaders to proactively engage with the opportunities and challenges of AI, so that this technology can be harnessed to enhance teaching and learning in the decade ahead.

Artificial Intelligence, conceptual metaphors and conceptual engineering: Are AI-based framings of human behaviour and cognition successful?

2026-05-18T20:31:52Z

Understanding human behaviour, neuroscience and psychology using concepts from the domain of AI is increasing in popularity. Given the massive integration of AI technologies into our daily lives, AI-related concepts are being used to compare AI systems with human behaviour, brain functions, and cognitive abilities like language acquisition. But scientists and philosophers are also increasingly tempted to take the AI-framing of the human conceptual domain as a literal one. This paper investigates the epistemic and practical success of these 'AI-framings': What does it mean to apply the conceptual constellation of AI to the human conceptual domain? We consider and compare two possible answers: either these examples are conceptual metaphors, or they are attempts at conceptual engineering. Firstly, we argue that when viewed as conceptual metaphors, the AI-framed descriptions risk committing the ''map-territory fallacy''. Secondly, we argue the comparisons also contain a misleading 'double metaphor' because of the metaphorical connection between human psychology and computation at the conceptual foundation of computation. But we also argue that there is a possible semantic catch to the AI-framing, which is captured by the conceptual engineering view. This is that the AI-framings point towards avenues for forms of conceptual engineering. If the challenges of conceptual ethics and reductionism are overcome, some AI-framings might enrich our epistemic and practical lives. So, at its worst - as implicit conceptual metaphor - the AI-framing leads us completely astray; at its best, it prompts us to reflect anew on how the boundaries of our current concepts serve us and how they could be improved.

Beyond Nutrition Labels: How Analogical Reasoning Shapes Synthetic Media Disclosure Design

2026-05-18T19:10:22Z

As synthetic media proliferates, AI policymakers and practitioners have increasingly turned to disclosures--signals describing how media has been created or modified by AI--to help audiences evaluate media credibility. While there is a growing body of research on user interpretations, the upstream decision-making processes that affect users remain underexplored. This study therefore examines how AI policymakers and practitioners design synthetic media disclosures under complex sociotechnical constraints. Drawing on 23 expert interviews and 13 case studies from organizations participating in the Partnership on AI's Synthetic Media Framework, analysis identifies key disclosure goals, including process transparency and harm reduction, and two central tensions that emerge when pursuing those goals: normativity versus neutrality and proactivity versus precision. Findings highlight the role of analogical reasoning, from nutrition labels to Prop 65 warnings, in managing, but not resolving tensions. Ultimately, this study emphasizes the need for scholarship focused on AI transparency decision-makers and their use of analogical reasoning to support audiences encountering media in the AI age.