https://arxiv.org/api/e0NJ0FFbTELgC20SyRJAAw4n+7k2026-06-21T16:21:43Z2899769015http://arxiv.org/abs/2604.10959v2Ozone: A Unified Platform for Transportation Research2026-05-19T07:10:31ZIntelligent Transportation Systems increasingly depend on heterogeneous data from roadside cameras, UAV imagery, LiDAR, and in-vehicle sensors, yet the lack of unified data standards, model interfaces, and evaluation protocols across these sources hampers reproducibility, cross-dataset benchmarking, and cross-region transferability of research findings. Existing trajectory datasets follow incompatible conventions for coordinate systems, object representations, and metadata fields, forcing researchers to build custom preprocessing pipelines for each dataset and simulator combination. To address these challenges, we propose Ozone, a unified platform for transportation research organized around five interconnected layers -- Hardware, Data, Model, Evaluation, and Prototype -- each with standardized schemas, automated conversion pipelines, and interoperable interfaces. In the first release, the data schema unifies four trajectory datasets -- NGSIM, highD, CitySim, and UTE -- into a canonical format with oriented bounding boxes, kinematic variables, and pre-computed surrogate safety measures. Digital-twin maps in CARLA and calibrated traffic models provide integrated benchmarking environments. Case studies in human-factor research, traffic scene generation, and safety-critical modeling demonstrate that Ozone reduces experiment setup time by 85%, achieves 91% cross-city transfer efficiency for safety models, and improves cross-dataset reproducibility to within 3% variance. The source code and datasets are publicly available.2026-04-13T03:55:39ZOu ZhengRuyi FengYufeng YangShengxuan DingLishengsa YueYe LiYunhan ZhengMinwei KongDingyi ZhuangAo QuZhibin LiMeng LiDongjie WangWangyang Yinghttp://arxiv.org/abs/2509.12288v2Digital Voices of Survival: From Social Media Disclosures to Support Provisions for Domestic Violence Victims2026-05-19T06:36:03ZDomestic Violence (DV) is a pervasive public health problem characterized by patterns of coercive and abusive behavior within intimate relationships. With the rise of social media as a key outlet for DV victims to disclose their experiences, online self-disclosure has emerged as a critical yet underexplored avenue for support-seeking. In addition, existing research lacks a comprehensive and nuanced understanding of DV self-disclosure, support provisions, and their connections. To address these gaps, this study proposes a novel computational framework for modeling DV support-seeking behavior alongside community support mechanisms. The framework consists of four key components: self-disclosure detection, post clustering, topic summarization, and support extraction and mapping. We implement and evaluate the framework with data collected from relevant social media communities. Our findings not only advance existing knowledge on DV self-disclosure and online support provisions but also enable victim-centered digital interventions.2025-09-15T05:32:42Z9 pages, 4 figures and 4 tables. Accepted to The 59th Hawaii International Conference on System Sciences (HICSS) 2026Kanlun WangZhe FuWangjiaxuan XinLina ZhouShashi Kiran Chandrappahttp://arxiv.org/abs/2405.18179v2Rethinking the A in STEAM: Insights from and for AI Literacy Education2026-05-19T06:28:01ZThis article rethinks the role of arts in STEAM education, emphasizing its importance in AI literacy within K-12 contexts. Arguing against the marginalization of arts, the paper is structured around four key domains: language studies, philosophy, social studies, and visual arts. Each section addresses critical AI-related phenomena and provides pedagogical strate-gies for effective integration into STEAM education. Language studies focus on media representations and the probabilistic nature of AI language models. The philosophy section examines anthropomorphism, ethics, and the misconstrued human-like capabilities of AI. Social studies discuss AI's societal impacts, biases, and ethical considerations in data prac-tices. Visual arts explore the implications of generative AI on artistic processes and intellec-tual property. The article concludes by advocating for a robust inclusion of arts in STEAM to foster a holistic, equitable, and sustainable understanding of AI, ultimately inspiring technologies that promote fairness and creativity.2024-05-28T13:46:22Z2 figuresCHAPTERS in Education 2026Pekka MertalaJanne FagerlundTomi Slotte Dufva10.1007/60490_2026_16http://arxiv.org/abs/2605.15768v2ALSO: Adversarial Online Strategy Optimization for Social Agents2026-05-19T06:16:22ZSocial simulation provides a compelling testbed for studying social intelligence, where agents interact through multi-turn dialogues under evolving contexts and strategically adapting opponents. Such environments are inherently non-stationary, requiring agents to dynamically adjust their strategies over time. However, most Large Language Model (LLM) based social agents rely on static personas, while existing approaches for enhancing social intelligence, such as offline reinforcement learning or external planners, are ill-suited to these settings, typically assuming stationarity and incurring substantial training overhead. To bridge this gap, we propose \textbf{ALSO} (\textbf{A}dversarial on\textbf{L}ine \textbf{S}trategy \textbf{O}ptimization), the first framework for online strategy optimization in multi-agent social simulation. ALSO advances social adaptation through two key contributions. (1) ALSO formulates multi-turn interaction as an adversarial bandit problem, where combinations of static personas and dynamic strategy instructions are treated as arms, providing a principled solution to non-stationarity without relying on environmental stability assumptions. (2) To predict rewards and generalize sparse feedback in multi-turn dialogues, ALSO introduces a lightweight neural surrogate to predict rewards from interaction histories, enabling sample-efficient exploration and continuous online adaptation. Experiments on the Sotopia benchmark demonstrate that ALSO consistently outperforms static baselines and existing optimization methods in dynamic environments, validating the effectiveness of adversarial online strategy optimization for building robust social agents.2026-05-15T09:25:15ZAccepted at ICML 2026Xiang LiLiping YiMingze KongMin ZhangZhongxiang DaiQingHua Huhttp://arxiv.org/abs/2605.19367v1Locked Out at 8,000 Miles: Why UK-China Partnership Students Are Suffering2026-05-19T04:59:59ZUniversity cybersecurity protocols have intensified dramatically in response to rising threats of data breaches, ransomware, and credential theft. While necessary, these measures have created a parallel crisis of accessibility - even for students physically on campus. This paper argues that domestic, on-campus students already face significant barriers: mandatory multi-factor authentication (MFA), device compliance rules, browser and operating system restrictions, and administrative remote-management permissions on personal phones and laptops. However, these difficulties are magnified to near-breaking point in the context of international partnerships, such as the increasingly common UK-China transnational education programmes. For a student in China accessing a UK university's virtual learning environment (VLE) from an 8-hour time difference, with no on-hand IT support during their active hours, the same security architecture becomes functionally disabling. Drawing on testimonies from public forums (Reddit's r/college, r/UniUK, r/Professors), higher education IT help boards, and student accounts from UK-China partnership programmes, this paper documents how over-engineering digital security disproportionately harms remote international learners. We show that while on-campus students can at least visit an IT desk or borrow a library terminal, their counterparts in partner institutions abroad face authentication failures, device lockouts, and unsupported browsers with no real-time remedy. The paper concludes that current university security models assume a co-located, 9-to-5, English-time-zone user - an assumption that fails both domestic students and, catastrophically, international partnership cohorts.2026-05-19T04:59:59ZBenjamin Kenwrighthttp://arxiv.org/abs/2605.20279v1The Economics of Model Collapse: Equilibrium, Welfare, and Optimal Provenance Subsidies in Synthetic Data Markets2026-05-19T04:41:39ZGenerative artificial intelligence is rapidly transforming the supply side of training data: an increasing share of new tokens, images, and structured records is produced by previous-generation models rather than by human originators. Recursive training on such synthetic content induces a measurable and often irreversible loss of distributional fidelity, a phenomenon known as model collapse. We develop the first unified microeconomic theory of synthetic data markets under model collapse. We introduce the Synthetic Data Contamination Equilibrium (SDCE), prove existence and generic uniqueness, derive a welfare decomposition W = W_prod + W_cons - L_coll - L_info, establish a Wasserstein-gradient-flow mean-field collapse limit, prove an impossibility of information-constrained implementation, and obtain closed-form expressions for the welfare-maximizing provenance subsidy s* = KL(q||p)/(2 kappa) and the welfare-maximizing watermark strength w* = (1 - psi) KL(q||p)/(2 kappa psi). We prove an information-theoretic Cramer-Rao lower bound on any provenance estimator using only producer-side observations and show that the Provenance-Market Iterative Retraining (PMIR) algorithm attains this bound up to constants while converging to an epsilon-SDCE in O(epsilon^-2 log T) iterations. A reduced-form OLS estimation on a C4-synthetic benchmark over ten retraining generations yields a collapse-rate coefficient b-hat = 0.181 (HAC s.e. 0.024), within one standard error of the structural prediction 0.183. Calibrated experiments raise generation-ten model quality by 23.1 percent over the unregulated benchmark while lowering the 2-Wasserstein drift on a held-out diversity probe from 0.318 to 0.142. Scaling experiments over generations t in {1,...,10} recover a logarithmic-in-t collapse law log Q_t = log Q_0 - 0.183 t rho^2 with R^2 = 0.962.2026-05-19T04:41:39Z7 pages, 5 tables, 1 algorithm; IEEEtran conference format; submitted to IEEE BigData 2026Gustav Olaf Yunus Laitinen-Fredriksson Lundström-Imanovhttp://arxiv.org/abs/2605.19285v1Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection2026-05-19T03:03:32ZThe rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.2026-05-19T03:03:32ZAccepted by KDD 2026. 12 pages, 8 figures. Code: https://github.com/wangbing1416/LONSREXBing WangRui MiaoXiming LiChen ShenShaotian YanChangchun LiKaiyuan LiuXiaosong YuanJieping Yehttp://arxiv.org/abs/2605.19190v1Going PLACES: Participatory Localized Red Teaming for Text-to-Image Safety in the Global South2026-05-18T23:34:53ZDespite the global deployment of text-to-image (T2I) models, their safety frameworks are largely calibrated to a Western-centric default, creating significant vulnerabilities for the rest of the world. To embrace cultural pluralism and bring historically under-represented perspectives in T2I safety, we conduct localised community-centered red teaming studies in the Global South. Our two-fold approach prioritizes localization and participation, by focusing on secondary urban centers in these regions, and conducting community engagement and training workshops to contextualize local norms. As a result, we present PLACES, a dataset comprising over 26,000 examples of T2I model failures collected in partnership with universities in Ghana, Nigeria, and two regions of India (Karnataka and Punjab). Analysis of prompts collected reveals a wide-ranging diversity in socio-cultural and linguistic attributes, when compared to existing geography-agnostic crowdsourced red-teaming data. We observe unique adversarial patterns enabled by local cultural and linguistic nuances, and distinct clusters within region around specific themes, such as religion in India. Moreover, we uncover structural contextual gaps in existing safety frameworks by identifying novel harms showing normative dissonance (e.g., violating religious norms, ignoring local customs, and ominous symbolism). This work argues that expanding T2I safety requires moving beyond mere scale to incorporate deeply localised, participatory methodologies for data collection and contextualization. Content warning: This paper includes examples containing potentially harmful or offensive content.2026-05-18T23:34:53ZPublished at ACM Conference on FAccT 2026Charvi RastogiMukul BhutaniMinsuk KahngShamsuddeen Hassan MuhammadEvgeniia RazumovskaiaPriyanka SureshIbrahim Said AhmadCharu KaliaYaaseen MahomedMadhurima MajiMinjae LeeAlicia ParrishJessica QuayeVijay Janapa ReddiAishwarya VermaLora Aroyohttp://arxiv.org/abs/2512.14700v2Context-Aware Detection and Victim-Centered Response Generation for Online Harassment in Private Messaging2026-05-18T23:08:19ZOnline harassment is a widespread social and public health concern, yet most computational approaches for detecting and addressing harassment focus on publicly visible social media content rather than private messaging environments. Private conversations present unique challenges because harmful interactions often unfold through context-dependent, multi-turn exchanges, while victims may lack timely support during moments of harassment. In this study, we investigate how large language models (LLMs) can support both the detection of and response to online harassment in private messaging. Using a dataset of 80,053 Instagram direct messages donated by 26 adolescents aged 12-18, including youth with suicide risk factors, we first construct a human-labeled dataset of online harassment in private conversations and develop a context-aware cascading LLM classification pipeline. The proposed pipeline outperforms baseline toxicity classifiers trained primarily on public social media data. We then develop a victim-centered response framework that produces context-sensitive and psychologically-grounded AI-generated responses to online harassment messages. Human evaluators perceived the AI-generated responses as significantly more helpful than the original participant responses (95% CI: 0.767--0.815, p < .001), particularly in terms of emotional support and de-escalation. Our findings highlight the potential of context-aware and victim-centered AI systems to provide just-in-time support during harassment in private messaging environments.2025-11-28T00:18:47Z16 pages, 2 figuresPinxian LuNimra IshfaqEmma WinMorgan RoseSierra R StricklandCandice L BiernesserJamie ZelaznyMunmun De Choudhuryhttp://arxiv.org/abs/2605.19156v1How Far Are We From True Auto-Research?2026-05-18T22:20:33ZRecent auto-research systems can produce complete papers, but feasibility is not the same as quality, and the field still lacks a systematic study of how good agent-generated papers actually are. We introduce ResearchArena, a minimal scaffold that lets off-the-shelf agents (Claude Code using Opus 4.6, Codex using GPT-5.4, and Kimi Code using K2.5) carry out the full research loop themselves (ideation, experimentation, paper writing, self-refinement) under only lightweight guidance. Across 13 computer science seeds and 3 trials per agent-domain pair, ResearchArena yields 117 agent-generated papers, each evaluated under three complementary lenses: a manuscript-only reviewer (SAR), an artifact-aware peer review (PR) in which agents inspect the workspace alongside the manuscript, and an human conducted meta-review. Under SAR alone the picture is optimistic: Claude Code obtains the highest score, outperforms Analemma's FARS, and matches the weighted-average human ICLR 2025 submission, suggesting that minimally scaffolded agents can produce papers that look competitive on manuscript-only review. Manual inspection, however, reveals this picture is overstated: SAR scores are poorly aligned with its actual acceptance decisions and reward plausible framing without verifying experimental substance. Under artifact-aware PR scores drop sharply, and manual auditing identifies experimental rigor as the major bottleneck, decomposing into three failure modes (fabricated results, underpowered experiments, and plan/execution mismatch) that are highly agent-dependent: Codex 5%/8% paper-vs-artifact mismatch / fabricated references versus Kimi Code 77%/72%, a $\sim$15$\times$ spread that tracks distinct research personas the agents develop. None of the 117 agent-generated papers reaches the acceptance bar of a top-tier venue. This suggests that we are still gapped from the true auto-research.2026-05-18T22:20:33ZZhengxin ZhangNing WangSainyam GalhotraClaire Cardiehttp://arxiv.org/abs/2510.08986v3CAPC-CG: A Large-Scale, Expert-Directed LLM-Annotated Corpus of Adaptive Policy Communication in China2026-05-18T22:12:24ZWe introduce CAPC-CG, the Chinese Adaptive Policy Communication (Central Government) Corpus, the first open dataset of Chinese policy directives annotated with a five-color taxonomy of clear and ambiguous language categories, building on Ang's theory of adaptive policy communication. Spanning 1949-2023, this corpus includes national laws, administrative regulations, and ministerial rules issued by China's top authorities. Each document is segmented into paragraphs, producing a total of 3.3 million units. Alongside the corpus, we release comprehensive metadata, a two-round labeling framework, and a gold-standard annotation set developed by expert and trained coders. Inter-annotator agreement achieves a Fleiss's kappa of K = 0.86 on directive labels, indicating high reliability for supervised modeling. We provide baseline classification results with several large language models (LLMs), together with our annotation codebook, and describe patterns from the dataset. This release aims to support downstream tasks and multilingual NLP research in policy communication.2025-10-10T04:11:57ZAccepted for publication in the Proceedings of ACL Main 2026Bolun SunCharles ChangYuen Yuen AngRuotong MuYuchen XuZhengxin ZhangPingxu Haohttp://arxiv.org/abs/2605.19141v1GRASP: Deterministic argument ranking in interaction graphs2026-05-18T21:49:02ZLarge language models are increasingly deployed as automated judges to evaluate the strength of arguments. As this role expands, their legitimacy depends on consistency, transparency, and the ability to separate argumentative structure from rhetorical appeal. However, we show that holistic judging - a common LLM-as-a-Judge practice where a model provides a global verdict on a debate - suffers from substantial inter-model disagreement. We argue that this instability arises from collapsing a debate's complex interaction structure into a single opaque score. To address this, we propose GRASP (Gradual Ranking with Attacks and Support Propagation), a deterministic framework that aggregates stable local interaction judgments into a global ranking via a convergent attack--defense propagation operator. We show that local interaction judgments are more reproducible than holistic rankings in LLM-as-a-Judge evaluations, allowing GRASP to produce more consistent global rankings. We further show that GRASP scores do not correlate with human "convincingness" labels, highlighting a vital sociotechnical distinction: GRASP does not measure persuasion, factuality, or rhetorical appeal, but structural sufficiency - a defense-aware notion of argument robustness over the explicit interaction graph. Overall, GRASP offers a transparent and auditable alternative to holistic LLM judging.2026-05-18T21:49:02ZPreprintDiganta MisraAntonio OrvietoRediet AbebeVolkan Cevherhttp://arxiv.org/abs/2604.27245v2Addressing the Reality Gap: A Three-Tension Framework for Agentic AI Adoption2026-05-18T20:55:38ZGenerative AI has rapidly entered education through free consumer tools, outpacing the ability of schools and universities to respond. Now a new wave of more autonomous agentic AI systems--with the capacity to plan and act towards goals--promises both greater educational personalization and greater disruption. This chapter argues that successfully navigating these innovations requires balancing three core tensions: (1) Implementation Feasibility, or the practical capacity to integrate AI sustainably into real classrooms; (2) Adaptation Speed, or the mismatch between fast-evolving AI capabilities and the slower pace of educational change; and (3) Mission Alignment, or the need to ensure AI applications uphold educational values such as equity, privacy, and pedagogical integrity. First, we review early evidence of generative and agentic AI in various sectors and in frontline education to illustrate these tensions in context. Then, we present a three-tension framework to guide decision-makers in evaluating and designing AI initiatives across K-12 and higher education. We provide examples of how the framework can be applied to plan responsible AI deployments, and we identify emerging trends--such as curriculum-linked AI agents and educator-informed AI design--along with open research directions. We conclude the chapter with recommendations for educational leaders to proactively engage with the opportunities and challenges of AI, so that this technology can be harnessed to enhance teaching and learning in the decade ahead.2026-04-29T22:33:36ZThis is a preprint version of an edited book chapter to appear in Mayrath, M., J. Behrens, D. Robinson, (eds) (2026). Handbook of Generative AI in Education: Integrating Research into Practice, SpringerJason FournierImagine LearningKacper ŁodzikowskiAdam Mickiewicz University, Poznań, Polandhttp://arxiv.org/abs/2504.07756v2Artificial Intelligence, conceptual metaphors and conceptual engineering: Are AI-based framings of human behaviour and cognition successful?2026-05-18T20:31:52ZUnderstanding human behaviour, neuroscience and psychology using concepts from the domain of AI is increasing in popularity. Given the massive integration of AI technologies into our daily lives, AI-related concepts are being used to compare AI systems with human behaviour, brain functions, and cognitive abilities like language acquisition. But scientists and philosophers are also increasingly tempted to take the AI-framing of the human conceptual domain as a literal one. This paper investigates the epistemic and practical success of these 'AI-framings': What does it mean to apply the conceptual constellation of AI to the human conceptual domain? We consider and compare two possible answers: either these examples are conceptual metaphors, or they are attempts at conceptual engineering. Firstly, we argue that when viewed as conceptual metaphors, the AI-framed descriptions risk committing the ''map-territory fallacy''. Secondly, we argue the comparisons also contain a misleading 'double metaphor' because of the metaphorical connection between human psychology and computation at the conceptual foundation of computation. But we also argue that there is a possible semantic catch to the AI-framing, which is captured by the conceptual engineering view. This is that the AI-framings point towards avenues for forms of conceptual engineering. If the challenges of conceptual ethics and reductionism are overcome, some AI-framings might enrich our epistemic and practical lives. So, at its worst - as implicit conceptual metaphor - the AI-framing leads us completely astray; at its best, it prompts us to reflect anew on how the boundaries of our current concepts serve us and how they could be improved.2025-04-10T13:55:32ZWarmhold Jan Thomas MollemaThomas Wachterhttp://arxiv.org/abs/2605.19045v1Beyond Nutrition Labels: How Analogical Reasoning Shapes Synthetic Media Disclosure Design2026-05-18T19:10:22ZAs synthetic media proliferates, AI policymakers and practitioners have increasingly turned to disclosures--signals describing how media has been created or modified by AI--to help audiences evaluate media credibility. While there is a growing body of research on user interpretations, the upstream decision-making processes that affect users remain underexplored. This study therefore examines how AI policymakers and practitioners design synthetic media disclosures under complex sociotechnical constraints. Drawing on 23 expert interviews and 13 case studies from organizations participating in the Partnership on AI's Synthetic Media Framework, analysis identifies key disclosure goals, including process transparency and harm reduction, and two central tensions that emerge when pursuing those goals: normativity versus neutrality and proactivity versus precision. Findings highlight the role of analogical reasoning, from nutrition labels to Prop 65 warnings, in managing, but not resolving tensions. Ultimately, this study emphasizes the need for scholarship focused on AI transparency decision-makers and their use of analogical reasoning to support audiences encountering media in the AI age.2026-05-18T19:10:22Z18 pages, 3 tablesClaire R. Leibowicz