https://arxiv.org/api/DGdybkzfaPKLn/xz3S60ggM0U1s 2026-06-21T18:53:33Z 28997 720 15 http://arxiv.org/abs/2606.12439v1 Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots 2026-05-18T02:02:54Z

Large language model (LLM) answer engines are increasingly used for information seeking, shifting visibility from ranked lists to synthesized answers. This enables Generative Engine Optimization (GEO), which targets LLM answer engines' evidence pool and generation. We analyze the search engine optimization (SEO) to GEO transition to identify two risks: (i) concentrated influence from low contestability and system sensitivity, and (ii) undisclosed commercial influence embedded in evidence and reasoning. We then formalize a general GEO pipeline to locate where optimization acts and compare academic and industry practices, revealing a third risk: (iii) academic-industry blind spots driven by visibility and evaluation asymmetries between offline setups and deployed systems. This position argues the need for answer-level governance and measurement: stronger contestability, high-precision disclosure, black-box auditing of material influence, and deployment-aligned metrics for exposure persistence.

2026-05-18T02:02:54Z This paper is accepted by the ICML 2026 Position Track https://icml.cc/virtual/2026/poster/67185 Yizhu Wen Nan Zhang Haohan Yuan Xun Chen Haopeng Zhang Hanqing Guo http://arxiv.org/abs/2605.17712v1 ChatGPT vs Teachers vs Students: Large-Scale Analysis of Generative AI Discourse in Education Communities on Reddit 2026-05-18T00:26:33Z

Generative Artificial Intelligence (GenAI) has prompted significant discussion in education, yet large-scale empirical evidence on how students and teachers perceive and navigate this shift remains limited. We analyse 270k AI-related Reddit posts and comments from 26 education-related subreddits spanning higher education, K-12 teaching, and professional training between November 2022 and April 2026. Topic modelling reveals seventeen themes covering academic integrity, teaching & pedagogy, career anxiety, policy, and niche professional contexts. Discourse evolves from an early detection-and-evasion arms race into a sustained enforcement regime that constructive integration only begins to challenge in mid-2024. Stakeholder communities differ sharply: K-12 teachers foreground cognitive dependency, academics focus on AI detection and deliberation, and professional-programme students concentrate on career anxiety. Sentiment correlates strongly negatively with engagement, showing adversarial enforcement themes mobilise communities far more than constructive integration discourse. Examining where faculty and students meet, we find 17% of threads are cross-role, and one third of such contact occurs in the adversarial themes AI Detection and Misconduct Enforcement. Students initiate 68% of mixed threads, but faculty produce most cross-role replies. Mixed threads contain 2-3 times more records and last 2-4 times longer than same-role threads, making adversarial integrity disputes the center of sustained faculty-student contact. We discuss implications for governance, pedagogical design, and cross-role contact design. The code and data is available at https://github.com/tugrulz/genai-edu

2026-05-18T00:26:33Z Pelin Yüce Xiangruo Dai Rebecca Owens Tuğrulcan Elmas http://arxiv.org/abs/2605.17697v1 Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management 2026-05-17T23:36:31Z

Cities are increasingly turning to large-scale data analysis and machine learning to make consequential decisions. While the algorithmic fairness community has focused on analyzing the risks and benefits associated with these complex methods, there has been much less scrutiny of the many simpler, but still widely used, data-driven tools that support government decision-making in a variety of settings. In this work, we study hand-crafted indices for geographic targeting and decision-making in emergency management -- a field responsible for coordinating preparedness and response efforts to hazards ranging from natural disasters to human threats. Indices, which capture abstract principles and overarching priorities (e.g., reducing social vulnerability), are low-complexity models that statistically aggregate chosen variables. They are generally flexible and interpretable, but can also be sensitive to key design choices and require strong assumptions. Through a case study of decision-making for extreme heat emergencies in NYC, we examine the challenges that practitioners may face in selecting an index for preparedness and response actions. We map empirical findings from index-based simulations to concerns related to validity and reliability from the measurement literature and show via sensitivity analyses that different reasonable choices of input variables or spatial scale can result in substantive differences to index risk scores, thereby affecting downstream government decision-making. We contrast these challenges with considerations for developing predictive algorithms that more narrowly relate to concrete, measurable outcomes. Ultimately, we provide generalizable recommendations that practitioners and public-sector technologists can use for navigating the trade-offs between indices and predictive algorithms in other government settings.

2026-05-17T23:36:31Z Jennah Gosciak Luke Boyce Angelina Wang Allison Koenecke http://arxiv.org/abs/2605.17676v1 Building Resilience to Misinformation: A Cross-National Development of the Digital Media and Information Literacy Scale (DMILS) 2026-05-17T22:09:16Z

Amid growing concern about information quality and credibility in digital media environments, researchers and educators still lack a concise, comprehensive yet psychometrically sound instrument for tracking the competencies that help people navigate this landscape. This article develops the Digital Media and Information Literacy Scale (DMILS), a robust and multidimensional measure that distinguishes domain (digital vs. information/news), competency type (knowledge vs. skill), and is measured through both subjective and objective items. Through two empirical studies with three nationally matched samples in the United States and Singapore (N = 1,498), we developed an 18-item self-report battery and 16-item objective knowledge questions, showing strong structural, convergent, and predictive validity, along with a short form (8 self-report and 8 objective items). By offering a parsimonious yet multidimensional yardstick, DMILS enables rigorous evaluation of media literacy interventions and supplies a common metric for cross-national research, critical for building an information ecosystem resilient to mis- and disinformation.

2026-05-17T22:09:16Z Sijia Qian Cuihua Shen Huiyi Wang Hichang Cho http://arxiv.org/abs/2605.17655v1 Disarranged Harmonization of Transparency Reporting by Social Media Platforms Under the Digital Services Act 2026-05-17T21:12:20Z

The European Commission recently introduced new regulation to harmonize transparency reporting of large online platforms under the Digital Services Act (DSA). Here, we present the first systematic evaluation of transparency reporting data quality after this normative change, for the eight largest social media platforms in the European Union. In detail, we run a set of large-scale quantitative analyses on key reporting dimensions, followed by a structured comparative assessment across platforms and reporting mechanisms. Among our findings is that: (i) the analyzed platforms had varying degrees of compliance and data quality, but all exhibited issues on data formatting, timeliness, consistency, and completeness; (ii) some platforms employed differing reporting procedures across mechanisms, which caused them to submit contrasting information; (iii) despite the harmonization, a number of issues still prevent interoperability between reporting mechanisms; and (iv) many of the previously identified issues with transparency reporting are still unresolved. We conclude by discussing implications for transparency auditing and proposing key targeted improvements to strengthen the reliability and interoperability of DSA transparency reporting.

2026-05-17T21:12:20Z Amaury Trujillo Benedetta Tessa Stefano Cresci http://arxiv.org/abs/2605.17634v1 AI Agents May Always Fall for Prompt Injections 2026-05-17T19:55:39Z

Prompt injection is the most critical vulnerability in deployed AI agents. Despite recent progress, we show that the prevailing defense paradigm (data-instruction separation) both fails to detect attacks that operate through contextual manipulation and degrades contextually appropriate behavior. We then recast prompt injection via the lens of Contextual Integrity (CI), a privacy theory that judges information flow compliance with contextual norms. This explains types of attacks that current defenses attempt to patch and predict advanced ones future agents will face. We develop unique benign and attack scenarios that force an agent to violate the norms by (1) misrepresenting the flow, (2) manipulating norms, or (3) mixing multiple flows. This reframing suggests an impossibility result: an adversary can always construct a context under which a blocked flow appears legitimate, or a defender who tightens norms will block genuinely legitimate flows. Our findings suggest that current research addresses a shrinking fraction of future attack surfaces. Instead, through CI, we offer a principled framework for evaluating context-sensitive failures, and designing CI-aware alignment for the frontier autonomous agents.

2026-05-17T19:55:39Z Sahar Abdelnabi Eugene Bagdasarian http://arxiv.org/abs/2601.14506v3 Compounding Disadvantage: Auditing Intersectional Bias in LLM-Generated Explanations Across Indian and American STEM Education 2026-05-17T18:39:30Z

Large language models are increasingly deployed in STEM education for personalized instruction and feedback across institutions in high- and low-income countries. These systems are designed to adapt content to student needs, but whether they adapt based on demonstrated ability or demographic signals remains untested at scale. Here we establish that LLM-generated STEM content systematically disadvantages marginalized student profiles across two cultural contexts, with the gap between the most privileged and most marginalized profiles reaching 2.55 grade levels. We audited four LLMs (Qwen 2.5-32B-Instruct, GPT-4o, GPT-4o-mini, GPT-OSS 20B) using synthetic profiles crossing dimensions specific to Indian education (caste, medium of instruction, college tier) and American education (race, HBCU attendance, school type), alongside income, gender, and disability, across ranking and generation tasks with FDR-corrected significance testing and SHAP feature attribution. Income produces significant effects across every model and context, medium of instruction drives the largest single effect in the Indian context, and disability status triggers simpler explanations. Effects compound non-additively: marginalization across multiple dimensions produces gaps larger than any single dimension predicts, and biases persist within elite institutions. Bias is consistent across all four architectures and persists through model selection, making intersectional, cross-cultural auditing a structural requirement before deployment.

2026-01-20T21:58:45Z Amogh Gupta Neil Niharika Patil Neil Sourojit Ghosh Neil SnehalKumar Neil S Gaikwad http://arxiv.org/abs/2507.04996v10 Agentic Vehicles for Human-Centered Mobility: Definition, Prospects, and Synergistic Co-Development with Vehicle Autonomy 2026-05-17T15:38:50Z

Autonomy, from the Greek autos (self) and nomos (law), refers to the capacity to operate according to internal rules without external control. Autonomous vehicles (AuVs) are therefore understood as vehicular systems that perceive their environment and execute tasks with minimal human intervention, consistent with the direction indicated by the SAE levels of automated driving. However, recent research and deployments increasingly showcase vehicular capabilities that, while not contradicting autonomy, are not entailed by it, including ambiguous goal handling, purposeful social engagement, external tool use, proactive problem solving, continuous learning, and context-sensitive reasoning in unseen and ethically salient situations, enabled in part by multimodal language models. These developments reveal a gap between technical autonomy and the broader social cognitive functions required for human-centered mobility, which are more precisely captured by the notion of agency. Therefore, rather than adding increasingly elaborate modifiers to "autonomous," we introduce agentic vehicles (AgVs) and suggest that autonomy and agency are intertwined but conceptually distinct: if autonomy concerns what to do and how to do it (task executions under internal rules), agency pertains to why to do it and what else can be done (goal-directed, adaptive actions). We present autonomy and agency as orthogonal yet synergistic dimensions with co-development implications. Vehicle agency marks a novel dimension of mobility service intelligence, heralding vehicles as purposeful actors in society.

2025-07-07T13:34:49Z Jiangbo Yu Raphael Frank Luis Miranda-Moreno Sasan Jafarnejad Jonatas Augusto Manzolli Fuqiang Liu Jiyao Wang Ali Eslami http://arxiv.org/abs/2605.17463v1 Teachers' Vocal Expressions and Student Engagement in Asynchronous Video Learning 2026-05-17T14:06:23Z

Asynchronous video learning, including massive open online courses (MOOCs), offers flexibility but often lacks students' affective engagement. This study examines how teachers' verbal and nonverbal vocal emotive expressions influence students' self-reported affective engagement. Using computational acoustic and sentiment analysis, valence and arousal scores were extracted from teachers' verbal vocal expressions, and nonverbal vocal emotions were classified into six categories: anger, fear, happiness, neutral, sadness, and surprise. Data from 210 video lectures across four MOOC platforms and feedback from 738 students collected after class were analyzed. Results revealed that teachers' verbal emotive expressions, even with positive valence and high arousal, did not significantly impact engagement. Conversely, vocal expressions with positive valence and high arousal, such as happiness and surprise, enhanced engagement, while negative high-arousal emotions, such as anger, reduced it. These findings offer practical insights for instructional video creators, teachers, and influencers to foster emotional engagement in asynchronous video learning.

2026-05-17T14:06:23Z 34 pages, 1 figure Suen, H. Y., and Su, Y. S. (2025). Teachers' Vocal Expressions and Student Engagement in Asynchronous Video Learning. International Journal of Human-Computer Interaction, 41(21), 13483-13494 Hung-Yue Suen Yu-Sheng Su 10.1080/10447318.2025.2474469 http://arxiv.org/abs/2605.17461v1 Artificial Intelligence can Recognize Whether a Job Applicant is Selling and/or Lying According to Facial Expressions and Head Movements Much More Correctly Than Human Interviewers 2026-05-17T14:03:08Z

Whether an interviewee's honest and deceptive responses can be detected by facial expression signals in videos has been debated and requires further research. We developed deep learning models enabled by computer vision to extract temporal patterns of job applicants' facial expressions and head movements to identify self-reported honest and deceptive impression management (IM) tactics from video frames in real asynchronous video interviews. A 12- to 15-minute video was recorded for each of N=121 job applicants as they answered five structured behavioral interview questions. Each applicant completed a survey to self-evaluate their trustworthiness on four IM measures. Additionally, a field experiment was conducted to compare the concurrent validity associated with self-reported IMs between our modeling approach and human interviewers. Human interviewers' performance in predicting these IM measures from another subset of 30 videos was obtained by having N=30 human interviewers evaluate three recordings. Our models explained 91% and 84% of the variance in honest and deceptive IMs, respectively, and showed stronger correlations with self-reported IM scores than human interviewers.

2026-05-17T14:03:08Z 11 pages, 5 figures IEEE Transactions on Computational Social Systems, 11(5), 5949-5960, 2024 Hung-Yue Suen Kuo-En Hung Che-Wei Liu Yu-Sheng Su Han-Chih Fan 10.1109/TCSS.2024.3376732 http://arxiv.org/abs/2604.23538v3 Analysis of Personal Data Exposure in Thailand 2026-05-17T10:24:01Z

In the digital era, personal data, particularly sensitive identifiers such as the Social Security Number and National Identification Number, has become a highly valuable asset, raising significant concerns regarding privacy and security. This study examines the risks associated with the online exposure of the Thai National Identification Number, a key element of identity verification in both governmental and commercial transactions. Similar to the Social Security Number in the United States, this unique identifier is crucial for various legal, financial, and welfare-related activities. However, the increasing digitization of personal records has heightened its vulnerability to unauthorized access and misuse, particularly through search engines that inadvertently index sensitive information. This research identifies publicly exposed Thai National Identification Numbers across major search engines, assessing the potential threats to individual privacy and national security. The study reveals the exposure of over 1.2 million unique National Identification Numbers, along with other highly sensitive personal data, e.g., addresses, contact details, employment status, disability status, and health information. Notably, the analysis indicates that a significant majority of these exposures originate from the Thai government sector websites, highlighting critical vulnerabilities in public data management practices. This widespread exposure not only increases the risk of identity theft and financial fraud but also underscores the urgent need for enhanced cybersecurity measures, stricter regulatory enforcement, and improved data governance within government agencies to prevent future breaches. Addressing these issues is essential to safeguarding citizens' personal information and ensuring compliance with Thailand's data protection laws in an increasingly digitized world.

2026-04-26T05:22:28Z Accepted for publication in the International Journal of Information Security, April 30, 2026 Suphannee Sivakorn Sasawat Malaivongs Nuttaya Rujiratanapat 10.1007/s10207-026-01270-w http://arxiv.org/abs/2605.17353v1 You Can't Fool Us: Understanding the Resilience of LLM-driven Agent Communities to Misinformation 2026-05-17T09:45:33Z

Misinformation resilience is a dynamic community process: communities differ not only in whether they initially trust false claims, but also in how they recover through interaction, questioning, correction, and support withdrawal. We study this process with an LLM-based agent simulation that constructs synthetic communities along two theoretically motivated dimensions: Actively Open-minded Thinking (AOT), which captures evidence-seeking and willingness to revise beliefs, and Political Ideology (PI), which captures identity-based interpretation of contested claims. These two traits allow us to examine how evidence-oriented reasoning and ideological alignment jointly shape community responses to credible misinformation shocks. Across systematically varied AOT-PI communities, we find that higher AOT improves both resistance to misinformation uptake and recovery after trust peaks. PI shapes the recovery pathway: ideologically moderate communities recover more reliably, while polarized communities retain more residual support. Stance-level analysis shows that resilience depends on whether agents move from questioning a claim to denying or correcting it and withdrawing prior support. Intervention experiments further show that persuasion and fact checking better support post-peak correction, whereas accuracy prompts mainly induce early caution and source warnings have weaker effects. Together, this work provides a mechanism-level account of community misinformation resilience, showing how psychological composition and intervention design shape whether communities move from misinformation exposure toward correction or persistent support.

2026-05-17T09:45:33Z 26 pages, 7 figures, 1 table Chichen Lin Yijie Jin Kangbo Hu Weijian Fan Han Xiao Yongbin Wang Zhihui Ying Zhanzhan Zhao http://arxiv.org/abs/2605.17347v1 Position: Age Estimation Models Do Not Process Biometric Data 2026-05-17T09:37:28Z

When a neural network estimates someone's age from a photograph, does it process biometric data? The answer depends on whether identity-discriminative representations arise within the network during inference, a question that may seem trivial to ML researchers but triggers consent requirements under GDPR, statutory damages under BIPA, or high-risk AI classification under the EU AI Act. Yet no regulatory guidance addresses it. This position paper provides empirical evidence: 14 models evaluated across 3 face verification benchmarks show age estimators fall orders of magnitude short of identification thresholds. Age estimation models cannot identify individuals. We call on researchers to provide transparency about what systems store and can do, and on regulators to distinguish transient processing from template storage.

2026-05-17T09:37:28Z 11 pages, 3 figures, 3 tables. Accepted as a position paper at the 43rd International Conference on Machine Learning (ICML 2026) Nikita Marshalkin http://arxiv.org/abs/2605.17317v1 Jurisdiction over Ubiquitous Copyright Infringements: Should Right-Holders Be Allowed to Sue at Home? 2026-05-17T08:17:02Z

The Internet, and more recently cloud computing, has transformed the technological, economic, social, and cultural conditions under which intellectual property rights are exploited. These developments also challenge traditional rules of private international law, particularly rules governing international jurisdiction. This paper examines when courts should assert jurisdiction over cross-border copyright disputes arising in cloud-based environments. It focuses on the risks faced by right holders and digital intermediaries when allegedly infringing content is stored, transmitted, or accessed across multiple states. The paper first explains how cloud computing changes the exploitation of intellectual property assets and complicates the identification of territorial connecting factors. It then analyzes the main jurisdictional principles applied by courts in common law and civil law systems, with particular attention to subject-matter jurisdiction, personal jurisdiction, and infringement-based jurisdiction. The paper argues that the territorial fragmentation of copyright law sits uneasily with the realities of ubiquitous online infringement. It therefore asks whether existing jurisdictional doctrines remain suitable for cloud-related disputes and whether, in some circumstances, right holders should be permitted to sue before the courts of their home state or center of economic interests. The paper concludes by discussing related work undertaken by a special committee of the International Law Association on intellectual property and private international law.

2026-05-17T08:17:02Z Kyushu University Legal Research Bulletin 2015 Paulius Jurcys Toshiyuki Kono http://arxiv.org/abs/2605.12824v2 Mechanism Plausibility in Generative Agent-Based Modeling 2026-05-17T05:34:27Z

Large language models (LLMs) can generate high-level diverse phenomena without explicitly programmed rules. This capability has led to their adoption within different agent-based models (ABMs) and social simulations. Recent studies investigate their ability to generate different phenomena of interest, for example, human behavior on social media platforms or alien behavior in game-theoretic scenarios. However, capability, prediction, and explanation are different--drawing from the philosophy of science and mechanisms literature, explanation requires showing, to some degree, how a phenomenon is produced by related organized entities and activities. For modelers, describing the characteristics of an experiment or whether a simulation provides progress in capability (or explanation), can be difficult without being grounded in potentially distant research areas. We integrate recent work on LLM-ABMs with contemporary philosophy of science literature and use it to operationalize a definition of 'plausibility' in a four-level scale. Our scale separates the evaluation of a model's generative sufficiency (ability to reproduce a phenomenon) from its mechanistic plausibility (how the phenomenon could be produced), and clarifies the distinct roles of different models, such as predictive and explanatory ones. We introduce this as the Mechanism Plausibility Scale.

2026-05-12T23:46:39Z Accepted at ACM FAccT 2026 Patrick Zhao David Huu Pham Nicholas Vincent 10.1145/3805689.3812388