https://arxiv.org/api/pPJwZrpHFtvJ5zZFlJ8wqwJ3aEA 2026-06-19T06:44:03Z 28997 570 15 http://arxiv.org/abs/2605.25416v1 The Traffickers' Pitch: Detecting Deceptive Recruitment in Online Job Boards 2026-05-25T04:32:22Z

While substantial efforts in anti-trafficking research and practice have focused on identifying and assisting victims after exploitation occurs, comparatively less attention has been paid to preventing victimization at the recruitment stage. Although some platforms offer preventive tools, such as background checks triggered by in-person meeting detection, these measures primarily protect potential victims rather than directly limiting traffickers' recruitment activities. In this paper, we propose a computational framework to identify human trafficking recruiters through their linguistic features and to characterize their online recruitment patterns. We introduce a network-driven labeling method to construct large-scale ground truth for trafficking-at-risk job advertisements. Our results reveal significant linguistic differences between safe and risky advertisements and demonstrate that language models and embedding representations behave distinctly across these linguistic spaces. Building on these insights, we propose a multi-model ensemble classifier to improve the detection of trafficking-at-risk job ads. Finally, we analyze the geographic, gender, industry, and contact-method preferences of trafficking recruiters, revealing systematic patterns in recruitment strategies.

2026-05-25T04:32:22Z Siyi Zhou Peiran Qiu Tanishq Salkar Leonardo Blas Urrutia Dacheng Shen Deyang Hsu Eun Cheol Choi Emilio Ferrara http://arxiv.org/abs/2605.25415v1 LLM-as-a-Reviewer: Benchmarking Their Ability, Divergence, and Prompt Injection Resistance as Paper Reviewers 2026-05-25T04:32:13Z

Large language models (LLMs) are increasingly used in academic peer review, yet their reliability, alignment with human judgment, and robustness to adversarial attacks remain poorly understood. We present a systematic benchmark of LLM-as-a-Reviewer on 898 papers stratified from NeurIPS and ICLR, evaluating 12 LLMs along three axes: rating calibration, divergence from human reviewers, and resistance to prompt injection embedded via an invisible font-mapping attack. We find that LLMs systematically overrate weaker submissions and diverge from humans in topical emphasis, under-flagging Clarity and over-flagging Reproducibility, while producing reviews two to three times longer with lower lexical diversity and a more standardized vocabulary. Prompt injection remains highly effective. Simple hidden instructions can promote low-scoring papers to acceptance-level ratings in a substantial fraction of cases, with effectiveness varying sharply across model families. While LLMs offer utility in structuring evaluations, their integration into peer review requires safeguards against both intrinsic biases and adversarial risks.

2026-05-25T04:32:13Z Lingyao Li Junjie Xiong Changjia Zhu Runlong Yu Chen Chen Junyu Wang Renkai Ma Zhicong Lu http://arxiv.org/abs/2605.25372v1 Routed Closure: Rethinking Value Capture in Decentralized Ecosystems 2026-05-25T02:57:11Z

A decentralized ecosystem can capture value and still fail to fund the actors who keep it running. Users may pay fees, tokens may appreciate, issuers may earn revenue, and protocols may burn value, but none of these facts by itself shows that authors, miners, validators, suppliers, storage providers, or other critical participants are actually compensated. This paper argues that traditional value-capture analysis often assumes a centralized pool: once value is captured, it can be reallocated through budgets, contracts, payroll, or managerial discretion. Decentralized ecosystems do not have this default pool. They require routed closure: captured value must pass through a verifiable route to a specified critical incentive recipient, and it must be sufficient relative to that recipient's reward requirement. We formalize this distinction through Route-Admissible Value and operationalize it with the External Value Routing Closure protocol. A contrast set including YouTube, Steem/Steemit, Bitcoin, Ethereum, Aave, Filecoin, USDC, and XRP shows why revenue, fees, burns, token prices, or market capitalization should not be mistaken for sustainable incentive funding.

2026-05-25T02:57:11Z Xubin Luo http://arxiv.org/abs/2605.25358v1 AI-Associated Lexical Shifts Across 34 Languages: Cross-Lingual Convergence and Diachronic Uptake in News Writing 2026-05-25T02:24:46Z

AI-associated lexical shifts have been documented mainly in Scientific English. We extend this work to 34 languages in the WMT News Crawl corpus, refining a split-halves continuation diagnostic that compares GPT-4.1 continuations with matched human gold-standard text. For each language, we derive ranked AI-overused lemmas using log prevalence ratios. We find substantial cross-lingual semantic convergence: semantically related concepts recur across typologically diverse languages, with 'emphasize'-type verbs appearing in 24 of 34 languages. Embedding-based and manual analyses support this pattern. We also examine diachronic uptake in news writing before and after ChatGPT's release. Tracking each language's top 20 AI-overused items, we find prevalence increases in 26 of 34 languages from 2020-2021 to 2023-2024, with a mean change of +15.1%, whilst matched baseline words show no comparable increase (-4.5%). In 10 languages with longer historical coverage, longitudinal analyses show post-2022 increases that exceed the modest shifts observed in earlier periods, though with smaller effect sizes than in Scientific English. We validate our approach extensively, including across seeds, model variants, data sizes, model families, and more. Our findings are consistent with the view that AI-associated lexical preferences extend beyond English and may exert cross-lingual homogenising pressure on global language use.

2026-05-25T02:24:46Z 19 pages (9-page main body, plus references and appendices), 3 figures; ACL ARR reviewed, committed to EMNLP 2026 Thomas Stephan Juzek http://arxiv.org/abs/2605.25273v1 LLM-as-a-Judge in Healthcare: A Scoping Analysis of Applications, Methods, and Human Alignment 2026-05-24T21:59:32Z

Large language models (LLMs) are increasingly deployed across healthcare applications, including clinical documentation, diagnostic reasoning, medicine recommendation, and medical education. Their outputs are largely unstructured clinical text, which is difficult to reliably evaluate at scale. LLM-as-a-Judge, in which an LLM evaluates another system's output against task-specific criteria, offers a scalable alternative and is increasingly used in clinical evaluation, yet its validity in healthcare remains underexamined. Existing reviews focus on general-purpose LLM evaluation or on risk framework, rather than systematically characterizing how LLM-as-a-Judge is applied in healthcare and how well their judgments align with human experts. We therefore conduct a PRISMA-guided comprehensive review of LLM-as-a-Judge applications in healthcare, searching five databases for studies published between January 2023 and February 2026. After screening 541 records, 134 studies meet the eligibility and are coded by health scenario, judge configuration, technical approach, and validation design. LLM-as-a-Judge is concentrated in clinical decision support, clinical natural language processing (NLP), medical knowledge and question answering (QA), and medical communication. OpenAI models are the most frequently used judges, and prompt engineering appears in nearly all studies, with ensemble, multi-agent, and retrieval-augmented designs as common extensions. Among studies reporting human validation, LLM judges often show moderate to strong alignment with expert judgments, although reliability varies substantially across tasks. Overall, this review positions LLM-as-a-Judge as a promising framework for scalable healthcare AI evaluation, while emphasizing that its clinical value depends on model design and rigorous validation.

2026-05-24T21:59:32Z Lingyao Li Deyi Li Chen Chen Renkai Ma Runlong Yu Mingquan Lin Rui Yin Lizhou Fan Cathy Shyr Siyuan Ma Mei Liu Steven Bethard http://arxiv.org/abs/2605.25272v1 AI Cartography: Mapping the Latent Landscape of AI Benchmark Ecosystems 2026-05-24T21:59:08Z

While aggregate leaderboard scores drive AI development, they contain substantial measurement noise whose sources and magnitudes remain unquantified, making it unclear when rankings reflect genuine capability differences versus evaluation artifacts. We introduce a framework for measuring the latent landscape in AI benchmark ecosystems. Applying Confirmatory Factor Analysis (CFA) and Generalizability Theory to 4,000+ models from the Open LLM Leaderboard, we decompose sources of ranking variance and establish: (1) structures assumed in current reporting practice underestimate the strength of relationships between benchmarks; (2) evidence of local dependence among leaderboard items, undermining uses of benchmarks as measurement instruments under current scoring systems; (3) contributor metadata explains more rank-relevant variance ($\approx9\%$) than architecture or deployment categories in this context; (4) a manifest-score "scaling law" slope has low reliability ($R_β=0.53$); by contrast, the latent general-factor size slope is highly stable across ecosystem controls ($R_g=0.97$). We are able to provide unique insights into benchmark dynamics, such as which benchmarks are a function of LLM size and which can be oppositely impacted by post-training practices. We provide actionable diagnostics to determine how benchmark rankings can be trusted and how benchmark design can be improved.

2026-05-24T21:59:08Z Michael Hardy Anka Reuel Lijin Zhang Jodi M. Casabianca Sang Truong Yash Dave Hansol Lee Benjamin Domingue Sanmi Koyejo http://arxiv.org/abs/2605.25258v1 First, do no harm: Breaking suicidogenic echo chambers in media recommendation 2026-05-24T21:21:02Z

Recommender systems generally optimises user engagement, but this approach is dangerous in mental health contexts. When vulnerable users show signs of suicidal ideation, standard algorithms often trap them in echo chambers of harmful content, worsening their psychological state. In response, we introduce RankAid, a re-ranking method that prioritises clinical safety alongside predictive relevance. It works as an add-on layer to existing models: it penalises risky items and boosts therapeutic content depending on the user's current level of vulnerability. We evaluated this approach using the MovieLens 1M dataset, where items were semantically annotated for clinical risk and therapeutic value using large language models. Our simulations show that our algorithm successfully blocks the recommendation of harmful content during crisis peaks, actively reshaping the feed to support emotional de-escalation. Furthermore, this safety intervention only causes a controlled, acceptable drop in standard accuracy metrics like NDCG. By using asymmetric hyperparameters, RankAid also gives system administrators the flexibility to tune the severity of the intervention based on specific clinical guidelines.

2026-05-24T21:21:02Z 10 pages, 5 figures. Research on safety-aware recommender systems and algorithmic ethics Alberto Díaz-Álvarez Raúl Lara-Cabrera Fernando Ortega-Requena Víctor Ramos-Osuna http://arxiv.org/abs/2605.25196v1 Beyond Killer Robots: General AI Attitudes and Public Support for Military AI in Nine Countries 2026-05-24T17:55:58Z

AI-enabled military systems are a fixture of modern military conflict. Applications vary from autonomous drones for surveillance and attack to AI-supported target selection. The importance of AI for modern conflict shows also in public disputes between governments and technology companies over the conditions for military access to frontier AI. Both military uses and government attempts at enabling and steering them happen before a backdrop of public opinion, yet we still know little about how people think about military AI. Drawing on a preregistered survey of 9,000 respondents in nine countries, including China, Germany, and the United States, we examine whether support for military AI is shaped primarily by general attitudes toward AI, principled opposition to lethal autonomy, or foreign-policy and geopolitical orientations. Across six military AI scenarios that vary in lethality and human control, respondents who view AI as beneficial are substantially more supportive of military AI. Hawkish respondents are also more supportive. By contrast, principled opposition to lethal autonomy is not broadly associated with the full index but is related to the application of fully autonomous lethal force. Contrary to our expectation, perceived AI risks are positively associated with support. Cross-national differences are moderate and broadly consistent with geopolitical context. Overall, public opinion toward military AI appears conditionally permissive. Publics are not categorically opposed to various military uses of AI. Instead, unease is concentrated around fully autonomous lethal force.

2026-05-24T17:55:58Z Andreas Jungherr Antonia Schlude Adrian Rauchfleisch http://arxiv.org/abs/2411.00934v2 The Meme Is the Message: Generative Memesis and AI Visuals in the 2024 USA Presidential Elections 2026-05-24T13:54:59Z

Visual content on social media has become increasingly influential in shaping political discourse and civic engagement, but it also limits participation due to the increased cost of multimedia production. In tandem, the growth of generative AI provides novel ways for citizens to participate in politics by lowering these costs. Drawing on a dataset of 239,526 Instagram images, we analyze the effects of synthetic images during the 2024 United States presidential election, using a multimodal workflow combining computer vision, large language models, and facial affect analysis. Results show that meme format is a stronger predictor of engagement than AI-generated content alone. However, AI-generated memes yield a significant interaction effect, suggesting synergistic increases in engagement when synthetic imagery is integrated with memes through human curation. We also characterize how users curate images. Partisans use AI in different ways: Democrat-leaning users tend to use it for in-group support, whereas Republican-leaning users more often employ it for out-group attacks. Users generally select happier synthetic faces compared to real photographs. We define generative memesis as a mode of communication in which memes are no longer shared person-to-person, but mediated by AI through customized visuals. We discuss how generative AI may empower civic participation, the bifurcation of content production and curation, and its implications for in the history of novel technologies and participatory culture.

2024-11-01T17:35:05Z Ho-Chun Herbert Chang Benjamin Shaman Yung-chun Chen Mingyue Zha Sean Noh Chiyu Wei Tracy Weener Maya Magee http://arxiv.org/abs/2605.25055v1 Building Digital Societies as Ecosystems: How Recognition and Repeat Relationships Sustain Cross-Community Work in Open Source 2026-05-24T13:04:24Z

We measure cross-boundary collaboration in an open-source software (OSS) ecosystem by reconstructing the bipartite contributor-repository graph of 464 cybersecurity projects and 11,372 contributors active over October 2001-May 2022 (Rawsec Cybersecurity Inventory). Louvain community detection identifies 163 non-singleton communities; per-community contributor count scales superlinearly with repository count (n_contributors ~ n_repos^1.4), and community formation follows a logistic trajectory saturating around 2018. Three patterns support a recognition/repeat-relationship account of cross-boundary work. First, cross-community work concentrates in a thin carrier layer: only nine canonical humans span seven or more communities at the commit level, authoring 14% of 4,015 inter-community merged pull requests; the top 50 cross-community contributors produce 54%. Second, boundary friction is a recognition cost, not a fixed boundary property: inter-community pull-request acceptance rises from 42% at breadth k=1 to 87% at k=5-9, with median latency compressing from 147 h to 49 h. Third, community survival is cohort-structured: per-cohort residualisation hazard rises an order of magnitude between pre-2010 and 2018 cohorts, and external community reach predicts survival mainly through size, leaving late cohorts under-served despite a stable carrier layer. The corpus predates mainstream LLM coding assistants; this baseline of carrier-layer thinness, friction gradient, and cohort hazard informs debates on social coding as a template for digital societies and on what AI-mediated OSS ecosystems should not optimise away.

2026-05-24T13:04:24Z 52 pages (main text + supplementary material), 5 main figures, 13 supplementary figures, 2 main tables. Submitted to EPJ Data Science. Data and code: https://doi.org/10.17605/OSF.IO/5RWEK Lucia Gomez Tejeiro Thibaut Chataing Julian Jang-Jaccard Alain Mermoud Thomas Maillart http://arxiv.org/abs/2605.02566v2 AI-Augmented Science and the New Institutional Scarcities 2026-05-24T12:21:17Z

Competent-looking judgment, including selecting, ranking, attributing, and certifying, is now produced at scale at marginal cost approaching zero, inverting the dominant economics-of-AI reading that treats judgment as the scarce complement to cheap prediction. Scientific institutions are among those most exposed, because manufacturing legitimate judgment is their primary product rather than one input among many, so they do not merely adapt to AI; they compete with it for the same functional role. Four complements then become scarce and load-bearing for AI-augmented science: verified signal, legitimacy, authentic provenance, and integration capacity (the community's tolerance for delegated cognition). Of these four, integration capacity is the least developed for scientific institutions and the most binding: no improvement in AI tooling can buy it. The frontier for AI-augmented science is not acceleration; it is the redesign of the certifying infrastructure around these new scarcities.

2026-05-04T13:16:08Z 7 pages, 15 references, 0 figures. Companion of arXiv:2604.22966 Lauri Lovén http://arxiv.org/abs/2602.04360v2 Counterfactual Explanations for Hypergraph Neural Networks 2026-05-24T07:34:37Z

Hypergraph neural networks (HGNNs) effectively model higher-order interactions in many real-world systems but remain difficult to interpret, limiting their deployment in high-stakes settings. We introduce CF-HyperGNNExplainer, a counterfactual explanation method for HGNNs that identifies the minimal structural changes required to alter a model's prediction. The method generates counterfactual hypergraphs using actionable edits limited to removing node-hyperedge incidences or deleting hyperedges, producing concise and structurally meaningful explanations. Extensive experiments on hypergraph benchmark datasets show that CF-HyperGNNExplainer generates valid and concise counterfactuals, highlighting the higher-order relations most critical to HGNN decisions.

2026-02-04T09:34:03Z Fabiano Veglianti Lorenzo Antonelli Gabriele Tolomei http://arxiv.org/abs/2605.24842v1 Translators as Invisible Teachers of AI: Copyright, Translation Memory, and the Political Economy of Linguistic Data 2026-05-24T03:21:41Z

This paper examines how the labour of translators has been transformed into foundational data capital for the age of artificial intelligence (AI). Translation memories (TM) and parallel corpora preserve a one-to-one correspondence between source and target text and therefore constitute extraordinarily valuable supervised training data for machine translation. The development of statistical machine translation (SMT), neural machine translation (NMT), the Transformer architecture, and multilingual large language models (LLMs) cannot be disentangled from the accumulation of such translation data. And yet, translators' renditions have been bought as deliverables under contract, segmented as technical objects, and processed as "information analysis" data under copyright law -- losing their moral, creative, and economic attribution to the translators who produced them. The paper develops two concepts to capture this process. The first is appropriation without consumption: a mode of use in which works are not read, viewed, or listened to, but only mined for statistical features -- a use that is legitimated under Article 30-4 of the Japanese Copyright Act. The second is the invisible teacherisation of translators: the process by which translators, through the construction of translation memories, post-editing, and quality assessment, have functioned as teachers of AI without recognition as such. Drawing on the data supply chain that runs from translators through language service providers (LSPs) and platforms to model developers, on a comparative reading of Japanese, European, and United States legal frameworks, on the distinction between open and proprietary AI models, and on the premium status that human-generated data has acquired in the era of model collapse, the paper asks what translators are actually afraid of, and points toward concrete directions for redistributive design.

2026-05-24T03:21:41Z 13 pages; comments welcome Masaru Yamada http://arxiv.org/abs/2605.24837v1 Generative AI as a Design Variable: An Evidence-Centered Framework for Principled Governance in STEM Assessment 2026-05-24T03:06:34Z

Generative Artificial Intelligence (GenAI) presents a governance challenge for STEM assessment. Unrestricted GenAI access enables task outsourcing that undermines the validity of traditional assessments; blanket prohibitions are difficult to enforce, may push use underground, and do little to prepare students for workplaces where GenAI-supported workflows are increasingly common. This paper addresses this dilemma by proposing a framework grounded in Evidence-Centered Design (ECD) that treats GenAI as a design variable within the assessment argument rather than an external threat to it. The framework analyzes how GenAI reshapes the student model, evidence model, and task model, and uses this analysis to articulate three principled governance stances. Restrict is warranted when GenAI would contaminate the inferential link between student work products and targeted unaided proficiency. Scaffold is warranted when bounded GenAI support can support peripheral demands without revealing the target construct, preserving inferential interpretability. Require is warranted when the target construct is disciplinary AI interaction competency and tasks can be designed to elicit process artifacts, including prompts, critiques, and revisions, that make student reasoning observable, scorable, and distinguishable from AI-generated output. This framework specifies when to restrict, scaffold, or require GenAI use in STEM assessment. We present two task designs deployed in an introductory physics course and demonstrate that disciplinary AI interaction competencies are observable in student response artifacts and can be scored using defensible rubrics grounded in student data and expert knowledge. By situating GenAI governance within validity arguments, the framework offers actionable guidance for preserving learning integrity while supporting authentic preparation for AI-enabled professional environments.

2026-05-24T03:06:34Z Yizhu Gao Zhongzhou Chen Min Li Xiaoming Zhai http://arxiv.org/abs/2512.11815v3 Video Deepfake Abuse: How Company Choices Predictably Shape Misuse Patterns 2026-05-23T23:07:49Z

In 2022, AI image generators crossed a threshold, enabling much more efficient and dynamic production of photorealistic deepfake images than before. This enabled opportunities for creative and positive uses of these models. However, it also enabled unprecedented opportunities for the low-effort creation of AI-generated non-consensual intimate imagery (AIG-NCII), including AI-generated child sexual abuse material (AIG-CSAM). Empirically, these harms were principally enabled by a small number of models that were trained on web data with pornographic content, released with open weights, and insufficiently safeguarded. In this paper, we observe ways in which the same patterns are emerging with video generation models in 2025. Specifically, we analyze how a small number of open-weight AI video generation models have become the dominant tools for photorealistic AIG-NCII video generation. We then analyze the literature on model safeguards and conclude that (1) developers who openly release the weights of capable video generation models without appropriate data curation and/or post-training safeguards foreseeably contribute to mitigatable downstream harm, and (2) model distribution platforms that do not proactively moderate individual misuse or models designed for AIG-NCII foreseeably amplify this harm. While there are no perfect defenses against AIG-NCII and AIG-CSAM from open-weight AI models, we argue that risk management by model developers and distributors, informed by emerging safeguard techniques, will substantially affect the future ease of creating AIG-NCII and AIG-CSAM with generative AI video tools.

2025-11-26T18:59:43Z Max Kamachee Stephen Casper Michelle L. Ding Rui-Jie Yew Anka Reuel Stella Biderman Dylan Hadfield-Menell