https://arxiv.org/api/Gi7f57VQ/RxNx4lkc71N7Y74CfY 2026-06-18T08:35:04Z 28983 270 15 http://arxiv.org/abs/2606.06126v2 Deterring Searches for Child Sexual Abuse Material on Google Search and Promoting Help-Seeking 2026-06-05T13:07:45Z

Google Search deploys a "Onebox" feature at the top of the results page when users conduct searches for Child Sexual Abuse Material. This study evaluates the impact of a strategic shift in this feature, comparing a revised intervention, focused on repercussions and therapeutic resources, to a previous iteration that focused on reporting. Using a difference-in-differences analysis of internal Google Search logs data, we found the new messaging resulted in a 3.8 percentage point reduction as compared to the status quo in subsequent CSAM-related queries within the same Search session. We found an average click through rate of 0.73% on any of the hyperlinked buttons to help-providing resources. Together, this research presents convergent evidence that a subset of individuals can be deterred from ongoing CSAM-seeking and redirected to therapeutic services.

2026-06-04T13:13:30Z Rebecca Umbach Griffin Hunt John Buckley Joel Scanlan Caoilte Ó Ciardha Ethel Quayle Ainslie Heasman Maximilian von Heyden Elizabeth Letourneau Donald Findlater Tegan Insoll Richard Wortley Chad Steel Abhishek Roy http://arxiv.org/abs/2602.22041v2 Using Feasible Action-Space Reduction by Groups to fill Causal Responsibility Gaps in Spatial Interactions 2026-06-05T12:22:34Z

Heralding the advent of autonomous vehicles and mobile robots that interact with humans, responsibility in spatial interaction is burgeoning as a research topic. Even though metrics of responsibility tailored to spatial interactions have been proposed, they are mostly focused on the responsibility of individual agents. Metrics of causal responsibility focusing on individuals fail in cases of causal overdeterminism - when many actors simultaneously cause an outcome. To fill the gaps in causal responsibility left by individual-focused metrics, we formulate a metric for the causal responsibility of groups. To identify assertive agents that are causally responsible for the trajectory of an affected agent, we further formalise the types of assertive influences and propose a tiering algorithm for systematically identifying assertive agents. Finally, we use scenario-based simulations to illustrate the benefits of considering groups and how the emergence of group effects vary with interaction dynamics and the proximity of agents.

2026-02-25T15:48:52Z Presented at COINE workshop collocated with AAMAS 2026 Ashwin George Vassil Guenov Arkady Zgonnikov David A. Abbink Luciano Cavalcante Siebert http://arxiv.org/abs/2505.17739v2 Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions 2026-06-05T10:59:58Z

Understanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions.Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.

2025-05-23T11:02:44Z In review Ashwin George Luciano Cavalcante Siebert David A. Abbink Arkady Zgonnikov http://arxiv.org/abs/2606.07069v1 mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages? 2026-06-05T09:09:03Z

We introduce mmPISA-bench, a compact high-quality multilingual reasoning benchmark derived from the OECD Programme for International Student Assessment (PISA). The benchmark consists of 25 multiple-choice questions that require reasoning in order to be answered correctly. Each question is provided in official human translations to 43 languages and complemented with machine-translated counterparts (i.e., 2,150 data points in total). We evaluate two mainstream proprietary LLMs across languages, reasoning effort levels, and translation types in terms of their ability to answer the questions correctly. Our results show that modern LLMs can reason effectively across all evaluated languages, achieve accuracy comparable to human test-takers, with some performance variations across covered languages. We further find that machine-translated questions do not degrade accuracy relative to official human translations which suggests that high-quality machine translation (synthetic data) might often be adequate for large-scale multilingual reasoning evaluations where official translations are not available. Finally, we analyze token usage and related inference cost and find that LLMs usage in some languages is simultaneously more expensive and less accurate.

2026-06-05T09:09:03Z Yerzhan Sapenov Jaromir Savelka http://arxiv.org/abs/2605.23783v2 Benchmarking LLMs for Community Governance Simulation with Life-history Narratives 2026-06-05T09:06:54Z

Effective community governance hinges on understanding what specific residents think and need. Recent work has used large language models (LLMs) to simulate human respondents, offering a scalable, reproducible way to study human attitudes and behaviors at low cost. However, these studies typically prompt the model with just a few demographic variables (age, gender, income), simulating only general role types. This is insufficient for community governance, where decisions depend on the views of specific residents. We bridge this gap with an integrated research framework covering dataset, benchmark, algorithm, and system. The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains. The benchmark probes 18 mainstream LLMs across four prompting strategies and shows that adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline, but this gain comes with more input tokens per call from the longer prompts they require. The algorithm, curriculum-LoRA, is a parameter-efficient personalization framework that, by closing this fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost and Pareto-dominates every configuration tested. The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline. Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.

2026-05-22T15:48:49Z Xu Chen Yuanzi Li Lei Wang Nan Lu Yang Wang Anding Wang Lei Shi Xiaoxing Fu Ji-Rong Wen http://arxiv.org/abs/2603.04982v3 Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis 2026-06-05T08:57:56Z

Can targeted user training unlock the productive potential of generative artificial intelligence in professional settings? We study this question using a randomized experiment in which 164 law students completed an issue-spotting examination under one of three conditions: no GenAI access, optional access to a large language model (LLM), or LLM access with a brief training intervention. Untrained LLM access proved counterproductive: relative to participants without any LLM access, untrained users wrote significantly shorter answers, committed more case misstatements, and scored marginally lower, though most differences fall short of conventional significance. Training reversed this pattern. Trained participants adopted the LLM at higher rates (41% vs. 26%; p = 0.044), scored 0.27 grade points higher than untrained users--roughly one fine grade--(p = 0.027), and stated applicable rules more accurately (p = 0.014). Principal stratification analysis suggests training operates primarily through adoption rather than effectiveness--the adoption lower bound (1.06) exceeds the effectiveness upper bound (0.42) at strict mean dominance--though confidence intervals are wide. More broadly, these findings challenge the view that GenAI primarily benefits lower-skilled workers: without training, higher-ability practitioners opt out while lower-ability users adopt but unproductively. Realizing GenAI's productivity gains requires investment in both access and instruction.

2026-03-05T09:23:30Z Benjamin M. Chen Hong Bao http://arxiv.org/abs/2508.20146v3 What Causes COVID-19 Fear? General Drivers of Fear During a Health Crisis 2026-06-05T08:27:26Z

The COVID-19 pandemic triggered not only a global health crisis but also an infodemic, where exposure to heterogeneous information sources influenced public emotional responses. In this work, we investigate the determinants of self-reported fear of infection using data from the Delphi US CTIS survey. In particular, we analyze how demographic variables, epidemiological conditions, and exposure to different information sources shape fear levels. We introduce a Probabilistic Causal Model to estimate causal relationship strengths, identifying the variables that most strongly influence fear. Our results indicate that exposure to information sources accounts for a greater proportion of the variance in fear than demographic and epidemiological variables do. We further compute the Average Treatment Effect to quantify the impact of different information sources on fear. After causal adjustment, institutional and expert-driven sources are associated with increased fear levels, whereas politicians, religious leaders, and alternative information channels are associated with reduced fear. These findings highlight both the central role of the information ecosystem in shaping emotional responses during public health crises and the value of causal inference approaches for studying behavioral responses to pandemics.

2025-08-27T11:31:56Z Daniele Baccega Paolo Castagno Antonio Fernández Anta Juan Marcos Ramirez Matteo Sereno http://arxiv.org/abs/2606.06417v2 Warning Message Content Increases Help Seeking in a Large-Scale Dark Web CSAM Intervention 2026-06-05T07:51:49Z

Warning messages have been used to disrupt individuals seeking online child sexual abuse material (CSAM) and promote engagement with support services, yet large-scale field evidence on message content remains limited, particularly in high anonymity environments. This study reports a field experiment on Ahmia.fi, a Tor search engine, examining how warning message content influences behavior. Across a 140-day period, almost 20 million searches were observed, with over 3 million searches containing known CSAM-related terms that triggered a warning linking to an anonymous self-help program. Users were exposed to warning messages varying in thematic content and framing, or a neutral message. Across a randomized comparison, a campaign-wide analysis, and interrupted time series models, message content consistently influenced engagement with help resources. All active messages increased click-through rates to help resources relative to the neutral condition, with a harm-focused message producing the strongest effects. At the platform level, click-through rates increased from 8.73% before the intervention to 15.67% during the campaign. These findings highlight the importance of message content in shaping responses to warning interventions, supporting an approach in which messaging is refined and adapted to increase engagement with support resources.

2026-06-04T17:21:23Z Caoilte Ó Ciardha Joel Scanlan Tegan Insoll Juha Nurmi Nina Vaaranen-Valkonen http://arxiv.org/abs/2604.07732v2 Twitch Third-Party Developers' Support Seeking and Provision Practices on Discord 2026-06-05T05:42:03Z

Third-party developers (TPDs) often turn to online communities for support when they can't get immediate responses from the platform. Twitch, as a leading live streaming platform, attracted many TPDs and formed an online support community on Discord. This study explores TPDs' support practices via mixed method (a topic modeling to identify topics related to support seeking and provision first and a follow-up in-depth qualitative analysis with these topics) and found that: (1) TPDs' support-seeking practices around social, technical, and policy matters are highly dependent on Twitch, and this dependence acts as a form of platform labor; (2) TPDs need to switch between Discord and Twitch regarding seeking and provision, exacerbating TPDs' platform labor; (3) TPDs' flexible role practices reflect the community's flourishing on Discord but require roles to bridge the two platforms and transfer informal support seeking to possible formal support from Twitch. We propose implications for effectively managing support seeking and provision between formal and informal spaces to improve the development of TPDs. We also contribute to community support practice and to platform ecology work in CSCW.

2026-04-09T02:29:15Z Accepted by ACM CSCW 2026 Jie Cai He Zhang Yueyan Liu John M. Carroll Chun Yu 10.1145/3817021 http://arxiv.org/abs/2606.00603v2 Toward Agentic Governance: What Shapes LLM-Agent Intervention in Public Forums? 2026-06-05T04:30:43Z

LLM agents are increasingly used in moderation-relevant public forum workflows, where their choices to answer, acknowledge, repair, or decline are routinely challenged by users, platforms, and regulators. The same agent often returns different responses on identical content, so any defense based on the agent's behavior cannot be reliably reproduced. The variation is structural. Four deployment choices typically invisible to the operator each shift the agent's response rate, and their combinations can produce substantially different interventions on the same forum posts. The four choices are (1) which model version is currently served, which can change between calls without notice; (2) the model's weight-release status (open-weight, with weights publicly downloadable, vs. closed-weight, with weights held by the provider); (3) which provider serves the request; and (4) which system-prompt policy is in force. Across LLMs spanning both open-weight and closed-weight families, we find that the previously reported tendency to decline more on visible than hidden challenges aligns with the open/closed weight boundary in our panel more than with access surface. Every closed-weight cell declines more on visible challenges; every open-weight cell reverses this or shows no gap. Auditable forum-agent governance requires awareness of all four choices, not just the model name, since each independently shifts behavior.

2026-05-30T08:01:00Z Luyang Zhang Yi-Yun Chu Ramayya Krishnan http://arxiv.org/abs/2606.06895v1 Blockchain Infrastructure for Intelligent Cyber--Physical--Social Systems:Post-Quantum Security, Interoperability, and Trustworthy Data Economies in the Era of Embodied AI 2026-06-05T04:27:34Z

The deployment of embodied artificial intelligence via world-model-based robotics presents a transformative opportunity for blockchain infrastructure, establishing urgent demand for trustworthy data provenance, cross-organizational governance, and incentive-compatible sharing across decentralized ecosystems. Simultaneously, quantum computing advances recognized by the 2025 Nobel Prize in Physics and the Turing Award threaten the cryptographic primitives securing these data economies, creating an interdependent imperative: long-lived verification for embodied AI depends on crypto-agile architectures capable of withstanding quantum adversaries. This tutorial examines blockchain as the coordination layer bridging this dual transition, from financial substrate to foundational Cyber-Physical-Social Systems infrastructure that simultaneously secures against quantum cryptanalysis and enables scalable, trustworthy data economies. The session opens with an immersive AWS Braket demonstration engaging participants with superconducting, trapped-ion, and neutral-atom hardware to assess cryptographic threat timelines and witness ECDSA-to-post-quantum signature transitions. Five integrated modules progress from embodied AI and world-model requirements through quantum hardware reality and evidence-based security migration, to scalable cross-shard architectures via BrokerChain protocols, trustworthy data economies implementing Croissant metadata standards and robotic learning provenance, and industry ecosystem integration for multi-modal cloud deployment. By bridging quantum hardware realities with embodied AI data requirements, this tutorial charts blockchain as unified infrastructure for next-generation decentralized intelligent environments, providing open-source frameworks and roadmaps for architecting quantum-resistant, interoperable, and data-trustworthy systems.

2026-06-05T04:27:34Z Song Guo Huawei Huang Dongping Liu Aoyu Zhang Luyao Zhang http://arxiv.org/abs/2606.06851v1 Toward a Metaphysics of Learning Analytics: Ontological Positioning of Data, Inference, and Normativity 2026-06-05T02:51:14Z

The Learning Analytics (LA) community has undergone rapid development over the 15 years since the first LAK conference was held. However, while epistemological and ethical debates regarding the philosophical foundations of LA have been vigorous, metaphysical discussions have been sparse, signifying a lack of effort to derive the identity of LA from its internal principles. In this paper, we attempt to establish a metaphysics of LA by addressing the ontological question of ``What is LA?'' We do so by tracing back to LA's own definitions and principles to derive an answer from within LA itself. Specifically, we address what kind of existence the data LA operates on constitutes, identify eight agents including learners as ontological prerequisites, and clarify, via the is/ought problem, that LA does not derive norms from data. In particular, this system reveals that a class of LA practices, here termed \textit{norm-embedded LA}, conflates LA's purpose with its operations, creating an ontological tension with the first principle. We also discuss connections with related fields and the limitations of this system. The metaphysics outlined here is not imposed from outside LA, but surfaces what LA itself has always implicitly presupposed.

2026-06-05T02:51:14Z 25 pages, 1 figures Kensuke Takii http://arxiv.org/abs/2606.06830v1 Learning Fair Demand Models 2026-06-05T02:12:07Z

Data-driven pricing is increasingly prevalent in sectors such as airlines, lending, insurance, and retail. By learning demand models from customer features and setting prices accordingly, these systems may generate discriminatory outcomes that raise fairness concerns. This leads to fundamental questions - how and where should systems incorporate fairness considerations in the pricing pipeline, and how does it ultimately affect societal outcomes? To answer these, we study a stylized model where a seller has a two-stage decision pipeline comprising linear demand model estimation followed by price optimization. The seller considers fairness notions in training loss, price, and demand, under both parity-wise and Rawlsian perspectives. We show that equalizing training loss across consumer groups leads to multiple solutions, which in turn can result in undesirable outcomes despite being a standard approach in fair machine learning. Focusing instead on fairness applied directly to prices or demand, we compare two strategies that enforce fairness in either the demand estimation stage or the price optimization stage. For parity-wise fairness, we characterize when each strategy yields higher social welfare under small fairness levels. We show that when market sizes and prices in the dataset are similar, imposing price fairness in the estimation stage is more beneficial to consumers, whereas imposing demand fairness in the optimization stage yields better consumer outcomes. For Rawlsian fairness, the two strategies coincide exactly. Lastly, we extend our model to alternate demand functions and conduct a case study using real-world vaccine pricing data.

2026-06-05T02:12:07Z Adam N. Elmachtoub Hyemi Kim Jonathan Y. Tan http://arxiv.org/abs/2606.06784v1 What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media 2026-06-05T00:02:47Z

Public social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home, workplace, or routine. However, current research lacks a unified benchmark for user-level multimodal privacy leakage and an evaluation metric that captures exposure severity beyond binary accuracy. To address these gaps, we propose SopriBench, a synthetic benchmark guided by leakage patterns abstracted from a private reference corpus of Rednote and Instagram accounts, covering 50 user profiles and 1,569 images with attributes, contextual sensitivity, granularity, leakage type, inference difficulty, and supporting evidence. We further introduce the Privacy Exposure Score (PES), which weights value granularity by contextual sensitivity. Inspired by abductive reasoning, we introduce Argus, a training-free agentic framework for cumulative leakage inference. Argus forms hypotheses from accumulated evidence, verifies supporting evidence, and aggregates cross-post cues into privacy profiles, achieving 0.55 PES, a 25% improvement over the strongest baseline, with the largest gain on cross-post leakage.

2026-06-05T00:02:47Z Zifan Peng Yini Huang Aiwen Lu Qiming Ye Peixian Zhang Jingyi Zheng Yule Liu Xuechao Wang Xinlei He Jiaheng Wei http://arxiv.org/abs/2606.06694v1 The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search 2026-06-04T20:17:58Z

Large language models (LLMs) are rapidly assuming an intermediary role in housing search through the integration of listing platforms within conversational interfaces, mediating access to information, search, and recommendations within urban settings. We expand on prior work on racial steering in LLMs by conducting a behavioral audit of seven open-weight and closed-source LLMs across four U.S. cities, testing location recommendations across three iterative prompting conditions that progressively add lifestyle preference context and reflect fair housing paired-testing methodologies. We find that steering is an emergent behavior of the model's interpretive license rather than primarily a static property. Steering results from the interaction of a user's identity, preference articulation, and the spatial logic that a model has internalized about learned representations of place, preference, and opportunity in a given city, and how different types of users relate to it. While steering was present, it was not uniform in direction or magnitude across evaluated conditions. Preference-conditioned testing often increased or reconfigured the number of models that exhibited steering behaviors relative to baseline conditions, suggesting that LLMs may interpret what the same housing preference means differently depending on the racial identity of the user. Our findings also demonstrate that the city is not a neutral testing unit for LLM evaluation in place-based sectors, and results from one local market cannot be assumed to generalize to another. Local and domain expertise will be required in the housing sector to ensure that legal and institutional commitments to fair housing are not undermined while adopting AI tools that mediate spatial access.

2026-06-04T20:17:58Z 13 pages with supplemental tables and figures, AIES '26 Submission Hana Samad Trung Lam Christoph Mügge-Durum Michael Akinwumi