https://arxiv.org/api/Gi7f57VQ/RxNx4lkc71N7Y74CfY2026-06-18T08:35:04Z2898327015http://arxiv.org/abs/2606.06126v2Deterring Searches for Child Sexual Abuse Material on Google Search and Promoting Help-Seeking2026-06-05T13:07:45ZGoogle Search deploys a "Onebox" feature at the top of the results page when users conduct searches for Child Sexual Abuse Material. This study evaluates the impact of a strategic shift in this feature, comparing a revised intervention, focused on repercussions and therapeutic resources, to a previous iteration that focused on reporting. Using a difference-in-differences analysis of internal Google Search logs data, we found the new messaging resulted in a 3.8 percentage point reduction as compared to the status quo in subsequent CSAM-related queries within the same Search session. We found an average click through rate of 0.73% on any of the hyperlinked buttons to help-providing resources. Together, this research presents convergent evidence that a subset of individuals can be deterred from ongoing CSAM-seeking and redirected to therapeutic services.2026-06-04T13:13:30ZRebecca UmbachGriffin HuntJohn BuckleyJoel ScanlanCaoilte Ó CiardhaEthel QuayleAinslie HeasmanMaximilian von HeydenElizabeth LetourneauDonald FindlaterTegan InsollRichard WortleyChad SteelAbhishek Royhttp://arxiv.org/abs/2602.22041v2Using Feasible Action-Space Reduction by Groups to fill Causal Responsibility Gaps in Spatial Interactions2026-06-05T12:22:34ZHeralding the advent of autonomous vehicles and mobile robots that interact with humans, responsibility in spatial interaction is burgeoning as a research topic. Even though metrics of responsibility tailored to spatial interactions have been proposed, they are mostly focused on the responsibility of individual agents. Metrics of causal responsibility focusing on individuals fail in cases of causal overdeterminism - when many actors simultaneously cause an outcome. To fill the gaps in causal responsibility left by individual-focused metrics, we formulate a metric for the causal responsibility of groups. To identify assertive agents that are causally responsible for the trajectory of an affected agent, we further formalise the types of assertive influences and propose a tiering algorithm for systematically identifying assertive agents. Finally, we use scenario-based simulations to illustrate the benefits of considering groups and how the emergence of group effects vary with interaction dynamics and the proximity of agents.2026-02-25T15:48:52ZPresented at COINE workshop collocated with AAMAS 2026Ashwin GeorgeVassil GuenovArkady ZgonnikovDavid A. AbbinkLuciano Cavalcante Sieberthttp://arxiv.org/abs/2505.17739v2Feasible Action Space Reduction for Quantifying Causal Responsibility in Continuous Spatial Interactions2026-06-05T10:59:58ZUnderstanding the causal influence of one agent on another agent is crucial for safely deploying artificially intelligent systems such as automated vehicles and mobile robots into human-inhabited environments. Existing models of causal responsibility deal with simplified abstractions of scenarios with discrete actions, thus, limiting real-world use when understanding responsibility in spatial interactions. Based on the assumption that spatially interacting agents are embedded in a scene and must follow an action at each instant, Feasible Action-Space Reduction (FeAR) was proposed as a metric for causal responsibility in a grid-world setting with discrete actions.Since real-world interactions involve continuous action spaces, this paper proposes a formulation of the FeAR metric for measuring causal responsibility in space-continuous interactions. We illustrate the utility of the metric in prototypical space-sharing conflicts, and showcase its applications for analysing backward-looking responsibility and in estimating forward-looking responsibility to guide agent decision making. Our results highlight the potential of the FeAR metric for designing and engineering artificial agents, as well as for assessing the responsibility of agents around humans.2025-05-23T11:02:44ZIn reviewAshwin GeorgeLuciano Cavalcante SiebertDavid A. AbbinkArkady Zgonnikovhttp://arxiv.org/abs/2606.07069v1mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?2026-06-05T09:09:03ZWe introduce mmPISA-bench, a compact high-quality multilingual reasoning benchmark derived from the OECD Programme for International Student Assessment (PISA). The benchmark consists of 25 multiple-choice questions that require reasoning in order to be answered correctly. Each question is provided in official human translations to 43 languages and complemented with machine-translated counterparts (i.e., 2,150 data points in total). We evaluate two mainstream proprietary LLMs across languages, reasoning effort levels, and translation types in terms of their ability to answer the questions correctly. Our results show that modern LLMs can reason effectively across all evaluated languages, achieve accuracy comparable to human test-takers, with some performance variations across covered languages. We further find that machine-translated questions do not degrade accuracy relative to official human translations which suggests that high-quality machine translation (synthetic data) might often be adequate for large-scale multilingual reasoning evaluations where official translations are not available. Finally, we analyze token usage and related inference cost and find that LLMs usage in some languages is simultaneously more expensive and less accurate.2026-06-05T09:09:03ZYerzhan SapenovJaromir Savelkahttp://arxiv.org/abs/2605.23783v2Benchmarking LLMs for Community Governance Simulation with Life-history Narratives2026-06-05T09:06:54ZEffective community governance hinges on understanding what specific residents think and need. Recent work has used large language models (LLMs) to simulate human respondents, offering a scalable, reproducible way to study human attitudes and behaviors at low cost. However, these studies typically prompt the model with just a few demographic variables (age, gender, income), simulating only general role types. This is insufficient for community governance, where decisions depend on the views of specific residents. We bridge this gap with an integrated research framework covering dataset, benchmark, algorithm, and system. The dataset comprises approximately 1.2 million characters of first-person narrative collected through two-hour semi-structured interviews with each of 92 residents in an urban community, organized around nine community-governance domains. The benchmark probes 18 mainstream LLMs across four prompting strategies and shows that adding rich life-history profiles meaningfully raises fidelity above the no-profile baseline, but this gain comes with more input tokens per call from the longer prompts they require. The algorithm, curriculum-LoRA, is a parameter-efficient personalization framework that, by closing this fidelity-cost gap, matches the strongest baseline's fidelity at roughly 10x lower per-call cost and Pareto-dominates every configuration tested. The system integrates curriculum-LoRA into a closed-loop policy-evaluation pipeline. Together, these results bring individual-level LLM-based resident simulation within reach of resource-constrained local administrations, enabling community-governance decisions to be systematically pre-evaluated in silico before real-world deployment.2026-05-22T15:48:49ZXu ChenYuanzi LiLei WangNan LuYang WangAnding WangLei ShiXiaoxing FuJi-Rong Wenhttp://arxiv.org/abs/2603.04982v3Training for Technology: Adoption and Productive Use of Generative AI in Legal Analysis2026-06-05T08:57:56ZCan targeted user training unlock the productive potential of generative artificial intelligence in professional settings? We study this question using a randomized experiment in which 164 law students completed an issue-spotting examination under one of three conditions: no GenAI access, optional access to a large language model (LLM), or LLM access with a brief training intervention.
Untrained LLM access proved counterproductive: relative to participants without any LLM access, untrained users wrote significantly shorter answers, committed more case misstatements, and scored marginally lower, though most differences fall short of conventional significance. Training reversed this pattern. Trained participants adopted the LLM at higher rates (41% vs. 26%; p = 0.044), scored 0.27 grade points higher than untrained users--roughly one fine grade--(p = 0.027), and stated applicable rules more accurately (p = 0.014).
Principal stratification analysis suggests training operates primarily through adoption rather than effectiveness--the adoption lower bound (1.06) exceeds the effectiveness upper bound (0.42) at strict mean dominance--though confidence intervals are wide.
More broadly, these findings challenge the view that GenAI primarily benefits lower-skilled workers: without training, higher-ability practitioners opt out while lower-ability users adopt but unproductively. Realizing GenAI's productivity gains requires investment in both access and instruction.2026-03-05T09:23:30ZBenjamin M. ChenHong Baohttp://arxiv.org/abs/2508.20146v3What Causes COVID-19 Fear? General Drivers of Fear During a Health Crisis2026-06-05T08:27:26ZThe COVID-19 pandemic triggered not only a global health crisis but also an infodemic, where exposure to heterogeneous information sources influenced public emotional responses. In this work, we investigate the determinants of self-reported fear of infection using data from the Delphi US CTIS survey. In particular, we analyze how demographic variables, epidemiological conditions, and exposure to different information sources shape fear levels. We introduce a Probabilistic Causal Model to estimate causal relationship strengths, identifying the variables that most strongly influence fear. Our results indicate that exposure to information sources accounts for a greater proportion of the variance in fear than demographic and epidemiological variables do. We further compute the Average Treatment Effect to quantify the impact of different information sources on fear. After causal adjustment, institutional and expert-driven sources are associated with increased fear levels, whereas politicians, religious leaders, and alternative information channels are associated with reduced fear. These findings highlight both the central role of the information ecosystem in shaping emotional responses during public health crises and the value of causal inference approaches for studying behavioral responses to pandemics.2025-08-27T11:31:56ZDaniele BaccegaPaolo CastagnoAntonio Fernández AntaJuan Marcos RamirezMatteo Serenohttp://arxiv.org/abs/2606.06417v2Warning Message Content Increases Help Seeking in a Large-Scale Dark Web CSAM Intervention2026-06-05T07:51:49ZWarning messages have been used to disrupt individuals seeking online child sexual abuse material (CSAM) and promote engagement with support services, yet large-scale field evidence on message content remains limited, particularly in high anonymity environments. This study reports a field experiment on Ahmia.fi, a Tor search engine, examining how warning message content influences behavior. Across a 140-day period, almost 20 million searches were observed, with over 3 million searches containing known CSAM-related terms that triggered a warning linking to an anonymous self-help program. Users were exposed to warning messages varying in thematic content and framing, or a neutral message. Across a randomized comparison, a campaign-wide analysis, and interrupted time series models, message content consistently influenced engagement with help resources. All active messages increased click-through rates to help resources relative to the neutral condition, with a harm-focused message producing the strongest effects. At the platform level, click-through rates increased from 8.73% before the intervention to 15.67% during the campaign. These findings highlight the importance of message content in shaping responses to warning interventions, supporting an approach in which messaging is refined and adapted to increase engagement with support resources.2026-06-04T17:21:23ZCaoilte Ó CiardhaJoel ScanlanTegan InsollJuha NurmiNina Vaaranen-Valkonenhttp://arxiv.org/abs/2604.07732v2Twitch Third-Party Developers' Support Seeking and Provision Practices on Discord2026-06-05T05:42:03ZThird-party developers (TPDs) often turn to online communities for support when they can't get immediate responses from the platform. Twitch, as a leading live streaming platform, attracted many TPDs and formed an online support community on Discord. This study explores TPDs' support practices via mixed method (a topic modeling to identify topics related to support seeking and provision first and a follow-up in-depth qualitative analysis with these topics) and found that: (1) TPDs' support-seeking practices around social, technical, and policy matters are highly dependent on Twitch, and this dependence acts as a form of platform labor; (2) TPDs need to switch between Discord and Twitch regarding seeking and provision, exacerbating TPDs' platform labor; (3) TPDs' flexible role practices reflect the community's flourishing on Discord but require roles to bridge the two platforms and transfer informal support seeking to possible formal support from Twitch. We propose implications for effectively managing support seeking and provision between formal and informal spaces to improve the development of TPDs. We also contribute to community support practice and to platform ecology work in CSCW.2026-04-09T02:29:15ZAccepted by ACM CSCW 2026Jie CaiHe ZhangYueyan LiuJohn M. CarrollChun Yu10.1145/3817021http://arxiv.org/abs/2606.00603v2Toward Agentic Governance: What Shapes LLM-Agent Intervention in Public Forums?2026-06-05T04:30:43ZLLM agents are increasingly used in moderation-relevant public forum workflows, where their choices to answer, acknowledge, repair, or decline are routinely challenged by users, platforms, and regulators. The same agent often returns different responses on identical content, so any defense based on the agent's behavior cannot be reliably reproduced. The variation is structural. Four deployment choices typically invisible to the operator each shift the agent's response rate, and their combinations can produce substantially different interventions on the same forum posts. The four choices are (1) which model version is currently served, which can change between calls without notice; (2) the model's weight-release status (open-weight, with weights publicly downloadable, vs. closed-weight, with weights held by the provider); (3) which provider serves the request; and (4) which system-prompt policy is in force. Across LLMs spanning both open-weight and closed-weight families, we find that the previously reported tendency to decline more on visible than hidden challenges aligns with the open/closed weight boundary in our panel more than with access surface. Every closed-weight cell declines more on visible challenges; every open-weight cell reverses this or shows no gap. Auditable forum-agent governance requires awareness of all four choices, not just the model name, since each independently shifts behavior.2026-05-30T08:01:00ZLuyang ZhangYi-Yun ChuRamayya Krishnanhttp://arxiv.org/abs/2606.06895v1Blockchain Infrastructure for Intelligent Cyber--Physical--Social Systems:Post-Quantum Security, Interoperability, and Trustworthy Data Economies in the Era of Embodied AI2026-06-05T04:27:34ZThe deployment of embodied artificial intelligence via world-model-based robotics presents a transformative opportunity for blockchain infrastructure, establishing urgent demand for trustworthy data provenance, cross-organizational governance, and incentive-compatible sharing across decentralized ecosystems. Simultaneously, quantum computing advances recognized by the 2025 Nobel Prize in Physics and the Turing Award threaten the cryptographic primitives securing these data economies, creating an interdependent imperative: long-lived verification for embodied AI depends on crypto-agile architectures capable of withstanding quantum adversaries. This tutorial examines blockchain as the coordination layer bridging this dual transition, from financial substrate to foundational Cyber-Physical-Social Systems infrastructure that simultaneously secures against quantum cryptanalysis and enables scalable, trustworthy data economies. The session opens with an immersive AWS Braket demonstration engaging participants with superconducting, trapped-ion, and neutral-atom hardware to assess cryptographic threat timelines and witness ECDSA-to-post-quantum signature transitions. Five integrated modules progress from embodied AI and world-model requirements through quantum hardware reality and evidence-based security migration, to scalable cross-shard architectures via BrokerChain protocols, trustworthy data economies implementing Croissant metadata standards and robotic learning provenance, and industry ecosystem integration for multi-modal cloud deployment. By bridging quantum hardware realities with embodied AI data requirements, this tutorial charts blockchain as unified infrastructure for next-generation decentralized intelligent environments, providing open-source frameworks and roadmaps for architecting quantum-resistant, interoperable, and data-trustworthy systems.2026-06-05T04:27:34ZSong GuoHuawei HuangDongping LiuAoyu ZhangLuyao Zhanghttp://arxiv.org/abs/2606.06851v1Toward a Metaphysics of Learning Analytics: Ontological Positioning of Data, Inference, and Normativity2026-06-05T02:51:14ZThe Learning Analytics (LA) community has undergone rapid development over the 15 years since the first LAK conference was held. However, while epistemological and ethical debates regarding the philosophical foundations of LA have been vigorous, metaphysical discussions have been sparse, signifying a lack of effort to derive the identity of LA from its internal principles. In this paper, we attempt to establish a metaphysics of LA by addressing the ontological question of ``What is LA?'' We do so by tracing back to LA's own definitions and principles to derive an answer from within LA itself. Specifically, we address what kind of existence the data LA operates on constitutes, identify eight agents including learners as ontological prerequisites, and clarify, via the is/ought problem, that LA does not derive norms from data. In particular, this system reveals that a class of LA practices, here termed \textit{norm-embedded LA}, conflates LA's purpose with its operations, creating an ontological tension with the first principle. We also discuss connections with related fields and the limitations of this system. The metaphysics outlined here is not imposed from outside LA, but surfaces what LA itself has always implicitly presupposed.2026-06-05T02:51:14Z25 pages, 1 figuresKensuke Takiihttp://arxiv.org/abs/2606.06830v1Learning Fair Demand Models2026-06-05T02:12:07ZData-driven pricing is increasingly prevalent in sectors such as airlines, lending, insurance, and retail. By learning demand models from customer features and setting prices accordingly, these systems may generate discriminatory outcomes that raise fairness concerns. This leads to fundamental questions - how and where should systems incorporate fairness considerations in the pricing pipeline, and how does it ultimately affect societal outcomes? To answer these, we study a stylized model where a seller has a two-stage decision pipeline comprising linear demand model estimation followed by price optimization. The seller considers fairness notions in training loss, price, and demand, under both parity-wise and Rawlsian perspectives.
We show that equalizing training loss across consumer groups leads to multiple solutions, which in turn can result in undesirable outcomes despite being a standard approach in fair machine learning. Focusing instead on fairness applied directly to prices or demand, we compare two strategies that enforce fairness in either the demand estimation stage or the price optimization stage. For parity-wise fairness, we characterize when each strategy yields higher social welfare under small fairness levels. We show that when market sizes and prices in the dataset are similar, imposing price fairness in the estimation stage is more beneficial to consumers, whereas imposing demand fairness in the optimization stage yields better consumer outcomes. For Rawlsian fairness, the two strategies coincide exactly. Lastly, we extend our model to alternate demand functions and conduct a case study using real-world vaccine pricing data.2026-06-05T02:12:07ZAdam N. ElmachtoubHyemi KimJonathan Y. Tanhttp://arxiv.org/abs/2606.06784v1What Your Posts Reveal: A Benchmark and Agentic Framework for User-Level Privacy Leakage on Social Media2026-06-05T00:02:47ZPublic social media posts can reveal private information through weak cues scattered across text, images, or metadata. Such leakage is often cumulative and cross-post: cues that appear harmless in isolation may jointly expose a user's home, workplace, or routine. However, current research lacks a unified benchmark for user-level multimodal privacy leakage and an evaluation metric that captures exposure severity beyond binary accuracy.
To address these gaps, we propose SopriBench, a synthetic benchmark guided by leakage patterns abstracted from a private reference corpus of Rednote and Instagram accounts, covering 50 user profiles and 1,569 images with attributes, contextual sensitivity, granularity, leakage type, inference difficulty, and supporting evidence. We further introduce the Privacy Exposure Score (PES), which weights value granularity by contextual sensitivity. Inspired by abductive reasoning, we introduce Argus, a training-free agentic framework for cumulative leakage inference. Argus forms hypotheses from accumulated evidence, verifies supporting evidence, and aggregates cross-post cues into privacy profiles, achieving 0.55 PES, a 25% improvement over the strongest baseline, with the largest gain on cross-post leakage.2026-06-05T00:02:47ZZifan PengYini HuangAiwen LuQiming YePeixian ZhangJingyi ZhengYule LiuXuechao WangXinlei HeJiaheng Weihttp://arxiv.org/abs/2606.06694v1The Geography of Algorithmic Judgment: LLM Intermediaries, Place Identity, and Racial Steering in Housing Search2026-06-04T20:17:58ZLarge language models (LLMs) are rapidly assuming an intermediary role in housing search through the integration of listing platforms within conversational interfaces, mediating access to information, search, and recommendations within urban settings. We expand on prior work on racial steering in LLMs by conducting a behavioral audit of seven open-weight and closed-source LLMs across four U.S. cities, testing location recommendations across three iterative prompting conditions that progressively add lifestyle preference context and reflect fair housing paired-testing methodologies. We find that steering is an emergent behavior of the model's interpretive license rather than primarily a static property. Steering results from the interaction of a user's identity, preference articulation, and the spatial logic that a model has internalized about learned representations of place, preference, and opportunity in a given city, and how different types of users relate to it. While steering was present, it was not uniform in direction or magnitude across evaluated conditions. Preference-conditioned testing often increased or reconfigured the number of models that exhibited steering behaviors relative to baseline conditions, suggesting that LLMs may interpret what the same housing preference means differently depending on the racial identity of the user. Our findings also demonstrate that the city is not a neutral testing unit for LLM evaluation in place-based sectors, and results from one local market cannot be assumed to generalize to another. Local and domain expertise will be required in the housing sector to ensure that legal and institutional commitments to fair housing are not undermined while adopting AI tools that mediate spatial access.2026-06-04T20:17:58Z13 pages with supplemental tables and figures, AIES '26 SubmissionHana SamadTrung LamChristoph Mügge-DurumMichael Akinwumi