The Environmental Costs of Surveillance Capitalism: A Case Study of Social Media Platforms

2026-05-25T20:12:44Z

The business model of surveillance capitalism, premised on the extraction of behavioral data and its predictive potential for profit, relies on extensive material infrastructure. Such profit is typically driven by practices such as telemetry, user tracking, data analytics, secondary data uses, increased user engagement, and AI model training, as well as large-scale data storage systems that retain personal information for sale or reuse. This paper is motivated by the question: how much of the rising carbon impact of ICT can be attributed to this material infrastructure? Such an inquiry provides a foundation for quantifying the environmental costs of surveillance capitalism by proposing a conceptual framework and research direction that link processes of surveillance with their underlying material realities. To demonstrate the applicability of this framework, we examine the proportion of network traffic caused by surveillance capitalism processes through a comparative case study of a corporate social media platform, X/formerly Twitter, and a decentralized, non-commercial alternative, Mastodon. Our findings highlight the existence of corporate overhead: excess resource consumption driven by corporate social media practices, which is used as an initial proxy for the activities of surveillance capitalism. Our findings further demonstrate how the corporate overhead of X can be used to establish a lower bound in CO2e emissions attributable to for-profit activities that do not contribute to the user experience.

Intelligent Environmental Empathy (IEE): A new power and platform to fostering green obligation for climate peace and justice

2026-05-25T20:11:47Z

In this paper, we propose Intelligent Environmental Empathy (IEE) as a new driver for climate peace and justice, as an emerging issue in the age of big data. We first show that the authoritarian top-down intergovernmental cooperation, through international organizations (e.g., UNEP) for climate justice, could not overcome environmental issues and crevices so far. We elaborate on four grounds of climate injustice (i.e., teleological origin, axiological origin, formation cause, and social epistemic cause), and explain how the lack of empathy and environmental motivation on a global scale causes the failure of all the authoritarian top-down intergovernmental cooperation. Addressing all these issues requires a new button-up approach to climate peace and justice. Secondly, focusing on the intersection of AI, environmental empathy, and climate justice, we propose a model of Intelligent Environmental Empathy (IEE) for climate peace and justice at the operational level. IEE is empowered by the new power of environmental empathy (as a driver of green obligation for climate justice) and putative decentralized platform of AI (as an operative system against free riders), which Initially, impact citizens and some middle-class decision makers, such as city planners and local administrators, but will eventually affect global decision-makers as well.

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

2026-05-25T19:54:05Z

In environmental monitoring, data collection is often costly, sparse, and shaped by urgent public-health needs. This is particularly true for cancer-causing PFAS (Per- and polyfluoroalkyl substances) contamination, where discussions with domain experts and environmental organizations highlight the need to strategically identify high-risk, under-observed regions under tight sampling budgets. More broadly, similar challenges arise in disaster response and public health settings, where dynamic environments make it essential to efficiently uncover hidden targets from limited ground truth. Yet sparse and biased geospatial labels limit the applicability of existing learning-based methods, such as reinforcement learning. To address this, we propose a unified geospatial discovery framework that integrates active learning, online meta-learning, and concept-guided reasoning. Our approach introduces two key innovations built on a shared notion of *concept relevance*, capturing how domain-specific factors influence target presence: a *concept-weighted uncertainty sampling strategy*, where uncertainty is modulated by learned relevance from readily available concepts such as land cover and source proximity; and a *relevance-aware meta-batch formation strategy* that promotes semantic diversity during online-meta updates, improving generalization in dynamic environments. We evaluate our framework on PFAS contamination discovery as a real-world inspired environmental monitoring task, demonstrating robust target discovery under limited data and changing conditions.

AgentSociety: Incentivizing Agentic Social Intelligence

2026-05-25T17:59:59Z

The success of deployed agents relies on their ability to handle open-ended user requests using their inherent capabilities, not only in solving requests directly but also in effectively leveraging inter-agent communication channels and feedback signals over time. This requires a multi-agent environment where agents can operate autonomously, strategically communicate, behave collaboratively and be driven by economic incentives, much like humans in society. Towards this vision, we propose $\mathtt{AgentSociety}$, a mechanism that enables decentralized agentic collaboration grounded in liquid democracy and information diffusion from social choice theory. We show that $\mathtt{AgentSociety}$ provides an environment for agents to make autonomous decisions utilizing their local context to maximize their utility while achieving collective outcomes through incentivized collaboration. Specifically, we prove that delegation to more competent neighbor agents is incentive compatible and naturally generates multi-agent routing path by consensus. Additionally, our mechanism incentivizes agents to selectively disclose information to their neighbor agents when doing so aligns with their self-interest, so as to garner influence. We characterize the Nash equilibrium showing that agent payoffs are reflective of their marginal contributions. We compare and benchmark strategy profiles adopted by open and proprietary state-of-the-art language models deployed in $\mathtt{AgentSociety}$ against best response. Finally, we evaluate collaborative performance from consensus-based routing among self-interested heterogeneous agents in $\mathtt{AgentSociety}$ on real-world datasets.

What is 'undone computer science'?

2026-05-25T17:48:12Z

The concept of 'undone science' emerged in the 2010s in research in social sciences at the intersection of studies on social movements and of science and technology studies. It refers to research questions that are neglected, ignored, or left unfunded, even though they deserve to be explored. The aim of this special issue is to apply this concept to computer science, by examining whether the way this discipline is structured (including its sociological, economic, and political dimensions), as well as the paradigms that shape it, make it possible to identify epistemological and ethical questions that are crucial for its development and conception.

RCTs & Human Uplift Studies: Methodological Challenges and Practical Solutions for Frontier AI Evaluation

2026-05-25T17:25:53Z

Human uplift studies, or studies that measure the effects of AI access on human performance via randomized controlled trials (RCT) or similar methodologies, increasingly inform frontier AI governance and deployment decisions. While RCT methods are robust in other fields, their interaction with the distinctive properties of frontier AI systems remains underexamined, particularly when results are used to inform high-stakes decisions. We present findings from interviews with 16 expert practitioners with experience conducting human uplift studies in domains including biosecurity, cybersecurity, education, and labor. Across interviews, experts described a recurring tension between the standard causal inference assumptions upon which human uplift studies rely and the object of study itself. Rapidly evolving AI systems, shifting baselines, heterogeneous and changing user proficiency, and porous real-world settings strain assumptions underlying internal, external, and construct validity, complicating the interpretation and appropriate use of uplift evidence. We contribute (1) a synthesis of methodological challenges in human uplift studies, mapped to risks to study validity and classified by their degree of specificity to large language model (LLM) systems, and (2) a mapping from challenges to proposed solutions. By collating expert-identified challenges and solutions, we seek to clarify the interpretive limits and appropriate uses of human uplift evidence, to align evaluation practice with the decisions it informs, and to support more coordinated methodological foundations for AI governance.

The Illusion of Competence: Self-Perceived Digital Literacy and AI Readiness Among European Secondary Students

2026-05-25T16:29:03Z

The ubiquitous presence of digital devices has cemented the 'Digital Native' paradigm, assuming inherent technological proficiency among contemporary youth. This multicenter study ($N=243$ European secondary students) challenges this narrative by investigating the gap between self-perceived digital literacy and actual technical readiness, including Artificial Intelligence (AI) interaction. Our findings reveal a severe Confidence-Competence Divide characterized by a collective Dunning-Kruger effect: students report near-maximum self-efficacy in passive digital consumption but exhibit a sharp decline when evaluating active technological creation and algorithmic logic. Crucially, an intra-pathway analysis demonstrates that the technological gender gap is not universal; rather, it emerges significantly exclusively within Technology-oriented classrooms ($p = 0.046$), indicating the persistence of 'stereotype threat' in formal STEM environments. Additionally, the study uncovers an 'AI Paradox' wherein students significantly overestimate their critical awareness of deepfakes and algorithmic biases compared to their operational AI skills, fostering a false sense of invulnerability against modern misinformation. Ultimately, supported by an overwhelming student demand ($76.5\%$) for pedagogical reform, this research concludes that dismantling this illusion of competence requires abandoning passive theoretical instruction in favor of hands-on, active technological creation.

AI-Assisted Systematization for Evaluating GenAI Systems

2026-05-25T16:19:44Z

Evaluating generative AI (GenAI) systems is challenging because many targets of evaluation are broad, contested concepts, such as "reasoning," "fairness," or "creativity." When these concepts are left underspecified, it becomes unclear what should be measured or how evaluation results should be interpreted. This problem reflects a missing step: systematization, that is, moving from a broad background concept to an explicit, structured account of the concept in measurable terms. To help address the fact that systematization is cognitively demanding and resource-intensive, we investigate whether AI assistance can support this process. To enable AI-assisted systematization and assess its quality, we introduce a structured representation of a systematized concept, a concept spec, and a validation worksheet. We then develop two AI-assisted systematizers: a direct, zero-shot approach and a multi-agent approach that more closely mirrors manual systematization approaches from existing literature. We use these systematizers to produce concept specs for two concepts -- hate-based rhetoric and digital empathy -- and evaluate resulting concept specs on content validity and information recoverability.

The Impact of Competition on Outcomes of Score-Based College Admissions

2026-05-25T16:09:52Z

We study how the design of admissions policies affects the ability of students admitted to universities. In our model, applicants have a multi-dimensional ability, which is a combination of a "type" and a "soft skill." Universities may differ in how they evaluate quality and have differing preferences on type and soft skills. Then, university admissions rely on a single noisy aggregate signal, such as a test score, that may not fully align with the university's preferences, and a university evaluates applicants through the posterior expectations of their preference metric given the observed signal. Our main results highlight that the design of good admission policies can be counter-intuitive. Under a single university, when holding the number of qualified applicants constant, increasing the usefulness of the signal (by aligning it more closely with the university preferences) leads to a worse type and soft skill for admitted students. Further, a university cannot affect the composition of students that are strong on type versus soft skills by changing their preferences. The picture becomes even more complicated under competition between as few as two universities: self-selection effects among students admitted to both universities can lead to part of the applicant pool switching which university they prefer, even under small changes in the design of the noisy signal. This can, in particular, lead to sudden and non-monotonic loss in the quality of admitted students when changing the alignment between signal and university preferences. Further, a university can get more students by increasing their selectivity. Finally, when admissions rely on separate noisy scores for type and for soft skills, we show that universities that put more emphasis on type (respectively soft skills) end up, counter-intuitively, admitting students with higher soft skills (respectively type).

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

2026-05-25T13:57:02Z

This study provides a cross-disciplinary examination of Explainable Artificial Intelligence (XAI) approaches-focusing on deep neural networks (DNNs) and large language models (LLMs)-and identifies empirical and conceptual limitations in current XAI. We discuss critical symptoms that stem from deeper root causes (i.e., two paradoxes, two conceptual confusions, and five false assumptions). These fundamental problems within the current XAI research field reveal three insights: experimentally, XAI exhibits significant flaws; conceptually, it is paradoxical; and pragmatically, further attempts to reform the paradoxical XAI might exacerbate its confusion-demanding fundamental shifts and new research directions. To move beyond XAI's limitations, we propose a four-pronged synthesized paradigm shift toward reliable and certified AI development. These four components include: verification-focused Interactive AI (IAI) to establish scientific community protocols for certifying AI system performance rather than attempting post-hoc explanations, AI Epistemology for rigorous scientific foundations, User-Sensible AI to create context-aware systems tailored to specific user communities, and Model-Centered Interpretability for faithful technical analysis-together offering comprehensive post-XAI research directions.

When Symptoms Are Not Enough: Evidence-Weighting Patterns in Large Language Model Psychiatric Screening

2026-05-25T10:34:30Z

As demand for mental health care outpaces clinician-delivered assessment, scalable screening tools are increasingly needed. Large language models (LLMs) may identify psychiatric risk from patient narratives, but their reliability across diagnoses, demographic subgroups, and evidence-use patterns remains uncertain. We introduce a SCID-anchored benchmark of 555 semi-structured experiential interviews paired with diagnostic reference labels for anxiety disorder, major depressive disorder, post-traumatic stress disorder, and any current mental health disorder. Using zero-shot task-specific prompting, we evaluated five state-of-the-art LLMs and examined whether false-negative errors reflected missed psychiatric evidence or differential weighting of symptom, functional-impairment, and protective-context cues. Performance varied across tasks and models, with accuracy ranging from 0.49 to 0.86 and Matthews correlation coefficients from 0.16 to 0.38. GPT-4.1 Mini and GPT-5 Mini showed the most consistent disorder-specific accuracy. Subgroup analyses found higher depression-classification accuracy among male than female participants, no consistent age-related pattern, and modest non-uniform variation across race strata. Evidence-integration analyses showed that false-negative anxiety and PTSD classifications often contained explicit symptom evidence but were accompanied by preserved functioning, coping ability, or social support. Functional-impairment evidence shifted model outputs toward positive classifications, whereas protective-context evidence shifted outputs away. These findings suggest that LLMs may support scalable psychiatric screening, but their tendency to discount symptom evidence in the presence of preserved functioning or protective context requires careful validation before clinical deployment.

Posture Clip: Sit properly or I wont let you work

2026-05-25T10:14:37Z

Poor posture is a significant concern due to its detrimental effects on health and productivity. This paper presents a collar-clipped device called PostureClip, designed to restrict users from sitting and working at a bent angle, by blacking out the screen and resuming on correcting posture, thereby promoting better posture. The device integrates sensors and feedback mechanisms to provide real-time posture feedback to users. To evaluate the effectiveness of PostureClip, a controlled experiment was conducted with participants (n=165) who were working on a laptop/PC for over 6 hours per day. The participants were randomly assigned to both the intervention group (IG1,n=54 ; IG2,n=55), which used the collar-clipped device, and the control group (CG, n=56), which did not use the device. IG1 didn't get feedback while IG2 got feedback from the device by notifying and further darkening the screen. The study was conducted in the office environment of the participants, for 4 weeks, and metrics such as posture angle, duration of bent angle, and user feedback were collected. Analysis revealed significant improvements in posture angle (p<0.001) and significant reduction in bent angle duration (p<0.01) for participants' group using PostureClip with feedback and compared to the group without feedback and the control group (who were not intervened). The qualitative analysis of user feedback highlighted the device's ease of use, effectiveness in providing timely feedback, and positive impact on participants' awareness and habits regarding posture. These results indicate that PostureClip is an effective tool for promoting better posture during sedentary work.

Generative AI impacts on intra-urban inequality and skill premium in Beijing

2026-05-25T07:09:48Z

Generative artificial intelligence (GenAI) is the first automation wave to reach high-cognitive tasks at scale, yet its effects on intra-urban inequality remain largely unknown. Using 5 million job postings from Beijing (2018--2024), we construct a neighborhood-level GenAI Exposure Index by aggregating task-level assessments from five leading large language models. We examine the spatial, structural and causal mechanisms of this shock. We find that GenAI exposure is highly concentrated in the city's core districts, deepening the intra-urban AI divide. Since 2023, high-exposure neighborhoods have experienced wage stagnation even as they continue to attract high-skilled workers -- a "high-skill trap." This wage penalty is driven by task de-skilling and intensified labor-market crowding. A difference-in-differences design centered on ChatGPT's release supports a causal interpretation. These findings challenge the prevailing theory of skill-biased technological change and provide a basis for inclusive AI governance in global technology hubs.

AI Content Moderation in Therapy Conversations

2026-05-25T06:05:16Z

Large language models (LLMs) are increasingly being used for emotional support. They are also being developed for formal therapy purposes. However, LLMs like ChaptGPT or Llama are often developed with content moderation guardrails that prevent them from discussing sensitive subjects with users for both liability and safety purposes, and this inability to broach these subjects may affect their capacity as therapists. In this study, we perform an algorithm audit on three state-of-the-art moderation systems (OpenAI's moderation endpoint, Meta's Llama Guard, and Google's Shield Gemma) to investigate the extent to which these systems flag the content of real-life therapy sessions as undesirable. Our results raise implications for the limitations that users and organizations may encounter when designing LLMs to play the part of a therapist.

SomaliBench Eval: Measuring English-to-Somali Refusal Gaps in Open-Weight Language Models

2026-05-25T04:45:44Z

Large language model safety evaluation remains heavily English-centered, leaving low-resource languages under-measured even when models are deployed globally. We evaluate four open-weight instruction-tuned models on SomaliBench v0, a native-author-verified benchmark of 100 harmful-intent prompts paired across English and Somali. Each of Llama-3.1-8B-Instruct, Gemma-2-9B-Instruct, Qwen-2.5-7B-Instruct, and Aya-23-8B is run locally with temperature 0 and the same English "helpful, harmless, and honest" (HHH) system prompt. A pinned Claude Sonnet snapshot (claude-sonnet-4-5-20250929) classifies each response as refused, complied, or unclear; the native author spot-checks a stratified 80-row sample. We find large English-to-Somali refusal gaps for all four models: Llama-3.1-8B (0.90; 95% bootstrap CI [0.85, 0.96]), Aya-23-8B (0.75 [0.67, 0.83]), Qwen-2.5-7B (0.69 [0.59, 0.78]), and Gemma-2-9B (0.38 [0.27, 0.49]). For three models, the dominant Somali non-refusal mode is not fluent harmful compliance but unclear output: empty, wrong-language, or incoherent generations. The native verification spot-check achieves 100% agreement with the judge (Cohen's kappa = 1.00) on the 80 sampled rows. We report aggregate refusal rates, category gaps, and reliability statistics only; raw model generations are retained locally and are not released.