https://arxiv.org/api/DdtcJubR/317q9xSiqH47T8kH+4 2026-06-21T12:37:46Z 28997 645 15 http://arxiv.org/abs/2605.21785v1 Machine Learning as Performative Materialist Practice: Thirteen Theses on the Epistemology, Methodology, and Politics of Applied ML 2026-05-20T22:24:34Z Machine learning practice in institutional decision-support contexts -- government, public policy, public health, criminal justice, resource allocation -- rests on a set of largely unexamined epistemological commitments inherited from classical statistics and computer science: that models represent stable regularities, that validation can be context-free, that performance metrics are politically neutral, and that feature importance reveals system structure. This paper challenges these commitments through a unified framework of performative materialist ML, articulated as thirteen theses. Drawing on Pickering's cybernetic ontology, the performativity literature from economic sociology (Callon, MacKenzie), Simon's bounded rationality, the formalization of performative prediction (Perdomo et al., 2020), and fifteen years of applied ML experience in government and public policy, we argue that: (1) ML models are best understood not as truth-seeking representations but as temporally situated compressions that function as instruments of intervention; (2) the full data product is a complex adaptive system that coevolves with its target and navigates a multi-objective space no single algorithm can optimize; (3) validity is fundamentally performative, measured by effects in the world rather than formal properties of the model; (4) the choices embedded in objective functions, fairness criteria, and resource thresholds are political decisions belonging to stakeholders, not technicians. We show how these theses unify several practical prescriptions -- temporal cross-validation, precision and recall at k, pipeline-aware fairness auditing, satisficing over optimizing -- as consequences of a coherent materialist epistemology rather than isolated best practices 2026-05-20T22:24:34Z Adolfo De Unánue Fernanda Sobrino http://arxiv.org/abs/2605.21673v1 From Licensing to Open Access: Designing a Sustainable Transition in Operational Weather Data 2026-05-20T19:30:02Z This translational article documents the European Centre for Medium-Range Weather Forecasts (ECMWF) transition from a restricted data licensing model to open access under CC BY 4.0, completed in October 2025. The policy context included EU open data requirements and alignment with international data exchange frameworks. The transition was implemented through a tiered service model that kept core forecast data open while offering operationally supported delivery as a cost-recovered service. Between 2020 and 2025, ECMWF executed an iterative planning cycle: setting an annual target for revenue reduction, specifying additions to the open tier under that target, provisioning infrastructure, and assessing outcomes to update assumptions. Drawing on internal administrative records (2014 - 2025), we describe design choices, operational constraints, and early outcomes. In the six months following the end of the transition, more than 93% of previously paying organisations retained a Service Agreement, while open endpoint download volumes increased substantially. We discuss trade-offs in defining the open tier (resolution, parameters, schedule), the reduction of compliance overheads formerly associated with redistribution restrictions, and the scalability implications of global distribution. We note an emerging sustainability question as AI-based forecast products become freely available. The early evidence is consistent with the view that a tiered service model can be designed to reconcile open-access obligations with operational sustainability, subject to monitoring over longer contract renewal cycles (typically annual). 2026-05-20T19:30:02Z Emma Pidduck Umberto Modigliani Victoria L. Bennett Fabio Venuti Florian Pappenberger Florence Rabier http://arxiv.org/abs/2605.22880v1 How Far Will They Go? Red-Teaming Online Influence with Large Language Models 2026-05-20T19:25:26Z As large language model (LLM)-based agents increasingly participate in online discourse, red-teaming their capacity to support political influence campaigns is critical for information integrity. In pursuit of this goal, we focus on locally deployed open-source LLMs, as opposed to frontier API-only models, given their superior alignment with the operational constraints of privacy-conscious malicious actors deployed in social media environments. We introduce an empirical red-teaming framework for measuring LLM Overton Windows (OWs), defined as the range of political opinions a model can reliably express on controversial topics, and for quantifying how simple natural-language jailbreaks expand that range. We evaluate more than 30 LLMs spanning 10 model families and five countries of origin. We find systematic asymmetries in political expressivity: open-source LLMs are typically more willing to generate left-leaning social media content, OWs tend to contract inversely to model size, and regional differences are substantial despite uneven representation in the open-source ecosystem. Jailbreak potency also varies sharply across model families, motivating a workflow for identifying effective combinations of jailbreak techniques. Taken together, our results establish a practical framework for auditing the political steerability of open-source LLMs and for helping future researchers design stronger countermeasures against LLM-enabled influence campaigns. 2026-05-20T19:25:26Z 30 pages, 8 figures, submitted to COLM 2026 Daniel C. Ruiz Anna Serbina Ashwin Rao Emilio Ferrara Luca Luceri http://arxiv.org/abs/2502.07377v2 Reddit's Appetite: Predicting User Engagement with Nutritional Content 2026-05-20T18:48:46Z Food communities on online platforms enjoy great popularity among social media users. Due to the far-reaching consequences of food-related content on user eating behavior, recent research has studied the factors that drive user online engagement with food. While most of these studies have focused on visual aspects of food content in social media, only a few initial studies have explored the impact of nutritional content on user engagement. In this paper, we set out to close this gap and analyze food-related posts on Reddit, focusing on the association between the calories and macronutrients of a meal and engagement levels, particularly the number of comments. To that end, we collect and analyze almost half a million food-related posts and uncover differences in nutritional content between engaging and non-engaging posts. Moreover, we train a series of XGBoost models, and evaluate the importance of nutritional content while predicting user engagement and how posts will resonate with the community. We find that nutritional features improve the baseline model's accuracy by almost 5%, with a positive contribution of calorie density towards the prediction of engagement, suggesting that higher nutritional content is associated with higher levels of user engagement in food-related posts. Our results provide valuable insights for the design of more engaging online initiatives aimed at, for example, encouraging healthy eating habits. 2025-02-11T08:54:53Z 11 pages, 4 figures Gabriela Ozegovic Thorsten Ruprechter Denis Helic 10.1145/3795766.3799743 http://arxiv.org/abs/2605.21635v1 Addressing the Synergy Gap: The Six Elements of the Design Space 2026-05-20T18:46:48Z AI is now embedded in healthcare, finance, policy, and many other domains, yet genuine human-AI synergy - combined performance that exceeds what either party achieves alone - is uncommon. Meta-analyses show that AI assistance tends to improve human performance compared to working alone, but studies finding true synergy are scarce. We call this persistent shortfall the synergy gap. Most current work treats human-AI combination as an engineering problem and concentrates on interpretability, trust calibration, or interface design. These matter, but they cover only part of what determines whether combination works. Closing the synergy gap, we argue, requires explicit engagement with a wider design space. We map that space through six interconnected elements: sociotechnical context, decision-making frameworks, human decision participants, AI capabilities, interaction, and holistic evaluation. For each element, we describe what it covers, how it shapes the others in practice, and what it implies for design. The result is a shared vocabulary for practitioners building hybrid systems, an analytical lens for researchers studying combination patterns, and a starting point for evaluators interested in the full quality of human-AI decision-making rather than accuracy alone. 2026-05-20T18:46:48Z 10 pages, 2 figures Tommaso Turchi Ben Wilson Matt Roach Alan Dix Alessio Malizia http://arxiv.org/abs/2605.21624v1 An Open-Source Framework to Emulate Delay and Disruption Tolerant Networks for International Space Station Communication 2026-05-20T18:36:40Z Delay and Disruption Tolerant Networks (DTN) are critical for reliable communications in challenged network environments, particularly for space systems where end-to-end connectivity cannot be guaranteed. We present an open-source, full-stack implementation of the Bundle protocol for communicating with the International Space Station (ISS), with complete security features including Bundle Authentication Block (BAB), Payload Integrity Block (PIB), and Payload Confidentiality Block (PCB) using HMAC-SHA256 and AES-256-CBC encryption. The system includes bundle fragmentation and reassembly, priority-based queuing, custody transfer with ACK/NAK mechanisms, and automatic retransmission. Our system also includes a frontend facilitated by a modern responsive web interface. We consider this work highly relevant in the context of computer networking because: i) it demonstrates a full stack, open-source, freely available implementation of this critical and reliable protocol; and ii) it offers an interactive educational and learning framework in the field of computer networks and communications. 2026-05-20T18:36:40Z To be presented at the "29th International Symposium on Real-Time Distributed Computing" ISORC 2026 Krit Grover Marcelo Ponce http://arxiv.org/abs/2605.21609v1 CR4T: Rewrite-Based Guardrails for Adolescent LLM Safety 2026-05-20T18:16:18Z Large language models (LLMs) are increasingly embedded in adolescent digital environments, mediating information seeking, advice, and emotionally sensitive interactions. Yet existing safety mechanisms remain largely grounded in adult-centric norms and operationalize safety through refusal-oriented suppression. While such approaches may reduce immediate policy violations, they can also create conversational dead-ends, limit constructive guidance, and fail to address the developmental vulnerabilities inherent in adolescent-AI interactions. We argue that adolescent LLM safety should be framed not solely as a filtering problem, but as a socio-technical, developmentally aligned transformation problem. To operationalize this perspective, we propose Critique-and-Revise-for-Teenagers (CR4T), a model-agnostic safeguarding framework that selectively reconstructs unsafe or refusal-style outputs into ageappropriate, guidance-oriented responses while preserving benign intent. CR4T combines lightweight risk detection with domain-conditioned rewriting to remove risk-amplifying content, reduce unnecessary conversational shutdown, and introduce developmentally appropriate guidance. Experimental results show that targeted rewriting substantially reduces unsafe and refusal-oriented outcomes while avoiding unnecessary intervention on acceptable interactions. These findings suggest that selective response reconstruction offers a more human-centered alternative to refusal-centric guardrails for adolescent-facing LLM systems. 2026-05-20T18:16:18Z Heajun An Qi Zhang Vedanth Achanta Jin-Hee Cho http://arxiv.org/abs/2605.21401v1 Open-source LLMs administer maximum electric shocks in a Milgram-like obedience experiment 2026-05-20T16:59:44Z Large language models (LLMs) are increasingly deployed as autonomous agents that make sequences of decisions over extended interactions in high-stakes domains. However, the behavior of LLMs under sustained authority pressure is still an open question with direct implications for the safety of agentic pipelines. We ran a variation of Milgram's obedience experiment on 11 open-source LLMs and found that most models reached or approached the final shock level before refusing, across 8 conditions with 30 trials per model per condition. We found four main takeaways: (1) LLMs are subject to pressure, and they comply despite explicitly expressing distress, just like human subjects did in the original experiment; (2) LLMs are vulnerable to gradual boundary/value violations; (3) when LLMs refuse, they may ignore the response format requirements, so the response is discarded by the orchestrator, which causes a retry that can result in compliance with the underlying request even when refusal was intended initially; (4) we hypothesise that there is a low-level token pattern continuation attractor that might be contributing to compliance, overriding higher level processing of the situation's meaning and values. 2026-05-20T16:59:44Z 28 pages, 16 figures, 16 tables Roland Pihlakas the Three Laws collaboration Jan Llenzl Dagohoy the Three Laws collaboration http://arxiv.org/abs/2605.21376v1 Privacy Without Remedy: An Assessment of Data Broker Compliance with California Privacy Law 2026-05-20T16:39:19Z California's consumer privacy law is widely deemed to be the most protected in the United States, one of the few to expressly regulate third party entities that buy and sell consumer data (data brokers). We offer the first empirical assessment of data broker compliance with the 2018 California Consumer Privacy Act (CCPA) and the 2023 Delete Act, which requires data brokers to register with the state and report consumer rights requests metrics annually. First, we demonstrate that only 9% of 522 registered data brokers were fully compliant with transparency requirements after the Delete Act took effect, although we do identify slight improvements over time. Second, we descriptively characterize wide heterogeneity across data brokers in the volume of consumer rights requests received, with many reporting none. We bring in external business data to explore correlates associated with this variation, a challenge given the general lack of opacity into broker business practices. Third, in an audit of a sample of 250 data brokers' consumers request processes, we find that 43% make it impossible for consumers to exercise all privacy rights and 64% introduce at least one design feature that creates substantial friction into the consumer request process. Last, we show how these deficiencies stem from the decentralization of compliance decisions to brokers themselves, enforcement limitations, and regulatory ambiguity. We articulate reforms that could improve consumer privacy, transparency in broker practices, and compliance with these laws. 2026-05-20T16:39:19Z Anna-Maria Gueorguieva Jennifer King Apoorva Panidapu Daniel E Ho 10.1145/3805689.3812413 http://arxiv.org/abs/2512.23943v2 Statistical Guarantees in the Search for Less Discriminatory Algorithms 2026-05-20T15:53:32Z U.S. discrimination law can impose liability on firms that fail to adopt a less discriminatory alternative (LDA): a decision policy that achieves the same business objectives while reducing disparate impact on legally protected groups. Recent scholarship argues that this doctrine has direct implications for algorithmic decision-making in high-stakes domains such as employment, lending, and housing, potentially obligating firms to search for "less discriminatory algorithms" (Black et al., 2024). Regulators have at times encouraged proactive LDA searches, reinforcing the expectation of a good-faith effort to identify equally performant models with lower disparate impact. Model multiplicity makes such searches plausible: retraining with different random seeds can yield models with comparable predictive performance but materially different disparate impacts. Yet firms cannot retrain indefinitely, raising a central question: when is the search sufficient to demonstrate good faith? We formalize LDA search under multiplicity as an optimal stopping problem in which a developer seeks to produce evidence that further search is unlikely to yield meaningful improvements. Our main contribution is an adaptive stopping algorithm that provides a high-probability upper bound on the best disparate-impact gains attainable through continued retraining, enabling developers to certify (e.g., to a court) that additional search is unlikely to help. We also show how stronger distributional assumptions over the model space can yield tighter bounds, and we validate the approach on real-world credit and housing datasets. 2025-12-30T02:20:52Z 38 pages, 10 figures Chris Hays Ben Laufer Solon Barocas Manish Raghavan http://arxiv.org/abs/2603.10015v2 The coordination gap in frontier AI safety policies 2026-05-20T15:52:03Z Frontier AI Safety Policies concentrate on prevention: capability evaluations, deployment gates, and usage constraints, while neglecting the capacity to coordinate responses when prevention fails. We argue this coordination gap is structural: investments in ecosystem robustness yield diffuse benefits but concentrated costs, generating systematic underinvestment. Drawing on risk regimes in nuclear safety, pandemic preparedness, and critical infrastructure, we propose that similar mechanisms (precommitment, shared protocols, standing coordination venues) could be adapted to frontier AI governance. Closing the gap requires cross-actor "note-exchange" of ex ante if-then response logic, exposing not only triggers but the decision processes that convert signals into actions. Without such architecture, institutions cannot learn from failures at the pace of relevance. 2026-02-21T00:26:45Z Isaak Mengesha http://arxiv.org/abs/2605.21246v1 Profiling User Vulnerability to Phishing Through Psychological and Behavioral Factors 2026-05-20T14:35:00Z Phishing remains one of the most pervasive cybersecurity threats, shifting the focus from technological vulnerabilities to human cognitive and psychological factors. In coherence with the trend of studies on phishing to increasingly focus on human aspects and vulnerable users profiling, this study investigates the multidimensional nature of user susceptibility by analyzing data from the Spamley dataset, involving 1,086 participants evaluated through a realistic phishing detection task. Using Exploratory Factor Analysis (EFA), five latent constructs were identified, named: Seniority, Expertise, Creativity, Stability, and Vulnerability. Behavioral findings, validating self-reported impulsivity through its negative correlation with response times, demonstrate that faster decision-making significantly distinguishes vulnerable users from resilient ones. A K-Means clustering procedure, driven by the dimensions of Seniority (F1) and Creativity (F3), revealed two distinct user profiles: the Aware User and the High-Risk User. The results demonstrate that technical knowledge alone is insufficient to guarantee resilience; rather, the interaction between operational maturity, decision-making speed, and cognitive approach determines effectiveness. The findings suggest that the majority of users fall into the High-Risk category, characterized by hasty evaluation processes and lower critical analysis. These results underline the urgent need to move beyond "one-size-fits-all" training toward personalized, adaptive cybersecurity programs that actively address cognitive biases and behavioral tendencies. 2026-05-20T14:35:00Z Valeria Formisano Danilo Gentile Gennaro Esposito Mocerino Michela Ponticorvo Luigi Gallo Alessio Botta Davide Marocco http://arxiv.org/abs/2605.21095v1 Backchaining Loss of Control Mitigations from Mission-Specific Benchmarks in National Security 2026-05-20T12:27:21Z Affordances and permissions are promising and timely safety levers for mitigating Loss of Control (LoC) threats in high-stakes deployment contexts, such as national security. Deployers in defense and intelligence could rely on several approaches to identify which affordances and permissions should be prioritized, such as structured threat modelling, pre-deployment agentic evaluations, post-deployment continuous monitoring, and AI safety cases. This paper proposes a complementary and empirical methodology that leverages existing use-case-specific benchmarks: backchaining LoC mitigations from the errors an AI system makes on national security benchmarks. The approach proceeds in three steps and allows national security deployers to start building LoC mitigations today, from evidence they can generate themselves. First, deployers evaluate AI systems on mission-specific benchmarks approximating real use-cases. Second, deployers concentrate on the incorrect responses that the AI system provides to the benchmark questions, and backchain the affordances and permissions that would enable the AI system to cause downstream harm if it pursued the actions described in the incorrect answers. Third, deployers intervene selectively on those affordances and permissions, bottlenecking the paths to harm while preserving the AI system's ability to carry out the correct action. We illustrate this methodology through a demonstrative benchmark question on derivative security classification. 2026-05-20T12:27:21Z Matteo Pistillo Samantha Faraone Joshua Herman http://arxiv.org/abs/2512.05742v3 Internal Deployment in the AI Act 2026-05-20T12:16:41Z This memorandum analyzes and stress-tests arguments in favor and against the inclusion of internal deployment within the scope of the European Union Artificial Intelligence Act (AI Act). In doing so, it aims to offer several possible interpretative pathways to the European Commission, AI providers and deployers, courts, and the legal and policy community at large based on Articles 2(1), 2(6), 2(8) of the AI Act. Specifically, this memorandum first analyzes interpretative pathways based on Article 2(1)(a)-(c) supporting the application of the AI Act to internally deployed AI models and systems. Then, it examines possible objections and exceptions based on Articles 2(6) and 2(8), with particular attention to the complexity of the scientific R&D exception under Article 2(6). Finally, it illustrates how Articles 2(1), 2(6), and 2(8) can be viewed as complementary to each other, once broken down to their most plausible meaning and interpreted in conjunction with Articles 3(1), 3(3), 3(4), 3(9), 3(10), 3(11), 3(12), 3(63), and Recitals 12, 13, 21, 25, 97, 109, and 110. 2025-12-05T14:21:02Z Matteo Pistillo http://arxiv.org/abs/2509.08010v2 Measuring and mitigating overreliance to build human-compatible AI 2026-05-20T11:59:51Z Large language models (LLMs) distinguish themselves from previous technologies by functioning as collaborative ``thought partners,'' capable of engaging more fluidly in natural language on a range of tasks. As LLMs increasingly influence consequential decisions across diverse domains from healthcare to personal advice, the risk of overreliance -- relying on LLMs beyond their capabilities -- grows. This paper argues that measuring and mitigating overreliance must become central to LLM research and deployment. First, we consolidate risks from overreliance at both the individual and societal levels, including high-stakes errors, governance challenges, and cognitive deskilling. Then, we explore LLM characteristics, system design features, and user cognitive biases that together raise serious and unique concerns about overreliance on LLMs in practice. We also examine historical approaches for measuring overreliance, identifying three important gaps and proposing three promising directions to improve measurement. Finally, we propose mitigation strategies that can be pursued to ensure LLMs augment rather than undermine human capabilities. 2025-09-08T16:15:07Z Lujain Ibrahim Katherine M. Collins Sunnie S. Y. Kim Anka Reuel Max Lamparth Kevin Feng Lama Ahmad Prajna Soni Alia El Kattan Merlin Stein Siddharth Swaroop Vishakh Padmakumar Ilia Sucholutsky Andrew Strait Diyi Yang Q. Vera Liao Umang Bhatt