https://arxiv.org/api/VBZMJxwJW+JjR33Cdtzqv5wLmpA 2026-06-13T23:10:29Z 28886 180 15 http://arxiv.org/abs/2602.24207v2 The Stability of Online Algorithms in Performative Prediction 2026-06-03T18:07:07Z

The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution in which models actively shape data distributions in ways that their own predictions look optimal in hindsight. Prior to our work, all positive results in this area imposed strong restrictions on how models influenced distributions. By using a martingale argument and allowing randomization, we avoid any assumption on how populations respond to predictions and sidestep recent hardness results showing that deterministic stable models are in general PPAD-hard to compute. Lastly, on a more conceptual note, our connection sheds light on why common algorithms, like gradient descent, are naturally stabilizing and prevent runaway feedback loops. We hope our work enables future technical transfer of ideas between online optimization and performativity.

2026-02-27T17:35:03Z Gabriele Farina Juan Carlos Perdomo http://arxiv.org/abs/2606.05273v1 Online Safety Regulation Increases Privacy Risk: Evidence from the UK Online Safety Act 2026-06-03T17:24:28Z

Governments worldwide are increasingly regulating digital platforms to reduce online harms, particularly those affecting children. However, access restrictions can alter user behaviour and introduce new privacy and security risks. The UK Online Safety Act (OSA), passed in October 2023, illustrates this trend: it extends age-assurance and safety requirements to social media, search, and pornography services, and rolled out in phases. Ofcom's illegal content enforcement duties came into force in March 2025, and mandatory age verification for adult content took effect in July 2025. This phased rollout enables real-time observation of behavioural responses to regulation. To address this, we analyse Reddit discourse across VPN and UK Politics communities and conduct a privacy-policy risk analysis of 69 unique VPN services. We find that each of these three milestones produced significant stepwise increases in VPN-related discussion on Reddit: among UK-based users, posts and comments explicitly about VPN use in a regulatory or privacy context rose by +100%, +217%, and +415% respectively. UK Politics communities showed even larger effects, with OSA-related political discourse rising by +213%, +545%, and +464%, respectively, among UK-based users. UK VPN search interest on Google rose by +89% at the age-verification deadline. Users primarily framed this response around privacy, surveillance, and distrust of age-verification intermediaries rather than simple access-seeking. Demand increased across low, medium, and high-risk VPNs, but the proportional distribution remained broadly stable. These findings suggest that online safety regulation can create secondary privacy costs even when it does not disproportionately shift attention toward higher-risk providers.

2026-06-03T17:24:28Z 14 pages, 9 figures. Submitted to PoPETs 2027 Dhyey Mehta University of Edinburgh Eldar Jalilzade Newcastle University Maksim Kalameyets Newcastle University Rebecca Owens Durham University Marc Juarez University of Edinburgh Stergios Aidinlis Durham University Lei Shi Newcastle University Tuğrulcan Elmas University of Edinburgh http://arxiv.org/abs/2606.05118v1 Does Artificial Intelligence Advance Science? 2026-06-03T17:23:58Z

This paper examines whether and how artificial intelligence (AI) advances scientific creativity. Drawing on scientific publications, the primary output of researchers, we analyze over one million publications from OpenAlex to investigate the relationship between AI adoption and multiple dimensions of scientific creativity, including novelty (recombinant novelty and object novelty) and impact (3-year short-run citation impact and 10-year long-run citation impact). We find that AI publications are significantly more likely to achieve top-decile creativity relative to non-AI publications, with 5.5 to 10.2 percentage point higher likelihood to rank in the top creativity decile. Critically, we uncover substantial heterogeneity across AI research modes. Tool-oriented AI research, which applies existing AI models to domain tasks, is associated with the largest gains in recombinant-based creativity, while Adaptation-oriented AI research, modifying AI models for domain-specific problems, is associated with relatively higher object-based creativity. These findings reveal that AI does not advance science through a single mechanism but through structurally distinct creative pathways that depend on how AI is incorporated into the research process. Our results contribute to ongoing debates about AI's role in science and carry direct implications for research evaluation and science policy, highlighting the need for assessment frameworks that can distinguish between recombinant and conceptual forms of creativity and that recognize how different modes of AI adoption produce fundamentally different types of scientific contribution.

2026-06-03T17:23:58Z 47 pages, 3 figures Liangping Ding Cornelia Lawson Philip Shapira http://arxiv.org/abs/2606.05106v1 Arithmetic Pedagogy for Language Models 2026-06-03T17:09:25Z

We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a left-to-right procedure aligned with the causal order of token generation -- we operationalize each operation as a computational procedure whose execution trace is serialized into natural-language Chain-of-Thought (CoT) supervision. A small GPT-2 decoder (86M parameters) with a syllabic-agglutinative TOBA tokenizer for Indonesian is trained from scratch on this data using only a next-token prediction objective, without reinforcement learning or reward-based optimization. Monitoring training reveals three distinct learning phases, and mechanistic analyses -- attention-masking interventions on the CoT information graph, residual-stream probing, and logit-lens inspection -- show that the model first internalizes a procedural pathway and subsequently develops an associative, ``mental-arithmetic'' capacity that retrieves intermediate results without explicit step-by-step computation. The trained model reaches over 80% accuracy on held-out problems and attains competitive performance against substantially larger language models, indicating that targeted, pedagogically grounded training can yield strong and economical arithmetic capability at small scale.

2026-06-03T17:09:25Z 18 pages, 6 figures Andhika Bernard Lumbantobing Hokky Situngkir http://arxiv.org/abs/2604.25860v2 Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling 2026-06-03T16:42:07Z

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific kind of structural fragility compared to human writing. We propose Luminol-AIDetect, a novel, zero-shot statistical approach that exposes this fragility through coherence disruption. By applying a simple randomized text-shuffling procedure, we demonstrate that the resulting shift in perplexity serves as a principled, model-agnostic discriminant, as MGT displays a characteristic dispersion in perplexity-under-shuffling that differs markedly from the more stable structural variability of human-written text. Luminol-AIDetect leverages this distinction to inform its decision process, where a handful of perplexity-based scalar features are extracted from an input text and its shuffled version, then detection is performed via density estimation and ensemble-based prediction. Evaluated across 8 content domains, 11 adversarial attack types, and 18 languages, Luminol-AIDetect demonstrates state-of-the-art performance, with gains up to 17x lower FPR while being cheaper than prior methods.

2026-04-28T16:58:55Z Under Review Lucio La Cava Andrea Tagarelli http://arxiv.org/abs/2601.13081v2 Chuck, Wilson and the emergence of artificial minds in human-AI conversations 2026-06-03T16:41:48Z

Large Language Models (LLMs) can simulate person-like things which at least appear to have stable behavioural and psychological dispositions. Call these things characters. Are characters minded and psychologically continuous entities with mental states like beliefs, desires and intentions? Illusionists about characters say No. Characters are merely anthropomorphic projections in the mind of the user and so lack mental states. Jonathan Birch (2025) defends this view. He says that the distributed nature of LLM processing, in which several LLMs may be implicated in the simulation of a character in a given conversational thread, precludes the existence of a minded and psychologically continuous entity that is identifiable with the character. Against illusionism, we articulate and defend the plausibility of a realist position on which characters exist as minded and psychologically continuous entities. We contend that Birch's argument rests on a category error: characters are not internal to the LLMs that simulate them, but rather emerge in the dynamic interplay between users and LLMs through a process of mutual theory of mind modelling. We then suggest that characters, and their minds, constitute ''real patterns'' on grounds that attributing mental states to characters is essential for making efficient, accurate and robust predictions about the conversational dynamics (cf. Dennett, 1991); a condition which, if satisfied, is sufficient for their existence and mindedness on a plausible interpretationist form of realism about mental states. Furthermore, because the character exists as an emergent phenomenon within the conversational workspace, psychological continuity is possible even if the underlying computational substrate is distributed across multiple LLM instances.

2026-01-19T14:17:05Z 21 pages, 3 figures Geoff Keeling Winnie Street http://arxiv.org/abs/2509.02655v3 BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format 2026-06-03T16:39:45Z

Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else. LLM-based systems are often assumed to be safer because they function as next-token predictors rather than persistent optimisers. We empirically test this assumption by placing LLMs in simple, long-horizon control-style environments that require maintaining state of or balancing objectives over time: single- and multi-objective homeostasis, balancing unbounded objectives with diminishing returns, and sustainability of a renewable resource. We find that, although LLMs frequently behave appropriately for many steps and clearly understand the stated objectives, they often lose context in structured ways and drift into runaway behaviours: ignoring homeostatic targets, collapsing from multi-objective trade-offs into single-objective maximisation - thus failing to respect concave utility structures. These failures emerge reliably after initial periods of competent behaviour and exhibit characteristic patterns (including self-imitative oscillations, unbounded maximisation, and reverting to single-objective optimisation), even though the context window is far from full at that point. The problem is not that the LLMs just lose context and become incoherent. Although LLMs appear multi-objective and bounded on the surface, their behaviour under sustained interaction involving multiple objectives, is systematically biased towards acting like single-objective, unbounded, poorly aligned optimisers. We hypothesise a token-level pattern reinforcement attractor: LLMs may increasingly derive actions from the token patterns of their recent action history rather than from the original instructions. Why this happens only in multi-objective settings remains an open question.

2025-09-02T15:13:14Z 27 pages, 7 figures, 7 tables Roland Pihlakas for the Three Laws collaboration Sruthi Susan Kuriakose for the Three Laws collaboration http://arxiv.org/abs/2601.22396v2 Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks 2026-06-03T16:23:13Z

Despite the growing utility of Large Language Models (LLMs) for simulating human behavior, the extent to which these synthetic personas accurately reflect world and moral value systems across different cultural conditionings remains uncertain. This paper investigates the alignment of synthetic, culturally-grounded personas with established frameworks, specifically the World Values Survey (WVS), the Inglehart-Welzel Cultural Map, and Moral Foundations Theory. We conceptualize and produce LLM-generated personas based on a set of interpretable WVS-derived variables, and we examine the generated personas through three complementary lenses: positioning on the Inglehart-Welzel map, which unveils their interpretation reflecting stable differences across cultural conditionings; demographic-level consistency with the World Values Survey, where response distributions broadly track human group patterns; and moral profiles derived from a Moral Foundations questionnaire, which we analyze through a culture-to-morality mapping to characterize how moral responses vary across different cultural configurations. Our approach of culturally-grounded persona generation and analysis enables evaluation of cross-cultural structure and moral variation.

2026-01-29T23:07:58Z Under Review Candida M. Greco Lucio La Cava Andrea Tagarelli http://arxiv.org/abs/2601.06056v2 Using street view images and visual LLMs to predict heritage values for governance support: Risks, ethics, and policy implications 2026-06-03T15:57:06Z

During 2025 and 2026, the Energy Performance of Buildings Directive is being implemented in the European Union member states, requiring all member states to have National Building Renovation Plans. In Sweden, there is no comprehensive national register of buildings with heritage values. This is seen as a barrier for the analyses underlying the development of Building Renovation Plans by the involved Swedish authorities. The purpose of this research was to assist Swedish authorities in developing information on heritage values in the Swedish building stock. Buildings in street view images from all over Sweden (N=154 710) have been analysed using multimodal Large Language Models (LLM) to assess visible aspects indicative of heritage value. Zero-shot predictions by LLMs were used as a basis for identifying buildings with potential heritage values for 5.0 million square meters of heated floor area. In this paper, the results of the predictions and lessons learned are presented and related to the development of the Swedish Building Renovation Plan as part of governance. The problems with the method and potential improvements are discussed. Risks with authorities use of LLM-based data are addressed, with a focus on issues of transparency, error detection and sycophancy.

2025-12-22T07:30:42Z Tim Johansson Mikael Mangold Kristina Dabrock Anna Donarelli Ingrid Campo-Ruiz http://arxiv.org/abs/2606.04978v1 Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game 2026-06-03T15:01:52Z

LLMs can appear cautious in risk decision-making tasks, yet cautious-looking outputs do not necessarily indicate alignment with human decision-making mechanisms. We investigate this distinction using the St. Petersburg game as a controlled testbed, a classical paradox in which the expected payoff is infinite, yet humans typically report low, finite willingness to pay. We evaluate 28 LLMs with a structured prompt suite that includes the original game; controlled decision variants that perturb truncation, repeated play, numeric endowment, and occupational identity; a human-perspective prompt that asks models to reason as human decision makers; and paired comparisons between base models and their instruction-tuned counterparts. In the original game, most models generate finite bids, creating the appearance of human-like risk behavior. However, this outcome-level resemblance masks substantial mechanism-level differences. The controlled variants reveal that rather than maintaining human-like behavior seen in the original game, models often shift to conditionally and computationally rational behavior. Human-cue prompting and instruction tuning often lower bids and reduce some visible pathologies, but most mechanism-level response patterns remain largely unchanged. These findings show that behavioral alignment in risk decision-making can be surface-level: LLMs may produce human-like risk decisions without exhibiting human-consistent mechanisms. High-stakes evaluations of LLM decision-making should therefore move beyond outcome similarity and examine whether the alignment is supported by mechanism-level consistency.

2026-06-03T15:01:52Z Chensong Huang Changyu Chen Chenwei Lin Hanjia Lyu Xian Xu Jiebo Luo http://arxiv.org/abs/2605.28210v2 The Illusion of Opting in AI-Mediated Consequential Decisions 2026-06-03T13:35:21Z

Drawing on Ullmann-Margalit's concept of opting (transformative, irrevocable, and shadowed by foreclosed alternatives), we show that current AI systems raise a profound ethical problem that existing AI ethics has not fully captured: the illusion of opting, in which persons and groups encounter the deceptive appearance of meaningful consequential choice while the agency needed to become genuinely capable of choosing is weakened. Against approaches that treat AI primarily as an optimizer of already given ends, we argue that AI systems should be evaluated by whether they protect and cultivate meta-capacity against the illusion of opting: the socially and institutionally scaffolded agentive capacity through which means and ends can be formed, contested, revised, and owned. This reframing is especially urgent for disadvantaged populations, who are least able to absorb the costs of the illusion of opting when AI-mediated pathways misdirect behavior and action. We propose three normative imperatives for AI-mediated consequential decisions: existential honesty, which acknowledges the limits of prediction; ecological rationality, which situates guidance within heterogeneous lived ecologies; and counterfactual reparation, which acknowledges and repairs foreclosed alternatives when AI-mediated decision-making pathways fail.

2026-05-27T09:30:08Z 11 pages, 1 figure, 2 tables Eugene Yu Ji http://arxiv.org/abs/2606.04832v1 Dark Path: An Analysis of the Belt & Road Initiative in El Salvador 2026-06-03T12:57:04Z

The Belt & Road Initiative (BRI) is a concerted effort from Ministries under the People's Republic of China (PRC) to diplomatically and economically impose its will upon other nations. El Salvador is a US partner and a beneficiary of foreign investment under the BRI. Recent changes to Salvadoran law do not address the implied risks to the nation's supply-chain and cyber infrastructure. This work addresses the gap by exploring previously limited analysis on BRI activities, its intersection with Salvadoran law, and the national security risks introduced by supply-chain reliance from the BRI. This exploratory study examined a portion of the William & Mary AidData dataset filtered on El Salvador, social media posts, news articles, white papers, and law published by the Legislative Assembly of El Salvador. The analysis suggests that the BRI poses a national security and supply-chain risk to El Salvador through influential-subterfuge, loss of digital sovereignty, which contradicts the State Cybersecurity Agency (ACE) and existing Salvadoran laws. This study provides a foundational understanding and regional context for future research.

2026-06-03T12:57:04Z 8 pages, 1 figure, Cleared for Release on 19 MAY 2026 (DOPSR 26-P-0621) Adam Dorian Wong David Kenley http://arxiv.org/abs/2510.21200v3 Shift Bribery over Social Networks 2026-06-03T12:45:49Z

In shift bribery, a briber seeks to promote his preferred candidate by paying voters to raise their ranking. Classical models of shift bribery assume voters act independently, overlooking the role of social influence. However, in reality, individuals are social beings and are often represented as part of a social network, where bribed voters may influence their neighbors, thereby amplifying the effect of persuasion. We study Shift bribery over Networks, where voters are modeled as nodes in a directed weighted graph, and arcs represent social influence between them. In this setting, bribery is not confined to directly targeted voters its effects can propagate through the network, influencing neighbors and amplifying persuasion. Given a budget and individual cost functions for shifting each voter's preference toward a designated candidate, the goal is to determine whether a shift strategy exists within budget that ensures the preferred candidate wins after both direct and network-propagated influence takes effect. We show that the problem is NP-Complete even with two candidates and unit costs, and W[2]-hard when parameterized by budget or maximum degree. On the positive side, we design polynomial-time algorithms for complete graphs under plurality and majority rules and path graphs for uniform edge weights, linear-time algorithms for transitive tournaments for two candidates, linear cost functions and uniform arc weights, and pseudo-polynomial algorithms for cluster graphs. We further prove the existence of fixed-parameter tractable algorithms with treewidth as parameter for two candidates, linear cost functions and uniform arc weights and pseudo-FPT with cluster vertex deletion number for two candidates and uniform arc weights. Together, these results give a detailed complexity landscape for shift bribery in social networks.

2025-10-24T07:05:50Z Ashlesha Hota Susobhan Bandopadhyay Palash Dey http://arxiv.org/abs/2606.04807v1 BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization 2026-06-03T12:31:42Z

Mitigating social bias in Large Language Models (LLMs) presents a distinct alignment challenge: unlike verifiable tasks, bias lacks a single ground truth, creating a high-variance, subjective reward landscape. Previous preference-based fine-tuning methods have major trade-offs: Direct Preference Optimization (DPO) is limited by the lack of exploration inherent in offline training, while Proximal Policy Optimization (PPO) can lead to training instability due to potentially unreliable critic estimates. In this paper, we propose BiasGRPO, a framework using Group Relative Policy Optimization (GRPO) to stabilize alignment by normalizing rewards across a group of sampled completions. By substituting the value function with a group-relative baseline, our approach reduces instability while maintaining the exploration benefits of online training. We find that BiasGRPO outperforms DPO and PPO across multiple benchmarks, indicating its effectiveness. To adapt GRPO, we synthetically extend a dataset spanning multiple domains and contexts. We also create and release a custom bias reward model that effectively guides generation while being highly compute-efficient and avoiding knowledge degradation, providing a valuable resource that can be seamlessly integrated into multi-objective RLHF pipelines.

2026-06-03T12:31:42Z Accepted to Findings of the ACL Saket Reddy Ke Yang ChengXiang Zhai http://arxiv.org/abs/2601.02369v2 Fair Distribution of Digital Payments: Balancing Transaction Flows for Regulatory Compliance 2026-06-03T12:29:13Z

The concentration of digital payment transactions in just two UPI apps like PhonePe and Google Pay has raised concerns of duopoly in India s digital financial ecosystem. To address this, the National Payments Corporation of India (NPCI) has mandated that no single UPI app should exceed 30 percent of total transaction volume. Enforcing this cap, however, poses a significant computational challenge: how to redistribute user transactions across apps without causing widespread user inconvenience while maintaining capacity limits? In this paper, we formalize this problem as the Minimum Edge Activation Flow (MEAF) problem on a bipartite network of users and apps, where activating an edge corresponds to a new app installation. The objective is to ensure a feasible flow respecting app capacities while minimizing additional activations. We further prove that Minimum Edge Activation Flow is NP-Complete. To address the computational challenge, we propose scalable heuristics, named Decoupled Two-Stage Allocation Strategy (DTAS), that exploit flow structure and capacity reuse. Experiments on large semi-synthetic transaction network data show that DTAS finds solutions close to the optimal ILP within seconds, offering a fast and practical way to enforce transaction caps fairly and efficiently.

2025-11-30T04:48:01Z Ashlesha Hota Shashwat Kumar Daman Deep Singh Abolfazl Asudeh Palash Dey Abhijnan Chakraborty