https://arxiv.org/api/JTm9ZTs5uwBNiCpxEF/FbyLvkJQ 2026-06-18T18:34:45Z 28983 405 15 http://arxiv.org/abs/2603.24625v2 From Hype to Collapse: Investigating Rug Pull Scams on Solana 2026-05-31T10:48:32Z

Solana has experienced rapid growth due to its high performance and low transaction costs, but the extremely low barrier to token issuance has also enabled widespread Rug Pulls. Unlike Ethereum-based Rug Pulls, which often rely on malicious smart-contract logic, Solana's unified SPL Token program shifts fraudulent execution toward on-chain behavioral manipulation. However, existing research has not systematically examined these Solana-specific Rug Pull patterns, and no public Solana Rug Pull dataset is available for empirical research. To bridge this gap, we present a large-scale measurement study of Rug Pulls on Solana. We manually verify 68 community-reported incidents and curate a benchmark of 117 confirmed Rug Pull tokens, from which we distill three representative on-chain behavioral patterns: Freeze Authority Abuse, Liquidity Withdrawal, and Pump-and-Dump. Guided by these patterns, we design a behavior-guided candidate identification and human-validation pipeline. We apply this pipeline to 100,063 tokens newly issued on Orca, Raydium, and Meteora during the first half of 2025, identifying 76,469 Rug Pull tokens. A random manual audit of 382 samples estimates a labeling false-positive rate of 0.26\%, supporting the reliability of the dataset. We release the resulting dataset and use it to characterize the Solana Rug Pull ecosystem. Our analysis shows that Rug Pulls on Solana exhibit extremely short lifecycles, strong price-driven dynamics, severe economic losses, and highly organized group behaviors. These findings provide new insights into the Solana Rug Pull landscape and support the development of effective on-chain defense mechanisms.

2026-03-25T02:31:31Z Jiaxin Chen Ziwei Li Zigui Jiang Ruihong He Yantong Zhou Jiajing Wu Zibin Zheng http://arxiv.org/abs/2606.01152v1 ASE-26: a curriculum for agentic software engineering as a discipline 2026-05-31T10:44:53Z

The work of a professional software engineer has begun to consist, increasingly, of directing agents rather than writing code, and the empirical evidence for the shift is now several years deep. Anthropic's Economic Index puts automation at 79 per cent of Claude Code interactions [2]; Handa and colleagues at Anthropic find AI exposure for Computer Programmer tasks at approximately 75 per cent of the role's distinct activities [3]; Brynjolfsson and colleagues at Stanford's Digital Economy Lab report a 13 per cent relative decline in employment for workers aged 22 to 25 in occupations most exposed to AI [4]. The shift is also unfinished, and the academic literature on agentic software engineering converges on the finding that the missing capability is not better models but structured practitioner discipline. This paper presents ASE-26, a comprehensive undergraduate curriculum for agentic software engineering as a discipline, deposited as a citable reference on Zenodo under CC BY-ND 4.0 [12]. The paper sets out the discipline framing the curriculum rests on, the conceptual contributions it makes (most importantly, the evolutionary spiral as the operational form of the co-evolution of intent and build), the twenty-one-module structure that organises the discipline for teaching, the pedagogical commitments that follow from grading work co-produced with an agent, what graduates leave with, and how the discipline as taught is designed to outlast the specific capabilities of today's models. The position the paper takes is that the practitioner skills the industry currently lacks are precisely the skills the discipline names, and that structured undergraduate curricula in agentic software engineering are the principal mechanism by which the gap closes.

2026-05-31T10:44:53Z 12 pages, 20 references. Companion paper to the ASE-26 curriculum deposited on Zenodo at doi:10.5281/zenodo.20468021. Part 1 of a planned series of two pre-prints on the curriculum and its conceptual core Mikael Gorsky http://arxiv.org/abs/2606.01127v1 How Proposal Novelty, Topical Diversity, and Theory-Practice Balance Shape Scholarly Outcomes in Funded Education Research 2026-05-31T10:01:21Z

Education research occupies a distinctive position in public science because it is expected to advance scholarly knowledge while also informing learning, teaching, participation, and workforce development. This study examines how the intellectual characteristics of NSF-funded education proposals are associated with the subsequent academic performance of funded scholars. Linking 8,715 NSF education awards from 1990 to 2020 with 84,519 publications by principal investigators, the analysis focuses on four major NSF education divisions that collectively span undergraduate and graduate levels, formal and informal learning environments, and inclusive educational initiatives. Proposal novelty is measured as semantic distance from prior funded projects within the same division, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced. The results show that NSF education funding is consistently associated with higher publication output across divisions. However, this increase is not accompanied by stronger citation performance or higher journal-level visibility; citation and CiteScore estimates are often negative, particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based patterns. These findings highlight the importance of evaluating education research funding through multiple academic outcomes and division-specific research contexts.

2026-05-31T10:01:21Z Yunfeng Gao Yuxuan Xiao Jiaming Zhang Yang Ding http://arxiv.org/abs/2606.13696v1 AGORA: Can Deliberation and Governance Gates Absorb Participation Bias in Transit Planning? 2026-05-31T08:00:37Z

Transit network design depends not only on the optimization algorithm but also on who shows up to the public hearing. Current practice often collects one-directional comments from self-selected attendees, leaving participant mix as an uncontrolled source of outcome variation. We present AGORA, a framework that holds the network, demand, and solver fixed while systematically varying meeting composition through stakeholder agents, structured deliberation, and governance gates. Across two standard benchmark networks at different scales, we find that (i) aggregate outcomes vary little across compositions, but on tail risk and fairness disparity, representative sampling still tends to outperform skewed compositions; (ii) without deliberation, composition produces no variation at all, showing that deliberation is the mechanism through which who attends affects outcomes; and (iii) governance gates compress cross-profile variance without shifting the average outcome on Mandl, but low acceptance on Mumford0 shows thresholds require instance-specific calibration. These findings reframe participation bias from an uncontrollable input to a process-design problem: even without guaranteed representative attendance, well-structured deliberation and governance criteria can substantially reduce how much outcomes depend on who is in the room.

2026-05-31T08:00:37Z Jung-Hoon Cho Cathy Wu http://arxiv.org/abs/2606.07631v1 Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning 2026-05-31T04:28:21Z

Emergent misalignment (EM) occurs when narrow finetuning causes a model to behave dangerously outside the finetuning task. Standard training signals can miss this shift, making reliable detection costly if it depends on repeated behavioral evaluation. We ask whether emergent misalignment can instead be detected from internal representations during finetuning. Using seven alignment-relevant traits encoded as linear directions in activation space, we track representational drift across training checkpoints in four open-source 7-9B LLMs. EM-relevant drift concentrates on a low-dimensional axis that explains 65.5% of the variance, revealing a geometric signature in the studied regime. A low-overhead monitor built on this drift profile detects dangerous checkpoints with 2.2% false negative rate, 2.9% false positive rate, and 0.990 AUROC on held-out perturbation types, outperforming unsupervised PCA and SAE baselines. Stress tests on two 14B models, longer finetuning runs, and misaligned starting points identify key deployment boundaries. These results position trait-space monitoring as a practical complement to behavioral evaluation for EM detection during LoRA-based finetuning, while showing that deployment across substantially different regimes may require recalibration.

2026-05-31T04:28:21Z First version. 45 pages Huy Nghiem Sy-Tuyen Ho Sarah Wiegreffe Hal Daumé http://arxiv.org/abs/2602.15259v2 Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight 2026-05-31T02:09:50Z

Generative AI agents equate understanding with resolving explicit queries, an assumption that confines interaction to what users can articulate. This assumption breaks down when users themselves lack awareness of what is missing, risky, or worth considering. In such conditions, proactivity is not merely an efficiency enhancement, but an epistemic necessity. We refer to this condition as epistemic incompleteness: where progress depends on engaging with unknown unknowns for effective partnership. Existing approaches to proactivity remain narrowly anticipatory, extrapolating from past behavior and presuming that goals are already well defined, thereby failing to support users meaningfully. However, surfacing possibilities beyond a user's current awareness is not inherently beneficial. Unconstrained proactive interventions can misdirect attention, overwhelm users, or introduce harm. Proactive agents, therefore, require behavioral grounding: principled constraints on when, how, and to what extent an agent should intervene. We advance the position that generative proactivity must be grounded both epistemically and behaviorally. Drawing on the philosophy of ignorance and research on proactive behavior, we argue that these theories offer critical guidance for designing agents that can engage responsibly and foster meaningful partnerships.

2026-02-16T23:28:17Z 43 rd International Conference on Machine Learning (ICML 2026) Kirandeep Kaur Xingda Lyu Chirag Shah http://arxiv.org/abs/2605.25142v2 Pre-Characterization of Electromagnetic Side-Channel Leakage Using Publicly Available Information: A Case Study on E-Voting Interfaces 2026-05-31T00:56:26Z

In this work, we study the interface of the Brazilian e-Voting Machine (BVM) in the context of electromagnetic side-channel threats commonly referred to as TEMPEST attacks. In a TEMPEST attack against video displays, an eavesdropper uses Software-Defined Radios (SDRs) to recover sensitive information by intercepting electromagnetic emanations generated during video signal transmission. We emulate the BVM using a VGA monitor by leveraging publicly available information disclosed by the electoral authority, including technical specifications, operational rules of the system, and the official BVM interface. Based on this setup, we investigate whether the BVM interface gives rise to a distinctive spectral signature observable through its unintended electromagnetic emissions. Our findings show that design characteristics relevant to a nationwide electoral process -- such as high image contrast, minimal on-screen information, and the prohibition of other electronic devices within the polling station -- result in a simple and highly distinctive spectral signature that can be observed even through a wall in our experiments. Although our experiments do not involve actual BVM hardware, the results raise concerns regarding the system's susceptibility to TEMPEST attacks and highlight the need for further research on protective countermeasures. In this context, our findings may support the design of automatic jammers capable of adaptively targeting compromising frequencies. To the best of our knowledge, this is the first study investigating TEMPEST attacks in the context of an electronic voting system officially adopted by a country.

2026-05-24T15:46:54Z This work was presented in the Show & Tell Technical Demonstration Session of the IEEE International Conference on Acoustic, Speech, and Signal Processing} (ICASSP) 2026, available in https://2026.ieeeicassp.org/industry_program/#DMOS_530 Leonardo Teodoro Kemuel L. Vieira Saulo Queiroz http://arxiv.org/abs/2606.00873v1 Prompts for Public-Sector LLMs Should Be Governed as Commons 2026-05-30T20:01:53Z

This paper argues that prompts used to deploy large language models (LLMs) in public-sector settings should be treated as governed artefacts rather than private, transient inputs. Prompts encode role instructions, decision framings, and value claims; prompt choice can materially shift outputs even when model weights and input records are held fixed. Existing governance tools, including model and dataset documentation, organisation-level policies, and post-training alignment, rarely make the local prompt collections used in deployment transparent, contestable, or auditable. We propose Prompt Commons: a versioned, community-maintained repository of prompt templates with provenance metadata, licensing, and moderation logs. Using a pilot dataset collected with community partners in a large North American city (443 human prompts; 3,317 after augmentation), we illustrate three governance states (open, curated, veto-enabled) and a negotiation-oriented ensemble method that aggregates stakeholder prompts into compromise recommendations. We close with falsifiable implications and an evaluation agenda for prompt-layer governance.

2026-05-30T20:01:53Z To appear in the Proceedings of the 43rd International Conference on Machine Learning (ICML 2026) Rashid Mushkani http://arxiv.org/abs/2606.07629v1 Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences 2026-05-30T18:47:52Z

Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.

2026-05-30T18:47:52Z Accepted to ICML 2026 Cristina Garbacea http://arxiv.org/abs/2606.07628v1 Frankenstein in the Pipeline: Computational Epistemicide in Facial Recognition 2026-05-30T17:52:02Z

While the eugenic roots of computer vision are well-documented in critical technology studies, less attention has been paid to the operational mechanisms through which this violence is enacted at the level of the pipeline. This paper employs Mary Shelley's Frankenstein not as a metaphor for unintended consequences, but as a diagnostic framework for method: disassembly, reconstruction, and the production of a creature whose legitimacy is asserted by the procedure that made it. I argue that embedding-based facial recognition enacts what I call computational epistemicide, an extension of Sueli Carneiro's concept of epistemicide to the computational domain - by destroying the face as a living, relational surface and authorizing a numerical proxy as the privileged site of identity. Across detection/cropping, landmarking, alignment/frontalization, and embedding, the face is progressively narrowed to what can be stabilized as data, producing a canonical face as the condition of legibility and a corresponding form-subject as the condition of recognition. Vectorization completes the Frankensteinian "stitching": the dissected face is reassembled into a fixed-dimensional artifact designed to circulate across databases and institutions. I then show how distance-based similarity and thresholding operationalize a norm of "close enough," making recognition inseparable from standardization and rendering reformist "ethical AI" optimization structurally insufficient. The paper concludes by arguing for abolition as a normative stance: refusing vectorized identity as a legitimate basis for rights and access, and dismantling the institutional impulse to govern human life through dissectible data points.

2026-05-30T17:52:02Z Accepted to ACM FAccT 2026. Author's version. 17 pages, 2 figures Nina da Hora 10.1145/3805689.3812284 http://arxiv.org/abs/2606.00791v1 Global Patterns in Student Stress and Academic Performance: A Machine Learning Study Using PISA 2022 2026-05-30T16:12:39Z

Machine learning was applied to examine whether stress-related factors influence student performance in a consistent way across the world. The main goal of this project is to confirm or reject the existence of a similar global pattern by generalizing the findings that already exist in this field. We focused on various psychological indicators such as anxiety score, test anxiety, math anxiety, math confidence, wellbeing, and sense of belonging, along with several non-psychological factors for context. Machine learning was chosen due to the extremely large size of the PISA 2022 dataset and its ability to capture complex relationships that simpler methods may overlook. The analysis was conducted across six continents by splitting the dataset into six separate case studies. Feature engineering was performed manually for each region, while the same baseline models were trained to ensure a fair comparison. The results show that the negative effect of stress on performance is present and fairly consistent across all continents. Although some error remains, partly because stress is not the only factor shaping academic outcomes, the overall pattern is clear. Africa stood out as an outlier due to lower average educational and wellbeing levels and a higher proportion of missing data, yet even there the negative relationship remained observable.

2026-05-30T16:12:39Z Ani Ghazanchyan Sachin Kumar http://arxiv.org/abs/2411.19093v6 Seeing SDG 6 from space: local-scale monitoring of piped water and sewage system access across Africa using satellite imagery and self-supervised learning 2026-05-30T15:30:28Z

Access to drinking water and sanitation is essential for health and well-being, yet major disparities remain, especially in data-scarce regions such as Africa. SDG 6 aims for universal access, but current monitoring relies on costly, infrequent, and spatially uneven surveys and censuses with long reporting delays. This study develops a scalable remote-sensing framework to estimate piped water and sewage system access at approximately 2.56 km resolution using Sentinel-2 imagery, Afrobarometer survey responses, 30 m population data, and DINO self-supervised Vision Transformer features. The best model achieves AUROC values of 91.54% for piped water and 93.24% for sewage access. Across 50 African countries, population-weighted estimates strongly align with WHO/UNICEF JMP statistics for piped water ($R^2 = 0.92$) and show meaningful agreement for sewage access ($R^2 = 0.72$). In countries without Afrobarometer coverage, MAEs are 9.5% and 10.7%, with estimates within 15% of JMP values for 121.4 million and 159.7 million people, respectively. A Nigeria case study across 767 Local Government Areas (LGAs) shows that the framework reveals fine-scale environmental inequality. The largest no-access burdens reach 1.155 million people for piped water and 1.452 million for sewage, 7.9 and 8.3 times the median LGA burden, while top-decile no-access thresholds of 0.805 and 0.952 indicate that deprivation is widespread. These findings show that DINO-based satellite models can complement household surveys with low-cost, spatially detailed evidence for SDG 6 monitoring, infrastructure targeting, and environmental equity assessment.

2024-11-28T12:13:46Z Under Review Othmane Echchabi Aya Lahlou Nizar Talty Josh Malcolm Manto Tongshu Zheng Ka Leung Lam http://arxiv.org/abs/2606.02632v1 Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery 2026-05-30T15:21:58Z

Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.

2026-05-30T15:21:58Z Will appear as a position paper in ICML Tyler H. McCormick http://arxiv.org/abs/2511.05613v2 Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations 2026-05-30T13:29:29Z

Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor remain uneven. To characterize this landscape, we conduct the first comprehensive analysis of social impact evaluation reporting, examining 186 first-party release reports and 248 third-party evaluation sources, supplemented by developer interviews. We find a stark division of labor: first-party reporting is sparse, often superficial, and declining in areas like environmental impact and bias, while third-party evaluators provide broader, more rigorous coverage of bias, harmful content, and performance disparities. However, only developers can authoritatively report on data provenance, content moderation labor, costs, and infrastructure, yet interviews reveal these disclosures are deprioritized unless tied to product adoption or compliance. Current practices leave major gaps in assessing societal impacts, underscoring the need for policies that mandate developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure for aggregating third-party evaluations.

2025-11-06T14:25:32Z Accepted at the Forty-Third International Conference on Machine Learning (ICML), 2026, in Seoul, Korea Anka Reuel Avijit Ghosh Jenny Chim Andrew Tran Yanan Long Jennifer Mickel Usman Gohar Srishti Yadav Pawan Sasanka Ammanamanchi Mowafak Allaham Hossein A. Rahmani Mubashara Akhtar Felix Friedrich Robert Scholz Michael Alexander Riegler Jan Batzner Eliya Habba Arushi Saxena Anastassia Kornilova Kevin Wei Prajna Soni Yohan Mathew Kevin Klyman Jeba Sania Subramanyam Sahoo Olivia Beyer Bruvik Pouya Sadeghi Sujata Goswami Angelina Wang Yacine Jernite Zeerak Talat Stella Biderman Mykel Kochenderfer Sanmi Koyejo Irene Solaiman http://arxiv.org/abs/2606.00655v1 Scaling Behavior of Single LLM-Driven Multi-Agent Systems 2026-05-30T09:57:49Z

The burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.

2026-05-30T09:57:49Z Jialing Li Zhouhong Gu Yin Cai Hongwei Feng