https://arxiv.org/api/JTm9ZTs5uwBNiCpxEF/FbyLvkJQ
2026-06-18T18:34:45Z
28983
405
15
http://arxiv.org/abs/2603.24625v2
From Hype to Collapse: Investigating Rug Pull Scams on Solana
2026-05-31T10:48:32Z
Solana has experienced rapid growth due to its high performance and low transaction costs, but the extremely low barrier to token issuance has also enabled widespread Rug Pulls. Unlike Ethereum-based Rug Pulls, which often rely on malicious smart-contract logic, Solana's unified SPL Token program shifts fraudulent execution toward on-chain behavioral manipulation. However, existing research has not systematically examined these Solana-specific Rug Pull patterns, and no public Solana Rug Pull dataset is available for empirical research. To bridge this gap, we present a large-scale measurement study of Rug Pulls on Solana. We manually verify 68 community-reported incidents and curate a benchmark of 117 confirmed Rug Pull tokens, from which we distill three representative on-chain behavioral patterns: Freeze Authority Abuse, Liquidity Withdrawal, and Pump-and-Dump. Guided by these patterns, we design a behavior-guided candidate identification and human-validation pipeline. We apply this pipeline to 100,063 tokens newly issued on Orca, Raydium, and Meteora during the first half of 2025, identifying 76,469 Rug Pull tokens. A random manual audit of 382 samples estimates a labeling false-positive rate of 0.26\%, supporting the reliability of the dataset. We release the resulting dataset and use it to characterize the Solana Rug Pull ecosystem. Our analysis shows that Rug Pulls on Solana exhibit extremely short lifecycles, strong price-driven dynamics, severe economic losses, and highly organized group behaviors. These findings provide new insights into the Solana Rug Pull landscape and support the development of effective on-chain defense mechanisms.
2026-03-25T02:31:31Z
Jiaxin Chen
Ziwei Li
Zigui Jiang
Ruihong He
Yantong Zhou
Jiajing Wu
Zibin Zheng
http://arxiv.org/abs/2606.01152v1
ASE-26: a curriculum for agentic software engineering as a discipline
2026-05-31T10:44:53Z
The work of a professional software engineer has begun to consist, increasingly, of directing agents rather than writing code, and the empirical evidence for the shift is now several years deep. Anthropic's Economic Index puts automation at 79 per cent of Claude Code interactions [2]; Handa and colleagues at Anthropic find AI exposure for Computer Programmer tasks at approximately 75 per cent of the role's distinct activities [3]; Brynjolfsson and colleagues at Stanford's Digital Economy Lab report a 13 per cent relative decline in employment for workers aged 22 to 25 in occupations most exposed to AI [4]. The shift is also unfinished, and the academic literature on agentic software engineering converges on the finding that the missing capability is not better models but structured practitioner discipline. This paper presents ASE-26, a comprehensive undergraduate curriculum for agentic software engineering as a discipline, deposited as a citable reference on Zenodo under CC BY-ND 4.0 [12]. The paper sets out the discipline framing the curriculum rests on, the conceptual contributions it makes (most importantly, the evolutionary spiral as the operational form of the co-evolution of intent and build), the twenty-one-module structure that organises the discipline for teaching, the pedagogical commitments that follow from grading work co-produced with an agent, what graduates leave with, and how the discipline as taught is designed to outlast the specific capabilities of today's models. The position the paper takes is that the practitioner skills the industry currently lacks are precisely the skills the discipline names, and that structured undergraduate curricula in agentic software engineering are the principal mechanism by which the gap closes.
2026-05-31T10:44:53Z
12 pages, 20 references. Companion paper to the ASE-26 curriculum deposited on Zenodo at doi:10.5281/zenodo.20468021. Part 1 of a planned series of two pre-prints on the curriculum and its conceptual core
Mikael Gorsky
http://arxiv.org/abs/2606.01127v1
How Proposal Novelty, Topical Diversity, and Theory-Practice Balance Shape Scholarly Outcomes in Funded Education Research
2026-05-31T10:01:21Z
Education research occupies a distinctive position in public science because it is expected to advance scholarly knowledge while also informing learning, teaching, participation, and workforce development. This study examines how the intellectual characteristics of NSF-funded education proposals are associated with the subsequent academic performance of funded scholars. Linking 8,715 NSF education awards from 1990 to 2020 with 84,519 publications by principal investigators, the analysis focuses on four major NSF education divisions that collectively span undergraduate and graduate levels, formal and informal learning environments, and inclusive educational initiatives. Proposal novelty is measured as semantic distance from prior funded projects within the same division, topical diversity as breadth across latent research themes, and intellectual orientation as theoretical, practical, or balanced. The results show that NSF education funding is consistently associated with higher publication output across divisions. However, this increase is not accompanied by stronger citation performance or higher journal-level visibility; citation and CiteScore estimates are often negative, particularly in later decades. Proposal novelty shows limited and uneven associations with post-award outcomes, whereas topical diversity is more clearly related to publication growth in some divisions but weaker citation-based performance in others. Balanced proposals that integrate theoretical and practical aims display the most favourable overall profile, combining positive publication associations with fewer negative citation-based patterns. These findings highlight the importance of evaluating education research funding through multiple academic outcomes and division-specific research contexts.
2026-05-31T10:01:21Z
Yunfeng Gao
Yuxuan Xiao
Jiaming Zhang
Yang Ding
http://arxiv.org/abs/2606.13696v1
AGORA: Can Deliberation and Governance Gates Absorb Participation Bias in Transit Planning?
2026-05-31T08:00:37Z
Transit network design depends not only on the optimization algorithm but also on who shows up to the public hearing. Current practice often collects one-directional comments from self-selected attendees, leaving participant mix as an uncontrolled source of outcome variation. We present AGORA, a framework that holds the network, demand, and solver fixed while systematically varying meeting composition through stakeholder agents, structured deliberation, and governance gates. Across two standard benchmark networks at different scales, we find that (i) aggregate outcomes vary little across compositions, but on tail risk and fairness disparity, representative sampling still tends to outperform skewed compositions; (ii) without deliberation, composition produces no variation at all, showing that deliberation is the mechanism through which who attends affects outcomes; and (iii) governance gates compress cross-profile variance without shifting the average outcome on Mandl, but low acceptance on Mumford0 shows thresholds require instance-specific calibration. These findings reframe participation bias from an uncontrollable input to a process-design problem: even without guaranteed representative attendance, well-structured deliberation and governance criteria can substantially reduce how much outcomes depend on who is in the room.
2026-05-31T08:00:37Z
Jung-Hoon Cho
Cathy Wu
http://arxiv.org/abs/2606.07631v1
Trait-space Monitoring for Emergent Misalignment During Supervised Finetuning
2026-05-31T04:28:21Z
Emergent misalignment (EM) occurs when narrow finetuning causes a model to behave dangerously outside the finetuning task. Standard training signals can miss this shift, making reliable detection costly if it depends on repeated behavioral evaluation. We ask whether emergent misalignment can instead be detected from internal representations during finetuning. Using seven alignment-relevant traits encoded as linear directions in activation space, we track representational drift across training checkpoints in four open-source 7-9B LLMs. EM-relevant drift concentrates on a low-dimensional axis that explains 65.5% of the variance, revealing a geometric signature in the studied regime. A low-overhead monitor built on this drift profile detects dangerous checkpoints with 2.2% false negative rate, 2.9% false positive rate, and 0.990 AUROC on held-out perturbation types, outperforming unsupervised PCA and SAE baselines. Stress tests on two 14B models, longer finetuning runs, and misaligned starting points identify key deployment boundaries. These results position trait-space monitoring as a practical complement to behavioral evaluation for EM detection during LoRA-based finetuning, while showing that deployment across substantially different regimes may require recalibration.
2026-05-31T04:28:21Z
First version. 45 pages
Huy Nghiem
Sy-Tuyen Ho
Sarah Wiegreffe
Hal Daumé
http://arxiv.org/abs/2602.15259v2
Knowing Isn't Understanding: Re-grounding Generative Proactivity with Epistemic and Behavioral Insight
2026-05-31T02:09:50Z
Generative AI agents equate understanding with resolving explicit queries, an assumption that confines interaction to what users can articulate. This assumption breaks down when users themselves lack awareness of what is missing, risky, or worth considering. In such conditions, proactivity is not merely an efficiency enhancement, but an epistemic necessity. We refer to this condition as epistemic incompleteness: where progress depends on engaging with unknown unknowns for effective partnership. Existing approaches to proactivity remain narrowly anticipatory, extrapolating from past behavior and presuming that goals are already well defined, thereby failing to support users meaningfully. However, surfacing possibilities beyond a user's current awareness is not inherently beneficial. Unconstrained proactive interventions can misdirect attention, overwhelm users, or introduce harm. Proactive agents, therefore, require behavioral grounding: principled constraints on when, how, and to what extent an agent should intervene. We advance the position that generative proactivity must be grounded both epistemically and behaviorally. Drawing on the philosophy of ignorance and research on proactive behavior, we argue that these theories offer critical guidance for designing agents that can engage responsibly and foster meaningful partnerships.
2026-02-16T23:28:17Z
43 rd International Conference on Machine Learning (ICML 2026)
Kirandeep Kaur
Xingda Lyu
Chirag Shah
http://arxiv.org/abs/2605.25142v2
Pre-Characterization of Electromagnetic Side-Channel Leakage Using Publicly Available Information: A Case Study on E-Voting Interfaces
2026-05-31T00:56:26Z
In this work, we study the interface of the Brazilian e-Voting Machine (BVM) in the context of electromagnetic side-channel threats commonly referred to as TEMPEST attacks. In a TEMPEST attack against video displays, an eavesdropper uses Software-Defined Radios (SDRs) to recover sensitive information by intercepting electromagnetic emanations generated during video signal transmission. We emulate the BVM using a VGA monitor by leveraging publicly available information disclosed by the electoral authority, including technical specifications, operational rules of the system, and the official BVM interface. Based on this setup, we investigate whether the BVM interface gives rise to a distinctive spectral signature observable through its unintended electromagnetic emissions. Our findings show that design characteristics relevant to a nationwide electoral process -- such as high image contrast, minimal on-screen information, and the prohibition of other electronic devices within the polling station -- result in a simple and highly distinctive spectral signature that can be observed even through a wall in our experiments. Although our experiments do not involve actual BVM hardware, the results raise concerns regarding the system's susceptibility to TEMPEST attacks and highlight the need for further research on protective countermeasures. In this context, our findings may support the design of automatic jammers capable of adaptively targeting compromising frequencies. To the best of our knowledge, this is the first study investigating TEMPEST attacks in the context of an electronic voting system officially adopted by a country.
2026-05-24T15:46:54Z
This work was presented in the Show & Tell Technical Demonstration Session of the IEEE International Conference on Acoustic, Speech, and Signal Processing} (ICASSP) 2026, available in https://2026.ieeeicassp.org/industry_program/#DMOS_530
Leonardo Teodoro
Kemuel L. Vieira
Saulo Queiroz
http://arxiv.org/abs/2606.00873v1
Prompts for Public-Sector LLMs Should Be Governed as Commons
2026-05-30T20:01:53Z
This paper argues that prompts used to deploy large language models (LLMs) in public-sector settings should be treated as governed artefacts rather than private, transient inputs. Prompts encode role instructions, decision framings, and value claims; prompt choice can materially shift outputs even when model weights and input records are held fixed. Existing governance tools, including model and dataset documentation, organisation-level policies, and post-training alignment, rarely make the local prompt collections used in deployment transparent, contestable, or auditable. We propose Prompt Commons: a versioned, community-maintained repository of prompt templates with provenance metadata, licensing, and moderation logs. Using a pilot dataset collected with community partners in a large North American city (443 human prompts; 3,317 after augmentation), we illustrate three governance states (open, curated, veto-enabled) and a negotiation-oriented ensemble method that aggregates stakeholder prompts into compromise recommendations. We close with falsifiable implications and an evaluation agenda for prompt-layer governance.
2026-05-30T20:01:53Z
To appear in the Proceedings of the 43rd International Conference on Machine Learning (ICML 2026)
Rashid Mushkani
http://arxiv.org/abs/2606.07629v1
Large Language Models Should Learn Personalized Rather Than Aggregated Human Preferences
2026-05-30T18:47:52Z
Current approaches to aligning large language models (LLMs) aggregate diverse human preferences into a single reward signal, effectively optimizing for a hypothetical ``average user'' who represents no real person particularly well. This position paper argues that LLMs should learn personalized, individual preferences rather than aggregated ones. We show that aggregation masks critical information about preference diversity, individual values, and contextual dependencies, which is a limitation both theoretically grounded in social choice theory and empirically evident across demographic groups. We analyze the rich structure that human preferences encode, survey technical approaches to personalization, and systematically address counterarguments on scalability, shared standards, and manipulation risk. While personalization introduces genuine safety challenges including filter bubbles, value lock-in, and psychological manipulation, we argue these are manageable through bounded personalization frameworks that preserve universal safety constraints while accommodating legitimate individual variation. We conclude with a concrete research and policy agenda for developing preference-aware models that respect both individual autonomy and collective safety.
2026-05-30T18:47:52Z
Accepted to ICML 2026
Cristina Garbacea
http://arxiv.org/abs/2606.07628v1
Frankenstein in the Pipeline: Computational Epistemicide in Facial Recognition
2026-05-30T17:52:02Z
While the eugenic roots of computer vision are well-documented in critical technology studies, less attention has been paid to the operational mechanisms through which this violence is enacted at the level of the pipeline. This paper employs Mary Shelley's Frankenstein not as a metaphor for unintended consequences, but as a diagnostic framework for method: disassembly, reconstruction, and the production of a creature whose legitimacy is asserted by the procedure that made it.
I argue that embedding-based facial recognition enacts what I call computational epistemicide, an extension of Sueli Carneiro's concept of epistemicide to the computational domain - by destroying the face as a living, relational surface and authorizing a numerical proxy as the privileged site of identity. Across detection/cropping, landmarking, alignment/frontalization, and embedding, the face is progressively narrowed to what can be stabilized as data, producing a canonical face as the condition of legibility and a corresponding form-subject as the condition of recognition. Vectorization completes the Frankensteinian "stitching": the dissected face is reassembled into a fixed-dimensional artifact designed to circulate across databases and institutions. I then show how distance-based similarity and thresholding operationalize a norm of "close enough," making recognition inseparable from standardization and rendering reformist "ethical AI" optimization structurally insufficient. The paper concludes by arguing for abolition as a normative stance: refusing vectorized identity as a legitimate basis for rights and access, and dismantling the institutional impulse to govern human life through dissectible data points.
2026-05-30T17:52:02Z
Accepted to ACM FAccT 2026. Author's version. 17 pages, 2 figures
Nina da Hora
10.1145/3805689.3812284
http://arxiv.org/abs/2606.00791v1
Global Patterns in Student Stress and Academic Performance: A Machine Learning Study Using PISA 2022
2026-05-30T16:12:39Z
Machine learning was applied to examine whether stress-related factors influence student performance in a consistent way across the world. The main goal of this project is to confirm or reject the existence of a similar global pattern by generalizing the findings that already exist in this field. We focused on various psychological indicators such as anxiety score, test anxiety, math anxiety, math confidence, wellbeing, and sense of belonging, along with several non-psychological factors for context. Machine learning was chosen due to the extremely large size of the PISA 2022 dataset and its ability to capture complex relationships that simpler methods may overlook. The analysis was conducted across six continents by splitting the dataset into six separate case studies. Feature engineering was performed manually for each region, while the same baseline models were trained to ensure a fair comparison. The results show that the negative effect of stress on performance is present and fairly consistent across all continents. Although some error remains, partly because stress is not the only factor shaping academic outcomes, the overall pattern is clear. Africa stood out as an outlier due to lower average educational and wellbeing levels and a higher proportion of missing data, yet even there the negative relationship remained observable.
2026-05-30T16:12:39Z
Ani Ghazanchyan
Sachin Kumar
http://arxiv.org/abs/2411.19093v6
Seeing SDG 6 from space: local-scale monitoring of piped water and sewage system access across Africa using satellite imagery and self-supervised learning
2026-05-30T15:30:28Z
Access to drinking water and sanitation is essential for health and well-being, yet major disparities remain, especially in data-scarce regions such as Africa. SDG 6 aims for universal access, but current monitoring relies on costly, infrequent, and spatially uneven surveys and censuses with long reporting delays.
This study develops a scalable remote-sensing framework to estimate piped water and sewage system access at approximately 2.56 km resolution using Sentinel-2 imagery, Afrobarometer survey responses, 30 m population data, and DINO self-supervised Vision Transformer features. The best model achieves AUROC values of 91.54% for piped water and 93.24% for sewage access. Across 50 African countries, population-weighted estimates strongly align with WHO/UNICEF JMP statistics for piped water ($R^2 = 0.92$) and show meaningful agreement for sewage access ($R^2 = 0.72$). In countries without Afrobarometer coverage, MAEs are 9.5% and 10.7%, with estimates within 15% of JMP values for 121.4 million and 159.7 million people, respectively.
A Nigeria case study across 767 Local Government Areas (LGAs) shows that the framework reveals fine-scale environmental inequality. The largest no-access burdens reach 1.155 million people for piped water and 1.452 million for sewage, 7.9 and 8.3 times the median LGA burden, while top-decile no-access thresholds of 0.805 and 0.952 indicate that deprivation is widespread. These findings show that DINO-based satellite models can complement household surveys with low-cost, spatially detailed evidence for SDG 6 monitoring, infrastructure targeting, and environmental equity assessment.
2024-11-28T12:13:46Z
Under Review
Othmane Echchabi
Aya Lahlou
Nizar Talty
Josh Malcolm Manto
Tongshu Zheng
Ka Leung Lam
http://arxiv.org/abs/2606.02632v1
Position: Prioritize Identifying Structure, Not Complex Models, for Scientific Discovery
2026-05-30T15:21:58Z
Modern Machine Learning (ML) and Artificial Intelligence (AI) models, especially large language models (LLMs), are increasingly used to generate scientific hypotheses and mechanistic explanations from observational data. This position paper argues that in the high-dimensional proxy regimes where modern ML excels, mechanistic learning is generically underdetermined: many incompatible mechanisms induce essentially the same observational relationships on the support of the data, so predictive success and coherent explanations are insufficient evidence of mechanism discovery. This underdetermination becomes uniquely hazardous with large language models (LLMs), which tend to collapse large equivalence classes of explanations into a single fluent narrative. This paper proposes concrete standards for ``mechanistic ML,'' and argues these norms are necessary if LLM-centered workflows are to support science rather than merely simulate it.
2026-05-30T15:21:58Z
Will appear as a position paper in ICML
Tyler H. McCormick
http://arxiv.org/abs/2511.05613v2
Who Evaluates AI's Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations
2026-05-30T13:29:29Z
Foundation models are increasingly central to high-stakes AI systems, and governance frameworks now depend on evaluations to assess their risks and capabilities. Although general capability evaluations are widespread, social impact assessments covering bias, fairness, privacy, environmental costs, and labor remain uneven. To characterize this landscape, we conduct the first comprehensive analysis of social impact evaluation reporting, examining 186 first-party release reports and 248 third-party evaluation sources, supplemented by developer interviews. We find a stark division of labor: first-party reporting is sparse, often superficial, and declining in areas like environmental impact and bias, while third-party evaluators provide broader, more rigorous coverage of bias, harmful content, and performance disparities. However, only developers can authoritatively report on data provenance, content moderation labor, costs, and infrastructure, yet interviews reveal these disclosures are deprioritized unless tied to product adoption or compliance. Current practices leave major gaps in assessing societal impacts, underscoring the need for policies that mandate developer transparency, strengthen independent evaluation ecosystems, and create shared infrastructure for aggregating third-party evaluations.
2025-11-06T14:25:32Z
Accepted at the Forty-Third International Conference on Machine Learning (ICML), 2026, in Seoul, Korea
Anka Reuel
Avijit Ghosh
Jenny Chim
Andrew Tran
Yanan Long
Jennifer Mickel
Usman Gohar
Srishti Yadav
Pawan Sasanka Ammanamanchi
Mowafak Allaham
Hossein A. Rahmani
Mubashara Akhtar
Felix Friedrich
Robert Scholz
Michael Alexander Riegler
Jan Batzner
Eliya Habba
Arushi Saxena
Anastassia Kornilova
Kevin Wei
Prajna Soni
Yohan Mathew
Kevin Klyman
Jeba Sania
Subramanyam Sahoo
Olivia Beyer Bruvik
Pouya Sadeghi
Sujata Goswami
Angelina Wang
Yacine Jernite
Zeerak Talat
Stella Biderman
Mykel Kochenderfer
Sanmi Koyejo
Irene Solaiman
http://arxiv.org/abs/2606.00655v1
Scaling Behavior of Single LLM-Driven Multi-Agent Systems
2026-05-30T09:57:49Z
The burgeoning field of LLM-based Multi-Agent Systems (MAS) promises to tackle complex tasks through collaborative intelligence, yet fundamental questions regarding their scaling behavior and intrinsic collective dynamics remain underexplored. This paper systematically investigates how the performance of a homogeneous MAS evolves as the number of agents increases, isolating the variable of collaboration from model or knowledge heterogeneity. We propose the Sequential Iterative Multi-Agent System (SIMAS) framework, a minimalist architecture centered on sequential inter-agent communication, to clearly observe scaling effects. Through extensive experiments across diverse tasks and model scales, we establish that MAS performance does not scale monotonically with agent count but follows a pattern of diminishing returns, governed by a trade-off between collaborative synergy and coordination overhead. Our findings reveal that effective MAS requires a sufficiently capable base LLM, that task type critically modulates the optimal agent count, and that collective intelligence is an emergent property contingent on strategic interaction design rather than a guaranteed outcome of agent plurality. The performance degradation stems coordination overhead rather than merely long-context failure, and the scaling tendency generalizes across interaction architectures like structured debate topologies. This work provides a foundational understanding of MAS scaling laws, offering practical guidance for designing efficient collaborative systems and challenging the prevailing assumption that more agents invariably lead to better performance.
2026-05-30T09:57:49Z
Jialing Li
Zhouhong Gu
Yin Cai
Hongwei Feng