https://arxiv.org/api/pjG53soHDgBigsfmSRguUNpLAbE2026-06-21T23:10:37Z2899778015http://arxiv.org/abs/2605.15934v1Privacy is Fungibility: Why Endogenous Tokens Are Not Money2026-05-15T13:15:32ZIn this paper, we make a case that endogenous tokens such as cryptoassets are not money. First, we define and classify tokens found on public, permissionless ledgers, contrasting them with privately issued stablecoins and proposed CBDC designs. We then discuss the work of Kahn et al in Money is Privacy on cash versus simplified credit, and we extend their analysis to the situation found on most public, permissionless ledgers. Many public, permissionless ledgers utilize an account-based abstraction for balances, resulting in a default state that maps onto the most harmful models of agent interaction enumerated in Money is Privacy. The conclusion is threefold: that most blockchain economies lack a cash-like primitive; that stablecoins do not intrinsically fulfil this role; and that the reliance of a network on an endogenous token for security exposes holders even of a privacy-preserving asset to the same risk, if that asset relies on the same global ledger state as the endogenous token.2026-05-15T13:15:32Z20 pages, 2 tablesAlex LynhamGeoffrey Goodellhttp://arxiv.org/abs/2506.22440v2From Model Design to Organizational Design: Complexity Redistribution and Trade-Offs in Generative AI2026-05-15T13:05:40ZThis paper introduces the Generality-Accuracy-Simplicity (GAS) framework to analyze how large language models (LLMs) are reshaping organizations and competitive strategy. We argue that viewing AI as a simple reduction in input costs overlooks two critical dynamics: (a) the inherent trade-offs among generality, accuracy, and simplicity, and (b) the redistribution of complexity across stakeholders. While LLMs appear to defy the traditional trade-off by offering high generality and accuracy through simple interfaces, this user-facing simplicity masks a significant shift of complexity to infrastructure, compliance, and specialized personnel. The GAS trade-off, therefore, does not disappear but is relocated from the user to the organization, creating new managerial challenges, particularly around accuracy in high-stakes applications. We contend that competitive advantage no longer stems from mere AI adoption, but from mastering this redistributed complexity through the design of abstraction layers, workflow alignment, and complementary expertise. This study advances AI strategy by clarifying how scalable cognition relocates complexity and redefines the conditions for technology integration.2025-06-10T15:22:09ZSharique HasanAlexander OettlSampsa Samilahttp://arxiv.org/abs/2412.05887v3An Overview of Cyber Security Funding for Open Source Software2026-05-15T11:45:48ZMany open source software (OSS) projects need more human resources for maintenance, improvements, and sometimes even their survival. These needs allegedly apply even to vital OSS projects that can be seen as being a part of the world's critical infrastructures. To address this resourcing problem, new funding instruments for OSS projects have been established in recent years. The paper examines two such funding bodies for OSS and the projects they have funded. The focus of both funding bodies is on software security and cyber security in general. Based on qualitative thematic analysis, the results indicate that particularly OSS supply chains, network and cryptography libraries, programming languages, and operating systems and their low-level components have been funded and thus seen as critical in terms of cyber security. In addition to the qualitative results presented, the paper makes a contribution by connecting the research branches of critical infrastructure and sustainability of OSS projects. A further contribution is made by connecting the topic examined to recent cyber security regulations. Finally, an important argument is raised that neither cyber security nor project sustainability alone can entirely explain the rationales behind the funding decisions made by the two funding bodies.2024-12-08T10:48:30ZProceedings of the 7th International Workshop on Engineering and Cybersecurity of Critical Systems (EnCyCriS 2026), Rio de Janeiro, ACM, pp. 18-25Jukka RuohonenGaurav ChoudharyAdam Alami10.1145/3786160.3788466http://arxiv.org/abs/2604.20127v2Trajectory-Aware Reliability Modeling of Democratic Systems2026-05-15T10:51:51ZFailures in complex systems often emerge through gradual degradation and the propagation of stress across interacting components rather than through isolated shocks. Democratic systems exhibit similar dynamics, where weakening institutions can trigger cascading deterioration in related institutional structures. Traditional reliability and survival models typically estimate failure risk based on the current system state but do not explicitly capture how degradation propagates through institutional networks over time. This paper introduces a trajectory-aware reliability modeling framework based on Dynamic Causal Neural Autoregression (DCNAR). The framework first estimates a causal interaction structure among institutional indicators and then models their joint temporal evolution to generate forward trajectories of system states. Failure risk is defined as the probability that predicted trajectories cross predefined degradation thresholds within a fixed horizon. Using longitudinal institutional indicators, we compare DCNAR-based trajectory risk models with discrete-time hazard and Cox proportional hazards models. Results show that trajectory-aware modeling consistently outperforms Cox models and improves risk prediction for several propagation-driven institutional failures. These findings highlight the importance of modeling dynamic system interactions for reliability analysis and early detection of systemic degradation.2026-04-22T02:47:31ZDmitry ZaytsevValentina KuskovaMichael Coppedgehttp://arxiv.org/abs/2603.18221v2Scalable and Personalized Oral Assessments Using Voice AI2026-05-15T05:48:09ZStudents in our AI/ML course submitted polished, well-argued project analyses. Then, in class discussion, we asked them to walk through a single choice from their own work. Many could not. The writing looked great. The understanding often wasn't. Oral examinations retain an evidentiary link where written work no longer does: a student who can reason aloud, defend a decision under follow-up, and adapt when pushed demonstrates something no submitted document can certify. The obstacle has always been cost. A 25-minute oral reviewed by two graders takes roughly 30 combined instructor and TA hours for 36 students; at 100 the format is untenable. Voice AI and automated grading change the arithmetic. We built Viva, a system that conducts a personalized oral exam, then grades the transcript with a panel of three LLMs that score independently, read each other's assessments, and revise. Across two undergraduate cohorts at NYU Stern (36 students in Fall 2025, 37 in Spring 2026), grading-LLM cost stayed under one dollar per exam within the ElevenLabs subscription covering our voice minutes; for deployments exceeding an equivalent credit pool, budget about a dollar per ten minutes of graded exam time, practical for weekly assignments, not just finals. The system also broke instructively: the agent asked several questions at once, failed to randomize topics across the cohort, and a voice cloned from the professor's came across as harsh, replaced in Spring 2026 with a calm preset. These failures, with an earlier finding that a monolithic agent handling both examination and grading proved unreliable, point to five candidate transferable patterns: decompose into single-purpose modules, constrain behavior with code rather than prompts, keep randomization out of the LLM, grade with a multi-model panel whose members disagree, and choose voice characteristics with the same care as question design.2026-03-18T19:09:06ZPanos IpeirotisKonstantinos Rizakoshttp://arxiv.org/abs/2605.15473v1Validated Hypotheses as a Lens for Human-Likeness Evaluation in AI Agents2026-05-14T23:25:02ZWe propose using validated behavioral hypotheses as a lens for evaluating human-likeness in LLM-based agents. Our key idea is simple: If an agent is human-like, a population of such agents should reach the same inferential conclusion as the human population when run through the same experiment. Decades of social science have produced many such validated findings, each anchored to concrete experimental protocols and robustly established through independent replication. This yields an evaluation that is objective, decomposable, and scalable. We operationalize this lens through HumanStudy-Bench, an open platform that turns published human-subject studies into reusable simulation environments and administers the evaluation to configurable agents. It scores agent-human alignment on two metrics: the Probability Alignment Score (PAS) for inferential agreement and the Effect Consistency Score (ECS) for effect-size agreement. We curated an initial suite of 12 studies whose hypotheses are robustly established through independent replication, and evaluated 10 models under 4 agent designs. Results show that agent responses polarize between full replication and complete failure; agent design influences alignment more than model scale, but its effect is non-monotonic.2026-05-14T23:25:02ZXuan LiuHaoYang ShangZizhang LiuYuanjun FengGuankai ZhaiYunze XiaoYiwen TuHaojian Jinhttp://arxiv.org/abs/2605.15468v1GreenZ: A Sustainable UX Framework for Complex Digital Systems2026-05-14T23:15:56ZDigital systems have become simultaneously more powerful and more wasteful. Features accumulate that nobody uses. Data is collected that nobody analyzes. AI is deployed at significant energy and water costs for gains that a simpler approach could have achieved. And through all of it, the people who depend on these systems quietly absorb the consequences in cognitive load, lost time, and eroded trust. This paper introduces GreenZ, a three-layer Sustainable UX Framework for complex digital systems. Its three layers are a Philosophy Layer built around ten published principles, an Operational Frameworks Layer comprising five applied systems, and a Tools and Canvases Layer of practical audit instruments and decision models. Two contributions sit at the framework's core: a Digital Waste Taxonomy classifying eight distinct waste types, and an AI Sufficiency Decision Model that asks whether AI should exist in a given flow before any question of how to implement it. GreenZ v1 is theoretically grounded but empirically unvalidated. A practitioner expert review study is underway at the time of submission. The paper presents the framework's architecture, its conceptual foundations, its position relative to existing literature, and an honest account of what remains to be established.2026-05-14T23:15:56Z8 pages, 1 figure, 4 tables. Framework preprint. Expert review study underway. v1Trisha Solankihttp://arxiv.org/abs/2606.12429v1Muse Spark Safety & Preparedness Report2026-05-14T23:12:14ZMuse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.2026-05-14T23:12:14Z159 pages, 57 figuresCristina MenghiniSailPeter NeySailHamza KwisabaSail ZifanSail WangMiles TurpinFelix BinderJean-Christophe TestudAidan BoydNathaniel LiIvan EvtimovKlaudia KrawieckaArman ZharmagambetovJeremy KritzAlexander R. FabbriDaniel SongJinpeng MiaoJoonas HjeltMeghna RamaniLeona LanReza AghajaniJoanna BittonMahesh PasupuletiDevin NorderKhalid El-AriniParidhi SinghVĂtor AlbieroSahana CBRashnil ChaturvediElahe DabirEdoardo DebenedettiJim GustZiwen HanKat HeSean HendryxLifeng JinPolina KirichenkoSandra LefdalKenneth LiAsad LiaqatInna LinDespoina MagkaNeal MangaokarIshita MedirattaZach MillerSmitha MilliNiloofar MireshghallahSaba NazirHung NguyenMaximilian NickelKelvin NiuKerem OktarBhargavi ParanjapeParth PathakMaya PavlovaEmmanuel RamirezDavid RenardyCandace RossYasha SheyninClaudia ShiShivam SinghalEvangelia SpiliopoulouRakshith Sharma SrinivasaJamelle Watson-DanielsSpencer WhitmanAdina WilliamsChen XingAndy ZouTommy MaSiqi DengJames BeldockPrashant RatanchandaniKate PlawiakTaesung LeeRyan VictoryLindsay HundleyRachad AlaoHimaghna BhattacharjeeJianfeng ChiGary FrostPegah GhahremaniNiki HoweYuheng HuangSaeed JahedHannah KorevaarTrang LeZhe LiuJinghong LuoQin LyuNina MehrabiAbraham MontillaChirag NagpalCyrus NikolaidisRajvardhan OakManoj RaviVidya SarmaAman ShankarAlana ShineEric Michael SmithMariana TandonMichael TontchevCaoyu WangZihan WangCorinne WongZheng WuHongyuan ZhanJustin ZhaoZexuan ZhongChengxu ZhuangTristan GoodmanAyaz MinhasHarrison RudolphVictoria JeffriesIngrid DickinsonAlex VaughanLauren DeasonKamalika ChaudhuriJulian MichaelShengjia ZhaoSummer Yuehttp://arxiv.org/abs/2601.21028v2"Unlimited Realm of Exploration and Experimentation": Methods and Motivations of AI-Generated Sexual Content Creators2026-05-14T21:05:40ZAI-generated media is radically changing the way content is both consumed and produced on the internet, and in no place is this potentially more visible than in sexual content. AI-generated sexual content (AIG-SC) is increasingly enabled by an ecosystem of individual AI developers, specialized third-party applications, and foundation model providers. AIG-SC raises a number of concerns from older debates about the line between pornography and obscenity to newer debates about fair use and labor displacement (in this case, of sex workers), and has spurred new regulations to curb the spread of non-consensual intimate imagery (NCII) created using the same technology used to create AIG-SC. However, despite the growing prevalence of AIG-SC, little is known about its creators, their motivations, and what types of content they produce. To inform effective governance in this space, we conducted an in-depth study to understand what AIG-SC creators make, along with how and why they make it. Interviews with 28 AIG-SC creators, ranging from hobbyists to entrepreneurs to those who moderate communities of hundreds of thousands of other creators, revealed a wide spectrum of motivations, including sexual exploration, creative expression, technical experimentation, and in a handful of cases, the creation of NCII.2026-01-28T20:43:25ZJaron MinkLucy QinElissa M. Redmileshttp://arxiv.org/abs/2605.15380v1Eskwai for Students: Generative AI Assistant for Legal Education in Ghana2026-05-14T20:10:32ZRecent advances in generative AI have shown their potential to be leveraged for legal education. Yet, work on the development and deployment of such systems for legal education in the Global South is limited. In this work, we developed Eskwai for Students, a generative AI assistant to help law students with their legal education. Eskwai for Students is a retrieval augmented generation (RAG) system that provides answers to a wide range of legal questions for law students grounded in a curated database of over 12K case laws and 1.4K legislation in Ghana. We deployed Eskwai for Students in a longitudinal study of 30 months (2.5 years) used by 3.1K law students in Ghana who made 32K queries. We evaluated the helpfulness of our AI, and provided insight into the kinds of queries law students submit to this generative AI tool, which raises some ethical concerns. This work contributes to an understanding of how law students in the Global South are using generative AI for their studies and the ways it could be leveraged responsibly to advance legal education.2026-05-14T20:10:32Z10 pages. Accepted at the 27th International Conference on Artificial Intelligence in Education (AIED 2026)George BoatengPhilemon BaduPatrick Agyeman-BuduSamuel AnsahEvans AtompoyaEvan IgwiloLord BaahFrederick Abu-BonsrahVictor Wumbor-Apin Kumbolhttp://arxiv.org/abs/2605.15376v1Adesua: Development and Feasibility Study of an AI WhatsApp Bot for Science Learning in West Africa2026-05-14T20:04:39ZSub-Saharan Africa faces persistently high student-teacher ratios and shortages of qualified teachers, limiting students' access to personalized learning support and formative assessment. To address this challenge, we present Adesua, a WhatsApp-based AI Teaching Assistant for science education that extends the Kwame for Science platform. Adesua leverages WhatsApp's widespread adoption in Africa to provide accessible, curriculum-aligned learning support for Junior High School (JHS) and Senior High School (SHS) students across West Africa. The system integrates curated textbooks and 33 years of national examination questions with generative AI to enable conversational question answering and automated assessment with feedback via a WhatsApp bot. Students can ask science questions, take timed or untimed multiple-choice tests by topic or exam year, and receive instant grading and detailed explanations of correct and incorrect responses. A 6-month feasibility deployment in 2025 had 56 active users in Ghana, including students and parents. Quantitative evaluation showed a high perceived usefulness, with a helpfulness score of 93.75\% for AI-generated answers, albeit with a small number of ratings (n=16). These preliminary results provide a basis for more extensive future evaluation of a WhatsApp-based AI assistant to assess its potential to offer scalable, low-cost personalized learning support and formative assessment in resource-constrained educational contexts.2026-05-14T20:04:39Z11 pages. Accepted at the 27th International Conference on Artificial Intelligence in Education (AIED 2026)George BoatengEvans AtompoyaPhilemon BaduSamuel JohnSamuel AnsahPatrick Agyeman-BuduVictor Wumbor-Apin Kumbolhttp://arxiv.org/abs/2606.12428v1Mapping AI Programs in the U.S: A Status Report from Early 2026 and an Analysis of AI Majors and Minors2026-05-14T20:01:25ZWe present a report on the status of undergraduate Artificial Intelligence (AI) programs in the United States in Spring 2026. In so doing, we 1) describe our scraping and mapping tools, which dynamically update to track the state of AI education in the U.S., and 2) create a historic record at a time of great upheaval. The tool we developed, available at https://cicmap.ai, detects, scrapes, and displays data from more than 350 undergraduate AI programs--majors, minors, concentrations, and certificates--at 4-year universities. Our tool searched over 560 institutions to locate these programs, a sample that represents 86\% of all undergraduate Computer Science (CS) graduates in the U.S. This tool allows prospective students, guidance counselors, administrators, and faculty to easily access AI program requirements and is designed to continually update as new programs emerge. To the best of our knowledge, this survey represents the most comprehensive snapshot of the state of AI programs in the U.S. to date. With this work we offer three important contributions: 1) a record of AI programs in the U.S. at a time of great upheaval; 2) a tool to explore AI programs and their requirements; and 3) an analysis of the courses required for 66 AI majors and 87 AI minors. Our analysis of majors and minors shows great variability in the size and the requirements of these degrees, but we note two takeaways. First, not all majors require a general AI course, but if they don't, they do require a Machine Learning (ML) course. Second, while more than a third of majors require an Ethics in AI course, just under a quarter of AI minors do.2026-05-14T20:01:25ZFelix MuznyCarolyn JonesCarter IthierHasnain SikoraHrutika Harshadbhai PatelCarla E. Brodleyhttp://arxiv.org/abs/2511.19115v2AI Consciousness and Existential Risk2026-05-14T19:57:37ZIn AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicate humanity. This issue is gaining prominence in scientific debate due to recent technical advancements and increased media coverage. In parallel, AI progress has sparked speculation and studies about the potential emergence of artificial consciousness. The two questions, AI consciousness and existential risk, are sometimes conflated, as if the former entailed the latter. Here, I explain that this view stems from a common confusion between consciousness and intelligence. Yet these two properties are empirically and theoretically distinct. Arguably, while intelligence is a direct predictor of an AI system's existential threat, consciousness is not. There are, however, certain incidental scenarios in which consciousness could influence existential risk, in either direction. Consciousness could be viewed as a means towards AI alignment, thereby lowering existential risk; or, it could be a precondition for reaching certain capabilities or levels of intelligence, and thus positively related to existential risk. Recognizing these distinctions can help AI safety researchers and public policymakers focus on the most pressing issues.2025-11-24T13:48:02ZUpdated for clarity and completeness following peer-reviewRufin VanRullenhttp://arxiv.org/abs/2606.12427v1Planning on Paper: Problem Decomposition with Diagrams in Introductory Computing2026-05-14T19:27:02ZBackground and Context. Problem decomposition is a core concern of computing education. It has also become increasingly relevant: in response to GenAI, many CS1 educators are advocating for shifting instructional emphasis away from code writing and towards decomposition and higher-level planning. Currently, there is a lack of knowledge in how novices do decomposition in large, multifunction tasks. Objectives. In this study, we describe how students represent solutions to a decomposition task, and characterize common issues that arise in those representations. Method. In a 50-minute lab, students were given a description of a word game and asked to draw (with pencil and paper) a decomposition diagram for a program that would implement this game. We performed an inductive thematic analysis with negotiated agreement on 55 of the diagrams, coding salient elements (e.g. functions and the relationships between them) and issues that arose. Findings. Students used multiple representational strategies, including hierarchical function calls and sequencing (order of execution). We identified issues in notation (including use of differing, incompatible notations within the same diagram), order of execution, abstraction and reuse, encapsulation, clarity, and problem-specific misunderstandings. Implications. These findings suggest that novice decomposition is shaped by multiple underlying models of program behavior, with tensions between structural and sequence-focused reasoning. We discuss implications for decomposition instruction and future work, including clarifying representational constraints and plan tracing as simulation.2026-05-14T19:27:02ZInternational Computing Education Conference (ICER)Annapurna VadapartyDevamardeep HayatpurAdalbert Gerald Soosai RajLeo PorterDaniel Zingarohttp://arxiv.org/abs/2605.15312v1Beyond Performance Disparities: A Three-Level Audit of Representational Harm in CelebA2026-05-14T18:25:17ZLarge-scale facial datasets like CelebA are widely used in computer vision, yet the cultural biases embedded in their labels remain underexplored. Fairness research has distinguished representational from allocational harms, but audits of computer vision datasets have mostly examined categorical labels, leaving open how such harms appear in learned features and model attention. This paper examines CelebA at three levels: dataset structure, learned feature weights, and spatial attention, focusing on how gendered double standards of ageing and beauty are encoded in the data and reproduced in model behaviour. First, hierarchical clustering of 202,599 images shows that the 39 attributes organise into latent trait bundles aligned with cultural archetypes: performative femininity (youth, makeup, adornment) and professional masculinity (ageing, facial hair, formal attire). Female faces, though more often rated attractive overall, incur steep penalties when assigned to ageing or masculine-coded clusters. Second, XGBoost with SHAP analysis reveal gender-specific effects, such as adiposity reducing attractiveness only for females. Third, Grad-CAM finds that predictions for female and younger male subgroups concentrate on mid-face cues, whereas predictions for older males drift toward peripheral cues such as hair and clothing. Older males attain the highest accuracy but the lowest average precision, indicating categorical exclusion of groups outside the dataset's evaluative templates. Cultural double standards thus pass from media representation into dataset labels, feature weights, and model attention, producing two representational harms: hyper-scrutiny of women under a narrow evaluative template, and exclusion of older men from the scheme entirely. Fairness metrics focused on performance disparities mask both, underscoring the need to address representational harm in fairness research.2026-05-14T18:25:17Z15 pages, 8 figuresSieun ParkYuanmo He