https://arxiv.org/api/SLJ0MB/RxTSBWghZ/79zATWQifg2026-06-14T03:13:34Z2888624015http://arxiv.org/abs/2601.12441v2The Dynamic and Endogenous Behavior of Re-Offense Risk: An Agent-Based Simulation Study of Treatment Allocation in Incarceration Diversion Programs2026-06-01T22:06:39ZIncarceration-diversion treatment programs aim to improve societal reintegration and reduce recidivism, but limited capacity forces policymakers to make prioritization decisions that often rely on risk assessment tools. While predictive, these tools typically treat risk as a static, individual attribute, which overlooks how risk evolves over time and how treatment decisions shape outcomes through social interactions. In this paper, we develop a new framework that models reoffending risk as a human-system interaction, linking individual behavior with system-level dynamics and endogenous community feedback. Using an agent-based simulation calibrated to U.S. probation data, we evaluate treatment allocation policies under different capacity constraints and incarceration settings. Our results show that no single prioritization policy dominates. Instead, policy effectiveness depends on temporal windows and system parameters: prioritizing low-risk individuals performs better when long-term trajectories matter, while prioritizing high-risk individuals becomes more effective in the short term or when incarceration leads to shorter monitoring periods. These findings highlight the need to evaluate risk-based decision systems as sociotechnical systems with long-term accountability, rather than as isolated predictive tools.2026-01-18T14:55:54ZUpon further review, we believe the manuscript requires substantial rethinking before its results can be presented in a fair and responsible manner in a sensitive field such as criminal justice. Given the potential implications of the work, we have decided that withdrawing the current version is the most appropriate course of actionChuwen ZhangPengyi ShiAmy Wardhttp://arxiv.org/abs/2606.02902v1Fairness Definitions and Metrics in Deep Reinforcement Learning for Drug Discovery in Healthcare: A Rapid Evidence Review2026-06-01T21:18:27ZDeep reinforcement learning (DRL) is increasingly applied to de novo molecular design, but choices in data, rewards, and evaluation can yield uneven performance across disease areas and chemotypes. Despite this, there is no concise synthesis of how fairness is defined, measured, and tested in DRL-based drug discovery. In this rapid evidence review, we synthesize fairness definitions and metrics for DRL-driven molecule generation in healthcare. We focus on three questions: (i) how dataset composition and split strategies, especially scaffold versus random splits, affect evaluation and distribution shift; (ii) how reward design (e.g., QED, docking, toxicity, synthetic accessibility) can create or mitigate bias, with emphasis on cancer targets; and (iii) which measurable metrics best capture fairness. This includes parity across cancer versus non-cancer indications and across cancer subtypes. It also includes distributional balance in key physicochemical descriptors, scaffold/chemotype diversity, groupwise validity, toxicity, and synthetic accessibility. From 2017 onward, we searched major biomedical, computer science, and engineering literature databases and used arXiv for horizon scanning. Records were screened using PRISMA-style procedures and analyzed via content coding to link reported parity outcomes to dataset and reward choices. Our review provides a concise set of fairness definitions and metrics for DRL molecule generation. It offers practical guidance for reporting distribution parity and outcome parity. It also summarizes how dataset and reward choices relate to observed parity effects and identifies open gaps relevant to trustworthy, cancer-relevant DRL generation.2026-06-01T21:18:27Z10 pages, 6 figures, 3 tables. Accepted as a full paper at a symposium of IEEE COMPSAC 2026Esmaeil ShakeriRonnie de Souza SantosBehrouz Farhttp://arxiv.org/abs/2606.02883v1LLM-Assisted Reranking to Operationalize Nuanced Objectives in Recommender Systems2026-06-01T20:54:14ZRecommender systems have grown from content-organization tools into sophisticated systems that shape daily behavior. By controlling what we see, they shape what we perceive, raising concerns about filter bubbles, radicalization, polarization, and social inequality. Large language models (LLMs) enable more powerful personalization, intensifying these dynamics. Yet most recommenders are tuned for engagement or limited accuracy metrics, with little attention to broader social implications, e.g. how personalization reshapes exposure in socially consequential domains. We investigate whether LLM-assisted reranking, while improving personalization, inadvertently amplifies exposure to ideologically extreme or conspiratorial political content, a risk theorized but not empirically characterized in news recommendation. Using real news-consumption histories, we rerank YouTube's sidebar candidates through zero-shot, instruction-based prompting. We compare a baseline prompt with a constrained variant that preserves topical relevance and broadens ideological exposure while reducing conspiratorial or extreme content. Without constraints, reranking strengthened personalization but increased exposure to conspiratorial and extremist material for users whose histories contained such content. Lightweight prompt-level regularization reduced promotion of extreme content and increased ideological diversity, with modest relevance loss. Synthetic experiments suggest that LLMs rerank via statistical regularities in language rather than semantic understanding of ideology, clarifying why naive prompts amplify these patterns and why regularization can reshape them. Together, our results highlight the power of LLMs to operationalize contextual nuance in high-stakes recommendation, and the need to evaluate LLM-assisted personalization beyond accuracy and treat prompt design as a value-laden rather than neutral default.2026-06-01T20:54:14Z30 pages total; 11 pages, 5 figures, 2 tables (main text); 19 pages, 11 figures, 9 tables (appendix)Amir GhasemianHoma HosseinmardiUpasana DuttaDuncan J. Wattshttp://arxiv.org/abs/2601.22424v2Toward Third-Party Assurance of AI Systems: Design Requirements, Prototype, and Early Testing2026-06-01T20:26:25ZAs Artificial Intelligence (AI) systems proliferate, the need for systematic, transparent, and actionable processes for evaluating them is growing. While many resources exist to support AI evaluation, they have several limitations. Few address both the process of designing, developing, and deploying an AI system and the outcomes it produces. Furthermore, few are end-to-end and operational, give actionable guidance, or present evidence of usability or effectiveness in practice. In this paper, we introduce a third-party AI assurance framework that addresses these gaps. We focus on third-party assurance to prevent conflict of interest and ensure credibility and accountability of the process. We begin by distinguishing assurance from audits in several key dimensions. Then, following design principles, we reflect on the shortcomings of existing resources to identify a set of design requirements for AI assurance. We then construct a prototype of an assurance process that consists of (1) a responsibility assignment matrix to determine the different levels of involvement each stakeholder has at each stage of the AI lifecycle, (2) an interview protocol for each stakeholder of an AI system, (3) a maturity matrix to assess AI systems' adherence to best practices, and (4) a template for an assurance report that draws from more mature assurance practices in business accounting. We conduct early validation of our AI assurance framework by applying the framework to two distinct AI use cases -- a business document tagging tool for downstream processing in a large private firm, and a housing resource allocation tool in a public agency -- and conducting six expert validation interviews. Our findings show early evidence that our AI assurance framework is sound and comprehensive, usable across different organizational contexts, and effective at identifying bespoke issues with AI systems.2026-01-30T00:37:12ZPublished at ACM FAccT 2026. 61 pages, 1 figureRachel M. KimBlaine KuehnertAlice LaiKenneth HolsteinHoda HeidariRayid Ghani10.1145/3805689.3806759http://arxiv.org/abs/2510.12837v4Semantic knowledge guides innovation and drives cultural evolution2026-06-01T20:24:30ZCultural evolution allows ideas and technologies to accumulate across generations, reaching their most complex and open-ended form in humans. While social learning enables the transmission of such innovations, the cognitive processes that generate them remain poorly understood. Classical theories typically treat innovation as random variation, a simplification insufficient for explaining the complexity of human cultural evolution. We propose that semantic knowledge-the associations linking concepts to their properties and functions-guides human innovation and drives cumulative culture. To test this, we combined an agent-based model, which examines how semantic knowledge shapes cultural evolutionary dynamics, with a large-scale behavioral experiment (N = 1,243) testing its role in human innovation. Across both approaches, we found that semantic knowledge directed exploration toward meaningful solutions, enhanced innovation success, and enabled generalization from prior discoveries. Moreover, semantic knowledge interacted synergistically with social learning to amplify innovation and accelerate cumulative cultural change. In contrast, experimental participants lacking access to semantic knowledge performed no better than chance, even when social learning was possible, and relied on shallow exploration strategies for innovation. Together, these findings suggest that semantic knowledge is a key cognitive process underpinning human cumulative culture.2025-10-13T16:03:51ZProceedings of the National Academy of Sciences, 123(22), e2530750123, 2026Anil YamanShen TianBjörn Lindström10.1073/pnas.2530750123http://arxiv.org/abs/2606.02839v1Human Factors in Cybersecurity in Icelandic Small and Medium-sized Enterprises2026-06-01T20:02:29ZCybersecurity threats are increasing in all aspects of society due to the integration of digital systems into modern-day life and a volatile geo-political landscape. Technical factors are an ongoing arms race; however, the threat surface from human and social factors is still present, often providing malicious actors the means to bypass complex technical security controls. Understanding human factors in light of technical evolution is essential to ensure security controls remain effective. This study presents the results of a survey on cybersecurity challenges within public and private sector organisations, including critical infrastructure providers, in Iceland (N = 130). From the management perspective, human factors were strongly noted as challenges and barriers to their organisations' security. These challenges include a lack of adequate training or awareness, hiring issues, poor cybersecurity culture, and time and/or financial resource constraints. Based on these findings, recommendations for mitigating threats from human factors are derived. These include: prioritising targeted over generic training to reduce employee fatigue, external government support for financially constrained organisations, and building a strong cybersecurity culture through constructive communication around shared responsibilities.2026-06-01T20:02:29ZTo be published in 17th EAI International Conference on Digital Forensics & Cyber Crime, 8 - 10 September 2026, Reykjavík, IcelandGoda CicėnaitėThomas WelshHelmut Neukirchenhttp://arxiv.org/abs/2508.05301v2A Conceptual Model and Methodology for Sustainability-aware, IoT-enhanced Business Processes2026-06-01T19:19:44ZThe real-time data collection and automation capabilities offered by the Internet of Things (IoT) are revolutionizing and transforming Business Processes (BPs) into IoT-enhanced BPs, showing high potential for improving sustainability. Although already studied in Business Process Management (BPM), sustainability research has primarily focused on environmental concerns. However, achieving a holistic and lasting impact requires a systematic approach to address sustainability beyond the environmental dimension. This work proposes a conceptual model and a structured methodology with the goal of analyzing the potential of IoT to measure and improve the sustainability of BPs. The conceptual model formally represents key sustainability concepts, linking BPM and IoT by highlighting how IoT devices support and contribute to sustainability. The methodology guides the systematic analysis of existing BPs, identifies opportunities, and implements sustainability-aware, IoT-enhanced BPs. The approach is illustrated through a running example from the tourism domain and a controlled case study in healthcare.2025-08-07T12:00:21ZAccepted for publication in Information Systems and e-Business Management (ISeB) journal (1617-9854)Victoria Torres BoschRonny SeigerManuela Albert AlbiolAntoni Mestre GasconPedro Jose Valderas Arandahttp://arxiv.org/abs/2606.07642v1Do VLMs See What Sensors Feel? A Scalable Expert-Guided Design for Wheelchair Accessibility Assessment from Street View2026-06-01T18:46:43ZAssessing built-environment interaction, such as wheelchair accessibility, is difficult because real-world mobility is shaped by distributed, context-dependent, and temporary barriers that are hard to capture at scale. To support scalable assessment, this paper examines whether vision-language models (VLMs) can identify accessibility barriers from Google Street View (GSV) imagery. We propose an expert-guided retrieval-augmented framework that combines GSV images, ADA-informed guidance, and expert-derived rubrics to evaluate accessibility dimensions. We collect a campus-scale dataset at the University of Florida, linking 407 unique GSV locations with GPS-derived wheelchair dwell behavior as a mobility-friction signal. Results show that VLM ratings are both negatively correlated and distributionally similar with dwell time, indicating partial but consistent alignment with a behavioral proxy for mobility friction. Visual cue analysis shows that certain environmental objects, such as curb ramps and crosswalks, are associated with higher VLM accessibility scores, while alignment remains limited for subtle surface conditions, transient obstructions, and viewpoint-dependent barriers. Overall, our findings show the potential of expert-guided VLMs for scalable accessibility assessment aligning with sensor-derived indicators of real-world wheelchair navigation.2026-06-01T18:46:43ZDongdong WangAlina HagenIsabelle GatmaitanHao ZhouYiwen DongShabboo ValipoorVivian W. H. WongLingyao Lihttp://arxiv.org/abs/2510.21011v3Generating the Modal Worker: A Cross-Model Audit of Race and Gender in LLM-Generated Personas Across 41 Occupations2026-06-01T18:35:39ZAs generative AI tools are increasingly used to portray people in professional roles, understanding their racial and gender representational biases is critical. We audit over 1.5 million occupational personas generated by four major large language models (GPT-4, Gemini 2.5, DeepSeek V3.1, and Mistral-medium) across 41 U.S. occupations. Comparing these personas against U.S. Bureau of Labor Statistics (BLS) data, we find that models generate demographics with less variation than real-world data, functionally compressing each occupation toward a dominant demographic profile rather than representing population-level variation. A shift/exaggeration decomposition reveals the structure of these distortions: White (-31 percentage points) and Black (-9 pp) workers are consistently underrepresented, while Hispanic (+17 pp) and Asian (+12 pp) workers are overrepresented, with stereotype exaggeration amplifying existing occupational segregation. These distortions are often extreme, including near-total portrayals of housekeepers as Hispanic and the near-erasure of Black workers from many occupations. Because these patterns recur across models with different institutional and cultural origins, they suggest shared structural sources of bias rather than model-specific artifacts. We argue that auditing generative AI requires evaluation frameworks that examine how synthetic populations systematically reshape demographic visibility across social roles.2025-10-23T21:43:08ZIlona van der LindenSahana KumarArnav DixitAadi SudanSmruthi DandaDavid C. AnastasiuKai Lukoffhttp://arxiv.org/abs/2606.02741v1Greener Than Humans? Environmental Attitudes in Large Language Models2026-06-01T18:05:53ZLarge language models (LLMs) are increasingly used in sustainability-related decision support, reporting, and public communication, yet little systematic evidence exists on the environmental attitudes embedded in their outputs. This paper develops a benchmark for evaluating environmental cognition, affect, and behavioural recommendations in LLMs and applies it to 31 widely used proprietary and open-weight models. Drawing on questions from established environmental awareness surveys and additional sustainability-related behavioural measures, we compare LLM responses 1) among models and 2) between models and human survey benchmarks from Germany. We assess their robustness across prompting conditions. We find that many LLMs align more closely with environmentally progressive attitudes than the average survey respondent, exhibiting higher levels of environmental affect and cognition and recommending behaviours associated with substantial potential CO2 reductions. At the same time, we observe no systematic relationship between sustainability-oriented responses and model origin, size, or release context. However, models exhibit contextual sensitivity, controlled by persona-based prompting and show sycophantic shifts mirroring user-specified ideological positions, which raises concerns about steerability and normative reliability in real-world deployments. Our findings provide a reusable evaluation framework for assessing sustainability-related value alignment in LLMs and highlight the importance of governance, transparency, and critical oversight as AI systems become increasingly embedded in sustainability transformations and public decision-making.2026-06-01T18:05:53ZCode can be found at https://gitlab.opencode.de/uba-ki-lab/llm-questionnaire-benchmarking-framework Benchmark data and results can be found at https://zenodo.org/records/20445903Stefanie KunkelTilman HartwigMarcus VossEmma K. SchüttAngelika Gellrichhttp://arxiv.org/abs/2601.04175v2Legal Alignment for Safe and Ethical AI2026-06-01T17:58:15ZAlignment of artificial intelligence (AI) encompasses the normative problem of specifying how AI systems should act and the technical problem of ensuring AI systems comply with those specifications. To date, AI alignment has generally overlooked an important source of knowledge and practice for grappling with these problems: law. In this paper, we survey the emerging field of legal alignment that aims to fill this gap and systematize research that studies how legal rules, principles, and methods can be leveraged to address problems of alignment and inform the design of AI systems that operate safely and ethically. Our survey provides a taxonomy of the three core research pathways of legal alignment and explores how each can be operationalized in practice: (1) designing AI systems to comply with the content of legal rules developed through legitimate institutions and processes, (2) adapting methods from legal interpretation to guide how AI systems reason and make decisions, and (3) harnessing legal concepts as a structural blueprint for confronting challenges of reliability, trust, and cooperation in AI systems. These research pathways present new conceptual, empirical, and institutional questions, which include examining the specific set of laws that particular AI systems should follow, creating evaluations to assess their legal compliance in real-world settings, and developing governance frameworks to support the implementation of legal alignment in practice. Tackling these questions requires expertise across law, computer science, and other disciplines, offering these communities the opportunity to collaborate in designing AI for the better.2026-01-07T18:42:04ZPublished in TMLRNoam KoltNicholas CaputoJack BoeglinCullen O'KeefeRishi BommasaniStephen CasperMariano-Florentino CuéllarNoah FeldmanIason GabrielGillian K. HadfieldLewis HammondPeter HendersonAtoosa KasirzadehSeth LazarAnka ReuelKevin L. WeiJonathan Zittrainhttp://arxiv.org/abs/2606.02528v1Auditing Asset-Specific Preferences in Financial Large Language Models: Evidence from Bitcoin Representations and Portfolio Allocation2026-06-01T17:36:06ZLarge language models now power robo-advisors and trading agents, yet whether they carry built-in biases toward specific assets is largely untested. We ask three questions: do LLMs systematically prefer certain financial instruments; can an internal representation with causal leverage over those preferences be identified; and does that representation affect downstream financial decisions?
We develop a three-level audit protocol and apply it to Bitcoin. First, a behavioral audit of eight frontier LLMs shows that Bitcoin's ranking among money-like instruments is frame-dependent: models place it around rank 5 of 8 as "reliable money" but near the top under crisis and autonomous-agent frames, and an attribute-swap experiment confirms rankings track functional properties, not names. Second, we open a model's internals: a search across thousands of sparse-autoencoder features in Gemma 3 identifies a dominant Bitcoin-selective feature. Amplifying it shifts the model toward the asset and suppressing it shifts the model away, even when "Bitcoin" never appears in the prompt. Third, we test financial consequences: amplification raises Bitcoin's portfolio share by 5.2 percentage points while suppression lowers it by 4.6 pp, with amplification reallocating within crypto and suppression cutting total crypto exposure.
We characterize this as bounded behavioral leverage (leverage meaning causal influence over outputs, not financial leverage): an identifiable internal feature can be perturbed to move financial choices, but only within measurable limits. The framework links internal representations to external recommendations, validated with random controls and mechanism boundaries. As LLMs become autonomous financial agents, this is a first step toward a behavioral layer for emerging know-your-agent (KYA) standards: knowing what an agent prefers, and how far that preference can be moved.2026-06-01T17:36:06Z28 pages, 5 figures, 18 tablesWenbin Wuhttp://arxiv.org/abs/2606.02523v1FigSIM: A Dataset for Fine-grained Suicide Severity and Figurative Language in Suicide Memes2026-06-01T17:32:29ZSuicide memes are memes used to express suicide-related thoughts or comment on suicide-related issues. Suicide memes are increasingly common on social media, yet remain poorly understood and potentially harmful. There is an urgent need to better understand their characteristics and to develop appropriate content moderation strategies that limits users' exposure to potentially harmful content. Currently, the absence of annotated datasets of suicide memes remains a key barrier to developing and evaluating automated moderation approaches. In this paper, we introduce FigSIM, the first dataset designed for fine-grained analysis of suicide memes. The dataset consists of 1049 memes, each annotated for (1) fine-grained suicide severity levels, (2) figurative phenomena (e.g., metaphors), and (3) suicide-related content (e.g., suicide method depiction). We benchmark 16 unimodal and multimodal models across three tasks: figurative language, suicide severity, and suicide-related content detection. Overall, FigSIM demonstrates that suicide memes pose unique challenges for both modeling and content moderation. Analysis revealed biases, such as underprediction of higher suicide severity levels, especially for figurative memes. The dataset (including splits used for analyses) is publicly available. Content Warning: This paper contains suicide-related content that may be triggering.2026-06-01T17:32:29ZContent warning: contains suicide-related content. Accepted to Findings of the Association for Computational Linguistics: ACL 2026Liuliu ChenElise R. CarrotteBrian E. ChapmanJo RobinsonMike Conwayhttp://arxiv.org/abs/2606.02375v1WAXAL-NET: Finetuned Edge ASR Across 19 African Languages2026-06-01T15:22:35ZWe evaluate whether compact domain-specialized ASR models can outperform massively multilingual foundation models for conversational African speech across 19 languages in the WAXAL corpus. Fine-tuned edge models achieve a macro-averaged WER of $38.0\%$ compared to $64.9\%$ for the best zero-shot baseline, a $26.9$ percentage-point reduction using models $3-40\times$ smaller. Results confirm that domain specialization dominates scale for spontaneous African speech. Cross-domain evaluation shows that fine-tuned models recover usable performance on out-of-distribution (OOD) speech, while zero-shot models regain an advantage when the test domain matches their pretraining distribution. A distributed native-speaker audit across all surveyed languages produces a linguistically-grounded error taxonomy, showing that CTC and autoregressive architectures behave differently across language families. We further show that WER alone misrepresents performance for syllabary-script languages where CER/WER ratios reveal substantially higher character-level accuracy than headline WER suggests. Finally, to contribute to future African ASR research, we release all model weights, fine-tuning and evaluation scripts, and a cleaned WAXAL subset covering all $19$ languages.2026-06-01T15:22:35ZVictor Tolulope OlufemiOreoluwa BabatundeRamsey NjemaBolarinwa GbotemiWanchi Lucia YenJohn UzodinmaSunday AjayiOluwademilade WilliamsKausar MoshoodInnocent Elendu AnyaeleAkebert ArefaineCandace HunzwiWongel Dawit DanielEmmilly NamugangaCleophas KadimaAthanase BahizireOnitsiky RanaivosonEmmanuel AaronNicholaus LadislausIdris MuhammedJonathan Enoch SimenyaMartin KoomeMatewos Tegete EndaylaluPeter Ifeoluwa AdeyemoHondi Prisca BirindwaUkachi Agnes Eze-MbeyYacoba Oduro-YeboahPericles AdjoviMikel K. NgueajioToluwani AremuPrasenjit Mitrahttp://arxiv.org/abs/2606.02348v1Privacy-preserving Information Sharing in Oligopoly Competitions2026-06-01T14:58:38ZInformation sharing among competing suppliers can improve decision-making under uncertainty, yet strategic concerns regarding rival exploitation often deter voluntary disclosure. We study information-sharing mechanisms in a Cournot oligopoly with uncertain demand, where a platform aggregates suppliers' signals through privacy-preserving channels and may also possess an exogenous external signal. The central challenge is to balance strategic safety with informational utility: privacy noise reduces the exposure of individual signals, but also lowers the value of the shared information pool. We first characterize a baseline setting in which access to aggregated information is contingent on participation. In a two-firm market without an external signal, firms refuse to share regardless of the privacy level. In an \(n\)-firm market, sharing may arise even without privacy safeguards because non-participating firms lose access to the aggregated signal. Building on this baseline, we show that privacy protection alone is insufficient to incentivize disclosure; it must be combined with a sufficiently informative external signal. We further show that firms with more accurate private signals require stronger privacy protection. Overall, our results characterize the sharing-feasible region and highlight the complementarity between privacy design and the external information environment.2026-06-01T14:58:38ZYuxin LiuM. Amin Rahimian