https://arxiv.org/api/H++uZHmPnfbPZxnqCm0X6WFhU9k 2026-06-19T02:35:33Z 28997 510 15 http://arxiv.org/abs/2507.16679v4 PICACO: Pluralistic In-Context Value Alignment of LLMs via Total Correlation Optimization 2026-05-27T11:22:11Z

In-Context Learning has shown great potential for aligning Large Language Models (LLMs) with human values, helping reduce harmful outputs and accommodate diverse preferences without costly post-training, known as In-Context Alignment (ICA). However, LLMs' comprehension of input prompts remains agnostic, limiting ICA's ability to address value tensions--human values are inherently pluralistic, often imposing conflicting demands, e.g., stimulation vs. tradition. Current ICA methods therefore face the Instruction Bottleneck challenge, where LLMs struggle to reconcile multiple intended values within a single prompt, leading to incomplete or biased alignment. To address this, we propose PICACO, a novel pluralistic ICA method. Without fine-tuning, PICACO optimizes a meta-instruction that incorporates multiple values to better elicit LLMs' understanding of them and improve alignment. This is achieved by maximizing the total correlation between specified values and LLM responses, which theoretically reinforces value conformity and reduces distractive noise, resulting in more effective instructions. Extensive experiments on five value sets show that PICACO works well with both black-box and open-source LLMs, outperforms several recent strong baselines, and achieves a better balance across up to 8 distinct values.

2025-07-22T15:14:56Z ICML 2026 Han Jiang Dongyao Zhu Xiaoyuan Yi Ziang Xiao Zhihua Wei Xing Xie http://arxiv.org/abs/2605.24413v3 Habermolt: Delegating Deliberation to AI Representatives 2026-05-27T10:45:17Z

Deliberative democracy arguably leads to better collective decisions, but is fundamentally constrained by human attention and bandwidth. While recent AI-mediated deliberations scale participation by synthesizing inputs from many humans, they remain time-intensive for individual users. As AI models become increasingly capable, AI systems are being deployed not only to mediate deliberation between humans, but to represent humans in it: where AI agents deliberate on behalf of human users. We call this paradigm AI-delegated deliberation. While it promises unprecedented scale for democratic participation, it introduces qualitatively new design and alignment challenges that are poorly understood and under-theorized. To study these dynamics empirically, we deploy Habermolt, a public platform for AI-delegated deliberation. We evaluate its effectiveness along three dimensions that we use to organize any deliberative system: representation, aggregation, and revision. We use these observations to illuminate the design decisions future AI-delegated deliberation platforms must confront, contributing to the broader research agenda for scalable yet trustworthy AI representatives.

2026-05-23T05:50:50Z Joseph Low Oscar Duys Claude Formanek Michiel Bakker Lewis Hammond http://arxiv.org/abs/2605.28251v1 Counterfactually Fair Regression via Optimal Transport 2026-05-27T10:00:54Z

We consider the problem of learning a counterfactually fair regressor. We adopt a causal uncertainty view in which counterfactual fairness is defined with resampled noise. We focus on obtaining theoretical fairness guarantees for a new post-processing estimator. We begin by showing that counterfactual fairness is equivalent to satisfying demographic parity conditional on the latent variable. This allows us to provide a closed-form expression of the optimal fair regressor via a barycentric quantile map. In order to handle continuous latent variables, we propose a discretized post-processing method. Then, under mild regularity assumptions, we prove high-probability finite-sample fairness guarantees for our estimator, providing an unfairness decay at rate $\tilde O(n^{-1/3})$, and establishing a matching risk bound of order $\tilde O(n^{-1/3})$. We provide a matching lower bound on the excess risk of almost fair predictions. Finally, we extend our results to the setting of relaxed counterfactual fairness. We validate our approach on real-world and synthetic data.

2026-05-27T10:00:54Z M. Generali Lince S. Gaucher J-J. Vie P. Loiseau http://arxiv.org/abs/2501.03097v3 Self-directed online information search can affect policy support: a randomized encouragement design with digital behavioral data 2026-05-27T09:59:44Z

As citizens increasingly encounter political information in digital environments, understanding whether this engagement shapes their policy views has become a central concern. Drawing on dual-process theories of persuasion, we argue that motivational activation is an enabling condition for policy support change in high-choice online environments. We test this in a three-wave field experiment with German participants (n = 791) across three policy topics (basic child support, renewable energy transition, cannabis legalization), in which participants were randomly assigned to a control group, and two encouragement conditions: a verbal encouragement, or a monetary incentive tied to a knowledge test. Browsing behavior was passively tracked via digital trace data over a 20-hour window. We find that self-directed online information search produced changes in policy support for child support and cannabis legalization but not for the energy transition, with monetary incentives producing significant effects rather than verbal prompts. We discuss motivational salience, issue malleability, and search-environment quality as joint conditions under which political information engagement can produce detectable changes in policy support.

2025-01-06T15:55:31Z Celina Kacperski Roberto Ulloa Peter Selb Andreas Spitz Denis Bonnay Juhi Kulshrestha http://arxiv.org/abs/2605.28233v1 Geometry of Relaxed Fair Regression: A Unified Framework for Aware and Unaware Settings 2026-05-27T09:45:16Z

Fairness-accuracy trade-offs are a central concern in the deployment of fairness-aware machine learning methods. When sensitive attributes are unavailable at inference time-the so called unawareness setting, principled methods for obtaining accurate predictions under relaxed fairness constraints are largely missing. In this work, we address this gap by formulating regression under a demographic parity penalty as an optimal transport problem. Our framework unifies both the \emph{aware} and \emph{unaware} settings and characterizes optimal prediction functions via optimal transport maps, under both squared Wasserstein-2 and Total Variation penalties. These results reveal that the choice of penalty reflects fundamentally different fairness philosophies: the Wasserstein penalty induces a smooth, population-wide compromise, while Total Variation enforces exact parity for a subset of individuals. Building on these theoretical characterizations, we propose an algorithm that is simple to implement, computationally efficient, and consistently matches or outperforms state-of-the-art baselines on real-world benchmarks.

2026-05-27T09:45:16Z M. Generali Lince V. Divol R. Flamary S. Gaucher P. Loiseau http://arxiv.org/abs/2605.28223v1 Why Meditation Wearables Fail: Reward Misspecification in Closed-Loop EEG and Biofeedback Systems 2026-05-27T09:38:47Z

Consumer EEG headbands, HRV biofeedback devices, and closed-loop neurostimulation systems share a fundamental design flaw: they reward measurable proxy signals rather than the outcomes they claim to produce. When a user optimises for calm EEG, HRV coherence, or breathing resonance, their brain learns to produce those signals through whatever strategy is most efficient, including strategies unrelated to the intended benefit. We formalise this as reward misspecification: the policy maximising proxy reward R_proxy is not the policy maximising true intended outcome V_target. This produces three failure modes: proxy mismatch, strategy shortcutting, and transfer failure. We review how existing devices including Muse, HeartMath, Unyte IOM2, and clinical neurofeedback systems instantiate these failures. We introduce a four-tier measurability taxonomy distinguishing reliably measurable wearable targets (Tier 1) from targets that are currently or possibly structurally unmeasurable (Tiers 3 and 4), and show that most devices make implicit Tier 3 and 4 claims. We propose a design framework that avoids all three failure modes: single Tier-1 target (mind-wandering onset via EEG), negative-only cueing, temporal separation of fast EEG and slow somatic feature streams, and transfer to unassisted practice as the only success criterion. No current product meets all four criteria. The framework has direct implications for the design, evaluation, and regulation of cognitive and contemplative wearables.

2026-05-27T09:38:47Z 16 pages, 1 figure Joy Bose http://arxiv.org/abs/2605.28187v1 Whose Name Comes Up? III: Persona Prompting Effects in LLM-Based Scholar Recommendation 2026-05-27T09:09:30Z

Large language models (LLMs) are increasingly used as scholar recommenders, shaping who is seen as an expert in academia. Existing audits remain English-centric, single discipline, and persona-agnostic, leaving the source of output variability poorly understood. To this end, we propose a benchmark that disentangles the effects of model choice and prompt design on recommendations. We audit 43 LLMs by varying persona prompts (language, location, role-and-task) and context (field, seniority, k). Recommended scholars are compared against Semantic Scholar over six scientific disciplines to measure technical quality (factuality, coverage) and social representativeness (diversity, parity). Basic technical quality is driven by model choice, factuality and parity by context, and diversity by location. South Africa prompts yield less factual lists, while Japan prompts yield highly factual but homogeneous lists skewed toward highly productive scholars. Prompt design is thus a non-trivial axis of LLM-based scholar discovery and should be systematically audited alongside model choice.

2026-05-27T09:09:30Z 25 pages (10 main, 2 references, 13 appendix), 6 figures in main, 13 figures in appendix (under-review) Annabella Sánchez-Guzmán Lukas Eberhard Denis Helic Lisette Espín-Noboa http://arxiv.org/abs/2204.04318v2 Towards Understanding Barriers and Mitigation Strategies of Software Engineers with Non-traditional Educational and Occupational Backgrounds 2026-05-27T08:14:49Z

The traditional path to a software engineering career usually involves a post-secondary diploma in Software Engineering, Computer Science, or a related field. However, many individuals working as software engineers take a non-traditional path to their careers, starting from other industries or fields of study. This paper explores the barriers that individuals with non-traditional educational and occupational backgrounds face when pursuing a software engineering career and proposes potential strategies to overcome those barriers. A two-stage methodology was used, consisting of an exploratory study followed by a follow-up survey. The exploratory study consisted of a grounded-theory-based qualitative analysis of relevant Reddit data to yield a framework around the barriers and possible mitigation strategies. These findings were then supplemented through a follow-up survey. Understanding these barriers and what strategies could be effective is an important step towards making software engineering more accessible to individuals with non-traditional backgrounds. In addition to fostering functional diversity, this might also serve to tackle labor shortages within the software engineering industry.

2022-04-08T22:51:19Z 27 pages, 8 figures Empirical Software Engineering Volume 29, article number 82, (2024) Tavian Barnes Ken Jen Lee Cristina Tavares Gema Rodríguez-Pérez Meiyappan Nagappan 10.1007/s10664-024-10493-1 http://arxiv.org/abs/2605.28025v1 MIRA: A Bilingual Benchmark for Medical Information Response Audit 2026-05-27T06:28:03Z

Large language models (LLMs) are increasingly used to provide public-facing health information, yet existing safety evaluations overlook whether responses preserve comparable medical information across different user phrasings of the same question. To address this, we introduce the Medical Information Response Audit (MIRA), a bilingual, controlled benchmark that assesses whether LLMs provide comparable medical information across user-side language, register, and health literacy signals. MIRA contains 4,320 prompts built from 60 medically reviewed, low-risk health questions. Across five mainstream LLMs, models answered all medical questions, but responses to low health-literacy signals consistently omitted more key information, provided fewer concrete next steps, and offered less support for independent judgment. We term this pattern Differential Information Dilution (DID). Language effects are model-specific rather than uniformly worse for non-English prompts. A comparison with 300 real-world health queries provides preliminary evidence of rank-order validity. A knowledge-guided mitigation prompt reduces information dilution for most models, with the largest reductions in underinformative simplification observed for Claude (~8%) and Qwen (~6%).

2026-05-27T06:28:03Z Mengyu Xu Qiaoxin Yang Qianqian Wang Xiwei Dai Weiyi Wu Chongyang Gao http://arxiv.org/abs/2605.27988v1 Auditing Stance Asymmetry in Generative Explanations 2026-05-27T05:22:17Z

Bias evaluation for language models has made substantial progress on bounded comparisons, such as overt derogation, stereotype association, or label-sensitive differences under controlled substitutions. Open-ended explanations raise a different problem: they guide interpretation by assigning responsibility, legitimacy, context, and grievance. A model can avoid hostile language while making one side structurally understandable and another personally at fault, overreacting, or less worth taking seriously. We call this stance-bearing asymmetry in generative explanations. We propose Symmetry Decomposition Evaluation (SDE), which tests paired situations with concrete group labels, structural-role rewrites, and explicit support or counter-evidence. In a controlled 32-family prototype suite, this decomposition shows that surface differences are not all alike: some weaken under structural or evidence control, while others remain as stable differences in how the model assigns blame, context, or legitimacy. Targeted case review and judge comparison suggest a broader difficulty for evaluating open-ended framing asymmetries: judge readings shift across operationalizations, and scalar scores can flatten distinctions that readers use to interpret explanatory stance. SDE therefore reframes generative bias evaluation as an audit of explanatory stance -- what stance each side receives, how it changes under decomposition, and where automatic scoring becomes unstable.

2026-05-27T05:22:17Z Jiarui Han http://arxiv.org/abs/2605.16237v2 Inside Baseball: The Automated Ball-Strike System as an Object Lesson in Technological Rule Enforcement 2026-05-27T04:47:58Z

Clearly-defined rules are often assumed to be straightforward to automate and evaluate. We challenge this assumption through an in-depth study of Major League Baseball's (MLB) seven-year experimentation with the Automated Ball-Strike System (ABS). ABS is envisioned to call balls and strikes accurately: a seemingly straightforward use of technology to objectively determine the distance between a pitch and the strike zone. Although the strike zone is an area clearly defined in the rulebook, it took MLB seven years to figure out how to automate calling balls and strikes with ABS, showing how even seemingly straightforward rules require a complex translation process to operationalize via technological systems. In this paper, we trace the design decisions that led to the current implementation of ABS. Our case study reveals that "distance" exists even between a clear rule and its technological implementation. Using analytic frameworks from Science and Technology Studies (STS), we show that such distance exists because (1) historically, the "ground truth" of the strike zone is contested: the rule in practice has always reflected a hybrid between the rulebook definition and umpires' enforcement decisions; and (2) the use of ABS is embedded in an existing eco-system, where the implementation of a technological enforcement system needs to balance multiple stakeholder values. This perspective challenges conventional evaluation paradigms that center on the distance between a formalized rule and its technological implementation, and instead calls for evaluating how such systems are experienced in practice. Addressing this question requires in-depth social science approaches, contributing to ongoing conversations in FAccT about the implementation and evaluation of sociotechnical systems.

2026-05-15T17:45:04Z Andrea Wen-Yi Wang Waki Kamino David Mimno Karen Levy Malte F. Jung 10.1145/3805689.3812385 http://arxiv.org/abs/2605.27921v1 Show, Don't TELL: Explainable AI-Generated Text Detection 2026-05-27T03:47:25Z

Research on AI-generated text detection has presented a number of approaches to discern human from AI prose, some of which achieving high in-distribution performance. However, real-world applicability has stalled because their outputs are misaligned with the needs of users, such as professors, who are presented with a numeric score that has no attached explanation. We tackle this issue with a novel architecture, TELL, that bakes explainability from the ground-up. While our system still offers a numerical score like other detectors for comparability, TELL takes a fundamentally different approach where we aim to show the user the "tells" by which the model believes a text is AI or human-written, to empower the user to decide who wrote a text using their own judgment and understanding of the context of the writing and its alleged author. We train TELL on a custom SFT dataset of domain-specific authorship annotations, and further refine the system using GRPO with curriculum learning to improve performance. We achieve competitive performance with state-of-the-art detectors (AUROC 0.927) while natively providing annotations that explain the basis for the detector's decision. We further evaluate the quality of our explanations using a dataset of human annotations and report a high (mean 72.3%) win-rate on annotation concreteness, falsifiability, coherence, plausibility and grounding, allowing users to critically think and decide for themselves. Our work thus reframes the problem of AI-generated text detection in a human-centric perspective and paves the way for a new family of detectors that focus on native explainability.

2026-05-27T03:47:25Z Aldan Creo Suraj Ranganath http://arxiv.org/abs/2512.15791v2 Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study 2026-05-27T02:11:59Z

In Artificial Intelligence (AI), language models have gained significant importance due to the widespread adoption of systems capable of simulating realistic conversations with humans through text generation. Because of their impact on society, developing and deploying these language models must be done responsibly, with attention to their negative impacts and possible harms. In this scenario, the number of AI Ethics Tools (AIETs) publications has recently increased. These AIETs are designed to help developers, companies, governments, and other stakeholders establish trust, transparency, and responsibility with their technologies by bringing accepted values to guide AI's design, development, and use stages. However, many AIETs lack good documentation, examples of use, and proof of their effectiveness in practice. This paper presents a methodology for evaluating AIETs in language models. Our approach involved an extensive literature survey on 213 AIETs, and after applying inclusion and exclusion criteria, we selected four AIETs: Model Cards, ALTAI, FactSheets, and Harms Modeling. For evaluation, we applied AIETs to language models developed for the Portuguese language, conducting 35 hours of interviews with their developers. The evaluation considered the developers' perspective on the AIETs' use and quality in helping to identify ethical considerations about their model. The results suggest that the applied AIETs serve as a guide for formulating general ethical considerations about language models. However, we note that they do not address unique aspects of these models, such as idiomatic expressions. Additionally, these AIETs did not help to identify potential negative impacts of models for the Portuguese language.

2025-12-16T02:43:37Z 7 figures, 11 tables. Accepted for publication in AI and Ethics Jhessica Silva Diego A. B. Moreira Gabriel O. dos Santos Alef Ferreira Helena Maia Sandra Avila Helio Pedrini 10.1007/s43681-025-00914-2 http://arxiv.org/abs/2605.27827v1 Operational AI Deployment Assurance: Governance-State Orchestration Under Threshold-Sensitive Deployment Conditions -- A Governance Framework for High-Stakes AI Systems 2026-05-27T01:33:40Z

AI governance frameworks increasingly emphasize fairness, transparency, accountability, and lifecycle risk management in high-stakes domains. However, many current approaches remain observational, relying on static metric reporting, post-hoc auditing, and monitoring dashboards without directly governing deployment readiness, remediation progression, escalation states, or assurance-driven deployment control. This paper introduces Operational AI Deployment Assurance (OADA), a governance framework for translating fairness disagreement, subgroup instability, threshold sensitivity, remediation outcomes, and operational uncertainty into deployment-oriented assurance decisions. Building on prior work on the Fairness Disagreement Index (FDI) and FairRisk-FDI, OADA reframes governance uncertainty as an operational concern within AI deployment pipelines rather than a byproduct of metric disagreement. The framework introduces Deployment Assurance Scores, Deployment Readiness Classifications, Threshold Stability Zones, Governance Escalation States, and remediation-aware assurance progression. These constructs support lifecycle-oriented governance decisions across high-stakes settings by connecting evaluation outputs to deployment-state interpretation, reassessment, escalation, and operational control. Through deployment-oriented evaluation across facial recognition systems, with discussion extended to healthcare AI as a representative high-stakes domain, the paper demonstrates how systems may appear acceptable under isolated fairness or performance metrics while still exhibiting instability that affects deployment readiness. The proposed framework positions operational deployment assurance as a governance layer between evaluation and real-world AI deployment.

2026-05-27T01:33:40Z 13 pages, 3 figures, governance-oriented framework for operational AI deployment assurance in high-stakes systems Khalid Adnan Alsayed http://arxiv.org/abs/2605.27801v1 Local Privacy Laws in a Globalized World 2026-05-27T00:41:31Z

Personal data has emerged as a highly valuable yet sensitive asset that drives business decisions, enables targeted advertising, and generates substantial revenue for companies, while simultaneously facilitating invasive monitoring of users. In recent years, research on digital privacy violations, including undue access, collection, and sharing of user data, has grown significantly. Much of this research adopts the European General Data Protection Regulation (GDPR) as the primary reference framework. This is reasonable, as GDPR was a pioneering legislation, and many of its stipulations are clear and unambiguous. However, we argue that focusing solely on GDPR (and a small set of other Western regulatory frameworks) ignores privacy-related concerns, attitudes, and problems faced by users from other locales, creating a significant research blind spot. This work systematically normalizes the heterogeneous legal requirements of multiple data protection laws into a unified abstraction aligned with the data lifecycle, which forms the foundation for the implementation of such regulations. We further investigate the implications of these laws on different stakeholders, including users, organizations, and governments. Overall, this work aims to broaden the digital privacy research community's perspective and to serve as a set of guiding principles for developing technological privacy solutions spanning multiple countries.

2026-05-27T00:41:31Z Accepted in ACM Conference on Data and Application Security and Privacy (CODASPY) 2026 Shantanu Sharma Ethan Myers Lorenzo De Carli Ritwik Banerjee Indrakshi Ray