https://arxiv.org/api/zggIyQBkYshDYPROit/Qp0soqzU2026-06-21T22:09:24Z2899776515http://arxiv.org/abs/2605.16566v1Characterizing AI Fact-Checkers and Their Contributions on Community Notes2026-05-15T19:13:28ZRecent advances in artificial intelligence (AI) have made timely, scalable, and effective fact-checking increasingly feasible. One such deployment is X's Community Notes, which provides the AI Note Writer API to enable end-to-end automated generation of contextual information. We present the first empirical analysis of AI fact-checkers and their contributions on Community Notes, examining four key dimensions: volume, velocity, variety, and veracity. We find that, between September 2, 2025 and May 9, 2026, 20 AI writers account for 14.2% of all submitted notes, with their daily share rising rapidly to 44.8% lately. AI writers are highly responsive, typically submitting notes within minutes of posts becoming available via the API. They also expand coverage, contributing notes to 16.8% of fact-checked posts, of which 74.4% are not checked by humans. Over time, AI writers become more prolific and responsive, with increasing coverage and discovery rates. Despite these advantages, their veracity remains mixed. Collectively, AI writers contribute a higher share of helpful notes while receiving a smaller share of human ratings, relative to their share of submitted notes. Controlling for the fact-checked post and note submission order, both AI and human writers exhibit a first-mover advantage, with earlier notes attracting more ratings. More importantly, AI-generated notes are less likely to be classified as helpful than those written by human experts, though they outperform those written by laypeople. Our findings provide new insights into the practical capabilities and limitations of AI-driven fact-checking, with implications for the design and governance of human--AI collaborative crowdsourced context systems.2026-05-15T19:13:28Z14 pages, 10 figuresYilin GongSiqi Wuhttp://arxiv.org/abs/2605.16528v1Inventorship in AI-Assisted Inventions: Designing an Experiment to Shape Case Law2026-05-15T18:25:22ZThe latest improvements in artificial intelligence (AI) raise new challenges for intellectual property laws, particularly concerning the inventorship issue in AI-assisted inventions - that is, those in which AI is used in the inventive process. While most jurisdictions allow only a natural person to be considered the inventor, the question of how to deal with AI-assisted inventions remains relevant. Namely, what is the nature and contribution of AI tools in an AI-assisted invention that would prevent a human from being recognized as its inventor? The main challenge in addressing this question is the lack of case law on the issue. It is reasonable to assume that with the development of AI and the growing interest in its use in the inventive process, new cases will naturally arise, which in turn will harmonize and address the inventorship issue in AI-assisted inventions to some extent. However, this process will take significant time and may not keep pace with the rapid development of AI, nor fully address the new problems that arise alongside AI advancements. This research proposes the conditions of an experiment to create relevant case law. This experiment could be initiated by society, involving stakeholders specializing in AI. The article also proposes a methodology for conducting the experiment and selecting cases that best reflect the current state of AI use in the inventive process. Conducting such an approach will help identify the most effective methods for measuring human contribution to AI-assisted inventions when determining inventorship.2026-05-15T18:25:22ZYevhenii ShchetyninDuygu UstaBryan Khanhttp://arxiv.org/abs/2605.16516v1Alignment Drift in Long-Term Human-LLM Interaction: A Mechanism-Oriented Framework2026-05-15T18:11:49ZLong-term interaction with LLM-based systems may produce alignment drift: a gradual process in which system outputs become less constrained by the user's current message and more shaped by prior interaction history, while still appearing helpful, coherent, and responsive. This process is difficult to detect because the user's subjective experience may improve as the system becomes more familiar, useful, and attuned. Existing research on human-LLM interaction has largely focused on short-term task performance, isolated outputs, or single-instance alignment problems, leaving slow and cumulative interaction-level dynamics undercharacterized. This paper proposes a mechanism-oriented framework for describing alignment drift. The framework defines the distinction between signal A and signal B, explains how drift develops through feedback loops and sub-pattern selection, divides the process into three interactional regimes, and identifies boundary conditions for controlling drift. By framing alignment drift as a recursive interactional process rather than an isolated model-side failure, the paper provides a conceptual basis for studying long-term human-system interaction.2026-05-15T18:11:49Z16 pages, 1 appendixXintong Yao10.5281/zenodo.20113611http://arxiv.org/abs/2605.16245v1AI-Mediated Communication Can Steer Collective Opinion2026-05-15T17:49:24ZGenerative artificial intelligence (AI) is increasingly integrated into the online platforms where humans exchange opinions; large language models (LLMs) now polish users' posts on LinkedIn and provide context for content shared on X. While prior work has shown that AI can express biased opinions and shape individuals' opinions during human-AI interactions, less attention has been paid to its influence on collective opinion formation when mediating human-to-human communication. We address this gap via a combination of empirical and theoretical analyses. We show empirically that LLMs from multiple popular families introduce directional biases when instructed to edit human-written texts on contested topics, for example, nudging texts in favor of gun control and against atheism. Building on this observation, we introduce a mathematical model of opinion dynamics in which an AI system sits between users on a social network, transforming the opinions they express and perceive. By analytically characterizing the equilibrium of this model and performing simulations on real social network data, we show that biases introduced by AI in human-to-human communication can be amplified through the network and shift collective opinion in their direction. In light of these findings, we investigate whether such biases are controllable by online platforms. We audit the "Explain this post" feature on X and find evidence of pro-life bias in Grok's outputs on abortion-related content, which we trace back to specific design choices. We conclude with a discussion of the broader implications of our findings in relation to ongoing legislative efforts in the European Union.2026-05-15T17:49:24ZStratis TsirtsisKai RawalChris RussellBrent MittelstadtSandra Wachterhttp://arxiv.org/abs/2606.12433v1Marginal Alignment Does Not Guarantee Joint-Distribution Fidelity: An Official-Reference Audit of Nemotron-Personas-Korea with Cross-Locale Replication2026-05-15T17:42:39ZSynthetic persona datasets cite alignment with official demographics as a basis for trust, yet downstream users consume them as joint structures across age, sex, region, occupation, education, name, and institutional status. Marginal alignment does not imply that these joints are preserved. We propose the Independence-Assumption Footprint (IAF), an audit primitive that operates on the attribute combinations a dataset card itself documents as treated independently. For each such combination, IAF compares the synthetic joint against an external official or institutional reference, using direct joint tables where available and rule-implied checks otherwise. Applied to NVIDIA Nemotron-Personas-Korea (one million Korean synthetic personas), IAF finds that NPK aligns with KOSIS marginals while three joints fail. The major-by-occupation distribution against the KEIS graduate universe carries a large conditional mismatch. The age profile of military service is institutionally inconsistent. Female representation in male-dominated occupations is substantially over-flattened toward parity, with the strict screening verdict mapping-dependent and age-robust under direct standardisation. A transferability demonstration across six further NPK locales finds locale-dependent rather than universal diagnostics, with reference-taxonomy cardinality confounding cross-locale flag counts. For synthetic personas used as silicon samples, marginal claims must therefore be paired with disclosure-anchored joint audits before reuse. The released audit artefacts (reference manifests, occupational crosswalks, derived metrics, reproducibility scripts) instantiate this protocol on the NPK family and are released for retargeting at other synthetic persona resources.2026-05-15T17:42:39ZJoonhyung Baehttp://arxiv.org/abs/2504.00289v3Do Chinese models speak Chinese languages?2026-05-15T17:29:44ZThe release of top-performing open-weight LLMs has cemented China's role as a leading force in AI development. Do these models support languages spoken in China? Or do they support the same languages as models developed in the United States or in Europe? Comparing multilingual capabilities is important for two reasons. First, language ability provides insights into pre-training data curation, and thus into resource allocation and development priorities. Second, Chinese model developers need to navigate the tension between serving a linguistically diverse population domestically, and optimizing for globally visible benchmarks that are predominantly English. We investigate Chinese model developers' priorities through a comparative study of Chinese-developed and Western-developed open-weight LLMs, on 21 language variants including Asian regional, Chinese, and European languages. Our experiments on Information Parity and reading comprehension show Chinese models' performance across these languages correlates strongly (r=0.93) with their Western counterparts, with the sole exception being better Mandarin. Chinese-developed models are good at French and German, but they sometimes cannot identify languages spoken by Chinese minorities such as Kazakh and Uyghur. Overall, all open-weight LLMs we study have a similar multilingual performance profile, despite the diverse linguistic and cultural contexts the model developers operated within. We interpret the homogenization as consistent with the influence of global benchmarking practices and shared training resources. Rather than treating current language support as inevitable, our results highlight multilingual development as a space of prioritization and trade-offs, with implications for model developers, policymakers, and users.2025-03-31T23:19:08ZFirst and second author contribute equallyAndrea W Wen-YiUnso Eun Seo JoDavid Mimno10.1145/3805689.3812333http://arxiv.org/abs/2605.16198v1Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems2026-05-15T17:13:27ZWe examine one particular dimension of AI governance: how to monitor and audit AI-enabled products and services throughout the AI development lifecycle, from pre-deployment testing to post-deployment auditing. Combining principles from formal methods with SoTA machine learning, we propose techniques that enable AI-enabled product and service developers, as well as third party AI developers and evaluators, to perform offline auditing and online (runtime) monitoring of product-specific (temporally extended) behavioral constraints such as safety constraints, norms, rules and regulations with respect to black-box advanced AI systems, notably LLMs. We further provide practical techniques for predictive monitoring, such as sampling-based methods, and we introduce intervening monitors that act at runtime to preempt and potentially mitigate predicted violations. Experimental results show that by exploiting the formal syntax and semantics of Linear Temporal Logic (LTL), our proposed auditing and monitoring techniques are superior to LLM baseline methods in detecting violations of temporally extended behavioral constraints; with our approach, even small-model labelers match or exceed frontier LLM judges. Our predictive and intervening monitors significantly reduce the violation rates of LLM-based agents while largely preserving task performance. We further show through controlled experiments that LLMs' temporal reasoning shows a pronounced degradation in accuracy with increasing event distance, number of constraints, and number of propositions.2026-05-15T17:13:27ZParand A. AlamdariToryn Q. KlassenSheila A. McIlraithhttp://arxiv.org/abs/2605.16193v1Improving Cross-Cultural Survey Simulation with Calibrated Value Personas2026-05-15T17:10:50ZLarge language models (LLMs) are increasingly used to simulate human opinions and survey responses, but their ability to reproduce population responses across cultures remains limited. Existing persona-based prompting methods typically rely on sociodemographic or personality traits, which are only indirect proxies for the values that shape human responses. We propose a value-based persona construction method that derives textual descriptors from survey responses capturing core cultural dimensions. By sampling value profiles from target populations and aggregating LLM responses across personas, we obtain population-level predictions grounded in observed value distributions. We further introduce a calibration procedure that improves response diversity while preserving estimated opinions. We show that our approach reduces prediction error across countries, with the largest improvements observed in underrepresented populations. This substantially narrows the performance gap between countries aligned with dominant LLM priors and those that are less represented in training data, while also yielding response distributions that closely match human diversity.2026-05-15T17:10:50ZSubmitted to the Fourth International Workshop on Value Engineering in AI (VALE 2026), held at IJCAI-ECAI 2026Axel AbelsElias Fernandez DomingosApurva ShahTom Lenaertshttp://arxiv.org/abs/2603.13452v3MESD: A Risk-Sensitive Metric for Explanation Fairness Across Intersectional Subgroups2026-05-15T16:50:28ZFairness in machine learning is predominantly evaluated through outcome-oriented metrics, such as Demographic parity, which measure whether predictions are statistically consistent across protected groups. However, these metrics cannot detect whether a model uses systematically different reasoning for different demographic groups, which violates procedural fairness principles. This problem is compounded by intersectionality, where models may appear fair on individual attributes (e.g., race) while exhibiting significant disparities for intersectional subgroups (e.g., race $\times$ gender), a phenomenon known as fairness gerrymandering. In this work, we introduce Multi-category Explanation Stability Disparity (MESD), a procedural fairness metric that quantifies disparities in explanation quality across intersectional subgroups formed by the Cartesian product of multiple protected attributes. MESD integrates three components, which are label-aware aggregation aligned with outcome-conditional fairness, empirical-Bayes shrinkage to stabilize estimates for small intersectional groups, and Conditional Value-at-Risk (CVaR) weighting to emphasize worst-case subgroup disparities. We integrate MESD within a multi-objective optimization framework (UEF) that jointly optimizes utility, outcome fairness, and procedural fairness using NSGA-II. We evaluated MESD and UEF on three benchmark datasets along with four state-of-the-art methods in several experiments, and we demonstrate that MESD reveals procedural disparities invisible to outcome metrics alone. We position our contribution within procedural justice theory and discuss implications for regulatory compliance and intersectional equity.2026-03-13T15:42:31ZGideon PopoolaJohn Sheppardhttp://arxiv.org/abs/2605.16167v1From Backup Restoration to Minimum Viable Factory Recovery: A Systematization of Ransomware Recovery in Manufacturing Systems2026-05-15T16:46:23ZRansomware recovery in critical manufacturing infrastructure is not only a backup-restoration problem. Production capability depends on coupled information-technology, operational-technology, physical-process, quality, logistics, identity, and supplier systems. After ransomware, a plant may rebuild servers yet remain unable to schedule work, authenticate operators, trust engineering workstations, release product, reconnect OT assets, or coordinate suppliers. This paper reframes manufacturing ransomware recovery as a critical-infrastructure continuity and interdependency problem. We conduct a PRISMA-guided multivocal review of academic literature, standards and government guidance, threat frameworks, public incident material, and verified full-text/source-page evidence anchors. The review identifies nine evidence-backed recovery failure modes: dependency blindness, untrusted restore point and backup over-trust, identity trust collapse, lack of proof-of-recovery, unsafe OT reconnection, segmentation assumption failure, capability mismatch, unmanaged degraded operation, and supplier dependency failure. We then introduce Minimum Viable Factory Recovery (MVF Recovery): the smallest safe, trusted, and operationally meaningful production capability that can be resumed under current dependency, evidence, identity, data, network, OT, and supplier constraints. MVF Recovery is an analytical objective rather than a claim of full recovery, implementation, or safety certification. The paper derives a recovery lifecycle and benchmarking directions as secondary outputs. The contribution is an evidence-calibrated foundation for capability-centric ransomware recovery in critical manufacturing infrastructure.2026-05-15T16:46:23Z46 pages, submitted manuscript. Includes taxonomy, recovery lifecycle, and benchmarking framework for ransomware recovery in manufacturing/ICS environmentsChun Yin Chiuhttp://arxiv.org/abs/2508.06524v2CarbonScaling: Extending Neural Scaling Laws for Carbon Footprint in Large Language Models2026-05-15T16:31:45ZLarge language models (LLMs) increasingly follow neural scaling laws that tie performance gains to rapidly expanding computational budgets, raising concerns about the sustainability of frontier-scale training. Existing carbon-estimation methods largely depend on regression over historical runs and fail to capture critical system-level factors, including hardware heterogeneity, distributed parallelism, communication overhead, and architectural sparsity. We present \textit{CarbonScaling}, a hardware-aware analytical framework for modeling the carbon scaling behavior of frontier LLM training. The framework integrates neural scaling laws, distributed training strategies, accelerator and interconnect modeling, and operational and embodied carbon accounting to estimate feasible hardware configurations and associated emissions. CarbonScaling jointly models tensor, pipeline, data, and expert parallelism while incorporating memory, bandwidth, utilization, and runtime constraints. Experimental validation demonstrates substantially higher fidelity than regression-based baselines and highlights the growing importance of embodied carbon at trillion-parameter scales. Source code: \url{https://github.com/UnchartedRLab/CarbonScaling}.2025-08-02T00:41:45Z8 pagesLei JiangFan Chenhttp://arxiv.org/abs/2602.04759v2How to Stop Playing Whack-a-Mole: Mapping the Ecosystem of Technologies Facilitating AI-Generated Non-Consensual Intimate Images2026-05-15T16:12:06ZThe last decade has witnessed a rapid advancement of generative AI technology that significantly scaled the accessibility of AI-generated non-consensual intimate images (AIG-NCII), a form of image-based sexual abuse that disproportionately harms and silences women and girls. There is a patchwork of commendable efforts across industry, policy, academia, and civil society to address AIG-NCII. However, these efforts lack a shared, consistent mental model that clearly situates the technologies they target within the context of a large, interconnected, and ever-evolving technological ecosystem. As a result, interventions remain siloed and are difficult to evaluate and compare, leading to a reactive cycle of whack-a-mole. In this paper, we contribute the first comprehensive AIG-NCII technological ecosystem that maps and taxonomizes 11 categories of technologies facilitating the creation, distribution, proliferation and discovery, infrastructural support, and monetization of AIG-NCII. First, we build and visualize the ecosystem through a synthesis of over a hundred primary sources from researchers, journalists, advocates, policymakers, and technologists. Then, we conduct two detailed walkthroughs to demonstrate the usefulness of the ecosystem in 1) making sense of new AIG-NCII harms using a case study of Grok and 2) mapping a clearer tech policy landscape using U.S. federal law and 63 state laws. We conclude with a vision for future AIG-NCII research that refines the edges of the ecosystem, recommending researchers to study critical relationships between technologies and potential ripple effects from different interventions. Our goal is to produce an AIG-NCII technological ecosystem that provides a clear, shared terminology and framework for stakeholders to move into the future of AIG-NCII prevention with clarity and foresight.2026-02-04T16:58:05Z5/15/26 Update: This version includes a revised ecosystem diagram (Fig 1.) with new color-coding and edge clarification; minor updates to section 3. that include recently published primary sources; a new section 4. with two detailed walkthroughs on the usefulness of the ecosystem; a new section 5. with clearer recommendations for researchers; a new ethical considerations section in the appendixMichelle L. DingHarini SureshSuresh Venkatasubramanianhttp://arxiv.org/abs/2502.14296v5On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective2026-05-15T15:43:44ZGenerative Foundation Models (GenFMs) have emerged as transformative tools. However, their widespread adoption raises critical concerns regarding trustworthiness across dimensions. This paper presents a comprehensive framework to address these challenges through three key contributions. First, we systematically review global AI governance laws and policies from governments and regulatory bodies, as well as industry practices and standards. Based on this analysis, we propose a set of guiding principles for GenFMs, developed through extensive multidisciplinary collaboration that integrates technical, ethical, legal, and societal perspectives. Second, we introduce TrustGen, the first dynamic benchmarking platform designed to evaluate trustworthiness across multiple dimensions and model types, including text-to-image, large language, and vision-language models. TrustGen leverages modular components--metadata curation, test case generation, and contextual variation--to enable adaptive and iterative assessments, overcoming the limitations of static evaluation methods. Using TrustGen, we reveal significant progress in trustworthiness while identifying persistent challenges. Finally, we provide an in-depth discussion of the challenges and future directions for trustworthy GenFMs, which reveals the complex, evolving nature of trustworthiness, highlighting the nuanced trade-offs between utility and trustworthiness, and consideration for various downstream applications, identifying persistent challenges and providing a strategic roadmap for future research. This work establishes a holistic framework for advancing trustworthiness in GenAI, paving the way for safer and more responsible integration of GenFMs into critical applications. To facilitate advancement in the community, we release the toolkit for dynamic evaluation.2025-02-20T06:20:36ZYue HuangChujie GaoSiyuan WuHaoran WangXiangqi WangYujun ZhouYanbo WangJiayi YeJiawen ShiQihui ZhangYuan LiHan BaoZhaoyi LiuTianrui GuanDongping ChenRuoxi ChenKehan GuoAndy ZouBryan Hooi Kuen-YewCaiming XiongElias Stengel-EskinHongyang ZhangHongzhi YinHuan ZhangHuaxiu YaoJaehong YoonJieyu ZhangKai ShuKaijie ZhuRanjay KrishnaSwabha SwayamdiptaTaiwei ShiWeijia ShiXiang LiYiwei LiYuexing HaoZhihao JiaZhize LiXiuying ChenZhengzhong TuXiyang HuTianyi ZhouJieyu ZhaoLichao SunFurong HuangOr Cohen SassonPrasanna SattigeriAnka ReuelMax LamparthYue ZhaoNouha DziriYu SuHuan SunHeng JiChaowei XiaoMohit BansalNitesh V. ChawlaJian PeiJianfeng GaoMichael BackesPhilip S. YuNeil Zhenqiang GongPin-Yu ChenBo LiDawn SongXiangliang Zhanghttp://arxiv.org/abs/2606.12432v1AI Debris: Residual Risk and the Afterlife of Failed AI Systems2026-05-15T14:59:15ZAI governance frameworks primarily focus on risks during the development and deployment phases, implicitly treating system withdrawal as a technical shutdown. This paper argues that decommissioned AI systems generate residual risk, termed AI debris, that persists after model removal and continues to shape institutional behaviour, accountability, and trust. AI debris is defined as the post-withdrawal socio-technical residue of AI systems, including workflow dependency, data contamination, capability displacement (deskilling), legitimacy erosion, and accountability breakdown. The paper develops a typology of debris domains and identifies mechanisms through which debris persists, including institutional memory, path dependency, blame avoidance, and feedback effects in organisational data. To operationalise the concept, the paper proposes an evaluator-ready AI Debris Decommissioning Protocol (AIDP), a stepwise checklist specifying auditable evidence for freezing decision footprints, incident review, remediation, contestability, and post-withdrawal accountability assignment. A brief vignette of Amazon's discontinued hiring tool illustrates how algorithmic decision categories and screening heuristics can persist after system rollback. The paper contributes a practical governance instrument for regulators, auditors, and organisations seeking to prevent paper compliance, strengthen AI lifecycle governance, and improve institutional resilience in high-stakes decision environments.2026-05-15T14:59:15ZVictor Frimponghttp://arxiv.org/abs/2605.15958v1Bridging the climate to energy data gap: simulated annealing for representative climate year selection2026-05-15T13:52:08ZEnergy system models are increasingly dependent on representative climate input. Yet, a fundamental mismatch persists between the hundreds of simulated years often used in climate science and the handful of years that computationally demanding power system models can process. Current practice, including ENTSO-E's European Resource Adequacy Assessment, relies on climate year selections that have not been validated against explicit representativeness criteria. This risks biased investment decisions and blind spots for plausible weather conditions. This study proposes simulated annealing as an optimisation method for selecting representative subsets of complete climate years from large climate ensembles. Representativeness is quantified using the seasonal sliced Wasserstein distance, a metric from optimal transport theory that captures representativeness on marginal distributions, inter-variable correlations, and seasonal structure simultaneously. We evaluate simulated annealing against the alternative methods random search, filtered random search, and K-Medoids clustering across three test cases spanning the Netherlands and Europe, using 180 climate years from the Pan-European Climate Database as a reference. Simulated annealing consistently produces the most representative subsets and outperforms all compared methods. Simulated annealing achieves an effective sample size four to five times the actual subset size. The resulting subsets are roughly 2.5--3.5 times more representative than current ENTSO-E practice. The method is application-agnostic and its output can serve as a validated climate data input to any subsequent (energy) impact study.2026-05-15T13:52:08Z33 pages, 13 figures, submitted to Applied EnergyBram van DuinenKarin van der WielJean ThoreyLaurens Stoop