https://arxiv.org/api/2wPo7ye02gx2IYDO43L+1Nf0du4 2026-06-13T18:33:03Z 6065 450 15 http://arxiv.org/abs/2512.03337v1 Epistemic Substitution: How Grokipedia's AI-Generated Encyclopedia Restructures Authority 2025-12-03T01:05:32Z

A quarter century ago, Wikipedia's decentralized, crowdsourced, and consensus-driven model replaced the centralized, expert-driven, and authority-based standard for encyclopedic knowledge curation. The emergence of generative AI encyclopedias, such as Grokipedia, possibly presents another potential shift in epistemic evolution. This study investigates whether AI- and human-curated encyclopedias rely on the same foundations of authority. We conducted a multi-scale comparative analysis of the citation networks from 72 matched article pairs, which cite a total of almost 60,000 sources. Using an 8-category epistemic classification, we mapped the "epistemic profiles" of the articles on each platform. Our findings reveal several quantitative and qualitative differences in how knowledge is sourced and encyclopedia claims are epistemologically justified. Grokipedia replaces Wikipedia's heavy reliance on peer-reviewed "Academic & Scholarly" work with a notable increase in "User-generated" and "Civic organization" sources. Comparative network analyses further show that Grokipedia employs very different epistemological profiles when sourcing leisure topics (such as Sports and Entertainment) and more societal sensitive civic topics (such as Politics & Conflicts, Geographical Entities, and General Knowledge & Society). Finally, we find a "scaling-law for AI-generated knowledge sourcing" that shows a linear relationship between article length and citation density, which is distinct from collective human reference sourcing. We conclude that this first implementation of an LLM-based encyclopedia does not merely automate knowledge production but restructures it. Given the notable changes and the important role of encyclopedias, we suggest the continuation and deepening of algorithm audits, such as the one presented here, in order to understand the ongoing epistemological shifts.

2025-12-03T01:05:32Z Aliakbar Mehdizadeh Martin Hilbert http://arxiv.org/abs/2512.02818v1 Designing FAIR Workflows at OLCF: Building Scalable and Reusable Ecosystems for HPC Science 2025-12-02T14:27:32Z

High Performance Computing (HPC) centers provide advanced infrastructure that enables scientific research at extreme scale. These centers operate with hardware configurations, software environments, and security requirements that differ substantially from most users' local systems. As a result, users often develop customized digital artifacts that are tightly coupled to a given HPC center. This practice can lead to significant duplication of effort as multiple users independently create similar solutions to common problems. The FAIR Principles offer a framework to address these challenges. Initially designed to improve data stewardship, the FAIR approach has since been extended to encompass software, workflows, models, and infrastructure. By encouraging the use of rich metadata and community standards, FAIR practices aim to make digital artifacts easier to share and reuse, both within and across scientific domains. Many FAIR initiatives have emerged within individual research communities, often aligned by discipline (e.g. bioinformatics, earth sciences). These communities have made progress in adopting FAIR practices, but their domain-specific nature can lead to silos that limit broader collaboration. Thus, we propose that HPC centers play a more active role in fostering FAIR ecosystems that support research across multiple disciplines. This requires designing infrastructure that enables researchers to discover, share, and reuse computational components more effectively. Here, we build on the architecture of the European Open Science Cloud (EOSC) EOSC-Life FAIR Workflows Collaboratory to propose a model tailored to the needs of HPC. Rather than focusing on entire workflows, we emphasize the importance of making individual workflow components FAIR. This component-based approach better supports the diverse and evolving needs of HPC users while maximizing the long-term value of their work.

2025-12-02T14:27:32Z Sean R. Wilkinson Patrick Widener Sarp Oral Rafael Ferreira da Silva 10.5281/zenodo.17290392 http://arxiv.org/abs/2512.01669v1 Mapping the Landscape of Open Access Dashboards -- A Dataset for Research and Infrastructure Development 2025-12-01T13:39:22Z

As Open Access continues to gain importance in science policy, understanding the proportion of Open Access publications relative to the total research output of research-performing organizations, individual countries, or even globally has become increasingly relevant. In response, dashboards are being developed to capture and communicate progress in this area. To provide an overview of these dashboards and their characteristics, an extensive survey was conducted, resulting in the identification of nearly 60 dashboards. To support a detailed and structured description, a dedicated metadata schema was developed, and the identified dashboards were systematically indexed accordingly. To foster community engagement and ensure ongoing development, a participatory process was launched, allowing interested stakeholders to contribute to the dataset. The dataset is particularly relevant for researchers in Library and Information Science (LIS) and Science and Technology Studies (STS), supporting both empirical analyses of Open Access and the methodological refinement of indicators and policy instruments in the context of Open Science.

2025-12-01T13:39:22Z Sci Data 13, 677 (2026) Johannes Schneider Heinz Pampel 10.1038/s41597-026-07217-z http://arxiv.org/abs/2512.01570v1 OpenDORS: A dataset of openly referenced open research software 2025-12-01T11:45:50Z

In many academic disciplines, software is created during the research process or for a research purpose. The crucial role of software for research is increasingly acknowledged. The application of software engineering to research software has been formalized as research software engineering, to create better software that enables better research. Despite this, large-scale studies of research software and its development are still lacking. To enable such studies, we present a dataset of 134,352 unique open research software projects and 134,154 source code repositories referenced in open access literature. Each dataset record identifies the referencing publication and lists source code repositories of the software project. For 122,425 source code repositories, the dataset provides metadata on latest versions, license information, programming languages and descriptive metadata files. We summarize the distributions of these features in the dataset and describe additional software metadata that extends the dataset in future work. Finally, we suggest examples of research that could use the dataset to develop a better understanding of research software practice in RSE research.

2025-12-01T11:45:50Z 5 pages, 3 figures, 1 table Stephan Druskat Lars Grunske http://arxiv.org/abs/2512.01560v1 Estimating the prevalence of LLM-assisted text in scholarly writing 2025-12-01T11:34:15Z

The use of large language models (LLMs) in scholarly publications has grown dramatically since the launch of ChatGPT in late 2022. This usage is often undisclosed, and it can be challenging for readers and reviewers to identify human written but LLM-revised or translated text, or predominantly LLM-generated text. Given the known quality and reliability issues connected with LLM-generated text, their potential growth poses an increasing problem for research integrity, and for public trust in research. This study presents a simple and easily reproducible methodology to show the growth in the full text of published papers, across the full range of research, as indexed in the Dimensions database. It uses this to demonstrate that LLM tools are likely to have been involved in the production of more than 10% of all published papers in 2024, based on disproportionate use of specific indicative words, and draws together earlier studies to confirm that this is a plausible overall estimate. It then discusses the implications of this for the integrity of scholarly publishing, highlighting evidence that use of LLMs for text generation is still being concealed or downplayed by authors, and presents an argument that more comprehensive disclosure requirements are urgently required to address this.

2025-12-01T11:34:15Z 19 pages, 3 figures Andrew Gray http://arxiv.org/abs/2512.01330v1 Prompt perturbation and fraction facilitation sometimes strengthen Large Language Model scores 2025-12-01T06:44:48Z

Large Language Models (LLMs) can be tasked with scoring texts according to pre-defined criteria and on a defined scale, but there is no recognised optimal prompting strategy for this. This article focuses on the task of LLMs scoring journal articles for research quality on a four-point scale, testing how user prompt design can enhance this ability. Based primarily on 1.7 million Gemma3 27b queries for 2780 health and life science articles with 58 similar prompts, the results show that improvements can be obtained by (a) testing semantically equivalent prompt variations, (b) averaging scores from semantically equivalent prompts, (c) specifying that fractional scores are allowed, and possibly also (d) not drawing attention to the input being partial. Whilst (a) and (d) suggests that models can be sensitive to how a task is phrased, (b) and (c) suggest that strategies to leverage more of the model's knowledge are helpful, such as by perturbing prompts and facilitating fractions. Perhaps counterintuitively, encouraging incorrect answers (fractions for this task) releases useful information about the model's certainty about its answers. Mixing semantically equivalent prompts also reduces the chance of getting no score for an input. Additional testing showed that the best prompts vary between LLMs, however, and were almost the opposite for ChatGPT 4o-mini, weakly aligned for Llama4 Scout and Magistral, and made little difference to Qwen3 32b and DeepSeek R1 32b. Overall, whilst there is no single best prompt, a good strategy for all models was to average the scores from a range of different semantically equivalent or similar prompts.

2025-12-01T06:44:48Z Mike Thelwall http://arxiv.org/abs/2512.00868v1 A Core Ontology for Particle Accelerators: Interoperable Data and Workflows Across Facilities 2025-11-30T12:36:03Z

We propose a small, shared core ontology for particle accelerators that provides a semantic backbone for interoperable data and workflows across facilities. The ontology names key device types, signals, parameters, and regions, and relates them through explicit properties (e.g., hasSetpoint, hasReadback, partOf). Each site contributes a lightweight facility bundle, a profile that maps local conventions into the shared vocabulary plus data slices that instantiate those mappings, without renaming channel addresses or changing existing systems. Using standard W3C technologies, the approach supports both sparse and rich descriptions. We demonstrate the idea on two beamline segments at different laboratories. A single semantic query is expressed once and evaluated against both knowledge bases, returning the locally correct PVs. The ontology thereby enables not only portable workflows but also interoperable data, since measurements and catalogs are annotated with shared semantics rather than facility-specific names. The framework complements, rather than replaces, existing middle layers and lattice/data standards, and it creates a stable foundation for reusable tools and agentic workflows.

2025-11-30T12:36:03Z 11 pages, 3 figures, glossary Chris Tennant http://arxiv.org/abs/2512.00772v1 SHRAG: AFrameworkfor Combining Human-Inspired Search with RAG 2025-11-30T08:06:47Z

Retrieval-Augmented Generation (RAG) is gaining recognition as one of the key technological axes for next generation information retrieval, owing to its ability to mitigate the hallucination phenomenon in Large Language Models (LLMs)and effectively incorporate up-to-date information. However, specialized expertise is necessary to construct ahigh-quality retrieval system independently; moreover, RAGdemonstratesrelativelyslowerprocessing speeds compared to conventional pure retrieval systems because it involves both retrieval and generation stages. Accordingly, this study proposes SHRAG, a novel framework designed to facilitate the seamless integration of Information Retrieval and RAG while simultaneously securing precise retrieval performance. SHRAG utilizes a Large Language Model as a Query Strategist to automatically transform unstructured natural language queries into logically structured search queries, subsequently performing Boolean retrieval to emulate the search process of an expert human searcher. Furthermore, it incorporates multilingual query expansion and a multilingual embedding model, enabling it to perform efficient cross-lingual question answering within the multilingual dataset environment of the ScienceON Challenge. Experimental results demonstrate that the proposed method, combining logical retrieval capabilities and generative reasoning, can significantly enhance the accuracy and reliability of RAG systems. Furthermore, SHRAG movesbeyondconventionaldocument-centric retrieval methods, presenting the potential for a new search paradigm capable of providing direct and reliable responses to queries.

2025-11-30T08:06:47Z 10 pages, 4 figures, 1 table, 1 algorithm, 3 prompts Hyunseok Ryu Wonjune Shin Hyun Park http://arxiv.org/abs/2512.05135v1 Analysis of Inter-Testamental References Reveal Five Groups of Books in the Christian Bible 2025-11-29T04:45:55Z

The Bible is packed with references from start to finish. This study aims to analyze a specific branch of these references: citations. While there are several types of references, both explicit and implicit, this study focuses on the types of references that can be detected with a simple algorithmic string comparison, or an n-gram string comparison. Words were compared by their Strong's Concordance numbers so they could be compared without conjugation or declension. We searched through the Greek Old Testament (Septuagint) and Greek New Testament manuscripts for direct quotations from the former in the latter. Our analysis of these references leads us to believe Old Testament books cluster into three groups of common use, and that New Testament books cluster into two books of common use. We analyze these clusters to show explicitly how they differ, and discover that New Testament books reference vastly different portions of the Old Testament.

2025-11-29T04:45:55Z 15 pages of paper content, 8 pages of data in Appendix Isaac Anderson Wesley Stevick Katrina Koehler http://arxiv.org/abs/2511.23439v1 ML Researchers Support Openness in Peer Review But Are Concerned About Resubmission Bias 2025-11-28T18:35:19Z

Peer-review venues have increasingly adopted open reviewing policies that publicly release anonymized reviews and permit public commenting. Venues have adopted a variety of policies, and there is still ongoing debate about the benefits and drawbacks of decisions. To inform this debate, we surveyed 2,385 reviewers, authors, and other peer-review participants in machine learning to understand their experiences and opinions. Our key findings are: (a) Preferences: Over 80% of respondents support releasing reviews for accepted papers and allowing public comments. However, only 27.1% support releasing rejected manuscripts. (b) Benefits: Respondents cite improved public understanding (75.3%) and reviewer education (57.8%), increased fairness (56.6%), and stronger incentives for high-quality reviews (48.0%). (c) Challenges: The top concern is resubmission bias, where rejection history biases future reviewers (ranked top impact of open reviewing by 41% of respondents, and mentioned in over 50% of free responses). Other challenges include fear of reviewer de-anonymization (33.2%) and potential commenting abuse. (d) AI and open peer review: Participants believe open policies deter "AI slop" submissions (71.9%) and AI-generated reviews (38.9%). Respondents are split regarding peer-review venues generating official AI reviews, with 56.0% opposed and 44.0% supportive. Finally, we use AI to annotate 4,244 reviews from ICLR (fully open) and NeurIPS (partially open). We find that the fully open venue (ICLR) has higher levels of correctness and completeness than the partially open venue (NeurIPS). The effect size is small for correctness and very small for completeness, and both are statistically significant. We also find that there is no statistically significant difference in the level of substantiation. We release the full dataset at https://github.com/justinpayan/OpenReviewAnalysis.

2025-11-28T18:35:19Z 36 pages, 16 figures Vishisht Rao Justin Payan Andrew McCallum Nihar B. Shah http://arxiv.org/abs/2511.22965v1 Research on Diamond Open Access in the Long Shadow of Science Policy 2025-11-28T08:16:29Z

This paper reviews research literature on Diamond Open Access (DOA) journals - sometimes also called Platinum Open Access - that was produced after this journal segment started to become a priority in European research policy around 2020. It contextualizes the current science policy debate, critically examines different understandings of DOA, and reviews studies on the role of such journals in scholarly communication. Most existing research consists of quantitative studies focusing on aspects such as the number of DOA journals, their publication output, the diversity of the landscape in terms of subject areas, languages, publishing entities, indexing in major databases, awareness and perception among scholars, cost analyses, as well as insights into the internal operations of DOA journals. The review shows that research on DOA journals is partly influenced by the science policy discourse in at least two ways: first, through the normativity inherent in that discourse, and second, through the temporality of policy-driven research of practical relevance, which leaves important aspects of the phenomenon understudied. Moreover, research on the DOA journal landscape has implications beyond understanding this particular journal segment, as it also challenges established views of the global system of scholarly communication.

2025-11-28T08:16:29Z Keywords: Diamond Open Access; Platinum Open Access; Open Science; Scholarly Publishing; Open Access Publishing; Science Policy Niels Taubert http://arxiv.org/abs/2511.21843v1 FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers 2025-11-26T19:19:44Z

The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliably detect errors given the limited pool of experts. Recent advances in Large Language Models (LLMs) have sparked interest in their potential to support such evaluation tasks, from academic peer review to automated scientific assessment. However, despite the growing use of LLMs in review systems, their capabilities to pinpoint errors remain underexplored. In this work, we introduce Fault Localization Across Writing in Science (FLAWS), an automated benchmark consisting of 713 paper-error pairs designed to evaluate how effectively LLMs detect errors that undermine key claims in research papers. We construct the benchmark by systematically inserting claim-invalidating errors into peer-reviewed papers using LLMs, paired with an automated evaluation metric that measures whether models can identify and localize these errors. Developing such a benchmark presents unique challenges that we overcome: ensuring that the inserted errors are well-defined, challenging, and relevant to the content of the paper, avoiding artifacts that would make identification trivial, and designing a scalable, automated evaluation metric. On the resulting benchmark, we evaluate five frontier LLMs: Claude Sonnet 4.5, DeepSeek Reasoner v3.1, Gemini 2.5 Pro, GPT 5, and Grok 4. Among these, GPT 5 is the top-performing model, achieving 39.1% identification accuracy when k=10, where k is the number of top-ranked error text candidates generated by the LLM.

2025-11-26T19:19:44Z 30 pages, 12 tables, 2 figures Sarina Xi Vishisht Rao Justin Payan Nihar B. Shah http://arxiv.org/abs/2511.21505v1 The Intertwined Rise of Collaboration Scale, Reference Diversity, and Breakthrough Potential in Modern Science: A 40-Year Cross-Disciplinary Study 2025-11-26T15:39:22Z

Over the last four decades, the way knowledge is created in academia has transformed dramatically: research teams have grown larger, scholars draw from ever-wider pools of prior work, and the most influential discoveries increasingly emerge from complex collaborative efforts. Using a massive dataset of over 15 million publications spanning 1970-2010 and covering six major domains (Humanities, Social Sciences, Agricultural Sciences, Medical and Health Sciences, Engineering and Technology, and Natural Sciences), this study tracks how three core features of scientific papers - authorship team size, the breadth and variety of cited sources, and eventual citation impact - have co-evolved over time. We uncover striking differences across disciplines. In every field, papers that build on a broader and more diverse knowledge base consistently attract more citations later on, lending large-scale empirical support to theories that view scientific breakthroughs as outcomes of novel recombination across distant ideas. Bigger teams, on average, generate work with greater ultimate influence, but the gains taper off after a certain scale; very large consortia seldom produce the absolute highest-impact papers. While the Humanities and Social Sciences remain anchored in solo or small-group authorship traditions, the Natural Sciences, Medicine, and Engineering have moved decisively toward big-team mega-science. These patterns illuminate the underlying production technology of discovery, reveal discipline-specific barriers to collaboration and idea integration, and offer evidence-based guidance for research funding agencies, universities, and policymakers seeking to organize scientific work for maximum breakthrough potential.

2025-11-26T15:39:22Z Sarah J. James Marcus A. Rodriguez David P. Miller http://arxiv.org/abs/2601.04368v1 From Paper to Structured JSON: An Agentic Workflow for Compliant BMR Digital Transformation 2025-11-26T14:02:49Z

Pharmaceutical manufacturers generate thousands of batch manufacturing records (BMRs) each year under FDA 21 CFR Part 211 and EU GMP rules. These long documents combine tables, calculations, images, and handwritten notes, and are usually digitized by hand with hours of expert review per record. We present an AI workflow that converts unstructured BMRs into structured JSON using token based chunking, parallel large language model extraction, and a fixed schema that covers 11 content types while preserving the original Group-Phase-Step hierarchy. The system applies three layers of validation (JSON syntax, structural integrity of classes and references, and pharmaceutical compliance checks aligned with GMP) and reports coverage metrics for text, tables, images, and calculations. On three real BMRs between 15 and 66 pages, it achieves composite confidence scores in the low to high eighties while reducing processing time from hours to minutes on a single GPU. This enables practical, human in the loop BMR digitization at scale and unlocks historical manufacturing data for downstream analysis.

2025-11-26T14:02:49Z Bhavik Agarwal Nidhi Bendre Viktoria Rojkova http://arxiv.org/abs/2511.21176v1 Prevalence and Trends in Global Retractions Explored Through a Topic Lens 2025-11-26T08:40:05Z

Scientific publications form the cornerstone of innovation and have maintained a stable growth trend over the years. However, in recent years, there has been a significant surge in retractions, driven largely by the proliferation of low-quality and fraudulent papers. This study aims to examine retractions and their evolving trends through a topic lens. Our analysis of global retraction data reveals that the numbers of retraction have remained alarmingly high in recent years, with the growth rate of retracted papers significantly outpacing that of overall global publications. While retractions are observed across various fields, their distribution is not uniform. In disciplines characterized by high retraction rates, certain topics may only encounter minor issues, whereas in fields with lower retraction rates, some topics can experience substantial challenges. Moreover, an unexpected surge in publications has been observed in specific topics that also display abnormally high retraction rates. This study underscores several indicators that can assist the scientific community in pinpointing key fields that require rigorous scrutiny for potential low-quality and fraudulent research. Ultimately, our findings could serve as a benchmark for examining scientific integrity across diverse topics and offer crucial insights for developing tailored governance policies to enhance research integrity in each field.

2025-11-26T08:40:05Z 15 pages,11 figures Zhengyi Zhou Ying Lou Zhesi Shen Menghui Li