https://arxiv.org/api/MSBtTA9vLhNaKdRv0DvQwDw+hOE2026-03-22T17:38:00Z587010515http://arxiv.org/abs/2601.20806v1How Disciplinary Partnerships Shape Research Landscape in U.S. Library and Information Science Schools2026-01-28T17:50:16ZThis study provides the first comprehensive empirical mapping of how organizational structures and research portfolios co-occur across U.S. Library and Information Science (LIS) schools. Analyzing 14,705 publications from 1,264 faculty members across 44 institutions (2013--2024), we employ computational methods including word embeddings and topic modeling to identify 16 distinct research themes organized into three foundational dimensions: Library and Knowledge Organization (LKO), Human-Centered Technology (HCT), and Computing Systems (CS). Our mixed-method analysis reveals significant differences in research composition across organizational types: Computer-affiliated schools cluster tightly in computationally-intensive research and differ significantly from all other school types, while independent Information schools demonstrate the greatest research diversity. Temporal analysis of LIS schools reveals complex evolutionary dynamics: 51.4% are moving toward HCT, 37.8% toward CS, and 37.8% toward LKO, with many schools simultaneously shifting along multiple dimensions. Contrary to narratives of computational dominance, HCT emerged as LIS's primary growth vector. These patterns challenge assumptions about field fragmentation, revealing structured diversification shaped by but not determined by organizational positioning. The study provides empirical foundations for institutional strategic planning, accreditation policy, and understanding LIS's evolving disciplinary identity amid computational transformation.2026-01-28T17:50:16ZJiangen HeWen Louhttp://arxiv.org/abs/2602.03863v1Overcoming Barriers to Computational Reproducibility2026-01-27T16:03:03ZComputational reproducibility, the possibility for independent researchers to exactly reproduce published empirical results, is fundamental to science. Despite its importance, the proportion of research articles aiming for reproducibility remains low and uneven across disciplines. Barriers include a perceived lack of incentives for researchers and journals, practical challenges in preparing reproducible materials, and the absence of harmonised standards of reproducibility processes and requirements by journals. Existing guidance is often highly technical, reaching mainly those already engaged with reproducible research. In this paper, we first synthesize evidence on the benefits of reproducibility for both authors and journals. Drawing on our extensive experience in reproducibility checking at various journals, we then put forward concise, pragmatic guidelines for creating reproducible analyses across disciplines. We further review current reproducibility policies of selected journals, illustrating the substantial heterogeneity in requirements and procedures. Motivated by the latter, we propose conceptual foundations for a harmonised multi-tier system of reproducibility standards that could support transparent, consistent assessment across journals and research communities. Our goal as journal (reproducibility) editors and contributors to the MaRDI initiative is to encourage broader adoption of reproducibility practices, in particular by lowering practical barriers for authors and journals.2026-01-27T16:03:03Z21 pages, 1 figureRoman HornungDepartment of Statistics, Ludwig-Maximilians-UniversitätMunich Center for Machine LearningLászló NémethWeierstrass Institute for Applied Analysis and Stochastics, Berlin, GermanyMax Planck Institute for Demographic Research, Rostock, GermanyOleksandr ZadorozhnyDepartment of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Munich, GermanyTheresa UllmannInstitute of Clinical Biometrics, Center for Medical Data Science, Medical University of Vienna, Vienna, AustriaMichael KammerInstitute of Clinical Biometrics, Center for Medical Data Science, Medical University of Vienna, Vienna, AustriaDivision of Nephrology and Dialysis, Department of Medicine III, Medical University of Vienna, Vienna, AustriaRebecca KillickSchool of Mathematical Sciences, Lancaster University, Lancaster, United KingdomSchool of Mathematical and Statistical Sciences, Clemson University, Clemson, USAChristopher J. PaciorekDepartment of Statistics, University of California, Berkeley, USAJulien ChiquetUMR MIA Paris-Saclay, INRAE, AgroParisTech, Université Paris-Saclay, Palaiseau, FranceMoritz HerrmannMunich Center for Machine LearningInstitute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-UniversitätLucija BatinovícDepartment of Behavioural Sciences and Learning, Linköping University, Linköping, SwedenDepartment of Psychology, Linnaeus University, Växjö, SwedenRickard CarlssonDepartment of Psychology, Linnaeus University, Växjö, SwedenPierre NeuvialInstitut de Mathématiques de ToulouseBoris HejblumSISTM, U1219 Bordeaux Population Health, Université de Bordeaux / INSERM / Inria, Bordeaux, FranceJulia WrobelDepartment of Biostatistics and Bioinformatics, Emory University, Atlanta, USAAnne-Laure BoulesteixInstitute for Medical Information Processing, Biometry, and Epidemiology, Faculty of Medicine, Ludwig-Maximilians-UniversitätMunich Center for Machine LearningKarsten TabelowWeierstrass Institute for Applied Analysis and Stochastics, Berlin, Germanyhttp://arxiv.org/abs/2508.08828v2Recent Advances and Trends in Research Paper Recommender Systems: A Comprehensive Survey2026-01-27T12:26:31ZAs the volume of scientific publications grows exponentially, researchers increasingly face difficulties in locating relevant literature. Research Paper Recommender Systems have become vital tools to mitigate this information overload by delivering personalized suggestions. This survey provides a comprehensive analysis of Research Paper Recommender Systems developed between November 2021 and December 2024, building upon prior reviews in the field. It presents an extensive overview of the techniques and approaches employed, the datasets utilized, the evaluation metrics and procedures applied, and the status of both enduring and emerging challenges observed during the research. Unlike prior surveys, this survey goes beyond merely cataloguing techniques and models, providing a thorough examination of how these methods are implemented across different stages of the recommendation process. By furnishing a detailed and structured reference, this work aims to function as a consultative resource for the research community, supporting informed decision-making and guiding future investigations in the advances of effective Research Paper Recommender Systems.2025-08-12T10:36:41ZIratxe PinedoMikel LarrañagaAna Arruartehttp://arxiv.org/abs/2601.19513v1Enhancing Academic Paper Recommendations Using Fine-Grained Knowledge Entities and Multifaceted Document Embeddings2026-01-27T11:55:10ZIn the era of explosive growth in academic literature, the burden of literature review on scholars are increasing. Proactively recommending academic papers that align with scholars' literature needs in the research process has become one of the crucial pathways to enhance research efficiency and stimulate innovative thinking. Current academic paper recommendation systems primarily focus on broad and coarse-grained suggestions based on general topic or field similarities. While these systems effectively identify related literature, they fall short in addressing scholars' more specific and fine-grained needs, such as locating papers that utilize particular research methods, or tackle distinct research tasks within the same topic. To meet the diverse and specific literature needs of scholars in the research process, this paper proposes a novel academic paper recommendation method. This approach embeds multidimensional information by integrating new types of fine-grained knowledge entities, title and abstract of document, and citation data. Recommendations are then generated by calculating the similarity between combined paper vectors. The proposed recommendation method was evaluated using the STM-KG dataset, a knowledge graph that incorporates scientific concepts derived from papers across ten distinct domains. The experimental results indicate that our method outperforms baseline models, achieving an average precision of 27.3% among the top 50 recommendations. This represents an improvement of 6.7% over existing approaches.2026-01-27T11:55:10ZScientometrics, 2026Haixu XiHeng ZhangChengzhi Zhanghttp://arxiv.org/abs/2505.18942v6Language Models Should be Used to Surface the Unwritten Code of Science and Society2026-01-26T20:53:05ZThis paper calls on the research community not only to investigate how human biases are inherited by large language models (LLMs) but also to explore how these biases in LLMs can be leveraged to make society's "unwritten code" - such as implicit stereotypes and heuristics - visible and accessible for critique. We introduce a conceptual framework through a case study in science: uncovering hidden rules in peer review - the factors that reviewers care about but rarely state explicitly due to normative scientific expectations. The idea of the framework is to push LLMs to speak out their heuristics through generating self-consistent hypotheses - why one paper appeared stronger in reviewer scoring - among paired papers submitted to 46 academic conferences, while iteratively searching deeper hypotheses from remaining pairs where existing hypotheses cannot explain. We observed that LLMs' normative priors about the internal characteristics of good science extracted from their self-talk, e.g., theoretical rigor, were systematically updated toward posteriors that emphasize storytelling about external connections, such as how the work is positioned and connected within and across literatures. Human reviewers tend to explicitly reward aspects that moderately align with LLMs' normative priors (correlation = 0.49) but avoid articulating contextualization and storytelling posteriors in their review comments (correlation = -0.14), despite giving implicit reward to them with positive scores. These patterns are robust across different models and out-of-sample judgments. We discuss the broad applicability of our proposed framework, leveraging LLMs as diagnostic tools to amplify and surface the tacit codes underlying human society, enabling public discussion of revealed values and more precisely targeted responsible AI.2025-05-25T02:28:40ZHonglin BaoSiyang WuJiwoong ChoiYingrong MaoJames A. Evanshttp://arxiv.org/abs/2601.18945v1Large Language Models for Departmental Expert Review Quality Scores2026-01-26T20:38:20ZPresumably, peer reviewers and Large Language Models (LLMs) do very different things when asked to assess research. Still, recent evidence has shown that LLMs have a moderate ability to predict quality scores of published academic journal articles. One untested potential application of LLMs is for internal departmental review, which may be used to support appointment and promotion decisions or to select outputs for national assessments. This study assesses for the first time the extent to which (1) LLM quality scores align with internal departmental quality ratings and (2) LLM reports differ from expert reports. Using a private dataset of 58 published journal articles from the School of Information at the University of Sheffield, together with internal departmental quality ratings and reports, ChatGPT-4o, ChatGPT-4o mini, and Gemini 2.0 Flash scores correlate positively and moderately with internal departmental ratings, whether the input is just title/abstract or the full text. Whilst departmental reviews tended to be more specific and showing field-level knowledge, ChatGPT reports tended to be standardised, more general, repetitive, and with unsolicited suggestions for improvement. The results therefore (a) confirm the ability of LLMs to guess the quality scores of published academic research moderately well, (b) confirm that this ability is a guess rather than an evaluation (because it can be made based on title/abstract alone), (c) extend this ability to internal departmental expert review, and (d) show that LLM reports are less insightful than human expert reports for published academic journal articles.2026-01-26T20:38:20ZLiv LangfeldtDag W. AksnesHenrik KarlstrømMike Thelwallhttp://arxiv.org/abs/2601.18724v1HalluCitation Matters: Revealing the Impact of Hallucinated References with 300 Hallucinated Papers in ACL Conferences2026-01-26T17:48:23ZRecently, we have often observed hallucinated citations or references that do not correspond to any existing work in papers under review, preprints, or published papers. Such hallucinated citations pose a serious concern to scientific reliability. When they appear in accepted papers, they may also negatively affect the credibility of conferences. In this study, we refer to hallucinated citations as "HalluCitation" and systematically investigate their prevalence and impact. We analyze all papers published at ACL, NAACL, and EMNLP in 2024 and 2025, including main conference, Findings, and workshop papers. Our analysis reveals that nearly 300 papers contain at least one HalluCitation, most of which were published in 2025. Notably, half of these papers were identified at EMNLP 2025, the most recent conference, indicating that this issue is rapidly increasing. Moreover, more than 100 such papers were accepted as main conference and Findings papers at EMNLP 2025, affecting the credibility.2026-01-26T17:48:23ZWork In ProgressYusuke SakaiHidetaka KamigaitoTaro Watanabehttp://arxiv.org/abs/2510.17853v3CiteGuard: Faithful Citation Attribution for LLMs via Retrieval-Augmented Validation2026-01-26T16:50:53ZLarge Language Models (LLMs) have emerged as promising assistants for scientific writing. However, there have been concerns regarding the quality and reliability of the generated text, one of which is the citation accuracy and faithfulness. While most recent work relies on methods such as LLM-as-a-Judge, the reliability of LLM-as-a-Judge alone is also in doubt. In this work, we reframe citation evaluation as a problem of citation attribution alignment, which assesses whether LLM-generated citations match those a human author would include for the same text. We propose CiteGuard, a retrieval-aware agent framework designed to provide more faithful grounding for citation validation. CiteGuard improves the prior baseline by 17%, and achieves up to 68.1% accuracy on the CiteME benchmark, approaching human-level performance (69.7%). It also enables the identification of alternative but valid citations and demonstrates generalization ability for cross-domain citation attribution.Our code is available at https://github.com/KathCYM/CiteGuard.2025-10-15T00:32:26ZYee Man ChoiXuehang GuoYi R. FungQingyun Wanghttp://arxiv.org/abs/2601.18616v1Issues regarding the Indexing of Publication Types and Study Designs2026-01-26T15:56:05ZObjectives. Major research and implementation efforts have been devoted to indexing articles according to the major topics discussed, but much less effort to indexing their publication types and study designs (collectively, PTs). In this Perspective, we discuss how indexing PTs differs from topical MeSH indexing and requires a different approach. Materials and Methods. Rather than focus on the technical aspects of machine learning-based indexing models, we emphasize the goals and purposes for which biomedical articles are indexed, and the surprisingly thorny question of how indexing systems should be evaluated. Results. Topical Medical Subject Heading (MeSH) terms are assigned to articles that cover the major topics discussed; when more than one term is applicable, only the most specific term is assigned. In contrast, PTs are assigned to articles that have a given structure or use a particular design. To meet the needs of end users, particularly groups involved in evidence syntheses, PT indexing needs to be comprehensive and employ probabilistic prediction scores. Whereas existing NLM hierarchies place publication types and study design-related terms on separate trees from each other, a unified rubric permits more appropriate retrieval via automatic expansion. Discussion. Automated PT indexing systems should allow users to input article records or full text pdfs and receive scores in real time. This will offer consistent indexing across bibliographic databases, as well as preprints and unpublished manuscripts. Conclusions. Automated PT indexing systems, properly designed and implemented, hold the promise of greatly improving the retrieval of biomedical articles, saving substantial effort when writing evidence syntheses and benefiting other users as well.2026-01-26T15:56:05Z14 pages, no figures or tablesNeil R. SmalheiserJoe D. MenkeArthur W. HoltHalil KilicogluJodi Schneiderhttp://arxiv.org/abs/2601.18271v1Designing large language model prompts to extract scores from messy text: A shared dataset and challenge2026-01-26T08:55:55ZIn some areas of computing, natural language processing and information science, progress is made by sharing datasets and challenging the community to design the best algorithm for an associated task. This article introduces a shared dataset of 1446 short texts, each of which describes a research quality score on the UK scale of 1* to 4*. This is a messy collection, with some texts not containing scores and others including invalid scores or strange formats. With this dataset there is also a description of what constitutes a valid score and a "gold standard" of the correct scores for these texts (including missing values). The challenge is to design a prompt for Large Language Models (LLMs) to extract the scores from these texts as accurately as possible. The format for the response should be a number and no other text so there are two aspects to the challenge: ensuring that the LLM returns only a number, and instructing it to deduce the correct number for the text. As part of this, the LLM prompt needs to explain when to return the missing value code, -1, instead of a number when the text does not clearly contain one. The article also provides an example of a simple prompt. The purpose of the challenge is twofold: to get an effective solution to this problem, and to increase understanding of prompt design and LLM capabilities for complex numerical tasks. The initial solution suggested has an accuracy of 72.6%, so the challenge is to beat this.2026-01-26T08:55:55ZTrends in Information Management, 13(2), paper 1 (2025)Mike Thelwallhttp://arxiv.org/abs/2601.18230v1Using LibCal Seats to Better Serve Students2026-01-26T07:34:06ZThis chapter examines the evolution of library services at the University of Liège (ULiège), with a focus on the implementation and assessment of the LibCal Seats booking module. Introduced in September 2020 in response to the COVID-19 pandemic, this system was designed to manage occupancy and maintain social distancing. While initially a temporary measure, the seat booking service remains in use during peak periods. Drawing on survey data from 2022 and 2023, the chapter analyses user perceptions of the system. Results indicate strong student appreciation, particularly regarding stress reduction and equitable access to study spaces. Despite overall satisfaction, issues such as unoccupied reserved seats and an unnecessarily complex booking process emerged, leading to targeted improvements. This chapter highlights the importance of responsive, user-centred services in academic libraries. The adoption of the booking system helped address challenges such as overcrowding and "seat hogging," ultimately contributing to a more organised and accessible environment. The case study illustrates how technology can enhance library service delivery, offering insights for institutions seeking to optimise space management. The continued evaluation of the system reflects a broader commitment to adapting services in alignment with user needs and institutional priorities.2026-01-26T07:34:06Z13 pages. To be published in May 2026 in: C. Furno, M. K. Saba, and M. Stöpel (Eds.), Changing Information Services and User Experiences. De Gruyter Saur. (eBook ISBN 9783111336459, hardcover ISBN 9783111335834)François RenavilleFabienne Prosmanshttp://arxiv.org/abs/2601.17431v1The 17% Gap: Quantifying Epistemic Decay in AI-Assisted Survey Papers2026-01-24T12:00:55ZThe adoption of Large Language Models (LLMs) in scientific writing promises efficiency but risks introducing informational entropy. While "hallucinated papers" are a known artifact, the systematic degradation of valid citation chains remains unquantified. We conducted a forensic audit of 50 recent survey papers in Artificial Intelligence (N=5,514 citations) published between September 2024 and January 2026. We utilized a hybrid verification pipeline combining DOI resolution, Crossref metadata analysis, Semantic Scholar queries, and fuzzy text matching to distinguish between formatting errors ("Sloppiness") and verifiable non-existence ("Phantoms). We detect a persistent 17.0% Phantom Rate -- citations that cannot be resolved to any digital object despite aggressive forensic recovery. Diagnostic categorization reveals three distinct failure modes: pure hallucinations (5.1%), hallucinated identifiers with valid titles (16.4%), and parsing-induced matching failures (78.5%). Longitudinal analysis reveals a flat trend (+0.07 pp/month), suggesting that high-entropy citation practices have stabilized as an endemic feature of the field. The scientific citation graph in AI survey literature exhibits "link rot" at scale. This suggests a mechanism where AI tools act as "lazy research assistants," retrieving correct titles but hallucinating metadata, thereby severing the digital chain of custody required for reproducible science.2026-01-24T12:00:55ZH. Kemal İlterhttp://arxiv.org/abs/2603.13232v1Autonomous Editorial Systems and Computational Investigation with Artificial Intelligence2026-01-23T17:51:04ZAutonomous editorial systems represent an emerging class of computational frameworks that transform how large volumes of information are ingested, organized, and analyzed. This work presents a structured, continuously operating editorial architecture that treats news and reports as persistent state rather than transient documents. The system separates editorial organization from investigative analysis, enabling deterministic orchestration of artificial intelligence components across ingestion, enrichment, clustering, verification, and persistence stages.
We introduce a pipeline-based design in which stories evolve over time through incremental updates, automated re-evaluation, and contextual enrichment. The architecture supports scalable real-time processing while maintaining traceability, reproducibility, and editorial oversight. By framing editorial workflows as computational processes, the system enables algorithmic investigation, longitudinal analysis, and automated discovery of trends, inconsistencies, and emerging narratives.
This paper formalizes the architectural principles, data flow, and operational characteristics of autonomous editorial systems and demonstrates how artificial intelligence can be integrated as a controlled, inspectable component rather than an opaque decision-maker. The proposed approach establishes a foundation for future research into machine-assisted journalism, automated investigation, and large-scale information synthesis.2026-01-23T17:51:04ZAhmed Banafeahttp://arxiv.org/abs/2601.17109v1Authority Signals in AI Cited Health Sources: A Framework for Evaluating Source Credibility in ChatGPT Responses2026-01-23T17:44:36ZHealth information seeking has fundamentally changed since the onset of Large Language Models (LLM), with nearly one third of ChatGPT's 800 million users asking health questions weekly. Understanding the sources of those AI generated responses is vital, as health organizations and providers are also investing in digital strategies to organically improve their ranking, reach and visibility in LLM systems like ChatGPT. As AI search optimization strategies are gaining maturity, this study introduces an Authority Signals Framework, organized in four domains that reflect key components to health information seeking, starting with "Who wrote it?" (Author Credentials), followed by "Who published it?" (Institutional Affiliation), "How was it vetted?" (Quality Assurance), and "How does AI find it?" (Digital Authority). This descriptive cross-sectional study randomly selected 100 questions from HealthSearchQA which contains 3,173 consumer health questions curated by Google Research from publicly available search engine suggestions. Those questions were entered into ChatGPT 5.2 Pro to record and code the cited sources through the lens of the Authority Signals Framework's four domains. Descriptive statistics were calculated for all cited sources (n=615), and cross tabulations were conducted to examine distinction among organization types. Over 75% of the sources cited in ChatGPT's health generated responses were from established institutional sources, such as Mayo Clinic, Cleveland Clinic, Wikipedia, National Health Service, PubMed with the remaining citations sourced from alternative health information sources that lacked established institutional backing.2026-01-23T17:44:36Z24 pages, 4 figures, 1 table. All research materials available at https://doi.org/10.5281/zenodo.18287499Erin JacquesYork College, CUNYErela DatuoweiTeachers College, Columbia UniversityVincent JonesYork College, CUNYCorey BaschWilliam Paterson UniversityCeleta VanderpoolTeachers College, Columbia UniversityNkechi UdeozoCUNY School of Public HealthGriselda ChapaYork College, CUNYhttp://arxiv.org/abs/2601.15485v1The Rise of Large Language Models and the Direction and Impact of US Federal Research Funding2026-01-21T21:37:08ZFederal research funding shapes the direction, diversity, and impact of the US scientific enterprise. Large language models (LLMs) are rapidly diffusing into scientific practice, holding substantial promise while raising widespread concerns. Despite growing attention to AI use in scientific writing and evaluation, little is known about how the rise of LLMs is reshaping the public funding landscape. Here, we examine LLM involvement at key stages of the federal funding pipeline by combining two complementary data sources: confidential National Science Foundation (NSF) and National Institutes of Health (NIH) proposal submissions from two large US R1 universities, including funded, unfunded, and pending proposals, and the full population of publicly released NSF and NIH awards. We find that LLM use rises sharply beginning in 2023 and exhibits a bimodal distribution, indicating a clear split between minimal and substantive use. Across both private submissions and public awards, higher LLM involvement is consistently associated with lower semantic distinctiveness, positioning projects closer to recently funded work within the same agency. The consequences of this shift are agency-dependent. LLM use is positively associated with proposal success and higher subsequent publication output at NIH, whereas no comparable associations are observed at NSF. Notably, the productivity gains at NIH are concentrated in non-hit papers rather than the most highly cited work. Together, these findings provide large-scale evidence that the rise of LLMs is reshaping how scientific ideas are positioned, selected, and translated into publicly funded research, with implications for portfolio governance, research diversity, and the long-run impact of science.2026-01-21T21:37:08Z41 pages, 23 figures, 12 tablesYifan QianZhe WenAlexander C. FurnasYue BaiErzhuo ShaoDashun Wang