https://arxiv.org/api/jnAsU0TgOb54OQmdRZfAZR9RM8w2026-06-15T00:12:21Z606585515http://arxiv.org/abs/2407.00135v2Quantitative Methods in Research Evaluation Citation Indicators, Altmetrics, and Artificial Intelligence2025-04-10T20:23:55ZThis book critically analyses the value of citation data, altmetrics, and artificial intelligence to support the research evaluation of articles, scholars, departments, universities, countries, and funders. It introduces and discusses indicators that can support research evaluation and analyses their strengths and weaknesses as well as the generic strengths and weaknesses of the use of indicators for research assessment. The book includes evidence of the comparative value of citations and altmetrics in all broad academic fields primarily through comparisons against article level human expert judgements from the UK Research Excellence Framework 2021. It also discusses the potential applications of traditional artificial intelligence and large language models for research evaluation, with large scale evidence for the former. The book concludes that citation data can be informative and helpful in some research fields for some research evaluation purposes but that indicators are never accurate enough to be described as research quality measures. It also argues that AI may be helpful in limited circumstances for some types of research evaluation.2024-06-28T12:23:10ZBook preprintMike Thelwallhttp://arxiv.org/abs/2504.07828v1Dynamic disruption index across citation and cited references windows: Recommendations for thresholds in research evaluation2025-04-10T15:05:01ZThe temporal dimension of citation accumulation poses fundamental challenges for quantitative research evaluations, particularly in assessing disruptive and consolidating research through the disruption index (D). While prior studies emphasize minimum citation windows (mostly 3-5 years) for reliable citation impact measurements, the time-sensitive nature of D - which quantifies a paper' s capacity to eclipse prior knowledge - remains underexplored. This study addresses two critical gaps: (1) determining the temporal thresholds required for publications to meet citation/reference prerequisites, and (2) identifying "optimal" citation windows that balance early predictability and longitudinal validity. By analyzing millions of publications across four fields with varying citation dynamics, we employ some metrics to track D stabilization patterns. Key findings reveal that a 10-year window achieves >80% agreement with final D classifications, while shorter windows (3 years) exhibit instability. Publications with >=30 references stabilize 1-3 years faster, and extreme cases (top/bottom 5% D values) become identifiable within 5 years - enabling early detection of 60-80% of highly disruptive and consolidating works. The findings offer significant implications for scholarly evaluation and science policy, emphasizing the need for careful consideration of citation window length in research assessment (based on D).2025-04-10T15:05:01ZHongkan ChenLutz BornmannYi Buhttp://arxiv.org/abs/2504.07726v1Quantum Machine Learning: Unveiling Trends, Impacts through Bibliometric Analysis2025-04-10T13:18:48ZQuantum Machine Learning (QML) is the intersection of two revolutionary fields: quantum computing and machine learning. It promises to unlock unparalleled capabilities in data analysis, model building, and problem-solving by harnessing the unique properties of quantum mechanics. This research endeavors to conduct a comprehensive bibliometric analysis of scientific information pertaining to QML covering the period from 2000 to 2023. An extensive dataset comprising 9493 scholarly works is meticulously examined to unveil notable trends, impact factors, and funding patterns within the domain. Additionally, the study employs bibliometric mapping techniques to visually illustrate the network relationships among key countries, institutions, authors, patent citations and significant keywords in QML research. The analysis reveals a consistent growth in publications over the examined period. The findings highlight the United States and China as prominent contributors, exhibiting substantial publication and citation metrics. Notably, the study concludes that QML, as a research subject, is currently in a formative stage, characterized by robust scholarly activity and ongoing development.2025-04-10T13:18:48ZRiya BansalNikhil Kumar Rajputhttp://arxiv.org/abs/2408.01904v2The Artificial Intelligence Disclosure (AID) Framework: An Introduction2025-04-09T19:03:37ZAs the use of Generative Artificial Intelligence tools have grown in higher education and research, there have been increasing calls for transparency and granularity around the use and attribution of the use of these tools. Thus far, this need has been met via the recommended inclusion of a note, with little to no guidance on what the note itself should include. This has been identified as a problem to the use of AI in academic and research contexts. This article introduces The Artificial Intelligence Disclosure (AID) Framework, a standard, comprehensive, and detailed framework meant to inform the development and writing of GenAI disclosure for education and research.2024-08-04T02:18:42Z5 pagesC&RL News, 85(10), 407-411 (2024)Kari D. Weaver10.5860/crln.85.10.407http://arxiv.org/abs/2504.13905v1MaRDMO: Future Gateway to FAIR Mathematical Data2025-04-09T11:05:47ZMathematical research data plays a crucial role across scientific disciplines, yet its documentation and dissemination remain challenging due to the lack of standardized research data management practices. The MaRDMO Plugin addresses these challenges by integrating mathematical models, algorithms, and interdisciplinary workflows into the established framework of the Research Data Management Organiser (RDMO). Built on FAIR principles, MaRDMO enables structured documentation and retrieval of mathematical research data through guided questionnaires. It connects to multiple knowledge graphs, including MathModDB, MathAlgoDB, and the MaRDI Portal. Users can document and search for models, algorithms, and workflows via dynamic selection interfaces that also leverage other sources such as Wikidata. The plugin facilitates the export to the individual MaRDI services, ensuring data quality through automated validation. By embedding mathematical research data management into the widely adopted RDMO platform, MaRDMO represents a significant step toward making mathematical research data more findable, accessible, and reusable.2025-04-09T11:05:47ZMarco Reidelbachhttp://arxiv.org/abs/2504.05976v1A Knowledge Base for Arts and Inclusion -- The Dataverse data archival platform as a knowledge base management system enabling multimodal accessibility2025-04-08T12:33:12ZCreating an inclusive art environment requires engaging multiple senses for a fully immersive experience. Culture is inherently synesthetic, enriched by all senses within a shared time and space. In an optimal synesthetic setting, people of all abilities can connect meaningfully; when one sense is compromised, other channels can be enhanced to compensate. This is the power of multimodality. Digital technology is increasingly able to capture aspects of multimodality. To document multimodality aspects of cultural practices and products for the long-term remains a challenge. Many artistic products from the performing arts tend to be multimodal, and are often immersive, so only a multimodal repository can offer a platform for this work. To our knowledge there is no single, comprehensive repository with a knowledge base to serve arts and disability. By knowledge base, we mean classifications, taxonomies, or ontologies (in short, knowledge organisation systems). This paper presents innovative ways to develop a knowledge base which capture multimodal features of archived representations of cultural assets, but also indicate various forms how to interact with them including machine-readable description. We will demonstrate how back-end and front-end applications, in a combined effort, can support accessible archiving and data management for complex digital objects born out of artistic practices and make them available for wider audiences.2025-04-08T12:33:12Zsubmitted to HCI 2025, session Human-Centred Design for Participation and InclusionMoa JohanssonVyacheslav TykhonovSophia AlexanderssonKim FergusonJames HanlonAndrea ScharnhorstNigel Osbornehttp://arxiv.org/abs/2504.05905v1Rethinking Review Citations: Impact on Scientific Integrity2025-04-08T11:02:31ZThe proliferation of surveys and review articles in academic journals has impacted citation metrics like impact factor and h-index, skewing evaluations of journal and researcher quality. This work investigates the implications of this trend, focusing on the field of Computer Science, where a notable increase in review publications has led to inflated citation counts and rankings. While reviews serve as valuable literature overviews, they should not overshadow the primary goal of research -to advance scientific knowledge through original contributions. We advocate for prioritizing citations of primary research in journal articles to uphold citation integrity and ensure fair recognition of substantive contributions. This approach preserves the reliability of citation-based metrics and supports genuine scientific advancement.2025-04-08T11:02:31ZJesus S. Aguilar-Ruizhttp://arxiv.org/abs/2504.05206v1Content-aware rankings: a new approach to rankings in scholarship2025-04-07T15:55:18ZEntity rankings (e.g., institutions, journals) are a core component of academia and related industries. Existing approaches to institutional rankings have relied on a variety of data sources, and approaches to computing outcomes, but remain controversial. One limitation of existing approaches is reliance on scholarly output (e.g., number of publications associated with a given institution during a time period). We propose a new approach to rankings - one that relies not on scholarly output, but rather on the type of citations received (an implementation of the Scite Index). We describe how the necessary data can be gathered, as well as how relevant metrics are computed. To demonstrate the utility of our approach, we present rankings of fields, journals, and institutions, and discuss the various ways Scite's data can be deployed in the context of rankings. Implications, limitations, and future directions are discussed.2025-04-07T15:55:18ZSean C. RifeJoshua M. NicholsonBeatriz BosquesDomenic RosatiAshish UppalaIgor A. Osipovhttp://arxiv.org/abs/2504.05361v1A Comparative Analysis of Modeling Approaches for the Association of FAIR Digital Objects Operations2025-04-07T10:21:46ZThe concept of FAIR Digital Objects represents a foundational step towards realizing machine-actionable, interoperable data infrastructures across scientific and industrial domains. As digital spaces become increasingly heterogeneous, scalable mechanisms for data processing and interpretability are essential. This paper provides a comparative analysis of various typing mechanisms to associate FAIR Digital Objects with their operations, addressing the pressing need for a structured approach to manage data interactions within the FAIR Digital Objects ecosystem. By examining three core models -- record typing, profile typing, and attribute typing -- this work evaluates each model's complexity, flexibility, versatility, and interoperability, shedding light on their strengths and limitations. With this assessment, we aim to offer insights for adopting FDO frameworks that enhance data automation and promote the seamless exchange of digital resources across domains.2025-04-07T10:21:46ZNicolas BlumenröhrJana BöhmPhilipp OstMarco KulükePeter WittenburgChristophe BlanchiSven BingertUlrich Schwardmannhttp://arxiv.org/abs/2504.04677v1The Disruption Index Measures Displacement Between a Paper and Its Most Cited Reference2025-04-07T02:04:10ZInitially developed to capture technical innovation and later adapted to identify scientific breakthroughs, the Disruption Index (D-index) offers the first quantitative framework for analyzing transformative research. Despite its promise, prior studies have struggled to clarify its theoretical foundations, raising concerns about potential bias. Here, we show that-contrary to the common belief that the D-index measures absolute innovation-it captures relative innovation: a paper's ability to displace its most-cited reference. In this way, the D-index reflects scientific progress as the replacement of older answers with newer ones to the same fundamental question-much like light bulbs replacing candles. We support this insight through mathematical analysis, expert surveys, and large-scale bibliometric evidence. To facilitate replication, validation, and broader use, we release a dataset of D-index values for 49 million journal articles (1800-2024) based on OpenAlex.2025-04-07T02:04:10ZYiling LinLinzhuo LiLingfei Wuhttp://arxiv.org/abs/2504.13894v1State of the Art on Artificial Intelligence Resources for Interaction Media Design in Digital Cultural Heritage2025-04-05T19:20:11ZThis paper explores the integration of Artificial Intelligence (AI) in the design of interactive experiences for Cultural Heritage (CH). Previous studies indeed either miss to represent the specificity of the CH or mention possible tools without making a clear reference to a structured Interaction Design (IxD) workflow. The study also attempts to overcome one of the major limitations of traditional literature review, which may fail to capture proprietary tools whose release is rarely accompanied by academic publications. Besides the analysis of previous research, the study proposes a possible workflow for IxD in CH, subdivided into phases and tasks: for each of them, this paper proposes possible AI-based tools that can support the activity of designers, curators, and CH professionals. The review concludes with a final section outlining future paths for research and development in this domain.2025-04-05T19:20:11ZIn S. Campana et al. (eds.), Digital Heritage, The Eurographics Association (2025)Manuele Veggi10.2312/dh.20253238http://arxiv.org/abs/2409.13521v2A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges2025-04-04T11:52:55ZMoral values have deep roots in early civilizations, codified within norms and laws that regulated societal order and the common good. They play a crucial role in understanding the psychological basis of human behavior and cultural orientation. The Moral Foundation Theory (MFT) is a well-established framework that identifies the core moral foundations underlying the manner in which different cultures shape individual and social lives. Recent advancements in natural language processing, particularly Pre-trained Language Models (PLMs), have enabled the extraction and analysis of moral dimensions from textual data. This survey presents a comprehensive review of MFT-informed PLMs, providing an analysis of moral tendencies in PLMs and their application in the context of the MFT. We also review relevant datasets and lexicons and discuss trends, limitations, and future directions. By providing a structured overview of the intersection between PLMs and MFT, this work bridges moral psychology insights within the realm of PLMs, paving the way for further research and development in creating morally aware AI systems.2024-09-20T14:03:06ZAccepted for publication with AI & Society, March 2025AI & Society, March 2025Lorenzo ZangariCandida M. GrecoDavide PiccaAndrea Tagarelli10.1007/s00146-025-02225-whttp://arxiv.org/abs/2504.02767v1How Deep Do Large Language Models Internalize Scientific Literature and Citation Practices?2025-04-03T17:04:56ZThe spread of scientific knowledge depends on how researchers discover and cite previous work. The adoption of large language models (LLMs) in the scientific research process introduces a new layer to these citation practices. However, it remains unclear to what extent LLMs align with human citation practices, how they perform across domains, and may influence citation dynamics. Here, we show that LLMs systematically reinforce the Matthew effect in citations by consistently favoring highly cited papers when generating references. This pattern persists across scientific domains despite significant field-specific variations in existence rates, which refer to the proportion of generated references that match existing records in external bibliometric databases. Analyzing 274,951 references generated by GPT-4o for 10,000 papers, we find that LLM recommendations diverge from traditional citation patterns by preferring more recent references with shorter titles and fewer authors. Emphasizing their content-level relevance, the generated references are semantically aligned with the content of each paper at levels comparable to the ground truth references and display similar network effects while reducing author self-citations. These findings illustrate how LLMs may reshape citation practices and influence the trajectory of scientific discovery by reflecting and amplifying established trends. As LLMs become more integrated into the scientific research process, it is important to understand their role in shaping how scientific communities discover and build upon prior work.2025-04-03T17:04:56Z32 pages, 17 figuresAndres AlgabaVincent HolstFloriano ToriMelika MobiniBrecht VerbekenSylvia WenmackersVincent Ginishttp://arxiv.org/abs/2410.09871v2A Comparative Study of PDF Parsing Tools Across Diverse Document Categories2025-04-03T12:09:36ZPDF is one of the most prominent data formats, making PDF parsing crucial for information extraction and retrieval, particularly with the rise of RAG systems. While various PDF parsing tools exist, their effectiveness across different document types remains understudied, especially beyond academic papers. Our research aims to address this gap by comparing 10 popular PDF parsing tools across 6 document categories using the DocLayNet dataset. These tools include PyPDF, pdfminer-six, PyMuPDF, pdfplumber, pypdfium2, Unstructured, Tabula, Camelot, as well as the deep learning-based tools Nougat and Table Transformer(TATR). We evaluated both text extraction and table detection capabilities. For text extraction, PyMuPDF and pypdfium generally outperformed others, but all parsers struggled with Scientific and Patent documents. For these challenging categories, learning-based tools like Nougat demonstrated superior performance. In table detection, TATR excelled in the Financial, Patent, Law & Regulations, and Scientific categories. Table detection tool Camelot performed best for tender documents, while PyMuPDF performed superior in the Manual category. Our findings highlight the importance of selecting appropriate parsing tools based on document type and specific tasks, providing valuable insights for researchers and practitioners working with diverse document sources.2024-10-13T15:11:31Z17 pages,11 figures, 5 tablesNarayan S. AdhikariShradha Agarwalhttp://arxiv.org/abs/2409.02592v2Exploring Citation Diversity in Scholarly Literature: An Entropy-Based Approach2025-04-02T13:43:53ZThis study explores the citation diversity in scholarly literature, analyzing different patterns of citations observed within different countries and academic disciplines. We examine citation distributions across top institutions within certain countries and find that the higher end of the distribution follows a Power Law or Pareto Law pattern; the scaling exponent of the Pareto Law varies depending on the number of top institutions included in the analysis. By adopting a novel entropy-based diversity measure, our findings reveal that countries with both small and large economies tend to cluster similarly in terms of citation diversity. The composition of countries within each group changes as the number of top institutions considered in the analysis varies. Moreover, we analyze citation diversity among award-winning scientists across six scientific disciplines, finding significant variations. We also explore the evolution of citation diversity over the past century across multiple fields. A gender-based study in several disciplines confirms varying citation diversities among male and female scientists. Our innovative citation diversity measure stands out as a valuable tool for assessing the unevenness of citation distributions, providing deeper insights that go beyond what traditional citation counts alone can reveal. This comprehensive analysis enhances our understanding of global scientific contributions and fosters a more equitable view of academic achievements.2024-09-04T10:16:53Z23 pages, 18 figures, 13 tables, Scientometrics 2025 (Accepted)Scientometrics 130, 2673-2704 (2025)Suchismita BanerjeeAbhik GhoshBanasri Basu10.1007/s11192-025-05313-2