https://arxiv.org/api/+0dDXrNuViaeL8JALQStmsJq7iY 2026-06-14T08:03:34Z 6065 630 15 http://arxiv.org/abs/2509.02581v1 Charting the Future of Scholarly Knowledge with AI: A Community Perspective 2025-08-27T09:46:50Z

Despite the growing availability of tools designed to support scholarly knowledge extraction and organization, many researchers still rely on manual methods, sometimes due to unfamiliarity with existing technologies or limited access to domain-adapted solutions. Meanwhile, the rapid increase in scholarly publications across disciplines has made it increasingly difficult to stay current, further underscoring the need for scalable, AI-enabled approaches to structuring and synthesizing scholarly knowledge. Various research communities have begun addressing this challenge independently, developing tools and frameworks aimed at building reliable, dynamic, and queryable scholarly knowledge bases. However, limited interaction across these communities has hindered the exchange of methods, models, and best practices, slowing progress toward more integrated solutions. This manuscript identifies ways to foster cross-disciplinary dialogue, identify shared challenges, categorize new collaboration and shape future research directions in scholarly knowledge and organization.

2025-08-27T09:46:50Z 39 pages, 3 figures Azanzi Jiomekong Hande Küçük McGinty Keith G. Mills Allard Oelen Enayat Rajabi Harry McElroy Antrea Christou Anmol Saini Janice Anta Zebaze Hannah Kim Anna M. Jacyszyn Sören Auer http://arxiv.org/abs/2508.19489v1 Interactive Graph Visualization and TeamingRecommendation in an Interdisciplinary Project'sTalent Knowledge Graph 2025-08-27T00:25:22Z

Interactive visualization of large scholarly knowledge graphs combined with LLM reasoning shows promise butremains under-explored. We address this gap by developing an interactive visualization system for the Cell Map forAI Talent Knowledge Graph (28,000 experts and 1,179 biomedical datasets). Our approach integrates WebGLvisualization with LLM agents to overcome limitations of traditional tools such as Gephi, particularly for large-scaleinteractive node handling. Key functionalities include responsive exploration, filtering, and AI-drivenrecommendations with justifications. This integration can potentially enable users to effectively identify potentialcollaborators and relevant dataset users within biomedical and AI research communities. The system contributes anovel framework that enhances knowledge graph exploration through intuitive visualization and transparent, LLM-guided recommendations. This adaptable solution extends beyond the CM4AI community to other large knowledgegraphs, improving information representation and decision-making. Demo: https://cm4aikg.vercel.app/

2025-08-27T00:25:22Z Short paper presented at the ASIS&T 2025 Annual Meeting Jiawei Xu Juichien Chen Yilin Ye Zhandos Sembay Swathi Thaker Pamela Payne-Foster Jake Chen Ying Ding http://arxiv.org/abs/2508.18623v1 A Bibliometric Analysis of the Scholarly Impact of Early Subaru Telescope-based Publications 2025-08-26T03:00:58Z

Bibliometric methods provide valuable tools for assessing scientific productivity and impact across disciplines, yet their application in astronomy journals remains relatively limited. This study conducts a bibliometric analysis of Japanese astronomy publications before and after the commissioning of the Subaru Telescope, a major national investment in observational infrastructure. Using data from Scopus and SciVal, we examine peer-reviewed journal articles published between 1996 and 2007 by authors affiliated with Japanese institutions, focusing on field-normalized citation indicators such as the Field-Weighted Citation Impact (FWCI) and the share of publications in the top 10% most cited globally. Subaru Telescope-based publications are identified through cross-referencing with official telescope publication lists and are compared against national and global benchmarks. The results show that Subaru Telescope-based publications, while accounting for less than 10% of Japan's total scholarly output in astronomy, consistently achieved FWCI values above 2.0 and a significantly higher proportion of highly cited papers. This indicates that the Subaru Telescope substantially enhanced Japan's research visibility and impact, especially during its early operational years. This study demonstrates the utility of bibliometric evaluation in capturing the academic return of large-scale research facilities and contributes to broader discussions on research infrastructure in astronomy.

2025-08-26T03:00:58Z 8 pages, 4 figures; Author's Original Version, incorporating minor corrections to footnote formatting, figure sizes, and the Acknowledgements. This article has been accepted for publication in Publications of the Astronomical Society of Japan Published by Oxford University Press Hideaki Fujiwara 10.1093/pasj/psaf100 http://arxiv.org/abs/2508.18620v1 Investigating Document Type, Language, Publication Year, and Author Count Discrepancies Between OpenAlex and Web of Science 2025-08-26T02:46:47Z

Bibliometrics, whether used for research or research evaluation, relies on large multidisciplinary databases of research outputs and citation indices. The Web of Science (WoS) was the main supporting infrastructure of the field for more than 30 years until several new competitors emerged. OpenAlex, a bibliographic database launched in 2022, has distinguished itself for its openness and extensive coverage. While OpenAlex may reduce or eliminate barriers to accessing bibliometric data, one of the concerns that hinders its broader adoption for research and research evaluation is the quality of its metadata. This study aims to assess metadata quality in OpenAlex and WoS, focusing on document type, publication year, language, and number of authors. By addressing discrepancies and misattributions in metadata, this research seeks to enhance awareness of data quality issues that could impact bibliometric research and evaluation outcomes.

2025-08-26T02:46:47Z Philippe Mongeon Madelaine Hare Poppy Riddle Summer Wilson Geoff Krause Rebecca Marjoram Rémi Toupin http://arxiv.org/abs/2508.06522v3 Citation Issues in Wave Mechanics Theory of Microwave Absorption 2025-08-25T23:31:53Z

The wave mechanics theory of microwave absorption challenges the long-standing impedance-matching and quarter-wavelength paradigms by demonstrating that conventional models mistakenly conflate bulk material parameters with thin-film phenomena. Drawing on a corpus of 35 peer-reviewed papers and preprints, the study performs a citation-pattern analysis and a logical audit of established theory. Results reveal a striking asymmetry in scholarly engagement, only a handful of supportive or neutral citations appear amid widespread silence, alongside critical logical flaws in impedance matching, notably its inconsistent treatment of penetration, reflection, and absorption from film. By re-framing absorption as a wave-mechanics process governed by interference at parallel interfaces, the wave mechanics framework restores energy-conservation consistency and provides experimentally verified design rules for film thickness, phase response, and broadband performance. The paper further situates the citation neglect within broader issues of peer-review bias and paradigm inertia, illustrating how cargo-cult scientific practices can impede theoretical progress. Recommendations are offered for researchers, editors, and institutions to foster open discourse, rigorously test competing models, and update curricula and design tools accordingly.

2025-08-01T02:47:37Z Yue Liu Ying Liu Michael G. B. Drew http://arxiv.org/abs/2508.18073v1 Debian in the Research Software Ecosystem: A Bibliometric Analysis 2025-08-25T14:37:50Z

Context: The Debian system has historically participated in academic works and scientific projects, with well-known examples including NeuroDebian, Debian Med, Debsources, Debian Science, and Debian GIS, where the scientific relevance of Debian and its contribution to the Research Software ecosystem are evident. Objective: The objective of this study is to investigate the Debian system through academic publications, with the aim of classifying articles, mapping research, identifying trends, and finding opportunities. Method: The study is based on a bibliometric analysis starting with an initial search for the term "Debian" in the titles, abstracts, or keywords of academic publications, using the Scopus database. This analysis calculates metrics of co-citation, co-authorship, and word co-occurrence, and is guided by a set of research questions and criteria for inclusion and exclusion to conduct the bibliometric analysis. Results: The study includes a set of articles published across various fields of knowledge, providing a map of the academic publication space about Debian. The study's data will be available in a public repository, reporting demographic and bibliometric trends, including the most cited articles, active countries, researchers, and popular conferences. Conclusion: Results includes a bibliometric and demographic analysis identified in publications about Debian, shedding light on the intellectual structure of academic research. The results of the analyses can help researchers gain an overview of existing trends in publications about Debian and identify areas that require more attention from the scientific community.

2025-08-25T14:37:50Z 5 pages; 3 figures; 2 tables; to be published in DebConf25 Academic Track https://www.diverse-team.fr/debconf25-academictrack Joenio Marques da Costa Christina von Flach http://arxiv.org/abs/2508.17647v1 SurveyGen: Quality-Aware Scientific Survey Generation with Large Language Models 2025-08-25T04:22:23Z

Automatic survey generation has emerged as a key task in scientific document processing. While large language models (LLMs) have shown promise in generating survey texts, the lack of standardized evaluation datasets critically hampers rigorous assessment of their performance against human-written surveys. In this work, we present SurveyGen, a large-scale dataset comprising over 4,200 human-written surveys across diverse scientific domains, along with 242,143 cited references and extensive quality-related metadata for both the surveys and the cited papers. Leveraging this resource, we build QUAL-SG, a novel quality-aware framework for survey generation that enhances the standard Retrieval-Augmented Generation (RAG) pipeline by incorporating quality-aware indicators into literature retrieval to assess and select higher-quality source papers. Using this dataset and framework, we systematically evaluate state-of-the-art LLMs under varying levels of human involvement - from fully automatic generation to human-guided writing. Experimental results and human evaluations show that while semi-automatic pipelines can achieve partially competitive outcomes, fully automatic survey generation still suffers from low citation quality and limited critical analysis.

2025-08-25T04:22:23Z EMNLP2025 Tong Bao Mir Tafseer Nayeem Davood Rafiei Chengzhi Zhang http://arxiv.org/abs/2508.16519v1 The Community Index: A More Comprehensive Approach to Assessing Scholarly Impact 2025-08-22T16:41:51Z

The h index is a widely recognized metric for assessing the research impact of scholars, defined as the maximum value h such that the scholar has published h papers each cited at least h times. While it has proven useful measuring individual scholarly productivity and citation impact, the h index has limitations, such as an inability to account for interdisciplinary collaboration or demographic differences in citation patterns. Moreover, it is sometimes mistakenly treated as a measure of research quality, even though it only reflects how often work has been cited. While metric based evaluations of research have grown in importance in some areas of academia, such as medicine, these evaluations fail to consider other important aspects of intellectual work, such as representational and epistemic diversity in research. In this article, we propose a new metric called the c index, or the community index, which combines multiple dimensions of scholarly impact. This is important because a plurality of perspectives and lived experiences within author teams can promote epistemological reflection and humility as part of the creation and validation of scientific knowledge. The c index is a means of accounting for the often global, and increasingly interdisciplinary nature of contemporary research, in particular, the data that is collected, curated and analyzed in the process of scientific inquiry. While the c index provides a means of quantifying diversity within research teams, diversity is integral to the advancement of scientific excellence and should be actively fostered through formal recognition and valuation. We herein describe the mathematical foundation of the c index and demonstrate its potential to provide a more comprehensive representation and more multidimensional assessment of scientific contributions of research impact as compared to the h index.

2025-08-22T16:41:51Z 22 pages 49 references Arav Kumar Cameron Sabet Alessandro Hammond Amelia Fiske Bhav Jain Deirdre Goode Dharaa Suresha Leo Anthony Celi Lisa Soleymani Lehmann Ned Mccague Rawan Abulibdeh Sameer Pradhan http://arxiv.org/abs/2508.20120v1 The Power of Data Communities 2025-08-22T14:56:08Z

Datasets together with active scientific communities prepared to leverage them can contribute to scientific progress and facilitate making research more equitable. In this study we found that MIMIC, despite its limited amount of funding, managed to provide higher impact per dollar spent through accessible data communities. These findings support the notion that making clinical data available empowers innovation which directly addresses clinical concerns and can set new standards for inclusivity.

2025-08-22T14:56:08Z Lucas McCullum Miguel Angel Armengol de la Hoz Catherine Bielick Daniel K. Ebner Amelia Fiske Jack Gallifant Judy W. Gichoya Rahul Gorijavolu Nura Izath Anna E. Premo Alice Rangel Teixeira Christopher M. Sauer Leo A. Celi http://arxiv.org/abs/2508.15645v2 Guidelines for the Enhancement of the Corpus and the Verismo Vocabulary 2025-08-22T14:18:58Z

VIVer is a digital lexicography project with historical-literary and historical-linguistic aims that can be considered a case study of a Digital Humanities project. This paper presents the IT choices made to promote the dissemination and enhancement of the results, analysing the issues and advantages for wider adoption, beyond the specific VIVer project, serving as a model and inspiration for future projects.

2025-08-21T15:23:51Z 12 pages, 2 figures Michael Bassi Giovanni Salucci http://arxiv.org/abs/2508.16276v1 Implicit reporting standards in bibliometric research: what can reviewers' comments tell us about reporting completeness? 2025-08-22T10:17:12Z

The recent surge in bibliometric studies published has been accompanied by increasing diversity in the completeness of reporting these studies' details, affecting reliability, reproducibility, and robustness. Our study systematises the reporting of bibliometric research using open peer reviews. We examined 182 peer reviews of 85 bibliometric studies published in library and information science (LIS) journals and conference proceedings, and non-LIS journals. We extracted 968 reviewer comments and inductively classified them into 11 broad thematic categories and 68 sub-categories, determining that reviewers largely focus on the completeness and clarity of reporting data, methods, and results. We subsequently derived 49 recommendations for the details authors should report and compared them with the GLOBAL, PRIBA, and BIBLIO reporting guidelines to identify (dis)similarities in content. Our recommendations addressed 60-80% of the guidelines' items, while the guidelines covered 45-65% of our recommendations. Our recommendations provided greater range and specificity, but did not incorporate the functions of guidelines beyond addressing academic content. We argue that peer reviews provide valuable information for the development of future guidelines. Further, our recommendations can be read as the implicit community standards for reporting bibliometric studies and could be used by authors to aid complete and accurate reporting of their manuscripts.

2025-08-22T10:17:12Z Dimity Stephen Alexander Schniedermann Andrey Lovakov Marion Schmidt Matteo Ottaviani Nikita Sorgatz Roberto Cruz Romero Torger Möller Valeria Aman Stephan Stahlschmidt http://arxiv.org/abs/2508.20118v1 The Importance of the Digital Object Identifier (DOI) in Enhancing the Credibility of Scientific Research: An Analytical Data Study 2025-08-22T07:07:40Z

This study aims to analyze the vital role played by the Digital Object Identifier (DOI) in enhancing the credibility and reliability of scientific research in the digital age. Through an analytical study of DOI usage data derived from international scientific publishing institutions, the extent of its adoption and its recognition as a global standard for encoding research and academic sources was highlighted. The results showed that the number of scientific records registered using DOIs exceeded 167 million, with more than 30,000 DOI prefixes distributed across over 150 countries, reflecting the significant growth in its use by academic and research institutions. Additionally, more than 3.2 billion monthly DOI resolutions were recorded, indicating the increasing reliance on them for accessing resources. The study also included an analysis of the content types registered with DOIs, showing that scientific articles constituted the majority at 71%, followed by books and conference papers. A notable finding was that 95% of citations linked to DOIs are now openly available, contributing to greater transparency and scientific verifiability. The study concluded that the DOI is not merely an organizational tool but a central element in the structure of modern scientific publishing. It contributes to improving research quality, facilitating verification, and ensuring continued accessibility. The study recommended the broader adoption of DOIs, especially in emerging scientific communities, to achieve greater integration in the global research information infrastructure.

2025-08-22T07:07:40Z Ahmed Shaker Alalaq http://arxiv.org/abs/2508.15916v1 Information Ecosystem Reengineering via Public Sector Knowledge Representation 2025-08-21T18:29:27Z

Information Ecosystem Reengineering (IER) -- the technological reconditioning of information sources, services, and systems within a complex information ecosystem -- is a foundational challenge in the digital transformation of public sector services and smart governance platforms. From a semantic knowledge management perspective, IER becomes especially entangled due to the potentially infinite number of possibilities in its conceptualization, namely, as a result of manifoldness in the multi-level mix of perception, language and conceptual interlinkage implicit in all agents involved in such an effort. This paper proposes a novel approach -- Representation Disentanglement -- to disentangle these multiple layers of knowledge representation complexity hindering effective reengineering decision making. The approach is based on the theoretically grounded and implementationally robust ontology-driven conceptual modeling paradigm which has been widely adopted in systems analysis and (re)engineering. We argue that such a framework is essential to achieve explainability, traceability and semantic transparency in public sector knowledge representation and to support auditable decision workflows in governance ecosystems increasingly driven by Artificial Intelligence (AI) and data-centric architectures.

2025-08-21T18:29:27Z Mayukh Bagchi http://arxiv.org/abs/2504.19675v2 Annif at SemEval-2025 Task 5: Traditional XMTC augmented by LLMs 2025-08-21T14:43:26Z

This paper presents the Annif system in SemEval-2025 Task 5 (LLMs4Subjects), which focussed on subject indexing using large language models (LLMs). The task required creating subject predictions for bibliographic records from the bilingual TIBKAT database using the GND subject vocabulary. Our approach combines traditional natural language processing and machine learning techniques implemented in the Annif toolkit with innovative LLM-based methods for translation and synthetic data generation, and merging predictions from monolingual models. The system ranked first in the all-subjects category and second in the tib-core-subjects category in the quantitative evaluation, and fourth in qualitative evaluations. These findings demonstrate the potential of combining traditional XMTC algorithms with modern LLM techniques to improve the accuracy and efficiency of subject indexing in multilingual contexts.

2025-04-28T11:04:23Z 6 pages, 4 figures, published at SemEval-2025 workshop Task 5: LLMs4Subjects: https://aclanthology.org/2025.semeval-1.315/ Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025), 2424--2431 Osma Suominen Juho Inkinen Mona Lehtinen http://arxiv.org/abs/2508.15556v1 HERITRACE in action: the ParaText project as a case study for semantic data management in Classical Philology 2025-08-21T13:36:16Z

HERITRACE is a semantic data editor designed for cultural heritage institutions, addressing the gap between complex Semantic Web technologies and domain expert needs. ParaText Bibliographical Database, a specialized bibliographical database for ancient Greek exegesis, demonstrates HERITRACE's capabilities in Classical Philology. This paper examines how HERITRACE enables non-technical scholars to manage complex semantic data through SHACL-based form generation and validation, while ensuring comprehensive provenance tracking and change management via an OpenCitations Data Model adaptation.

2025-08-21T13:36:16Z 10 pages, 4 figures, submitted to Una Europa Cultural Heritage Book Series Francesca Filograsso Arcangelo Massari Camillo Neri Silvio Peroni