https://arxiv.org/api/3cJoX+iUpKuBAvERMLzCV+UBulM 2026-06-10T14:38:12Z 6061 240 15 http://arxiv.org/abs/2509.25298v2 Trajectories and Comparative Analysis of Global Countries Dominating AI Publications, 2000-2025 2026-03-14T21:03:00Z

This study investigates the shifting global dynamics of Artificial Intelligence (AI) research by analysing the trajectories of countries dominating AI publications between 2000 and 2025. Drawing on the comprehensive OpenAlex datasets and employing fractional counting to avoid double attribution in co-authored work, the research maps the relative shares of AI publications across major global players. The analysis reveals a profound restructuring of the international AI research landscape. The US and the European Union (representing EU27), once the undisputed and established leaders, have experienced a notable decline in relative dominance, with their combined share of publications falling from over 57% in 2000 to less than 25% in 2025. In contrast, China has undergone a dramatic ascent, expanding its global share of AI publications from under 5% in 2000 to nearly 36% by 2025, therefore emerging as the single most dominant contributor. Alongside China, India has also risen substantially, consolidating a multipolar Asian research ecosystem. These empirical findings highlight the strategic implications of concentrated research output, particularly China's capacity to shape the future direction of AI innovation and standard-setting. Beyond publication volume, the study further examines research quality by comparing each country's share of high-impact publications against its overall output, and analyses citation impact trajectories across major players. The findings show that in addition to China leading in volume, the country has also recently led in high-impact publications. Such an observation challenges the general assumption that Western powers retain dominance in high-impact AI scholarship.

2025-09-29T16:35:54Z 22 pages, 12 figures, 7 tables Jason Hung http://arxiv.org/abs/2603.19303v1 Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology 2026-03-12T19:56:46Z

Introduction: Evaluating compliance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement can be time-consuming and subjective. This study compares STROBE assessments from large language models (LLMs), a human reviewer panel, and the original manuscript authors in observational rheumatology research. Methods: Guided by the GRRAS and DEAL Pathway B frameworks, 17 rheumatology articles were independently assessed. Evaluations used the 22-item STROBE checklist, completed by the authors, a five-person human panel (ranging from junior to senior professionals), and two LLMs (ChatGPT-5.2, Gemini-3Pro). Items were grouped into Methodological Rigor and Presentation and Context domains. Inter-rater reliability was calculated using Gwet's Agreement Coefficient (AC1). Results: Overall agreement across all reviewers was 85.0% (AC1=0.826). Domain stratification showed almost perfect agreement for Presentation and Context (AC1=0.841) and substantial agreement for Methodological Rigor (AC1=0.803). Although LLMs achieved complete agreement (AC1=1.000) with all human reviewers on standard formatting elements, their agreement with human reviewers and authors declined on complex items. For example, regarding the item on loss to follow-up, the agreement between Gemini 3 Pro and the senior reviewer was AC1=-0.252, while the agreement with the authors was only fair. Additionally, ChatGPT-5.2 generally demonstrated higher agreement with human reviewers than Gemini-3Pro on specific methodological items. Conclusion: While LLMs show potential for basic STROBE screening, their lower agreement with human experts on complex methodological items likely reflects a reliance on surface-level information. Currently, these models appear more reliable for standardizing straightforward checks than for replacing expert human judgment in evaluating observational research.

2026-03-12T19:56:46Z 19 pages, 2 figures, 2 supplementary figures Emre Bilgin Ebru Ozturk Meera Shah Lisa Traboco Rebecca Everitt Ai Lyn Tan Marwan Bukhari Vincenzo Venerito Latika Gupta http://arxiv.org/abs/2603.11933v1 Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library 2026-03-12T13:46:42Z

Digital Gregorian chant scholarship has for decades enjoyed the privilege of a large digital resource cataloguing chant sources: the Cantus ecosystem, with nearly 900,000 chants catalogued across more than 2000 sources. The Cantus Database data model and the Cantus ID mechanism has been adopted by 18 more chant databases, jointly accessible through the Cantus Index interface. However, this data has only been available piecemeal via the individual online user interfaces; computational methods have so far had only a limited opportunity to process these immense resources. To overcome this hurdle, we compiled CantusCorpus v1.0, a dataset that combines everything that was available across the Cantus Index-centered network of databases as of mid-2025, and we have also provided the code for updating the dataset as the databases grow. We then created the lightweight PyCantus library for working with this data. PyCantus decouples the data model from the Cantus codebase and thus allows integration of further chant data sources, which we illustrate with harmonising pilot data from the Corpus Monodicum project. Computational chant research is attractive - and CantusCorpus v1.0 and PyCantus are infrastructures that should make work in this field more transparent, replicable, and accessible to digital humanities practitioners beyond chant scholars themselves.

2026-03-12T13:46:42Z Accepted to TISMIR Special Issue on Digital Musicology Anna Dvořáková Tim Eipert Debra Lacoste Jan Hajič http://arxiv.org/abs/2509.09596v2 How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis 2026-03-11T18:35:43Z

This study investigates how Large Language Models (LLMs) are influencing the language of academic papers by tracking 12 LLM-associated terms across six major scholarly databases (Scopus, Web of Science, PubMed, PubMed Central (PMC), Dimensions, and OpenAlex) from 2015 to 2024. Using over 2.4 million PMC open-access publications (2021-July 2025), we also analysed full texts to assess changes in the frequency and co-occurrence of these terms before and after ChatGPT's initial public release. Across databases, delve (+1,500%), underscore (+1,000%), and intricate (+700%) had the largest increases between 2022 and 2024. Growth in LLM-term usage was much higher in STEM fields than in social sciences and arts and humanities. In PMC full texts, the proportion of papers using underscore six or more times increased by over 10,000% from 2022 to 2025, followed by intricate (+5,400%) and meticulous (+2,800%). Nearly half of all 2024 PMC papers using any LLM term also included underscore, compared with only 3%-14% of papers before ChatGPT in 2022. Papers using one LLM term are now much more likely to include other terms. For example, in 2024, underscore strongly correlated with pivotal (0.449) and delve (0.311), compared with very weak associations in 2022 (0.032 and 0.018, respectively). These findings provide the first large-scale evidence based on full-text publications and multiple databases that some LLM-related terms are now being used much more frequently and together. The rapid uptake of LLMs to support scholarly publishing is a welcome development reducing the language barrier to academic publishing for non-English speakers.

2025-09-11T16:35:54Z Kayvan Kousha Mike Thelwall http://arxiv.org/abs/2603.08935v2 PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration 2026-03-11T16:00:39Z

Pathology underpins modern diagnosis and cancer care, yet its most valuable asset, the accumulated experience encoded in millions of narrative reports, remains largely inaccessible. Although institutions are rapidly digitizing pathology workflows, storing data without effective mechanisms for retrieval and reasoning risks transforming archives into a passive data repository, where institutional knowledge exists but cannot meaningfully inform patient care. True progress requires not only digitization, but the ability for pathologists to interrogate prior similar cases in real time while evaluating a new diagnostic dilemma. We present PathoScribe, a unified retrieval-augmented large language model (LLM) framework designed to transform static pathology archives into a searchable, reasoning-enabled living library. PathoScribe enables natural language case exploration, automated cohort construction, clinical question answering, immunohistochemistry (IHC) panel recommendation, and prompt-controlled report transformation within a single architecture. Evaluated on 70,000 multi-institutional surgical pathology reports, PathoScribe achieved perfect Recall@10 for natural language case retrieval and demonstrated high-quality retrieval-grounded reasoning (mean reviewer score 4.56/5). Critically, the system operationalized automated cohort construction from free-text eligibility criteria, assembling research-ready cohorts in minutes (mean 9.2 minutes) with 91.3% agreement to human reviewers and no eligible cases incorrectly excluded, representing orders-of-magnitude reductions in time and cost compared to traditional manual chart review. This work establishes a scalable foundation for converting digital pathology archives from passive storage systems into active clinical intelligence platforms.

2026-03-09T21:09:24Z Abdul Rehman Akbar Samuel Wales-McGrath Alejadro Levya Lina Gokhale Rajendra Singh Wei Chen Anil Parwani Muhammad Khalid Khan Niazi http://arxiv.org/abs/2603.10876v1 An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously? 2026-03-11T15:24:20Z

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.

2026-03-11T15:24:20Z 9 pages, 5 figures. Accepted to appear in the Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026) Jennifer D'Souza Sameer Sadruddin Maximilian Kähler Andrea Salfinger Luca Zaccagna Francesca Incitti Lauro Snidaro Osma Suominen http://arxiv.org/abs/2604.16330v1 A Collection of Systematic Reviews in Computer Science 2026-03-11T12:30:05Z

Systematic reviews are the standard method for synthesizing scientific evidence, but their creation requires substantial manual effort, particularly during retrieval and screening. While recent work has explored automating these steps, evaluation resources remain largely confined to the biomedical domain, limiting reproducible experimentation in other domains. This paper introduces SR4CS, a large-scale collection of systematic reviews in computer science, designed to support reproducible research on Boolean query generation, retrieval, and screening. The corpus comprises 1,212 systematic reviews with their original expert-designed Boolean search queries, 104,316 resolved references, and structured methodological metadata. For controlled evaluation, the original Boolean queries are additionally provided in a normalized, approximated form operating over titles and abstracts. To illustrate the intended use of the collection, baseline experiments compare the approximated expert Boolean queries with zero-shot LLM-generated Boolean queries, BM25, and dense retrieval under a unified evaluation setting. The results highlight systematic differences in precision, recall, and ranking behavior across retrieval paradigms and expose limitations of naive zero-shot Boolean generation. SR4CS is released under an open license on Zenodo (https://doi.org/10.5281/zenodo.17163932), together with documentation and code (https://github.com/webis-de/scolia26-sr4cs), to enable reproducible evaluation and future research on scaling systematic review automation.

2026-03-11T12:30:05Z Accepted at SCOLIA26 Workshop Pierre Achkar Tim Gollub amd Martin Potthast http://arxiv.org/abs/2603.19301v1 Journal Research Data Policies in Materials Science 2026-03-11T11:07:29Z

Open and reproducible research in materials science relies on the availability of data, code, and common metadata standards. Journal research data policies (RDPs) remain a primary mechanism by which publication norms are defined and enforced. We survey RDPs for 171 materials science journals spanning 17 publishers, using an expanded coding framework that captures both data-and-code sharing behavior as well as refereeing standards. We find clear signs of progress in comparison to earlier research on RDPs: nearly all journals provide an RDP, and most mention data availability statements. However, enforceable requirements remain uncommon, public deposition of underlying data is rarely mandatory, and FAIR publication is typically encouraged rather than required. Expectations for research software are substantially less developed than those for data, with limited attention to versioning and persistent identifiers, dependency disclosure, reproducible execution environments, or software quality practices. Aggregating the findings on policy features into an open research data score reveals pronounced heterogeneity across journals. Neither impact factor nor access model reliably predicts policy strength. Double-coding further shows that more complex policies and stricter policies can be more challenging to interpret consistently, and we highlight challenges in consistent RDP encoding across studies. Lastly, we conclude with recommended best practice directions for the future.

2026-03-11T11:07:29Z 15 pages, 4 figures, Lukas Hörmann Hemanadhan Myneni Rwayda Kh. S. Al-Hamd Katarina Batalović Silvia Bonfanti Federico Grasselli Saulius Gražulis Bahattin Koç Konstantinos Konstantinou Ivor Lončarić Nataliya Lopanitsyna José Manuel Oliveira Paolo Pegolo Patrícia Ramos Kevin Rossi Sebastian P. Schwaminger Edith Simmen Milica Todorović Markus Stricker Jonathan Schmidt http://arxiv.org/abs/2603.10285v1 Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums 2026-03-11T00:07:32Z

Recent digitisation efforts in natural history museums have produced large volumes of collection data, yet their scale and scientific complexity often hinder public access and understanding. Conventional data management tools, such as databases, restrict exploration through keyword-based search or require specialised schema knowledge. This paper presents a system design that uses conversational AI to query nearly 1.7 million digitised specimen records from the life-science collections of the Australian Museum. Designed and developed through a human-centred design process, the system contains an interactive map for visual-spatial exploration and a natural-language conversational agent that retrieves detailed specimen data and answers collection-specific questions. The system leverages function-calling capabilities of contemporary large language models to dynamically retrieve structured data from external APIs, enabling fast, real-time interaction with extensive yet frequently updated datasets. Our work provides a new approach of connecting large museum collections with natural language-based queries and informs future designs of scientific AI agents for natural history museums.

2026-03-11T00:07:32Z 25 pages, 9 figures Yiyuan Wang Andrew Johnston Zoë Sadokierski Rhiannon Stephens Shane T. Ahyong http://arxiv.org/abs/2308.07162v3 Evolution of funding for collaborative health research towards higher-level patient-oriented research. A comparison of the European Union Framework Programmes to the program funding by the United States National Institutes of Health 2026-03-10T21:18:40Z

Public research funding agencies increasingly seek to steer health research toward higher levels of translation and societal relevance. Yet it remains unclear to what extent such policy shifts are effectively implemented and reflected in funded projects and scientific outputs. This study examines evolution and changes in the orientation of health research portfolios since 2008 within European funding (Framework Programmes FP7 and Horizon 2020 funding for collaborative health research, FP-HR, and ERC Life Sciences grants), in comparison to NIH funding for collaborative research (P01, U01, and UM1). Using large-scale text analysis and supervised classification, we analyze both project descriptions and the associated scientific publications. At the project level, the EU FP-HR show pronounced shifts toward population-level, diagnostic, and health systems-oriented research, whereas investigator-driven ERC life sciences, NIH P01 and U01, display greater stability with a predominance of basic biomedical research. Publication-level analyses reveal more moderate changes, with basic biomedical research remaining a central component including in EU FP-HR, indicating partial translation of funding priorities into outputs. By jointly analyzing projects and publications, this study identifies and distinguishes between changes in funder expectations and realized research trajectories, highlighting how strategic funding shapes research portfolios within enduring epistemic and institutional constraints.

2023-08-14T14:17:34Z Quantitative Science Studies, 2026 David Fajardo-Ortiz Bart Thijs Wolfgang Glanzel Karin R. Sipido 10.1162/QSS.a.472 http://arxiv.org/abs/2603.08012v1 Structure-Preserving Graph Contrastive Learning for Mathematical Information Retrieval 2026-03-09T06:36:34Z

This paper introduces Variable Substitution as a domain-specific graph augmentation technique for graph contrastive learning (GCL) in the context of searching for mathematical formulas. Standard GCL augmentation techniques often distort the semantic meaning of mathematical formulas, particularly for small and highly structured graphs. Variable Substitution, on the other hand, preserves the core algebraic relationships and formula structure. To demonstrate the effectiveness of our technique, we apply it to a classic GCL-based retrieval model. Experiments show that this straightforward approach significantly improves retrieval performance compared to generic augmentation strategies. We release the code on GitHub.\footnote{https://github.com/lazywulf/formula_ret_aug}.

2026-03-09T06:36:34Z Chun-Hsi Ku Hung-Hsuan Chen http://arxiv.org/abs/2603.06839v1 From Job Postings to Curriculum Decisions: Using AI to Generate Workforce Intelligence for MSW Program Planning 2026-03-06T20:02:39Z

Social work programs lack systematic methods to align curricula with employer expectations, typically relying on advisory input and alumni surveys rather than direct analysis of workforce requirements. This paper presents a case study demonstrating how one MSW program used artificial intelligence tools to generate organizational intelligence from job posting data for curriculum planning. Using a locally deployed language model, we classified over 40,000 job postings for MSW relevance and alignment with eight practice specializations, then extracted skills, therapeutic modalities, and technology competencies. Interpersonal Practice dominated the employment landscape, followed by Children, Youth, and Families. Clinical Assessment and Case Management emerged as cross-cutting competencies. Macro-level specializations showed co-occurrence patterns among partially aligned positions that largely disappeared among positions requiring MSW credentials specifically. Trauma-informed care appeared in management and evaluation roles, reflecting its expansion from clinical modality to organizational framework. The methodology demonstrates a transferable approach that other programs can adapt for strategic planning, and the findings illustrate the type of intelligence such analysis can yield. The patterns identified entered faculty deliberation as one input among many, interpreted by stakeholders with contextual knowledge no dataset can fully capture.

2026-03-06T20:02:39Z Barbara S. Hiltz Bryan G. Victor Brian E. Perron http://arxiv.org/abs/2603.06814v1 AI-Assisted Curation of Conference Scholarship: Compiling, Structuring, and Analyzing Two Decades of Presentations at the Society for Social Work and Research 2026-03-06T19:19:29Z

Purpose: This study developed a comprehensive database of presentation abstracts from the Society for Social Work and Research (SSWR) Annual Conference and examined patterns in research methodology, authorship, collaboration, and institutional participation over two decades. Method: Abstract metadata was compiled from the SSWR Confex conference management system for presentations from 2005 to 2026 using web scraping. A small language model (gpt-oss:20b) performed classification and extraction tasks on abstracts, including categorization of methodologies and parsing of author affiliations, with human review at each major stage to ensure accuracy. Results: The database contains 23,793 presentations with 69,924 author records representing 20,779 unique researchers from 4,049 institutions across 93 countries. Annual conference presentations increased from 423 in 2005 to 1,935 in 2026, representing a compound annual growth rate of 7.5%. Quantitative methods predominated (61.1%), followed by qualitative approaches (23.4%), mixed methods (9.1%), and reviews (5.4%). The mean number of authors per presentation increased from 2.22 in 2005 to 3.31 in 2026. International participation grew from 4.5% to 13.5% of author affiliations over the observation period. Discussion: Findings indicate substantial growth in SSWR conference participation, alongside increased collaboration and international engagement. The methodological distribution reveals continued quantitative predominance with growing qualitative representation. This database provides research infrastructure for systematic hypothesis testing about research priorities and disciplinary development over time, enabling analyses that inform both scholarship and conference planning.

2026-03-06T19:19:29Z Brian Perron Bryan Victor Zia Qi http://arxiv.org/abs/2603.06436v1 Rethinking Thematic Evolution in Science Mapping: An Integrated Framework for Longitudinal Analysis 2026-03-06T16:16:04Z

Strategic diagrams and co-word analysis are widely employed to examine the conceptual structure of scientific domains and their development over time. Yet a structural inconsistency characterises dominant longitudinal implementations: themes are detected through relational clustering in weighted networks, whereas their inter-temporal connections are commonly inferred from set-theoretic overlap among keywords or core documents. This study introduces a structurally integrated framework in which lineage reconstruction is embedded within the same weighted relational architecture that underpins cross-sectional detection. The approach models thematic continuity through graded document affiliation and a lineage-strength measure that combines directional coverage with centrality-weighted structural relevance, thereby conceptualising evolution as the reconfiguration of relational structures rather than simple lexical persistence. By aligning thematic detection and temporal modelling within a unified relational paradigm, the framework enhances the methodological coherence and interpretive robustness of longitudinal science mapping.

2026-03-06T16:16:04Z Massimo Aria Luca D'Aniello Michelangelo Misuraca Maria Spano http://arxiv.org/abs/2404.01800v3 Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest 2026-03-06T09:10:59Z

Scientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles. By leveraging advanced natural language processing (NLP) techniques, ChatGPT can discern the nuanced positivity or negativity of citations, offering insights into the reception and impact of cited works. Furthermore, ChatGPT's capabilities extend to detecting potential biases and conflicts of interest in citations, enhancing the objectivity and reliability of scientific literature evaluation. This study showcases the transformative potential of artificial intelligence (AI)-powered tools in enhancing citation analysis and promoting integrity in scholarly research.

2024-04-02T09:59:49Z Walid Hariri