https://arxiv.org/api/3cJoX+iUpKuBAvERMLzCV+UBulM2026-06-10T14:38:12Z606124015http://arxiv.org/abs/2509.25298v2Trajectories and Comparative Analysis of Global Countries Dominating AI Publications, 2000-20252026-03-14T21:03:00ZThis study investigates the shifting global dynamics of Artificial Intelligence (AI) research by analysing the trajectories of countries dominating AI publications between 2000 and 2025. Drawing on the comprehensive OpenAlex datasets and employing fractional counting to avoid double attribution in co-authored work, the research maps the relative shares of AI publications across major global players. The analysis reveals a profound restructuring of the international AI research landscape. The US and the European Union (representing EU27), once the undisputed and established leaders, have experienced a notable decline in relative dominance, with their combined share of publications falling from over 57% in 2000 to less than 25% in 2025. In contrast, China has undergone a dramatic ascent, expanding its global share of AI publications from under 5% in 2000 to nearly 36% by 2025, therefore emerging as the single most dominant contributor. Alongside China, India has also risen substantially, consolidating a multipolar Asian research ecosystem. These empirical findings highlight the strategic implications of concentrated research output, particularly China's capacity to shape the future direction of AI innovation and standard-setting. Beyond publication volume, the study further examines research quality by comparing each country's share of high-impact publications against its overall output, and analyses citation impact trajectories across major players. The findings show that in addition to China leading in volume, the country has also recently led in high-impact publications. Such an observation challenges the general assumption that Western powers retain dominance in high-impact AI scholarship.2025-09-29T16:35:54Z22 pages, 12 figures, 7 tablesJason Hunghttp://arxiv.org/abs/2603.19303v1Agreement Between Large Language Models, Human Reviewers, and Authors in Evaluating STROBE Checklists for Observational Studies in Rheumatology2026-03-12T19:56:46ZIntroduction: Evaluating compliance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement can be time-consuming and subjective. This study compares STROBE assessments from large language models (LLMs), a human reviewer panel, and the original manuscript authors in observational rheumatology research. Methods: Guided by the GRRAS and DEAL Pathway B frameworks, 17 rheumatology articles were independently assessed. Evaluations used the 22-item STROBE checklist, completed by the authors, a five-person human panel (ranging from junior to senior professionals), and two LLMs (ChatGPT-5.2, Gemini-3Pro). Items were grouped into Methodological Rigor and Presentation and Context domains. Inter-rater reliability was calculated using Gwet's Agreement Coefficient (AC1). Results: Overall agreement across all reviewers was 85.0% (AC1=0.826). Domain stratification showed almost perfect agreement for Presentation and Context (AC1=0.841) and substantial agreement for Methodological Rigor (AC1=0.803). Although LLMs achieved complete agreement (AC1=1.000) with all human reviewers on standard formatting elements, their agreement with human reviewers and authors declined on complex items. For example, regarding the item on loss to follow-up, the agreement between Gemini 3 Pro and the senior reviewer was AC1=-0.252, while the agreement with the authors was only fair. Additionally, ChatGPT-5.2 generally demonstrated higher agreement with human reviewers than Gemini-3Pro on specific methodological items. Conclusion: While LLMs show potential for basic STROBE screening, their lower agreement with human experts on complex methodological items likely reflects a reliance on surface-level information. Currently, these models appear more reliable for standardizing straightforward checks than for replacing expert human judgment in evaluating observational research.2026-03-12T19:56:46Z19 pages, 2 figures, 2 supplementary figuresEmre BilginEbru OzturkMeera ShahLisa TrabocoRebecca EverittAi Lyn TanMarwan BukhariVincenzo VeneritoLatika Guptahttp://arxiv.org/abs/2603.11933v1Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library2026-03-12T13:46:42ZDigital Gregorian chant scholarship has for decades enjoyed the privilege of a large digital resource cataloguing chant sources: the Cantus ecosystem, with nearly 900,000 chants catalogued across more than 2000 sources. The Cantus Database data model and the Cantus ID mechanism has been adopted by 18 more chant databases, jointly accessible through the Cantus Index interface. However, this data has only been available piecemeal via the individual online user interfaces; computational methods have so far had only a limited opportunity to process these immense resources. To overcome this hurdle, we compiled CantusCorpus v1.0, a dataset that combines everything that was available across the Cantus Index-centered network of databases as of mid-2025, and we have also provided the code for updating the dataset as the databases grow. We then created the lightweight PyCantus library for working with this data. PyCantus decouples the data model from the Cantus codebase and thus allows integration of further chant data sources, which we illustrate with harmonising pilot data from the Corpus Monodicum project. Computational chant research is attractive - and CantusCorpus v1.0 and PyCantus are infrastructures that should make work in this field more transparent, replicable, and accessible to digital humanities practitioners beyond chant scholars themselves.2026-03-12T13:46:42ZAccepted to TISMIR Special Issue on Digital MusicologyAnna DvořákováTim EipertDebra LacosteJan Hajičhttp://arxiv.org/abs/2509.09596v2How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis2026-03-11T18:35:43ZThis study investigates how Large Language Models (LLMs) are influencing the language of academic papers by tracking 12 LLM-associated terms across six major scholarly databases (Scopus, Web of Science, PubMed, PubMed Central (PMC), Dimensions, and OpenAlex) from 2015 to 2024. Using over 2.4 million PMC open-access publications (2021-July 2025), we also analysed full texts to assess changes in the frequency and co-occurrence of these terms before and after ChatGPT's initial public release. Across databases, delve (+1,500%), underscore (+1,000%), and intricate (+700%) had the largest increases between 2022 and 2024. Growth in LLM-term usage was much higher in STEM fields than in social sciences and arts and humanities. In PMC full texts, the proportion of papers using underscore six or more times increased by over 10,000% from 2022 to 2025, followed by intricate (+5,400%) and meticulous (+2,800%). Nearly half of all 2024 PMC papers using any LLM term also included underscore, compared with only 3%-14% of papers before ChatGPT in 2022. Papers using one LLM term are now much more likely to include other terms. For example, in 2024, underscore strongly correlated with pivotal (0.449) and delve (0.311), compared with very weak associations in 2022 (0.032 and 0.018, respectively). These findings provide the first large-scale evidence based on full-text publications and multiple databases that some LLM-related terms are now being used much more frequently and together. The rapid uptake of LLMs to support scholarly publishing is a welcome development reducing the language barrier to academic publishing for non-English speakers.2025-09-11T16:35:54ZKayvan KoushaMike Thelwallhttp://arxiv.org/abs/2603.08935v2PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration2026-03-11T16:00:39ZPathology underpins modern diagnosis and cancer care, yet its most valuable asset, the accumulated experience encoded in millions of narrative reports, remains largely inaccessible. Although institutions are rapidly digitizing pathology workflows, storing data without effective mechanisms for retrieval and reasoning risks transforming archives into a passive data repository, where institutional knowledge exists but cannot meaningfully inform patient care. True progress requires not only digitization, but the ability for pathologists to interrogate prior similar cases in real time while evaluating a new diagnostic dilemma. We present PathoScribe, a unified retrieval-augmented large language model (LLM) framework designed to transform static pathology archives into a searchable, reasoning-enabled living library. PathoScribe enables natural language case exploration, automated cohort construction, clinical question answering, immunohistochemistry (IHC) panel recommendation, and prompt-controlled report transformation within a single architecture. Evaluated on 70,000 multi-institutional surgical pathology reports, PathoScribe achieved perfect Recall@10 for natural language case retrieval and demonstrated high-quality retrieval-grounded reasoning (mean reviewer score 4.56/5). Critically, the system operationalized automated cohort construction from free-text eligibility criteria, assembling research-ready cohorts in minutes (mean 9.2 minutes) with 91.3% agreement to human reviewers and no eligible cases incorrectly excluded, representing orders-of-magnitude reductions in time and cost compared to traditional manual chart review. This work establishes a scalable foundation for converting digital pathology archives from passive storage systems into active clinical intelligence platforms.2026-03-09T21:09:24ZAbdul Rehman AkbarSamuel Wales-McGrathAlejadro LevyaLina GokhaleRajendra SinghWei ChenAnil ParwaniMuhammad Khalid Khan Niazihttp://arxiv.org/abs/2603.10876v1An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?2026-03-11T15:24:20ZSubject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.2026-03-11T15:24:20Z9 pages, 5 figures. Accepted to appear in the Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)Jennifer D'SouzaSameer SadruddinMaximilian KählerAndrea SalfingerLuca ZaccagnaFrancesca IncittiLauro SnidaroOsma Suominenhttp://arxiv.org/abs/2604.16330v1A Collection of Systematic Reviews in Computer Science2026-03-11T12:30:05ZSystematic reviews are the standard method for synthesizing scientific evidence, but their creation requires substantial manual effort, particularly during retrieval and screening. While recent work has explored automating these steps, evaluation resources remain largely confined to the biomedical domain, limiting reproducible experimentation in other domains. This paper introduces SR4CS, a large-scale collection of systematic reviews in computer science, designed to support reproducible research on Boolean query generation, retrieval, and screening. The corpus comprises 1,212 systematic reviews with their original expert-designed Boolean search queries, 104,316 resolved references, and structured methodological metadata. For controlled evaluation, the original Boolean queries are additionally provided in a normalized, approximated form operating over titles and abstracts. To illustrate the intended use of the collection, baseline experiments compare the approximated expert Boolean queries with zero-shot LLM-generated Boolean queries, BM25, and dense retrieval under a unified evaluation setting. The results highlight systematic differences in precision, recall, and ranking behavior across retrieval paradigms and expose limitations of naive zero-shot Boolean generation. SR4CS is released under an open license on Zenodo (https://doi.org/10.5281/zenodo.17163932), together with documentation and code (https://github.com/webis-de/scolia26-sr4cs), to enable reproducible evaluation and future research on scaling systematic review automation.2026-03-11T12:30:05ZAccepted at SCOLIA26 WorkshopPierre AchkarTim Gollub amd Martin Potthasthttp://arxiv.org/abs/2603.19301v1Journal Research Data Policies in Materials Science2026-03-11T11:07:29ZOpen and reproducible research in materials science relies on the availability of data, code, and common metadata standards. Journal research data policies (RDPs) remain a primary mechanism by which publication norms are defined and enforced. We survey RDPs for 171 materials science journals spanning 17 publishers, using an expanded coding framework that captures both data-and-code sharing behavior as well as refereeing standards. We find clear signs of progress in comparison to earlier research on RDPs: nearly all journals provide an RDP, and most mention data availability statements. However, enforceable requirements remain uncommon, public deposition of underlying data is rarely mandatory, and FAIR publication is typically encouraged rather than required. Expectations for research software are substantially less developed than those for data, with limited attention to versioning and persistent identifiers, dependency disclosure, reproducible execution environments, or software quality practices. Aggregating the findings on policy features into an open research data score reveals pronounced heterogeneity across journals. Neither impact factor nor access model reliably predicts policy strength. Double-coding further shows that more complex policies and stricter policies can be more challenging to interpret consistently, and we highlight challenges in consistent RDP encoding across studies. Lastly, we conclude with recommended best practice directions for the future.2026-03-11T11:07:29Z15 pages, 4 figures,Lukas HörmannHemanadhan MyneniRwayda Kh. S. Al-HamdKatarina BatalovićSilvia BonfantiFederico GrasselliSaulius GražulisBahattin KoçKonstantinos KonstantinouIvor LončarićNataliya LopanitsynaJosé Manuel OliveiraPaolo PegoloPatrícia RamosKevin RossiSebastian P. SchwamingerEdith SimmenMilica TodorovićMarkus StrickerJonathan Schmidthttp://arxiv.org/abs/2603.10285v1Conversational AI-Enhanced Exploration System to Query Large-Scale Digitised Collections of Natural History Museums2026-03-11T00:07:32ZRecent digitisation efforts in natural history museums have produced large volumes of collection data, yet their scale and scientific complexity often hinder public access and understanding. Conventional data management tools, such as databases, restrict exploration through keyword-based search or require specialised schema knowledge. This paper presents a system design that uses conversational AI to query nearly 1.7 million digitised specimen records from the life-science collections of the Australian Museum. Designed and developed through a human-centred design process, the system contains an interactive map for visual-spatial exploration and a natural-language conversational agent that retrieves detailed specimen data and answers collection-specific questions. The system leverages function-calling capabilities of contemporary large language models to dynamically retrieve structured data from external APIs, enabling fast, real-time interaction with extensive yet frequently updated datasets. Our work provides a new approach of connecting large museum collections with natural language-based queries and informs future designs of scientific AI agents for natural history museums.2026-03-11T00:07:32Z25 pages, 9 figuresYiyuan WangAndrew JohnstonZoë SadokierskiRhiannon StephensShane T. Ahyonghttp://arxiv.org/abs/2308.07162v3Evolution of funding for collaborative health research towards higher-level patient-oriented research. A comparison of the European Union Framework Programmes to the program funding by the United States National Institutes of Health2026-03-10T21:18:40ZPublic research funding agencies increasingly seek to steer health research toward higher levels of translation and societal relevance. Yet it remains unclear to what extent such policy shifts are effectively implemented and reflected in funded projects and scientific outputs. This study examines evolution and changes in the orientation of health research portfolios since 2008 within European funding (Framework Programmes FP7 and Horizon 2020 funding for collaborative health research, FP-HR, and ERC Life Sciences grants), in comparison to NIH funding for collaborative research (P01, U01, and UM1). Using large-scale text analysis and supervised classification, we analyze both project descriptions and the associated scientific publications. At the project level, the EU FP-HR show pronounced shifts toward population-level, diagnostic, and health systems-oriented research, whereas investigator-driven ERC life sciences, NIH P01 and U01, display greater stability with a predominance of basic biomedical research. Publication-level analyses reveal more moderate changes, with basic biomedical research remaining a central component including in EU FP-HR, indicating partial translation of funding priorities into outputs. By jointly analyzing projects and publications, this study identifies and distinguishes between changes in funder expectations and realized research trajectories, highlighting how strategic funding shapes research portfolios within enduring epistemic and institutional constraints.2023-08-14T14:17:34ZQuantitative Science Studies, 2026David Fajardo-OrtizBart ThijsWolfgang GlanzelKarin R. Sipido10.1162/QSS.a.472http://arxiv.org/abs/2603.08012v1Structure-Preserving Graph Contrastive Learning for Mathematical Information Retrieval2026-03-09T06:36:34ZThis paper introduces Variable Substitution as a domain-specific graph augmentation technique for graph contrastive learning (GCL) in the context of searching for mathematical formulas. Standard GCL augmentation techniques often distort the semantic meaning of mathematical formulas, particularly for small and highly structured graphs. Variable Substitution, on the other hand, preserves the core algebraic relationships and formula structure. To demonstrate the effectiveness of our technique, we apply it to a classic GCL-based retrieval model. Experiments show that this straightforward approach significantly improves retrieval performance compared to generic augmentation strategies. We release the code on GitHub.\footnote{https://github.com/lazywulf/formula_ret_aug}.2026-03-09T06:36:34ZChun-Hsi KuHung-Hsuan Chenhttp://arxiv.org/abs/2603.06839v1From Job Postings to Curriculum Decisions: Using AI to Generate Workforce Intelligence for MSW Program Planning2026-03-06T20:02:39ZSocial work programs lack systematic methods to align curricula with employer expectations, typically relying on advisory input and alumni surveys rather than direct analysis of workforce requirements. This paper presents a case study demonstrating how one MSW program used artificial intelligence tools to generate organizational intelligence from job posting data for curriculum planning. Using a locally deployed language model, we classified over 40,000 job postings for MSW relevance and alignment with eight practice specializations, then extracted skills, therapeutic modalities, and technology competencies. Interpersonal Practice dominated the employment landscape, followed by Children, Youth, and Families. Clinical Assessment and Case Management emerged as cross-cutting competencies. Macro-level specializations showed co-occurrence patterns among partially aligned positions that largely disappeared among positions requiring MSW credentials specifically. Trauma-informed care appeared in management and evaluation roles, reflecting its expansion from clinical modality to organizational framework. The methodology demonstrates a transferable approach that other programs can adapt for strategic planning, and the findings illustrate the type of intelligence such analysis can yield. The patterns identified entered faculty deliberation as one input among many, interpreted by stakeholders with contextual knowledge no dataset can fully capture.2026-03-06T20:02:39ZBarbara S. HiltzBryan G. VictorBrian E. Perronhttp://arxiv.org/abs/2603.06814v1AI-Assisted Curation of Conference Scholarship: Compiling, Structuring, and Analyzing Two Decades of Presentations at the Society for Social Work and Research2026-03-06T19:19:29ZPurpose: This study developed a comprehensive database of presentation abstracts from the Society for Social Work and Research (SSWR) Annual Conference and examined patterns in research methodology, authorship, collaboration, and institutional participation over two decades.
Method: Abstract metadata was compiled from the SSWR Confex conference management system for presentations from 2005 to 2026 using web scraping. A small language model (gpt-oss:20b) performed classification and extraction tasks on abstracts, including categorization of methodologies and parsing of author affiliations, with human review at each major stage to ensure accuracy.
Results: The database contains 23,793 presentations with 69,924 author records representing 20,779 unique researchers from 4,049 institutions across 93 countries. Annual conference presentations increased from 423 in 2005 to 1,935 in 2026, representing a compound annual growth rate of 7.5%. Quantitative methods predominated (61.1%), followed by qualitative approaches (23.4%), mixed methods (9.1%), and reviews (5.4%). The mean number of authors per presentation increased from 2.22 in 2005 to 3.31 in 2026. International participation grew from 4.5% to 13.5% of author affiliations over the observation period.
Discussion: Findings indicate substantial growth in SSWR conference participation, alongside increased collaboration and international engagement. The methodological distribution reveals continued quantitative predominance with growing qualitative representation. This database provides research infrastructure for systematic hypothesis testing about research priorities and disciplinary development over time, enabling analyses that inform both scholarship and conference planning.2026-03-06T19:19:29ZBrian PerronBryan VictorZia Qihttp://arxiv.org/abs/2603.06436v1Rethinking Thematic Evolution in Science Mapping: An Integrated Framework for Longitudinal Analysis2026-03-06T16:16:04ZStrategic diagrams and co-word analysis are widely employed to examine the conceptual structure of scientific domains and their development over time. Yet a structural inconsistency characterises dominant longitudinal implementations: themes are detected through relational clustering in weighted networks, whereas their inter-temporal connections are commonly inferred from set-theoretic overlap among keywords or core documents. This study introduces a structurally integrated framework in which lineage reconstruction is embedded within the same weighted relational architecture that underpins cross-sectional detection. The approach models thematic continuity through graded document affiliation and a lineage-strength measure that combines directional coverage with centrality-weighted structural relevance, thereby conceptualising evolution as the reconfiguration of relational structures rather than simple lexical persistence. By aligning thematic detection and temporal modelling within a unified relational paradigm, the framework enhances the methodological coherence and interpretive robustness of longitudinal science mapping.2026-03-06T16:16:04ZMassimo AriaLuca D'AnielloMichelangelo MisuracaMaria Spanohttp://arxiv.org/abs/2404.01800v3Sentiment Analysis of Citations in Scientific Articles Using ChatGPT: Identifying Potential Biases and Conflicts of Interest2026-03-06T09:10:59ZScientific articles play a crucial role in advancing knowledge and informing research directions. One key aspect of evaluating scientific articles is the analysis of citations, which provides insights into the impact and reception of the cited works. This article introduces the innovative use of large language models, particularly ChatGPT, for comprehensive sentiment analysis of citations within scientific articles. By leveraging advanced natural language processing (NLP) techniques, ChatGPT can discern the nuanced positivity or negativity of citations, offering insights into the reception and impact of cited works. Furthermore, ChatGPT's capabilities extend to detecting potential biases and conflicts of interest in citations, enhancing the objectivity and reliability of scientific literature evaluation. This study showcases the transformative potential of artificial intelligence (AI)-powered tools in enhancing citation analysis and promoting integrity in scholarly research.2024-04-02T09:59:49ZWalid Hariri