https://arxiv.org/api/IhyAn2QUzBb1zgxwo56wSxd19Y42026-03-18T10:16:00Z5869015http://arxiv.org/abs/2508.08850v2APCs and citation impact of Gold OA articles authored by Ukrainian scholars before and during Russia's full-scale war against Ukraine (2020-2023)2026-03-17T17:41:27ZThis study first examines how APC expenditures, authorship patterns, and publishing venues of Ukrainian scholars changed between the pre-war (2020-2021) and wartime (2022-2023) periods. Second, it explores the extent to which APC levels are associated with the field-normalized citation impact (FNCI) of Gold Open Access articles authored by Ukrainian scholars. Statistical analysis revealed a small but significant correlation between APC amounts and citation impact, though the effect size was minimal, suggesting higher APCs did not substantially boost citations. APC waivers offered by major publishers such as Springer and Elsevier since 2022 resulted in only a slight increase in the number of articles authored solely by Ukrainian scholars. Despite these waivers, MDPI and Aluna maintained the largest shares. Between 2020 and 2023, the number of articles authored solely by Ukrainian scholars in foreign journals fell by 25.7 percent, and total APC spending declined by 24.6 percent, from 1.24 million EUR to 0.93 million EUR. Medicine accounted for the largest share of both articles and APC expenditure, with the majority published in Aluna journals.2025-08-12T11:16:39ZMyroslava Hladchenkohttp://arxiv.org/abs/2603.16816v1WildDepth: A Multimodal Dataset for 3D Wildlife Perception and Depth Estimation2026-03-17T17:19:43ZDepth estimation and 3D reconstruction have been extensively studied as core topics in computer vision. Starting from rigid objects with relatively simple geometric shapes, such as vehicles, the research has expanded to address general objects, including challenging deformable objects, such as humans and animals. However, for the animal, in particular, the majority of existing models are trained based on datasets without metric scale, which can help validate image-only models. To address this limitation, we present WildDepth, a multimodal dataset and benchmark suite for depth estimation, behavior detection, and 3D reconstruction from diverse categories of animals ranging from domestic to wild environments with synchronized RGB and LiDAR. Experimental results show that the use of multi-modal data improves depth reliability by up to 10% RMSE, while RGB-LiDAR fusion enhances 3D reconstruction fidelity by 12% in Chamfer distance. By releasing WildDepth and its benchmarks, we aim to foster robust multimodal perception systems that generalize across domains.2026-03-17T17:19:43ZMuhammad AamirNaoya MuramatsuSangyun ShinMatthew WijersJiaxing JhongXinyu HouAmir PatelAndrew Markhamhttp://arxiv.org/abs/2603.16637v1Organisational accounts engaged in scholarly communication on Twitter: Patterns of presence, activity and engagement2026-03-17T15:09:03ZOrganisational accounts are an integral part of the Twitter (now X) ecosystem. This study identified 9,842 research- and policy-related organisational accounts that had tweeted about scholarly publications by linking three global organisational databases (GRID, ROR, and Overton) with two altmetric databases containing Twitter data (Altmetric and the former Crossref Event Data). The resulting openly available dataset was used to examine organisational activity in scholarly communication across three dimensions: social media capital, tweeting activity, and engagement level. The results show that, compared to all Twitter users engaged in scholarly communication, organisational accounts hold a notable advantage in terms of follower bases and the proportion of scholarly tweets. Their scholarly tweets achieve high visibility through likes and retweets but perform weakly in generating more conversational forms of engagement, such as quotes and replies. Distinct patterns emerge across organisational categories: research facilities, in particular, demonstrate the strongest focus on scholarly tweeting, whereas government accounts are comparatively more successful in eliciting engagement across all metrics, including the more interactive ones. This study contributes both an open dataset of organisational accounts and a methodological framework for their identification, while also highlighting the important roles that organisations play in shaping scholarly discourse on social media.2026-03-17T15:09:03ZThis is the preprint of a paper accepted for publication in the Journal of Information Science (in press)Zohreh ZahediYanqing ZhangZekun HanEr-Te ZhengZhichao Fang10.1177/01655515261421164http://arxiv.org/abs/2506.18616v5A Formalization of the Ionescu-Tulcea Theorem in Mathlib2026-03-17T10:11:02ZWe describe the formalization of the Ionescu-Tulcea theorem, showing the existence of a probability measure on the space of trajectories of a Markov chain, in the proof assistant Lean using the integrated library Mathlib. We first present a mathematical proof before exposing the difficulties which arise when trying to formalize it, and how they were overcome. We then build on this work to formalize the construction of the product of an arbitrary family of probability measures.2025-06-23T13:24:06ZEtienne MarionENS de Lyonhttp://arxiv.org/abs/2509.15107v2Limitations of Public Chest Radiography Datasets for Artificial Intelligence: Label Quality, Domain Shift, Bias and Evaluation Challenges2026-03-16T17:35:20ZArtificial intelligence has shown significant promise in chest radiography, where deep learning models can approach radiologist-level diagnostic performance. Progress has been accelerated by large public datasets such as MIMIC-CXR, ChestX-ray14, PadChest, and CheXpert, which provide hundreds of thousands of labelled images with pathology annotations. However, these datasets also present important limitations. Automated label extraction from radiology reports introduces errors, particularly in handling uncertainty and negation, and radiologist review frequently disagrees with assigned labels. In addition, domain shift and population bias restrict model generalisability, while evaluation practices often overlook clinically meaningful measures. We conduct a systematic analysis of these challenges, focusing on label quality, dataset bias, and domain shift. Our cross-dataset domain shift evaluation across multiple model architectures revealed substantial external performance degradation, with pronounced reductions in AUPRC and F1 scores relative to internal testing. To assess dataset bias, we trained a source-classification model that distinguished datasets with near-perfect accuracy, and performed subgroup analyses showing reduced performance for minority age and sex groups. Finally, expert review by two board-certified radiologists identified significant disagreement with public dataset labels. Our findings highlight important clinical weaknesses of current benchmarks and emphasise the need for clinician-validated datasets and fairer evaluation frameworks.2025-09-18T16:13:11ZAmy RaffertyAjitha Rajanhttp://arxiv.org/abs/2603.15722v1A Framework and Prototype for a Navigable Map of Datasets in Engineering Design and Systems Engineering2026-03-16T17:08:20ZThe proliferation of data across the system lifecycle presents both a significant opportunity and a challenge for Engineering Design and Systems Engineering (EDSE). While this ``digital thread'' has the potential to drive innovation, the fragmented and inaccessible nature of existing datasets hinders method validation, limits reproducibility, and slows research progress. Unlike fields such as computer vision and natural language processing, which benefit from established benchmark ecosystems, engineering design research often relies on small, proprietary, or ad-hoc datasets. This paper addresses this challenge by proposing a systematic framework for a ``Map of Datasets in EDSE.'' The framework is built upon a multi-dimensional taxonomy designed to classify engineering datasets by domain, lifecycle stage, data type, and format, enabling faceted discovery. An architecture for an interactive discovery tool is detailed and demonstrated through a working prototype, employing a knowledge graph data model to capture rich semantic relationships between datasets, tools, and publications. An analysis of the current data landscape reveals underrepresented areas (``data deserts'') in early-stage design and system architecture, as well as relatively well-represented areas (``data oases'') in predictive maintenance and autonomous systems. The paper identifies key challenges in curation and sustainability and proposes mitigation strategies, laying the groundwork for a dynamic, community-driven resource to accelerate data-centric engineering research.2026-03-16T17:08:20Z10 pages, 3 figures, Submitted to ASME IDETC 2026-DAC22H. Sinan BankDaniel R. Herberhttp://arxiv.org/abs/2603.15416v1Estimating Absolute Web Crawl Coverage From Longitudinal Set Intersections2026-03-16T15:28:30ZWeb archives preserve portions of the web, but quantifying their completeness remains challenging. Prior approaches have estimated the coverage of a crawl by either comparing the outcomes of multiple crawlers, or by comparing the results of a single crawl to external ground truth datasets. We propose a method to estimate the absolute coverage of a crawl using only the archive's own longitudinal data, i.e., the data collected by multiple subsequent crawls. Our key insight is that coverage can be estimated from the empirical URL overlaps between subsequent crawls, which are in turn well described by a simple urn process. The parameters of the urn model can then be inferred from longitudinal crawl data using linear regression. Applied to our focused crawl configuration of the German Academic Web, with 15 semi-annual crawls between 2013-2021, we find a coverage of approximately 46 percent of the crawlable URL space for the stable crawl configuration regime. Our method is extremely simple, requires no external ground truth, and generalizes to any longitudinal focused crawl.2026-03-16T15:28:30ZMichael ParisGrigori ParisFabian Baumannhttp://arxiv.org/abs/2602.03864v2Have Large Language Models Enhanced the Way Civil & Environmental Engineers Write? A Quantitative Analysis of Scholarly Communication over 25 Years2026-03-16T15:16:52ZLarge language models (LLMs) have rapidly emerged in civil and environmental engineering (CEE) research, education, and practice as tools for project ideation, execution, and communication. However, it is unknown how prevalent LLM adoption is across CEE scholarship and whether it measurably alters research prose. Inspired by recent analyses of biomedical research, this study uses a vocabulary-based frequency-shift methodology to detect linguistic signals of LLM-assisted writing in a large corpus of CEE literature. A total of 149,452 abstracts published by the American Society of Civil Engineers from 2000 through 2025 are analyzed to quantify deviations from long-term vocabulary trends. Prior to the introduction of LLMs in 2022, CEE publications exhibit long-term trends toward longer abstracts and sentences, greater use of segmenting punctuation, higher required reading levels, and a shift toward active, first-person verb constructions. Beginning around 2023, however, the frequencies of many stylistic marker words (e.g., enhance) sharply depart from historical trajectories, accompanied by deviations in multiple semantic properties. Abstracts classified as likely LLM-assisted exhibit increased lexical diversity, comma use, and complexity, with reduced passive voice and hedging language, producing prose that is more segmented, complex, and confident. The AI contribution of this study lies in the use of natural language processing to identify population-level linguistic signals of LLM-assisted text, applied to quantify the prevalence of LLM use and its influence on the vocabulary, structure, and tone of engineering scholarly writing. Together, these findings provide the first large-scale, data-driven assessment of how LLMs are beginning to reshape scholarly communication in CEE.2026-01-28T01:02:55ZMorgan D. SangerBrett W. Maurerhttp://arxiv.org/abs/2603.14919v1Which stylistic features fool ChatGPT research evaluations?2026-03-16T07:25:46ZLarge Language Models (LLMs) have the potential to be used to support research evaluation and have a moderate capability to estimate the research quality of a journal article from its title and abstract. This paper assesses whether there are language-related factors unrelated to the quality of the research that influence ChatGPT's scores. Using a dataset of 99,277 journal articles submitted to the UK-wide Research Excellence Framework (REF) 2021 assessments, we calculated several readability indicators from abstracts and correlated them with ChatGPT scores and departmental REF scores. From the results, linguistic complexity and length were more strongly associated with ChatGPT research quality scores than with REF expert scores in many subject areas. Although cause-and-effect was not tested, these results suggest that ChatGPT may be more likely than human experts to reward linguistic complexity, with a potential bias towards longer and less readable abstracts in many fields. The apparent preference of LLMs for complex language is an undesirable feature for practical applications of LLMs for research quality evaluation, unless solutions can be found.2026-03-16T07:25:46ZKayvan KoushaMike Thelwallhttp://arxiv.org/abs/2603.14565v1Can Large Language Models Evaluate Grant Proposal Quality? Revisiting the Wennerås and Wold Peer Review Data2026-03-15T19:34:54ZPurpose: Despite the importance of peer review for grant funding decisions, academics are often reluctant to conduct it. This can lead to long delays between submission and the final decision as well as the risk of substandard reviews from busy or non-specialist scholars. At least one funder now uses Large Language Models (LLMs) to reduce the reviewing burden but the accuracy of LLMs for scoring grant proposals needs to be assessed. Design/methodology/approach: This article compares scores from a range of medium sized open weights LLMs with peer review scores for a well-researched dataset, the Swedish Medical Council's post-doctoral fellowship applications from 1994. Findings: Whilst the LLM scores correlate moderately between each other (mean Spearman correlation: 0.34), they correlated weakly but positively and mostly statistically significantly with the average expert scores (mean Spearman correlation: 0.22). The highest rank correlation between expert scores and LLMs was 0.33 for Gemma 3 27b based on proposal titles and summaries without their main texts, which is about half (56%) of the correlation between reviewers. Research limitations: The small sample size, old funding call and heterogeneous evaluation criteria all undermine the robustness of the analysis. Practical implications: Despite the ability of LLMs to score grant proposals being quantitatively weaker than that of experts, at least in this special case, they may have role in application triage or tie-breaking. Originality/value: This is the first assessment of the value of LLM scores for funding proposals.2026-03-15T19:34:54ZUlf SandströmMike Thelwallhttp://arxiv.org/abs/2507.15500v2Researcher Population Pyramids: Tracking Demographic and Gender Trajectories Across Countries2026-03-15T12:49:02ZThe sustainability of the academic ecosystem relies on researcher demographics and gender balance, yet assessing these dynamics in a timely manner for policy is challenging. Here, we propose a researcher population pyramid framework for tracking demographic and gender trajectories across countries using publication data. We provide a timely snapshot of historical and present demographics and gender balance across 58 countries, revealing three contrasting patterns among research systems: Emerging systems (e.g., Arab countries) exhibit high researcher inflows with widening gender gaps in cumulative productivity; Mature systems (e.g., the United States) show modest inflows with narrowing gender gaps; and Rigid systems (e.g., Japan) lag in both. Furthermore, by simulating future scenarios, the framework makes potential trajectories visible. If 2023 demographic patterns persist, Arab countries' systems could resemble mature or even rigid ones by 2050. Our framework provides a robust diagnostic tool for policymakers worldwide to foster sustainable talent pipelines and gender equality in academia.2025-07-21T11:05:02Z22 pages, 6 figures, 1 table, and Supplementary InformationKazuki NakajimaTakayuki Mizunohttp://arxiv.org/abs/2509.25298v2Trajectories and Comparative Analysis of Global Countries Dominating AI Publications, 2000-20252026-03-14T21:03:00ZThis study investigates the shifting global dynamics of Artificial Intelligence (AI) research by analysing the trajectories of countries dominating AI publications between 2000 and 2025. Drawing on the comprehensive OpenAlex datasets and employing fractional counting to avoid double attribution in co-authored work, the research maps the relative shares of AI publications across major global players. The analysis reveals a profound restructuring of the international AI research landscape. The US and the European Union (representing EU27), once the undisputed and established leaders, have experienced a notable decline in relative dominance, with their combined share of publications falling from over 57% in 2000 to less than 25% in 2025. In contrast, China has undergone a dramatic ascent, expanding its global share of AI publications from under 5% in 2000 to nearly 36% by 2025, therefore emerging as the single most dominant contributor. Alongside China, India has also risen substantially, consolidating a multipolar Asian research ecosystem. These empirical findings highlight the strategic implications of concentrated research output, particularly China's capacity to shape the future direction of AI innovation and standard-setting. Beyond publication volume, the study further examines research quality by comparing each country's share of high-impact publications against its overall output, and analyses citation impact trajectories across major players. The findings show that in addition to China leading in volume, the country has also recently led in high-impact publications. Such an observation challenges the general assumption that Western powers retain dominance in high-impact AI scholarship.2025-09-29T16:35:54Z22 pages, 12 figures, 7 tablesJason Hunghttp://arxiv.org/abs/2603.11933v1Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library2026-03-12T13:46:42ZDigital Gregorian chant scholarship has for decades enjoyed the privilege of a large digital resource cataloguing chant sources: the Cantus ecosystem, with nearly 900,000 chants catalogued across more than 2000 sources. The Cantus Database data model and the Cantus ID mechanism has been adopted by 18 more chant databases, jointly accessible through the Cantus Index interface. However, this data has only been available piecemeal via the individual online user interfaces; computational methods have so far had only a limited opportunity to process these immense resources. To overcome this hurdle, we compiled CantusCorpus v1.0, a dataset that combines everything that was available across the Cantus Index-centered network of databases as of mid-2025, and we have also provided the code for updating the dataset as the databases grow. We then created the lightweight PyCantus library for working with this data. PyCantus decouples the data model from the Cantus codebase and thus allows integration of further chant data sources, which we illustrate with harmonising pilot data from the Corpus Monodicum project. Computational chant research is attractive - and CantusCorpus v1.0 and PyCantus are infrastructures that should make work in this field more transparent, replicable, and accessible to digital humanities practitioners beyond chant scholars themselves.2026-03-12T13:46:42ZAccepted to TISMIR Special Issue on Digital MusicologyAnna DvořákováTim EipertDebra LacosteJan Hajičhttp://arxiv.org/abs/2509.09596v2How much are LLMs changing the language of academic papers after ChatGPT? A multi-database and full text analysis2026-03-11T18:35:43ZThis study investigates how Large Language Models (LLMs) are influencing the language of academic papers by tracking 12 LLM-associated terms across six major scholarly databases (Scopus, Web of Science, PubMed, PubMed Central (PMC), Dimensions, and OpenAlex) from 2015 to 2024. Using over 2.4 million PMC open-access publications (2021-July 2025), we also analysed full texts to assess changes in the frequency and co-occurrence of these terms before and after ChatGPT's initial public release. Across databases, delve (+1,500%), underscore (+1,000%), and intricate (+700%) had the largest increases between 2022 and 2024. Growth in LLM-term usage was much higher in STEM fields than in social sciences and arts and humanities. In PMC full texts, the proportion of papers using underscore six or more times increased by over 10,000% from 2022 to 2025, followed by intricate (+5,400%) and meticulous (+2,800%). Nearly half of all 2024 PMC papers using any LLM term also included underscore, compared with only 3%-14% of papers before ChatGPT in 2022. Papers using one LLM term are now much more likely to include other terms. For example, in 2024, underscore strongly correlated with pivotal (0.449) and delve (0.311), compared with very weak associations in 2022 (0.032 and 0.018, respectively). These findings provide the first large-scale evidence based on full-text publications and multiple databases that some LLM-related terms are now being used much more frequently and together. The rapid uptake of LLMs to support scholarly publishing is a welcome development reducing the language barrier to academic publishing for non-English speakers.2025-09-11T16:35:54ZKayvan KoushaMike Thelwallhttp://arxiv.org/abs/2603.08935v2PathoScribe: Transforming Pathology Data into a Living Library with a Unified LLM-Driven Framework for Semantic Retrieval and Clinical Integration2026-03-11T16:00:39ZPathology underpins modern diagnosis and cancer care, yet its most valuable asset, the accumulated experience encoded in millions of narrative reports, remains largely inaccessible. Although institutions are rapidly digitizing pathology workflows, storing data without effective mechanisms for retrieval and reasoning risks transforming archives into a passive data repository, where institutional knowledge exists but cannot meaningfully inform patient care. True progress requires not only digitization, but the ability for pathologists to interrogate prior similar cases in real time while evaluating a new diagnostic dilemma. We present PathoScribe, a unified retrieval-augmented large language model (LLM) framework designed to transform static pathology archives into a searchable, reasoning-enabled living library. PathoScribe enables natural language case exploration, automated cohort construction, clinical question answering, immunohistochemistry (IHC) panel recommendation, and prompt-controlled report transformation within a single architecture. Evaluated on 70,000 multi-institutional surgical pathology reports, PathoScribe achieved perfect Recall@10 for natural language case retrieval and demonstrated high-quality retrieval-grounded reasoning (mean reviewer score 4.56/5). Critically, the system operationalized automated cohort construction from free-text eligibility criteria, assembling research-ready cohorts in minutes (mean 9.2 minutes) with 91.3% agreement to human reviewers and no eligible cases incorrectly excluded, representing orders-of-magnitude reductions in time and cost compared to traditional manual chart review. This work establishes a scalable foundation for converting digital pathology archives from passive storage systems into active clinical intelligence platforms.2026-03-09T21:09:24ZAbdul Rehman AkbarSamuel Wales-McGrathAlejadro LevyaLina GokhaleRajendra SinghWei ChenAnil ParwaniMuhammad Khalid Khan Niazi