https://arxiv.org/api/zucYYx7BDM8HfHLpEdjFrI3hgL0 2026-06-10T16:31:26Z 6061 270 15 http://arxiv.org/abs/2602.23921v1 CA20108 COST Action: A Methodology for Developing FAIR Micrometeorological Networks 2026-02-27T11:10:09Z This article reports the outcomes of the FAIRNESS COST Action (CA20108), a coordinated European initiative aimed at advancing micrometeorological data toward compliance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The article presents three core achievements: (i) a structured inventory of urban and rural micrometeorological networks across Europe; (ii) the design and deployment of the FAIR Micrometeorological Portal, providing a digital infrastructure for data discovery, access, and standardized metadata description; and (iii) methodological guidance for quality control, gap detection, and gap filling tailored to the specific characteristics of micrometeorological time series. By providing both technical infrastructure and community-driven standards, the FAIRNESS outputs advance micrometeorological data from isolated datasets into coherent, reusable resources. Beyond technical developments, the FAIRNESS systematically addressed gaps in knowledge and skills within the micrometeorological community. A key outcome is the beginner-oriented book Micrometeorological Measurements - An Introduction for Beginners, which provides structured guidance on measurement design, instrumentation, data management, and quality assurance. In parallel, FAIRNESS implemented a comprehensive capacity-building programme, including summer schools, workshops, and short-term scientific missions, targeting both domain-specific competencies and transferable skills such as FAIR data stewardship, interdisciplinary collaboration, and practical problem solving. Together, these efforts contribute to strengthening the long-term usability of micrometeorological data and fostering a more integrated, FAIR-oriented research culture within the European meteorological community. 2026-02-27T11:10:09Z 25 pages Branislava Lalic Josef Eitzinger Stevan Savić Ana Firanj Sremac Michael Scriney Mark Roantree http://arxiv.org/abs/2603.13271v1 Tracing the Evolution of Word Embedding Techniques in Natural Language Processing 2026-02-27T04:06:19Z This work traces the evolution of word-embedding techniques within the natural language processing (NLP) literature. We collect and analyze 149 research articles spanning the period from 1954 to 2025, providing both a comprehensive methodological review and a data-driven bibliometric analysis of how representation learning has developed over seven decades. Our study covers four major embedding paradigms, statistical representation-based methods (one-hot encoding, bag-of-words, TF-IDF), static word embeddings (Word2Vec, GloVe, FastText), contextual word embeddings (ELMo, BERT, GPT), and sentence/document embeddings, critically discussing the strengths, limitations, and intellectual lineage connecting each category. Beyond the methodological survey, we conduct a formal era comparison using GPT-3's release as a dividing line, applying seven hypothesis tests to quantify shifts in research focus, collaboration patterns, and institutional involvement. Our analysis reveals a dramatic post-GPT-3 paradigm shift: contextual and sentence-level methods now dominate at 6.4X the odds of the pre-GPT-3 era, mean team sizes have grown significantly (p = 0.018), and 30 entirely new techniques have emerged while 54 pre-GPT-3 methods received no further attention. These findings, combined with evidence of rising industry involvement, provide a quantitative account of how the field's epistemic priorities have been reshaped by the advent of large language models. 2026-02-27T04:06:19Z Minh Anh Nguyen Kuheli Sai Minh Nguyen http://arxiv.org/abs/2601.11542v2 The Credibility Revolution in Political Science 2026-02-26T07:50:51Z How has the credibility revolution shaped political science? We address this question by classifying 91,632 articles published between 2003 and 2023 across 156 political science journals using large language models, focusing on research design, credibility-enhancing practices, and citation patterns. We find that design-based studies -- those leveraging plausibly exogenous variation to justify causal claims -- have become increasingly common and receive a citation premium. In contrast, model-based approaches that rely on strong modeling assumptions have declined. Yet the rise of design-based work is uneven: it is concentrated in top journals and among authors at highly ranked institutions, and it is driven primarily by the growth of survey experiments. Other credibility-enhancing practices that help reduce false positives and false negatives, such as placebo tests and power calculations, remain rare. Taken together, our findings point to substantial but selective change, more consistent with a partial reform than a revolution. 2025-12-02T00:37:31Z Carolina Torreblanca William Dinneen Guy Grossman Yiqing Xu http://arxiv.org/abs/2510.22426v2 Can ChatGPT be a good follower of academic paradigms? Research quality evaluations in conflicting areas of sociology 2026-02-26T07:32:31Z Purpose: It has become increasingly likely that Large Language Models (LLMs) will be used to score the quality of academic publications to support research assessment goals in the future. This may cause problems for fields with competing paradigms since there is a risk that one may be favoured, causing long term harm to the reputation of the other. Design/methodology/approach: To test whether this is plausible, this article uses 17 ChatGPTs to evaluate up to 100 journal articles from each of eight pairs of competing sociology paradigms (1490 altogether). Each article was assessed by prompting ChatGPT to take one of five roles: paradigm follower, opponent, antagonistic follower, antagonistic opponent, or neutral. Findings: Articles were scored highest by ChatGPT when it followed the aligning paradigm, and lowest when it was told to devalue it and to follow the opposing paradigm. Broadly similar patterns occurred for most of the paradigm pairs. Follower ChatGPTs displayed only a small amount of favouritism compared to neutral ChatGPTs, but articles evaluated by an opposing paradigm ChatGPT had a substantial disadvantage. Research limitations: The data covers a single field and LLM. Practical implications: The results confirm that LLM instructions for research evaluation should be carefully designed to ensure that they are paradigm-neutral to avoid accidentally resolving conflicts between paradigms on a technicality by devaluing one side's contributions. Originality/value: This is the first demonstration that LLMs can be prompted to show a partiality for academic paradigms. 2025-10-25T20:06:18Z Mike Thelwall Ralph Schroeder Meena Dhanda http://arxiv.org/abs/2602.22529v1 Generative Agents Navigating Digital Libraries 2026-02-26T02:08:39Z In the rapidly evolving field of digital libraries, the development of large language models (LLMs) has opened up new possibilities for simulating user behavior. This innovation addresses the longstanding challenge in digital library research: the scarcity of publicly available datasets on user search patterns due to privacy concerns. In this context, we introduce Agent4DL, a user search behavior simulator specifically designed for digital library environments. Agent4DL generates realistic user profiles and dynamic search sessions that closely mimic actual search strategies, including querying, clicking, and stopping behaviors tailored to specific user profiles. Our simulator's accuracy in replicating real user interactions has been validated through comparisons with real user data. Notably, Agent4DL demonstrates competitive performance compared to existing user search simulators such as SimIIR 2.0, particularly in its ability to generate more diverse and context-aware user behaviors. 2026-02-26T02:08:39Z Proceedings of the 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024 Saber Zerhoudi Michael Granitzer 10.1007/978-981-96-0865-2_14 http://arxiv.org/abs/2505.06721v3 Behind the Byline: A Large-Scale Study of Scientific Author Contributions 2026-02-25T18:32:55Z Understanding how co-authors distribute credit is critical for accurately assessing scholarly collaboration. In this study, we uncover the implicit structures within scientific teamwork by systematically analyzing author contributions across a large corpus of research publications. We introduce a computational framework designed to convert free-text contribution statements into 14 standardized CRediT categories, identifying clear and consistent positional patterns in task assignments. By analyzing over 400,000 scientific articles from prominent sources such as PLOS One and Nature, we extracted and standardized more than 5.6 million author-task assignments corresponding to 1.58 million author mentions. Our analysis reveals substantial disparities in workload distribution. Notably, in small teams with three co-authors, the most engaged contributor performs over three times more tasks than the least engaged, a disparity that grows linearly with team size. This demonstrates a consistent pattern of central and peripheral roles within modern collaborative teams. Moreover, our analysis shows distinct positional biases in task allocation: technical responsibilities, such as software development and formal analysis, broadly fall to authors positioned earlier in the author list, whereas managerial tasks, including supervision and funding acquisition, increasingly concentrate among authors positioned toward the end. This gradient underscores a significant division of labor, where early-listed authors mainly undertake most hands-on activities. In contrast, senior authors mostly assume roles involving leadership and oversight. Our findings highlight the structured and hierarchical organization within scholarly collaborations, providing deeper insights into the specific roles and dynamics that govern academic teamwork 2025-05-10T18:02:55Z 15 (include references and appendix sections) and 8 figures (and 1 in the appendix section) Itai Assraf Michael Fire http://arxiv.org/abs/2602.21926v1 Bridging Through Absence: How Comeback Researchers Bridge Knowledge Gaps Through Structural Re-emergence 2026-02-25T14:04:03Z Understanding the role of researchers who return to academia after prolonged inactivity, termed "comeback researchers", is crucial for developing inclusive models of scientific careers. This study investigates the structural and semantic behaviors of comeback researchers, focusing on their role in cross-disciplinary knowledge transfer and network reintegration. Using the AMiner citation dataset, we analyze 113,637 early-career researchers and identify 1,425 comeback cases based on a three-year-or-longer publication gap followed by renewed activity. We find that comeback researchers cite 126% more distinct communities and exhibit 7.6% higher bridging scores compared to dropouts. They also demonstrate 74% higher gap entropy, reflecting more irregular yet strategically impactful publication trajectories. Predictive models trained on these bridging- and entropy-based features achieve a 97% ROC-AUC, far outperforming the 54% ROC-AUC of baseline models using traditional metrics like publication count and h-index. Finally, we substantiate these results via a multi-lens validation. These findings highlight the unique contributions of comeback researchers and offer data-driven tools for their early identification and institutional support. 2026-02-25T14:04:03Z Preprint; 25 pages, 14 figures, 7 tables, Submitted to Scientometrics 2025 Somyajit Chakraborty Angshuman Jana Avijit Gayen http://arxiv.org/abs/2602.22276v1 EmpiRE-Compass: A Neuro-Symbolic Dashboard for Sustainable and Dynamic Knowledge Exploration, Synthesis, and Reuse 2026-02-25T09:58:20Z Software engineering (SE) and requirements engineering (RE) face a significant increase in secondary studies, particularly literature reviews (LRs), due to the ever-growing number of scientific publications. Generative artificial intelligence (GenAI) exacerbates this trend by producing LRs rapidly but often at the expense of quality, rigor, and transparency. At the same time, secondary studies often fail to share underlying data and artifacts, limiting replication and reuse. This paper introduces EmpiRE-Compass, a neuro-symbolic dashboard designed to lower barriers for accessing, replicating, and reusing LR data. Its overarching goal is to demonstrate how LRs can become more sustainable by semantically structuring their underlying data in research knowledge graphs (RKGs) and by leveraging large language models (LLMs) for easy and dynamic access, replication, and reuse. Building on two RE use cases, we developed EmpiRE-Compass with a modular system design and workflows for curated and custom competency questions. The dashboard is freely available online, accompanied by a demonstration video. To manage operational costs, a limit of 25 requests per IP address per day applies to the default LLM (GPT-4o mini). All source code and documentation are released as an open-source project to foster reuse, adoption, and extension. EmpiRE-Compass provides three core capabilities: (1) Exploratory visual analytics for curated competency questions; (2) Neuro-symbolic synthesis for custom competency questions; and (3) Reusable knowledge with all queries, analyses, and results openly available. By unifying RKGs and LLMs in a neuro-symbolic dashboard, EmpiRE-Compass advances sustainable LRs in RE, SE, and beyond. It lowers technical barriers, fosters transparency and reproducibility, and enables collaborative, continuously updated, and reusable LRs 2026-02-25T09:58:20Z 7 pages, 1 figure, Accepted at 32nd International Working Conference on Requirements Engineering: Foundations for Software Quality Oliver Karras Amirreza Alasti Lena John Sushant Aggarwal Yücel Celik http://arxiv.org/abs/2602.19711v1 A Three-stage Neuro-symbolic Recommendation Pipeline for Cultural Heritage Knowledge Graphs 2026-02-23T11:02:13Z The growing volume of digital cultural heritage resources highlights the need for advanced recommendation methods capable of interpreting semantic relationships between heterogeneous data entities. This paper presents a complete methodology for implementing a hybrid recommendation pipeline integrating knowledge-graph embeddings, approximate nearest-neighbour search, and SPARQL-driven semantic filtering. The work is evaluated on the JUHMP (Jagiellonian University Heritage Metadata Portal) knowledge graph developed within the CHExRISH project, which at the time of experimentation contained ${\approx}3.2$M RDF triples describing people, events, objects, and historical relations affiliated with the Jagiellonian University (Kraków, PL). We evaluate four embedding families (TransE, ComplEx, ConvE, CompGCN) and perform hyperparameter selection for ComplEx and HNSW. Then, we present and evaluate the final three-stage neuro-symbolic recommender. Despite sparse and heterogeneous metadata, the approach produces useful and explainable recommendations, which were also proven with expert evaluation. 2026-02-23T11:02:13Z 15 pages, 1 figure; submitted to ICCS 2026 conference Krzysztof Kutt Elżbieta Sroka Oleksandra Ishchuk Luiz do Valle Miranda http://arxiv.org/abs/2602.19698v1 Iconographic Classification and Content-Based Recommendation for Digitized Artworks 2026-02-23T10:44:27Z We present a proof-of-concept system that automates iconographic classification and content-based recommendation of digitized artworks using the Iconclass vocabulary and selected artificial intelligence methods. The prototype implements a four-stage workflow for classification and recommendation, which integrates YOLOv8 object detection with algorithmic mappings to Iconclass codes, rule-based inference for abstract meanings, and three complementary recommenders (hierarchical proximity, IDF-weighted overlap, and Jaccard similarity). Although more engineering is still needed, the evaluation demonstrates the potential of this solution: Iconclass-aware computer vision and recommendation methods can accelerate cataloging and enhance navigation in large heritage repositories. The key insight is to let computer vision propose visible elements and to use symbolic structures (Iconclass hierarchy) to reach meaning. 2026-02-23T10:44:27Z 14 pages, 7 figures; submitted to ICCS 2026 conference Krzysztof Kutt Maciej Baczyński http://arxiv.org/abs/2602.19197v1 How Ten Publishers Retract Research 2026-02-22T13:58:22Z Retractions are the primary mechanism for correcting the scholarly record, yet publishers differ markedly in how they use them. We present a bibliometric analysis of 46,087 retractions across 10 major publishers using data from the Retraction Watch database (1997-2026), examining retraction rates, reasons, temporal trends, and geographic distributions, among other dimensions. Normalized retraction rates vary by two orders of magnitude, from Elsevier's 3.97 per 10,000 publications to Hindawi's 320.02. China-affiliated authors account for the largest share of retractions at every publisher. Retraction lags and reason profiles also vary widely across publishers. Among the ten publishers, ACM is an outlier in its retraction profile. ACM's normalized rate is mid-range (5.65), yet 98.3% of its 354 retractions are related to one incident. Seven of the ten most common global retraction reasons (including misconduct, plagiarism, and data concerns) are entirely absent from ACM's record. ACM's first retraction dates to 2020, despite a catalog dating to 1997. ACM self-describes its retraction threshold as "extremely high." We discuss this threshold in relation to the COPE retraction guidelines and the implications of ACM's non-public dark archive of removed works. 2026-02-22T13:58:22Z 43 pages, 7 figures, 13 tables Jonas Oppenlaender http://arxiv.org/abs/2602.19115v1 How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders 2026-02-22T10:12:20Z In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features that capture key aspects of how research quality is represented: 1) features reflecting research methodologies; 2) features related to publication type, with literature reviews typically exhibiting higher impact; 3) features associated with high-impact research fields and technologies; and 4) features corresponding to specific scientific jargons. These findings represent an important step toward understanding how LLMs encapsulate concepts related to research quality. 2026-02-22T10:12:20Z Presented at SESAME 2025: Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models, @ JCDL 2025 Michael McCoubrey Angelo Salatino Francesco Osborne Enrico Motta http://arxiv.org/abs/2602.18935v1 Responsible Intelligence in Practice: A Fairness Audit of Open Large Language Models for Library Reference Services 2026-02-21T19:05:03Z As libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI adoption in libraries and the importance of ongoing monitoring in aligning LLM-based services with core professional values. 2026-02-21T19:05:03Z Invited chapter for the edited volume Artificial Intelligence and Social Justice Intersections in Library and Information Studies: Challenges and Opportunities (Emerald Group Publishing, in preparation) Haining Wang Jason Clark Angelica Peña http://arxiv.org/abs/2603.19246v1 Speed and impact of team science during urgent societal events 2026-02-20T19:24:10Z Urgent societal events demand scientific responses that are both rapid and impactful. Through an adversarial collaboration, we connected bibliometric databases to evaluate the speed and impact of over 2 million scientific publications in the three years following 48 urgent societal events. A pilot analysis of three cases -- the 2022 release of ChatGPT, the 2019 COVID-19 pandemic, and the 2001 World Trade Center attacks -- yielded unexpected patterns: larger teams were not only more impactful but also quicker to publish. More precisely, increases in team size were associated with (a) initial increases, but eventual diminishing returns in academic citations, (b) curvilinear returns in news and policy document citations, and (c) curvilinear returns in terms of how quickly papers were published. In other words, there are points where further increases in team sizes are either marginally helpful (diminishing returns) or counterproductive (curvilinear returns). To evaluate robustness, we pre-registered a broader test covering 45 additional events spanning two decades. 2026-02-20T19:24:10Z Nicholas A. Coles Joao Francisco Goes Braga Takayanagi Stephen M. Fiore Lingfei Wu http://arxiv.org/abs/2602.18264v1 A Curated Literature Database for Monitoring More Than 30 Years of Ansys Granta Product Usage 2026-02-20T14:47:27Z Engineering and materials software is increasingly difficult to track in the scholarly and technical literature because publication volume is growing rapidly and software citation practices remain inconsistent. This is particularly true for the Ansys Granta product family, which is used for materials education, materials and process selection, sustainability-driven design, and enterprise materials information management. We present a structured and reproducible framework to consolidate evidence of \emph{operational} Granta usage and to support quantitative monitoring of adoption patterns, application domains, and technical impact. The framework is implemented as a curated reference database in \textit{Ansys Granta MI Enterprise}: bibliographic metadata are ingested semi-automatically (e.g., via DOI and citation-file parsing) and complemented by expert curation of usage descriptors (product, context, application domain, and technical depth), with relational links to authors and institutions. Downstream analytics are performed with Python, dashboards, and bibliometric/network visualization tools to enable reproducible querying and reporting. As of September~2025, the database contains more than 1{,}100 curated records spanning journals, conferences, theses, books, patents, standards, and reports, and supports rapid retrieval of validated case studies, reproducible literature reviews, and technology scouting. Example analyses highlight dominant domains, key institutions, and recurring integrations with CAD/CAE/FEM environments. Overall, the approach converts heterogeneous software-usage evidence into structured, analyzable knowledge to improve visibility of engineering software impact and to support evidence-based assessment and strategic decision-making. 2026-02-20T14:47:27Z David Mercier