https://arxiv.org/api/zucYYx7BDM8HfHLpEdjFrI3hgL02026-06-10T16:31:26Z606127015http://arxiv.org/abs/2602.23921v1CA20108 COST Action: A Methodology for Developing FAIR Micrometeorological Networks2026-02-27T11:10:09ZThis article reports the outcomes of the FAIRNESS COST Action (CA20108), a coordinated European initiative aimed at advancing micrometeorological data toward compliance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles. The article presents three core achievements: (i) a structured inventory of urban and rural micrometeorological networks across Europe; (ii) the design and deployment of the FAIR Micrometeorological Portal, providing a digital infrastructure for data discovery, access, and standardized metadata description; and (iii) methodological guidance for quality control, gap detection, and gap filling tailored to the specific characteristics of micrometeorological time series. By providing both technical infrastructure and community-driven standards, the FAIRNESS outputs advance micrometeorological data from isolated datasets into coherent, reusable resources. Beyond technical developments, the FAIRNESS systematically addressed gaps in knowledge and skills within the micrometeorological community. A key outcome is the beginner-oriented book Micrometeorological Measurements - An Introduction for Beginners, which provides structured guidance on measurement design, instrumentation, data management, and quality assurance. In parallel, FAIRNESS implemented a comprehensive capacity-building programme, including summer schools, workshops, and short-term scientific missions, targeting both domain-specific competencies and transferable skills such as FAIR data stewardship, interdisciplinary collaboration, and practical problem solving. Together, these efforts contribute to strengthening the long-term usability of micrometeorological data and fostering a more integrated, FAIR-oriented research culture within the European meteorological community.2026-02-27T11:10:09Z25 pagesBranislava LalicJosef EitzingerStevan Savić Ana Firanj SremacMichael ScrineyMark Roantreehttp://arxiv.org/abs/2603.13271v1Tracing the Evolution of Word Embedding Techniques in Natural Language Processing2026-02-27T04:06:19ZThis work traces the evolution of word-embedding techniques within the natural language processing (NLP) literature. We collect and analyze 149 research articles spanning the period from 1954 to 2025, providing both a comprehensive methodological review and a data-driven bibliometric analysis of how representation learning has developed over seven decades. Our study covers four major embedding paradigms, statistical representation-based methods (one-hot encoding, bag-of-words, TF-IDF), static word embeddings (Word2Vec, GloVe, FastText), contextual word embeddings (ELMo, BERT, GPT), and sentence/document embeddings, critically discussing the strengths, limitations, and intellectual lineage connecting each category. Beyond the methodological survey, we conduct a formal era comparison using GPT-3's release as a dividing line, applying seven hypothesis tests to quantify shifts in research focus, collaboration patterns, and institutional involvement. Our analysis reveals a dramatic post-GPT-3 paradigm shift: contextual and sentence-level methods now dominate at 6.4X the odds of the pre-GPT-3 era, mean team sizes have grown significantly (p = 0.018), and 30 entirely new techniques have emerged while 54 pre-GPT-3 methods received no further attention. These findings, combined with evidence of rising industry involvement, provide a quantitative account of how the field's epistemic priorities have been reshaped by the advent of large language models.2026-02-27T04:06:19ZMinh Anh NguyenKuheli SaiMinh Nguyenhttp://arxiv.org/abs/2601.11542v2The Credibility Revolution in Political Science2026-02-26T07:50:51ZHow has the credibility revolution shaped political science? We address this question by classifying 91,632 articles published between 2003 and 2023 across 156 political science journals using large language models, focusing on research design, credibility-enhancing practices, and citation patterns. We find that design-based studies -- those leveraging plausibly exogenous variation to justify causal claims -- have become increasingly common and receive a citation premium. In contrast, model-based approaches that rely on strong modeling assumptions have declined. Yet the rise of design-based work is uneven: it is concentrated in top journals and among authors at highly ranked institutions, and it is driven primarily by the growth of survey experiments. Other credibility-enhancing practices that help reduce false positives and false negatives, such as placebo tests and power calculations, remain rare. Taken together, our findings point to substantial but selective change, more consistent with a partial reform than a revolution.2025-12-02T00:37:31ZCarolina TorreblancaWilliam DinneenGuy GrossmanYiqing Xuhttp://arxiv.org/abs/2510.22426v2Can ChatGPT be a good follower of academic paradigms? Research quality evaluations in conflicting areas of sociology2026-02-26T07:32:31ZPurpose: It has become increasingly likely that Large Language Models (LLMs) will be used to score the quality of academic publications to support research assessment goals in the future. This may cause problems for fields with competing paradigms since there is a risk that one may be favoured, causing long term harm to the reputation of the other. Design/methodology/approach: To test whether this is plausible, this article uses 17 ChatGPTs to evaluate up to 100 journal articles from each of eight pairs of competing sociology paradigms (1490 altogether). Each article was assessed by prompting ChatGPT to take one of five roles: paradigm follower, opponent, antagonistic follower, antagonistic opponent, or neutral. Findings: Articles were scored highest by ChatGPT when it followed the aligning paradigm, and lowest when it was told to devalue it and to follow the opposing paradigm. Broadly similar patterns occurred for most of the paradigm pairs. Follower ChatGPTs displayed only a small amount of favouritism compared to neutral ChatGPTs, but articles evaluated by an opposing paradigm ChatGPT had a substantial disadvantage. Research limitations: The data covers a single field and LLM. Practical implications: The results confirm that LLM instructions for research evaluation should be carefully designed to ensure that they are paradigm-neutral to avoid accidentally resolving conflicts between paradigms on a technicality by devaluing one side's contributions. Originality/value: This is the first demonstration that LLMs can be prompted to show a partiality for academic paradigms.2025-10-25T20:06:18ZMike ThelwallRalph SchroederMeena Dhandahttp://arxiv.org/abs/2602.22529v1Generative Agents Navigating Digital Libraries2026-02-26T02:08:39ZIn the rapidly evolving field of digital libraries, the development of large language models (LLMs) has opened up new possibilities for simulating user behavior. This innovation addresses the longstanding challenge in digital library research: the scarcity of publicly available datasets on user search patterns due to privacy concerns. In this context, we introduce Agent4DL, a user search behavior simulator specifically designed for digital library environments. Agent4DL generates realistic user profiles and dynamic search sessions that closely mimic actual search strategies, including querying, clicking, and stopping behaviors tailored to specific user profiles. Our simulator's accuracy in replicating real user interactions has been validated through comparisons with real user data. Notably, Agent4DL demonstrates competitive performance compared to existing user search simulators such as SimIIR 2.0, particularly in its ability to generate more diverse and context-aware user behaviors.2026-02-26T02:08:39ZProceedings of the 26th International Conference on Asia-Pacific Digital Libraries, ICADL 2024Saber ZerhoudiMichael Granitzer10.1007/978-981-96-0865-2_14http://arxiv.org/abs/2505.06721v3Behind the Byline: A Large-Scale Study of Scientific Author Contributions2026-02-25T18:32:55ZUnderstanding how co-authors distribute credit is critical for accurately assessing scholarly collaboration. In this study, we uncover the implicit structures within scientific teamwork by systematically analyzing author contributions across a large corpus of research publications. We introduce a computational framework designed to convert free-text contribution statements into 14 standardized CRediT categories, identifying clear and consistent positional patterns in task assignments. By analyzing over 400,000 scientific articles from prominent sources such as PLOS One and Nature, we extracted and standardized more than 5.6 million author-task assignments corresponding to 1.58 million author mentions. Our analysis reveals substantial disparities in workload distribution. Notably, in small teams with three co-authors, the most engaged contributor performs over three times more tasks than the least engaged, a disparity that grows linearly with team size. This demonstrates a consistent pattern of central and peripheral roles within modern collaborative teams. Moreover, our analysis shows distinct positional biases in task allocation: technical responsibilities, such as software development and formal analysis, broadly fall to authors positioned earlier in the author list, whereas managerial tasks, including supervision and funding acquisition, increasingly concentrate among authors positioned toward the end. This gradient underscores a significant division of labor, where early-listed authors mainly undertake most hands-on activities. In contrast, senior authors mostly assume roles involving leadership and oversight. Our findings highlight the structured and hierarchical organization within scholarly collaborations, providing deeper insights into the specific roles and dynamics that govern academic teamwork2025-05-10T18:02:55Z15 (include references and appendix sections) and 8 figures (and 1 in the appendix section)Itai AssrafMichael Firehttp://arxiv.org/abs/2602.21926v1Bridging Through Absence: How Comeback Researchers Bridge Knowledge Gaps Through Structural Re-emergence2026-02-25T14:04:03ZUnderstanding the role of researchers who return to academia after prolonged inactivity, termed "comeback researchers", is crucial for developing inclusive models of scientific careers. This study investigates the structural and semantic behaviors of comeback researchers, focusing on their role in cross-disciplinary knowledge transfer and network reintegration. Using the AMiner citation dataset, we analyze 113,637 early-career researchers and identify 1,425 comeback cases based on a three-year-or-longer publication gap followed by renewed activity. We find that comeback researchers cite 126% more distinct communities and exhibit 7.6% higher bridging scores compared to dropouts. They also demonstrate 74% higher gap entropy, reflecting more irregular yet strategically impactful publication trajectories. Predictive models trained on these bridging- and entropy-based features achieve a 97% ROC-AUC, far outperforming the 54% ROC-AUC of baseline models using traditional metrics like publication count and h-index. Finally, we substantiate these results via a multi-lens validation. These findings highlight the unique contributions of comeback researchers and offer data-driven tools for their early identification and institutional support.2026-02-25T14:04:03ZPreprint; 25 pages, 14 figures, 7 tables, Submitted to Scientometrics 2025Somyajit ChakrabortyAngshuman JanaAvijit Gayenhttp://arxiv.org/abs/2602.22276v1EmpiRE-Compass: A Neuro-Symbolic Dashboard for Sustainable and Dynamic Knowledge Exploration, Synthesis, and Reuse2026-02-25T09:58:20ZSoftware engineering (SE) and requirements engineering (RE) face a significant increase in secondary studies, particularly literature reviews (LRs), due to the ever-growing number of scientific publications. Generative artificial intelligence (GenAI) exacerbates this trend by producing LRs rapidly but often at the expense of quality, rigor, and transparency. At the same time, secondary studies often fail to share underlying data and artifacts, limiting replication and reuse. This paper introduces EmpiRE-Compass, a neuro-symbolic dashboard designed to lower barriers for accessing, replicating, and reusing LR data. Its overarching goal is to demonstrate how LRs can become more sustainable by semantically structuring their underlying data in research knowledge graphs (RKGs) and by leveraging large language models (LLMs) for easy and dynamic access, replication, and reuse. Building on two RE use cases, we developed EmpiRE-Compass with a modular system design and workflows for curated and custom competency questions. The dashboard is freely available online, accompanied by a demonstration video. To manage operational costs, a limit of 25 requests per IP address per day applies to the default LLM (GPT-4o mini). All source code and documentation are released as an open-source project to foster reuse, adoption, and extension. EmpiRE-Compass provides three core capabilities: (1) Exploratory visual analytics for curated competency questions; (2) Neuro-symbolic synthesis for custom competency questions; and (3) Reusable knowledge with all queries, analyses, and results openly available. By unifying RKGs and LLMs in a neuro-symbolic dashboard, EmpiRE-Compass advances sustainable LRs in RE, SE, and beyond. It lowers technical barriers, fosters transparency and reproducibility, and enables collaborative, continuously updated, and reusable LRs2026-02-25T09:58:20Z7 pages, 1 figure, Accepted at 32nd International Working Conference on Requirements Engineering: Foundations for Software QualityOliver KarrasAmirreza AlastiLena JohnSushant AggarwalYücel Celikhttp://arxiv.org/abs/2602.19711v1A Three-stage Neuro-symbolic Recommendation Pipeline for Cultural Heritage Knowledge Graphs2026-02-23T11:02:13ZThe growing volume of digital cultural heritage resources highlights the need for advanced recommendation methods capable of interpreting semantic relationships between heterogeneous data entities. This paper presents a complete methodology for implementing a hybrid recommendation pipeline integrating knowledge-graph embeddings, approximate nearest-neighbour search, and SPARQL-driven semantic filtering. The work is evaluated on the JUHMP (Jagiellonian University Heritage Metadata Portal) knowledge graph developed within the CHExRISH project, which at the time of experimentation contained ${\approx}3.2$M RDF triples describing people, events, objects, and historical relations affiliated with the Jagiellonian University (Kraków, PL). We evaluate four embedding families (TransE, ComplEx, ConvE, CompGCN) and perform hyperparameter selection for ComplEx and HNSW. Then, we present and evaluate the final three-stage neuro-symbolic recommender. Despite sparse and heterogeneous metadata, the approach produces useful and explainable recommendations, which were also proven with expert evaluation.2026-02-23T11:02:13Z15 pages, 1 figure; submitted to ICCS 2026 conferenceKrzysztof KuttElżbieta SrokaOleksandra IshchukLuiz do Valle Mirandahttp://arxiv.org/abs/2602.19698v1Iconographic Classification and Content-Based Recommendation for Digitized Artworks2026-02-23T10:44:27ZWe present a proof-of-concept system that automates iconographic classification and content-based recommendation of digitized artworks using the Iconclass vocabulary and selected artificial intelligence methods. The prototype implements a four-stage workflow for classification and recommendation, which integrates YOLOv8 object detection with algorithmic mappings to Iconclass codes, rule-based inference for abstract meanings, and three complementary recommenders (hierarchical proximity, IDF-weighted overlap, and Jaccard similarity). Although more engineering is still needed, the evaluation demonstrates the potential of this solution: Iconclass-aware computer vision and recommendation methods can accelerate cataloging and enhance navigation in large heritage repositories. The key insight is to let computer vision propose visible elements and to use symbolic structures (Iconclass hierarchy) to reach meaning.2026-02-23T10:44:27Z14 pages, 7 figures; submitted to ICCS 2026 conferenceKrzysztof KuttMaciej Baczyńskihttp://arxiv.org/abs/2602.19197v1How Ten Publishers Retract Research2026-02-22T13:58:22ZRetractions are the primary mechanism for correcting the scholarly record, yet publishers differ markedly in how they use them. We present a bibliometric analysis of 46,087 retractions across 10 major publishers using data from the Retraction Watch database (1997-2026), examining retraction rates, reasons, temporal trends, and geographic distributions, among other dimensions. Normalized retraction rates vary by two orders of magnitude, from Elsevier's 3.97 per 10,000 publications to Hindawi's 320.02. China-affiliated authors account for the largest share of retractions at every publisher. Retraction lags and reason profiles also vary widely across publishers. Among the ten publishers, ACM is an outlier in its retraction profile. ACM's normalized rate is mid-range (5.65), yet 98.3% of its 354 retractions are related to one incident. Seven of the ten most common global retraction reasons (including misconduct, plagiarism, and data concerns) are entirely absent from ACM's record. ACM's first retraction dates to 2020, despite a catalog dating to 1997. ACM self-describes its retraction threshold as "extremely high." We discuss this threshold in relation to the COPE retraction guidelines and the implications of ACM's non-public dark archive of removed works.2026-02-22T13:58:22Z43 pages, 7 figures, 13 tablesJonas Oppenlaenderhttp://arxiv.org/abs/2602.19115v1How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders2026-02-22T10:12:20ZIn recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features that capture key aspects of how research quality is represented: 1) features reflecting research methodologies; 2) features related to publication type, with literature reviews typically exhibiting higher impact; 3) features associated with high-impact research fields and technologies; and 4) features corresponding to specific scientific jargons. These findings represent an important step toward understanding how LLMs encapsulate concepts related to research quality.2026-02-22T10:12:20ZPresented at SESAME 2025: Smarter Extraction of ScholArly MEtadata using Knowledge Graphs and Language Models, @ JCDL 2025Michael McCoubreyAngelo SalatinoFrancesco OsborneEnrico Mottahttp://arxiv.org/abs/2602.18935v1Responsible Intelligence in Practice: A Fairness Audit of Open Large Language Models for Library Reference Services2026-02-21T19:05:03ZAs libraries explore large language models (LLMs) as a scalable layer for reference services, a core fairness question follows: can LLM-based services support all patrons fairly, regardless of demographic identity? While LLMs offer great potential for broadening access to information assistance, they may also reproduce societal biases embedded in their training data, potentially undermining libraries' commitments to impartial service. In this chapter, we apply a systematic evaluation approach that combines diagnostic classification to detect systematic differences with linguistic analysis to interpret their sources. Across three widely used open models (Llama-3.1 8B, Gemma-2 9B, and Ministral 8B), we find no compelling evidence of systematic differentiation by race/ethnicity, and only minor evidence of sex-linked differentiation in one model. We discuss implications for responsible AI adoption in libraries and the importance of ongoing monitoring in aligning LLM-based services with core professional values.2026-02-21T19:05:03ZInvited chapter for the edited volume Artificial Intelligence and Social Justice Intersections in Library and Information Studies: Challenges and Opportunities (Emerald Group Publishing, in preparation)Haining WangJason ClarkAngelica Peñahttp://arxiv.org/abs/2603.19246v1Speed and impact of team science during urgent societal events2026-02-20T19:24:10ZUrgent societal events demand scientific responses that are both rapid and impactful. Through an adversarial collaboration, we connected bibliometric databases to evaluate the speed and impact of over 2 million scientific publications in the three years following 48 urgent societal events. A pilot analysis of three cases -- the 2022 release of ChatGPT, the 2019 COVID-19 pandemic, and the 2001 World Trade Center attacks -- yielded unexpected patterns: larger teams were not only more impactful but also quicker to publish. More precisely, increases in team size were associated with (a) initial increases, but eventual diminishing returns in academic citations, (b) curvilinear returns in news and policy document citations, and (c) curvilinear returns in terms of how quickly papers were published. In other words, there are points where further increases in team sizes are either marginally helpful (diminishing returns) or counterproductive (curvilinear returns). To evaluate robustness, we pre-registered a broader test covering 45 additional events spanning two decades.2026-02-20T19:24:10ZNicholas A. ColesJoao Francisco Goes Braga TakayanagiStephen M. FioreLingfei Wuhttp://arxiv.org/abs/2602.18264v1A Curated Literature Database for Monitoring More Than 30 Years of Ansys Granta Product Usage2026-02-20T14:47:27ZEngineering and materials software is increasingly difficult to track in the scholarly and technical literature because publication volume is growing rapidly and software citation practices remain inconsistent. This is particularly true for the Ansys Granta product family, which is used for materials education, materials and process selection, sustainability-driven design, and enterprise materials information management. We present a structured and reproducible framework to consolidate evidence of \emph{operational} Granta usage and to support quantitative monitoring of adoption patterns, application domains, and technical impact. The framework is implemented as a curated reference database in \textit{Ansys Granta MI Enterprise}: bibliographic metadata are ingested semi-automatically (e.g., via DOI and citation-file parsing) and complemented by expert curation of usage descriptors (product, context, application domain, and technical depth), with relational links to authors and institutions. Downstream analytics are performed with Python, dashboards, and bibliometric/network visualization tools to enable reproducible querying and reporting. As of September~2025, the database contains more than 1{,}100 curated records spanning journals, conferences, theses, books, patents, standards, and reports, and supports rapid retrieval of validated case studies, reproducible literature reviews, and technology scouting. Example analyses highlight dominant domains, key institutions, and recurring integrations with CAD/CAE/FEM environments. Overall, the approach converts heterogeneous software-usage evidence into structured, analyzable knowledge to improve visibility of engineering software impact and to support evidence-based assessment and strategic decision-making.2026-02-20T14:47:27ZDavid Mercier