https://arxiv.org/api/x+Qv1XbSnCxZrIS0Rrh6UdbSc8A2026-03-22T16:14:09Z58709015http://arxiv.org/abs/2510.21825v210 Simple Rules for Improving Your Standardized Fields and Terms2026-02-04T21:25:03ZContextual metadata is the unsung hero of research data. When done right, standardized and structured vocabularies make your data findable, shareable, and reusable. When done wrong, they turn a well intended effort into data cleanup and curation nightmares. In this paper we tackle the surprisingly tricky process of vocabulary standardization with a mix of practical advice and grounded examples. Drawing from real-world experience in contextual data harmonization, we highlight common challenges (e.g., semantic noise and concept bombs) and provide actionable strategies to address them. Our rules emphasize alignment with Findability, Accessibility, Interoperability, and Reusability (FAIR) principles while remaining adaptable to evolving user and research needs. Whether you are curating datasets, designing a schema, or contributing to a standards body, these rules aim to help you create metadata that is not only technically sound but also meaningful to users.2025-10-21T23:34:59Z17 pages, 1 figure Author Contributions: Conceptualization by EG and RC. Manuscript writing by RC. Revisions and Editing by RC, EG, DD, and WH. Acknowledgements: Charlotte Barclay Version 2: Added missing word on page 10Rhiannon CameronCentre for Infectious Disease Genomics and One Health, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, CanadaEmma GriffithsCentre for Infectious Disease Genomics and One Health, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, CanadaDamion DooleyCentre for Infectious Disease Genomics and One Health, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, CanadaWilliam HsiaoCentre for Infectious Disease Genomics and One Health, Faculty of Health Sciences, Simon Fraser University, Burnaby, BC, Canadahttp://arxiv.org/abs/2602.04871v1Evolving scientific collaboration among EU member states, candidate countries and global partners: 2000-20242026-02-04T18:57:19ZThis study explores how EU integration, globalisation, and geopolitical disruptions have influenced scientific collaboration among European countries at different stages of EU membership. Specifically, it distinguishes between the EU-14, the EU-13, that joined the EU in 2004 or later, and EU candidate countries. Using Scopus article, the study analyses Relative Intensity of Collaboration (RIC) among EU member state, candidate countries and China, Latin America, the UK, the USA and Russia. Findings indicate increasing integration within European groups and with global partners, yet persistent hierarchical structures remain. EU-14 countries form the core of the network, exhibiting stable and cohesive collaboration, including with the UK despite Brexit. EU-13 countries occupy an intermediate position, showing moderate collaboration with EU-14 but stronger collaboration within their own group, with EU candidate countries and Russia. EU candidate countries demonstrate even weaker integration with EU-14, focusing on intra-group ties and links with EU-13 and Russia. RIC peaks in 2012 and 2018 for EU-13 and EU candidate countries correspond to Horizon 2020 and Horizon Europe cycles, highlighting the role of EU Framework Programmes. Collaboration with Russia increased following 2014 and only marginally declined after 2022. For EU-14, it exceeds collaboration with the USA. Collaboration with China remains limited due to network and cultural constraints, with similar intensity across all three groups. Overall, funding and policy initiatives are critical for stable international collaboration.2026-02-04T18:57:19ZMyroslava Hladchenkohttp://arxiv.org/abs/2602.06078v1Allocate Marginal Reviews to Borderline Papers Using LLM Comparative Ranking2026-02-04T07:12:42ZThis paper argues that large ML conferences should allocate marginal review capacity primarily to papers near the acceptance boundary, rather than spreading extra reviews via random or affinity-driven heuristics. We propose using LLM-based comparative ranking (via pairwise comparisons and a Bradley--Terry model) to identify a borderline band \emph{before} human reviewing and to allocate \emph{marginal} reviewer capacity at assignment time. Concretely, given a venue-specific minimum review target (e.g., 3 or 4), we use this signal to decide which papers receive one additional review (e.g., a 4th or 5th), without conditioning on any human reviews and without using LLM outputs for accept/reject. We provide a simple expected-impact calculation in terms of (i) the overlap between the predicted and true borderline sets ($ρ$) and (ii) the incremental value of an extra review near the boundary ($Δ$), and we provide retrospective proxies to estimate these quantities.2026-02-04T07:12:42Z13 pagesElliot L. EpsteinRajat DwaraknathJohn WinnickiThanawat Sornwaneehttp://arxiv.org/abs/2602.07039v1When Excellence Stops Producing Knowledge: A Practitioner's Observation on Research Funding2026-02-03T16:21:11ZAfter almost four decades of participating in competitive research funding -- as applicant, coordinator, evaluator, and panel member -- I have come to see a structural paradox: many participants recognize that the current system is approaching its functional limits, yet most reform measures intensify rather than alleviate the underlying dynamics. This paper documents how excellence has become decoupled from knowledge production through an increasing coupling to representability under evaluation. The discussion focuses on two domains in which this is particularly visible: competitive basic research funding and large EU consortium projects. Three accelerating trends are examined: the professionalization of proposal writing through specialized consultants, the rise of AI-assisted applications, and an evaluator shortage that forces panels to rely on reviewers increasingly distant from the actual research domains. These observations are offered not as external critique but as an insider account, in the hope that naming a widely experienced but rarely articulated pattern may enable more constructive orientation.
Keywords: Research funding, Excellence, Evaluation, Goodhart's Law, Professionalization, AI-assisted proposals, Peer review crisis2026-02-03T16:21:11ZHeimo Müllerhttp://arxiv.org/abs/2512.23066v3GLiSE: A Prompt-Driven and ML-Powered Tool for Automated Grey Literature Extraction in Software Engineering2026-02-02T23:05:34ZGrey literature is essential to software engineering research as it captures practices and decisions that rarely appear in academic venues. However, collecting and assessing it at scale remains difficult because of their heterogeneous sources, formats, and APIs that impede reproducible, large-scale synthesis. To address this issue, we present GLiSE, a prompt-driven tool that turns a research topic prompt into platform-specific queries, gathers results from common software-engineering web sources (GitHub, Stack Overflow) and Google Search, and uses embedding-based semantic classifiers to filter and rank results according to their relevance. GLiSE is designed for reproducibility with all settings being configuration-based, and every generated query being accessible. In this paper, (i) we present the GLiSE tool, (ii) provide a curated dataset of software engineering grey-literature search results classified by semantic relevance to their originating search intent, and (iii) conduct an empirical study on the usability of our tool.2025-12-28T20:20:58ZHoucine Abdelkader CheriefBrahim MahmoudiZacharie Chenail-LarcherNaouel MohaQuentin Sti'evenartFlorent Avellanedahttp://arxiv.org/abs/2405.19872v5Detection of the papermilling behavior2026-02-02T13:53:49ZBased on the analysis of the data obtainable from the Web of Science publication and citation database, typical signs of possible papermilling behavior are described, quantified, and illustrated by examples. A MATLAB function is provided for the analysis of the outputs from the Web of Science. A new quantitative indicator -- integrity index, or I-index -- is proposed for using it along with standard bibliographic and scientometric indicators. A case study is presented.2024-05-30T09:27:34Z18 pages, 16 figuresIgor Podlubnyhttp://arxiv.org/abs/2504.05711v2Automated Archival Descriptions with Federated Intelligence of LLMs2026-02-02T09:43:35ZEnforcing archival standards requires specialized expertise, and manually creating metadata descriptions for archival materials is a tedious and error-prone task. This work aims at exploring the potential of agentic AI and large language models (LLMs) in addressing the challenges of implementing a standardized archival description process. To this end, we introduce an agentic AI-driven system for automated generation of high-quality metadata descriptions of archival materials. We develop a federated optimization approach that unites the intelligence of multiple LLMs to construct optimal archival metadata. We also suggest methods to overcome the challenges associated with using LLMs for consistent metadata generation. To evaluate the feasibility and effectiveness of our techniques, we conducted extensive experiments using a real-world dataset of archival materials, which covers a variety of document types and formats. The evaluation results demonstrate the feasibility of our techniques and highlight the superior performance of the federated optimization approach compared to single-model solutions in metadata quality and reliability.2025-04-08T06:11:05Z16 pagesJinghua GroppeAndreas MarquetAnnabel WalzSven Groppehttp://arxiv.org/abs/2601.02395v2On (Newcomb-)Benford's law: a tale of two papers and of their disproportionate citations. How citation counts can become biased2026-02-02T08:09:53ZThe first digit (FD) phenomenon i.e., the significant digits of numbers in large data are often distributed according to a logarithmically decreasing function was first reported by S. Newcomb and then many decades later independently by F. Benford. After its century long neglect the last three decades have seen huge growth in the number of relevant publications. However, notwithstanding the rising popularity the two independent proponents of the phenomenon are not equally acknowledged an indication of which is disproportionate number of citations accumulated by Newcomb (1881) and Benford (1938). In the present study we use citation analysis to show that the formalization of the eponym Benford's law, a name questionable itself for overlooking Newcomb's contribution, by Raimi (1976) had a strong adverse effect on the future citations of Newcomb (1881). Furthermore, we identify the papers published over various decades of the developmental history of the FD phenomenon, which latter turned out to be amongst the most cited ones in the field. We find that lack of its consideration, intentional or occasionally out of ignorance for referencing by the prominent papers, is responsible for a far lesser number of citations of Newcomb (1881) in comparison to Benford (1938).2025-12-26T11:13:45Z18 pages, 4 figures, 2 tablesTariq Ahmad MirMarcel Auslooshttp://arxiv.org/abs/2602.01686v1Unmediated AI-Assisted Scholarly Citations2026-02-02T05:56:27ZTraditional bibliography databases require users to navigate search forms and manually copy citation data. Language models offer an alternative: a natural-language interface where researchers write text with informal citation fragments, which are automatically resolved to proper references. However, language models are not reliable for scholarly work as they generate fabricated (hallucinated) citations at substantial rates.
We present an architectural approach that combines the natural-language interface of LLM chatbots with the accuracy of direct database access, implemented through the Model Context Protocol. Our system enables language models to search bibliographic databases, perform fuzzy matching, and export verified entries, all through conversational interaction.
A key architectural principle bypasses the language model during final data export: entries are fetched directly from authoritative sources, with timeout protection, to guarantee accuracy. We demonstrate this approach with MCP-DBLP, a server providing access to the DBLP computer science bibliography. The system transforms form-based bibliographic services into conversational assistants that maintain scholarly integrity. This architecture is adaptable to other bibliographic databases and academic data sources.2026-02-02T05:56:27ZOpen Conference Proceedings, Vol. 8 (2026): The Second Bridge on Artificial Intelligence for Scholarly Communication (AAAI-26)Stefan Szeider10.52825/ocp.v8i.3161http://arxiv.org/abs/2602.00912v1Assessing and Comparing the Coverage of Publications of Italian Universities in OpenCitations2026-01-31T21:46:35ZRecent initiatives advocating responsible, transparent research assessment have intensified the call to use open research information rather than proprietary databases. This study evaluates the coverage and citation representation of publications recorded in the Current Research Information Systems (CRIS), all instances of the IRIS software platform, of six Italian universities within OpenCitations, a community-owned open infrastructure. Using persistent identifiers (DOIs, PMIDs, and ISBNs) specified in the IRIS installations involved, we matched the publications recorded in OpenCitations Meta and extracted the related citation links from the OpenCitations Index. Results show that OpenCitations covers, on average, over 40% of IRIS publications, which is quantitatively comparable to those reported by Scopus and Web of Science in another study. However, gaps persist, particularly for publication types prevalent in the Social Sciences and Humanities, such as monographs and critical editions. Overall, the findings demonstrate the growing maturity of OpenCitations and, more broadly, of Open Science infrastructures as viable alternatives as sources of research information, while highlighting areas where further metadata enrichment and interoperability efforts are needed.2026-01-31T21:46:35ZErica AndreoseIvan HeibiSilvio PeroniLeonardo Zillihttp://arxiv.org/abs/2602.00337v1Smarter AI Through Prompt Engineering: Insights and Case Studies from Data Science Application2026-01-30T21:40:13ZThe field of prompt engineering is becoming an essential phenomenon in artificial intelligence. It is altering how data scientists interact with large language models (LLMs) for analytics applications. This research paper shares empirical results from different studies on prompt engineering with regards to its methodology, effectiveness, and applications. Through case studies in healthcare, materials science, financial services, and business intelligence, we demonstrate how the use of structured prompting techniques can improve performance on a range of tasks by between 6% and more than 30%. The effectiveness of prompts relies on their complexity, according to our findings. Further, model architecture and optimisation strategy also depend on these factors as well. We also found promise in advanced frameworks such as chain-of-thought reasoning and automatic optimisers. The proof indicates that prompt engineering allows access to strong AI localisation. Nonetheless, there is plenty of information regarding standardisation, interpretability and the ethical use of AI.2026-01-30T21:40:13ZSnehasish PaulRohit KumarLaxman Dashttp://arxiv.org/abs/2601.16993v2BibAgent: An Agentic Framework for Traceable Miscitation Detection in Scientific Literature2026-01-30T05:09:04ZCitations are the bedrock of scientific authority, yet their integrity is compromised by widespread miscitations: ranging from nuanced distortions to fabricated references. Systematic citation verification is currently unfeasible; manual review cannot scale to modern publishing volumes, while existing automated tools are restricted by abstract-only analysis or small-scale, domain-specific datasets in part due to the "paywall barrier" of full-text access. We introduce BibAgent, a scalable, end-to-end agentic framework for automated citation verification. BibAgent integrates retrieval, reasoning, and adaptive evidence aggregation, applying distinct strategies for accessible and paywalled sources. For paywalled references, it leverages a novel Evidence Committee mechanism that infers citation validity via downstream citation consensus. To support systematic evaluation, we contribute a 5-category Miscitation Taxonomy and MisciteBench, a massive cross-disciplinary benchmark comprising 6,350 miscitation samples spanning 254 fields. Our results demonstrate that BibAgent outperforms state-of-the-art Large Language Model (LLM) baselines in citation verification accuracy and interpretability, providing scalable, transparent detection of citation misalignments across the scientific literature.2026-01-12T16:30:45ZPeiran LiFangzhou LinShuo XingXiang ZhengXi HongSiyuan YangJiashuo SunZhengzhong TuChaoqun Nihttp://arxiv.org/abs/2601.22505v1Constructing BERT Models: How Team Dynamics and Focus Shape AI Model Impact2026-01-30T03:31:33ZThe rapid evolution of AI technologies, exemplified by BERT-family models, has transformed scientific research, yet little is known about their production and recognition dynamics in the scientific system. This study investigates the development and impact of BERT-family models, focusing on team size, topic specialization, and citation patterns behind the models. Using a dataset of 4,208 BERT-related papers from the Papers with Code (PWC) dataset, we analyze how the BERT-family models evolve across methodological generations and how the newness of models is correlated with their production and recognition. Our findings reveal that newer BERT models are developed by larger, more experienced, and institutionally diverse teams, reflecting the increasing complexity of AI research. Additionally, these models exhibit greater topical specialization, targeting niche applications, which aligns with broader trends in scientific specialization. However, newer models receive fewer citations, particularly over the long term, suggesting a "first-mover advantage," where early models like BERT garner disproportionate recognition. These insights highlight the need for equitable evaluation frameworks that value both foundational and incremental innovations. This study underscores the evolving interplay between collaboration, specialization, and recognition in AI research.2026-01-30T03:31:33ZThe paper has been accepted by Quantitative Science StudiesLikun CaoKai Lihttp://arxiv.org/abs/2601.22218v1What Lies Beneath: A Call for Distribution-based Visual Question & Answer Datasets2026-01-29T19:00:01ZVisual Question Answering (VQA) has become an important benchmark for assessing how large multimodal models (LMMs) interpret images. However, most VQA datasets focus on real-world images or simple diagrammatic analysis, with few focused on interpreting complex scientific charts. Indeed, many VQA datasets that analyze charts do not contain the underlying data behind those charts or assume a 1-to-1 correspondence between chart marks and underlying data. In reality, charts are transformations (i.e. analysis, simplification, modification) of data. This distinction introduces a reasoning challenge in VQA that the current datasets do not capture. In this paper, we argue for a dedicated VQA benchmark for scientific charts where there is no 1-to-1 correspondence between chart marks and underlying data. To do so, we survey existing VQA datasets and highlight limitations of the current field. We then generate synthetic histogram charts based on ground truth data, and ask both humans and a large reasoning model questions where precise answers depend on access to the underlying data. We release the open-source dataset, including figures, underlying data, distribution parameters used to generate the data, and bounding boxes for all figure marks and text for future research.2026-01-29T19:00:01ZAccepted to ACM/IEEE Joint Conference on Digital Libraries JCDL 2025, 4 pages, 2 figuresJill P. NaimanDaniel J. EvansJooYoung Seohttp://arxiv.org/abs/2601.21908v1The 'Big Three' of Scientific Information: A comparative bibliometric review of Web of Science, Scopus, and OpenAlex2026-01-29T16:00:42ZThe present comparative study examines the three main multidisciplinary bibliographic databases, Web of Science Core Collection, Scopus, and OpenAlex, with the aim of providing up-to-date evidence on coverage, metadata quality, and functional features to help inform strategic decisions in research assessment. The report is structured into two complementary methodological sections. First, it presents a systematic review of recent scholarly literature that investigates record volume, open-access coverage, linguistic diversity, reference coverage, and metadata quality; this is followed by an original bibliometric analysis of the 2015-2024 period that explores longitudinal distribution, document types, thematic profiles, linguistic differences, and overlap between databases. The text concludes with a ten-point executive summary and five recommendations.2026-01-29T16:00:42ZDaniel Torres-SalinasWenceslao Arroyo-Machado