https://arxiv.org/api/yDeE0R0dUwkh8wVo8JdXdsQN/+k2026-06-14T13:32:32Z606570515http://arxiv.org/abs/2508.00859v1Author Once, Publish Everywhere: Portable Metadata Authoring with the CEDAR Embeddable Editor2025-07-16T17:31:16ZHigh-quality, "rich" metadata are essential for making research data findable, interoperable, and reusable. The Center for Expanded Data Annotation and Retrieval (CEDAR) has long addressed this need by providing tools to design machine-actionable metadata templates that encode community standards in a computable form. To make these capabilities more accessible within real-world research workflows, we have developed the CEDAR Embeddable Editor (CEE)-a lightweight, interoperable Web Component that brings structured, standards-based metadata authoring directly into third-party platforms. The CEE dynamically renders metadata forms from machine-actionable templates and produces semantically rich metadata in JSON-LD format. It supports ontology-based value selection via the BioPortal ontology repository, and it includes external authority resolution for persistent identifiers such as ORCIDs for individuals and RORs for research organizations. Crucially, the CEE requires no custom user-interface development, allowing deployment across diverse platforms. The CEE has been successfully integrated into generalist scientific data repositories such as Dryad and the Open Science Framework, demonstrating its ability to support discipline-specific metadata creation. By supporting the embedding of metadata authoring within existing research environments, the CEE can facilitate the adoption of community standards and help improve metadata quality across scientific disciplines.2025-07-16T17:31:16ZMartin J. O'ConnorMarcos Martinez-RomeroAttila L. EgyediMete U. AkdoganMichael V. DorfMark A. Musenhttp://arxiv.org/abs/2507.11330v2Automated Novelty Evaluation of Academic Paper: A Collaborative Approach Integrating Human and Large Language Model Knowledge2025-07-16T14:26:34ZNovelty is a crucial criterion in the peer review process for evaluating academic papers. Traditionally, it's judged by experts or measure by unique reference combinations. Both methods have limitations: experts have limited knowledge, and the effectiveness of the combination method is uncertain. Moreover, it's unclear if unique citations truly measure novelty. The large language model (LLM) possesses a wealth of knowledge, while human experts possess judgment abilities that the LLM does not possess. Therefore, our research integrates the knowledge and abilities of LLM and human experts to address the limitations of novelty assessment. One of the most common types of novelty in academic papers is the introduction of new methods. In this paper, we propose leveraging human knowledge and LLM to assist pretrained language models (PLMs, e.g. BERT etc.) in predicting the method novelty of papers. Specifically, we extract sentences related to the novelty of the academic paper from peer review reports and use LLM to summarize the methodology section of the academic paper, which are then used to fine-tune PLMs. In addition, we have designed a text-guided fusion module with novel Sparse-Attention to better integrate human and LLM knowledge. We compared the method we proposed with a large number of baselines. Extensive experiments demonstrate that our method achieves superior performance.2025-07-15T14:03:55ZJournal of the Association for Information Science and Technology, 2025Wenqing WuChengzhi ZhangYi Zhao10.1002/asi.70005http://arxiv.org/abs/2507.03556v2A Multistakeholder Approach to Value-Driven Co-Design of Recommender System Evaluation Metrics in Digital Archives2025-07-16T07:47:37ZThis paper presents the first multistakeholder approach for translating diverse stakeholder values into an evaluation metric setup for Recommender Systems (RecSys) in digital archives. While commercial platforms mainly rely on engagement metrics, cultural heritage domains require frameworks that balance competing priorities among archivists, platform owners, researchers, and other stakeholders. To address this challenge, we conducted high-profile focus groups (5 groups x 5 persons) with upstream, provider, system, consumer, and downstream stakeholders, identifying value priorities across critical dimensions: visibility/representation, expertise adaptation, and transparency/trust. Our analysis shows that stakeholder concerns naturally align with four sequential research funnel stages: discovery, interaction, integration, and impact. The resulting evaluation setup addresses domain-specific challenges including collection representation imbalances, non-linear research patterns, and tensions between specialized expertise and broader accessibility. We propose directions for tailored metrics in each stage of this research journey, such as research path quality for discovery, contextual appropriateness for interaction, metadata-weighted relevance for integration, and cross-stakeholder value alignment for impact assessment. Our contributions extend beyond digital archives to the broader RecSys community, offering transferable evaluation approaches for domains where value emerges through sustained engagement rather than immediate consumption.2025-07-04T13:09:08ZAccepted at RecSys 2025Florian Atzenhofer-BaumgartnerGeorg VogelerDominik Kowald10.1145/3705328.3748026http://arxiv.org/abs/2504.15038v2Estimating transformative agreement impact on hybrid open access: A comparative large-scale study using Scopus, Web of Science and open metadata2025-07-15T09:46:05ZThis study compares open metadata from hoaddata, an openly available dataset based on Crossref, OpenAlex and the cOAlition S Journal Checker Tool, with proprietary bibliometric databases Scopus and Web of Science to estimate the impact of transformative agreements on hybrid open access publishing. Analysing over 13,000 hybrid journals between 2019-2023, the research found substantial growth in open access due to these agreements, although most articles remain paywalled. The results were consistent across all three data sources, showing strong correlations in country-level metrics despite differences in journal coverage and metadata availability. By 2023, transformative agreements enabled the majority of open access in hybrid journals, with particularly high adoption in European countries. The analysis revealed strong alignment between first and corresponding authorship when measuring agreement uptake by publisher and country. This comparative approach supports the use of open metadata for large-scale hybrid open access studies, while using multiple data sources together provides a more robust understanding of hybrid open access adoption than any single database can offer, overcoming individual limitations in coverage and metadata quality.2025-04-21T11:48:39Z28 pages, 6 figuresScientometrics, 2025Najko Jahn10.1007/s11192-025-05390-3http://arxiv.org/abs/2507.10891v1Artificial Intelligence and Journalism: A Systematic Bibliometric and Thematic Analysis of Global Research2025-07-15T01:11:39ZArtificial Intelligence (AI) is reshaping journalistic practices across the globe, offering new opportunities while raising ethical, professional, and societal concerns. This study presents a comprehensive systematic review of published articles on AI in journalism from 2010 to 2025. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, a total of 72 peer-reviewed articles were selected from Scopus and Web of Science databases. The analysis combines bibliometric mapping and qualitative thematic synthesis to identify dominant trends, technologies, geographical distributions, and ethical debates. Additionally, sentiment analysis was performed on article abstracts using the Valence Aware Dictionary and sEntiment Reasoner (VADER) algorithm to capture evaluative tones across the literature. The findings show a sharp increase in research activity after 2020, with prominent focus areas including automation, misinformation, and ethical governance. While most studies reflect cautious optimism, concerns over bias, transparency, and accountability remain persistent. The review also highlights regional disparities in scholarly contributions, with limited representation from the Global South. By integrating quantitative and qualitative insights, this study offers a multi-dimensional understanding of how AI is transforming journalism and proposes future research directions for inclusive and responsible innovation.2025-07-15T01:11:39ZMohammad Al Masum MollaMd Manjurul Ahsanhttp://arxiv.org/abs/2507.07551v1ArchiveGPT: A human-centered evaluation of using a vision language model for image cataloguing2025-07-10T08:49:15ZThe accelerating growth of photographic collections has outpaced manual cataloguing, motivating the use of vision language models (VLMs) to automate metadata generation. This study examines whether Al-generated catalogue descriptions can approximate human-written quality and how generative Al might integrate into cataloguing workflows in archival and museum collections. A VLM (InternVL2) generated catalogue descriptions for photographic prints on labelled cardboard mounts with archaeological content, evaluated by archive and archaeology experts and non-experts in a human-centered, experimental framework. Participants classified descriptions as AI-generated or expert-written, rated quality, and reported willingness to use and trust in AI tools. Classification performance was above chance level, with both groups underestimating their ability to detect Al-generated descriptions. OCR errors and hallucinations limited perceived quality, yet descriptions rated higher in accuracy and usefulness were harder to classify, suggesting that human review is necessary to ensure the accuracy and quality of catalogue descriptions generated by the out-of-the-box model, particularly in specialized domains like archaeological cataloguing. Experts showed lower willingness to adopt AI tools, emphasizing concerns on preservation responsibility over technical performance. These findings advocate for a collaborative approach where AI supports draft generation but remains subordinate to human verification, ensuring alignment with curatorial values (e.g., provenance, transparency). The successful integration of this approach depends not only on technical advancements, such as domain-specific fine-tuning, but even more on establishing trust among professionals, which could both be fostered through a transparent and explainable AI pipeline.2025-07-10T08:49:15Z56 pages, 7 figuresLine AbeleGerrit AndersTolgahan AydınJürgen BuderHelen FischerDominik KimmelMarkus Huffhttp://arxiv.org/abs/2507.08853v1Clio-X: AWeb3 Solution for Privacy-Preserving AI Access to Digital Archives2025-07-09T05:30:38ZAs archives turn to artificial intelligence to manage growing volumes of digital records, privacy risks inherent in current AI data practices raise critical concerns about data sovereignty and ethical accountability. This paper explores how privacy-enhancing technologies (PETs) and Web3 architectures can support archives to preserve control over sensitive content while still being able to make it available for access by researchers. We present Clio-X, a decentralized, privacy-first Web3 digital solution designed to embed PETs into archival workflows and support AI-enabled reference and access. Drawing on a user evaluation of a medium-fidelity prototype, the study reveals both interest in the potential of the solution and significant barriers to adoption related to trust, system opacity, economic concerns, and governance. Using Rogers' Diffusion of Innovation theory, we analyze the sociotechnical dimensions of these barriers and propose a path forward centered on participatory design and decentralized governance through a Clio-X Decentralized Autonomous Organization. By integrating technical safeguards with community-based oversight, Clio-X offers a novel model to ethically deploy AI in cultural heritage contexts.2025-07-09T05:30:38Z28 pages, 8 figuresVictoria L. LemieuxRosa GilFaith MolosiwaQihong ZhouBinming LiRoberto GarciaLuis De La Torre CubilloZehua Wanghttp://arxiv.org/abs/2507.04444v1Data Discovery using LLMs -- A Study of Data User Behaviour2025-07-06T16:09:53ZData search for scientific research is more complex than a simple web search. The emergence of large language models (LLMs) and their applicability for scientific tasks offers new opportunities for researchers who are looking for data, e.g., to freely express their data needs instead of fitting them into restrictions of data catalogues and portals. However, this also creates uncertainty about whether LLMs are suitable for this task. To answer this question, we conducted a user study with 32 researchers. We qualitatively and quantitively analysed participants' information interaction behaviour while searching for data using LLMs in two data search tasks, one in which we prompted the LLM to behave as a persona. We found that participants interact with LLMs in natural language, but LLMs remain a tool for them rather than an equal conversational partner. This changes slightly when the LLM is prompted to behave as a persona, but the prompting only affects participants' user experience when they are already experienced in LLM use.2025-07-06T16:09:53ZAccepted as full paper at TPDL'25Christin Katharina KreutzAnja PerryTanja Friedrich10.1007/978-3-032-05409-8_3http://arxiv.org/abs/2507.04132v1An HTR-LLM Workflow for High-Accuracy Transcription and Analysis of Abbreviated Latin Court Hand2025-07-05T19:07:15ZThis article presents and validates an ideal, four-stage workflow for the high-accuracy transcription and analysis of challenging medieval legal documents. The process begins with a specialized Handwritten Text Recognition (HTR) model, itself created using a novel "Clean Ground Truth" curation method where a Large Language Model (LLM) refines the training data. This HTR model provides a robust baseline transcription (Stage 1). In Stage 2, this baseline is fed, along with the original document image, to an LLM for multimodal post-correction, grounding the LLM's analysis and improving accuracy. The corrected, abbreviated text is then expanded into full, scholarly Latin using a prompt-guided LLM (Stage 3). A final LLM pass performs Named-Entity Correction (NEC), regularizing proper nouns and generating plausible alternatives for ambiguous readings (Stage 4). We validate this workflow through detailed case studies, achieving Word Error Rates (WER) in the range of 2-7% against scholarly ground truths. The results demonstrate that this hybrid, multi-stage approach effectively automates the most laborious aspects of transcription while producing a high-quality, analyzable output, representing a powerful and practical solution for the current technological landscape.2025-07-05T19:07:15ZJoshua D. Isomhttp://arxiv.org/abs/2508.00842v1Algorithmic Evaluation and the Marginalization of Single Authorship in Management Science2025-07-05T18:50:32ZThe decline of single authorship in peer-reviewed journals within the current collaboration-oriented knowledge production framework has prompted deeper reflection on the shifting power structures in academic systems. This paper aims to explore the underlying institutional logic and evaluation mechanisms contributing to the marginalization of single-author research in the management field. It further investigates how the discourse of collaborative advantage conceals structural power redistribution and ideological disembedding. Through an analysis of authorship data from top-tier journals, a critical reading of institutional incentive texts, and an empirical review of authorial configurations, the study building on the work of Harzing, Wuchty, and Lariviere constructs a three-dimensional causal chain: collaboration incentives, responsibility dilution, and originality weakening. Findings suggest that single authorship is not explicitly excluded but is gradually sidelined from central publication channels by funding policies, review practices, and performance metrics. Independent thought is thus structurally marginalized within institutionalized collaboration. The paper advocates for a paradigm shift from instrumental rationality to value-based rationality and calls for the restoration of legitimacy and public value for independent research through reforms in evaluation frameworks, journal governance, and research ethics, aiming to safeguard academic diversity and intellectual autonomy.2025-07-05T18:50:32ZThis paper conducts a structural causal analysis of how algorithmic evaluation systems and institutional incentives influence authorship patterns in the field of management scienceWei Menghttp://arxiv.org/abs/2507.03216v1Disclosing Generative AI Use in Digital Humanities Research2025-07-03T23:11:45ZThis survey study investigates how digital humanists perceive and approach generative AI disclosure in research. The results indicate that while digital humanities scholars acknowledge the importance of disclosing GenAI use, the actual rate of disclosure in research practice remains low. Respondents differ in their views on which activities most require disclosure and on the most appropriate methods for doing so. Most also believe that safeguards for AI disclosure should be established through institutional policies rather than left to individual decisions. The study's findings will offer empirical guidance to scholars, institutional leaders, funders, and other stakeholders responsible for shaping effective disclosure policies.2025-07-03T23:11:45ZRongqian MaXuhan ZhangAdrian Wisnickihttp://arxiv.org/abs/2406.07016v5Delving into LLM-assisted writing in biomedical publications through excess vocabulary2025-07-03T08:26:13ZLarge language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists use them for their scholarly writing. But how wide-spread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: we study vocabulary changes in over 15 million biomedical abstracts from 2010--2024 indexed by PubMed, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the Covid pandemic.2024-06-11T07:16:34Zv5: Reverting to v3Science Advances, 2 Jul 2025, Vol. 11, No. 27Dmitry KobakRita González-MárquezEmőke-Ágnes HorvátJan Lause10.1126/sciadv.adt3813http://arxiv.org/abs/2507.02139v1When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search2025-07-02T20:53:51ZLarge language models (LLMs) are increasingly used to assign document relevance labels in information retrieval pipelines, especially in domains lacking human-labeled data. However, different models often disagree on borderline cases, raising concerns about how such disagreement affects downstream retrieval. This study examines labeling disagreement between two open-weight LLMs, LLaMA and Qwen, on a corpus of scholarly abstracts related to Sustainable Development Goals (SDGs) 1, 3, and 7. We isolate disagreement subsets and examine their lexical properties, rank-order behavior, and classification predictability. Our results show that model disagreement is systematic, not random: disagreement cases exhibit consistent lexical patterns, produce divergent top-ranked outputs under shared scoring functions, and are distinguishable with AUCs above 0.74 using simple classifiers. These findings suggest that LLM-based filtering introduces structured variability in document retrieval, even under controlled prompting and shared ranking logic. We propose using classification disagreement as an object of analysis in retrieval evaluation, particularly in policy-relevant or thematic search tasks.2025-07-02T20:53:51ZPresented at LLM4Eval Workshop, SIGIR 2025 Padova, Italy, July 17, 2025William A. IngramBipasha BanerjeeEdward A. Foxhttp://arxiv.org/abs/2411.10575v2Tenure and Research Trajectories2025-07-02T17:22:04ZTenure is a cornerstone of the US academic system, yet its relationship to faculty research trajectories remains poorly understood. Conceptually, tenure systems may act as a selection mechanism, screening in high-output researchers; a dynamic incentive mechanism, encouraging high output prior to tenure but low output after tenure; and a creative search mechanism, encouraging tenured individuals to undertake high-risk work. Here, we integrate data from seven different sources to trace US tenure-line faculty and their research outputs at an unprecedented scale and scope, covering over 12,000 researchers across 15 disciplines. Our analysis reveals that faculty publication rates typically increase sharply during the tenure track and peak just before obtaining tenure. Post-tenure trends, however, vary across disciplines: in lab-based fields, such as biology and chemistry, research output typically remains high post-tenure, whereas in non-lab-based fields, such as mathematics and sociology, research output typically declines substantially post-tenure. Turning to creative search, faculty increasingly produce novel, high-risk research after securing tenure. However, this shift toward novelty and risk-taking comes with a decline in impact, with post-tenure research yielding fewer highly cited papers. Comparing outcomes across common career ages but different tenure years or comparing research trajectories in tenure-based and non-tenure-based research settings underscores that breaks in the research trajectories are sharply tied to the individual's tenure year. Overall, these findings provide a new empirical basis for understanding the tenure system, individual research trajectories, and the shape of scientific output.2024-11-15T20:50:24ZGiorgio TripodiXiang ZhengYifan QianDakota MurrayBenjamin F. JonesChaoqun NiDashun Wanghttp://arxiv.org/abs/2401.14818v6Developing ChemDFM as a large language foundation model for chemistry2025-07-02T11:25:33ZArtificial intelligence (AI) has played an increasingly important role in chemical research. However, most models currently used in chemistry are specialist models that require training and tuning for specific tasks. A more generic and efficient solution would be an AI model that could address many tasks and support free-form dialogue in the broad field of chemistry. In its utmost form, such a generalist AI chemist could be referred to as Chemical General Intelligence. Large language models (LLMs) have recently logged tremendous success in the general domain of natural language processing, showing emerging task generalization and free-form dialogue capabilities. However, domain knowledge of chemistry is largely missing when training general-domain LLMs. The lack of such knowledge greatly hinders the performance of generalist LLMs in the field of chemistry. To this end, we develop ChemDFM, a pioneering LLM for chemistry trained on 34B tokens from chemical literature and textbooks, and fine-tuned using 2.7M instructions. As a result, it can understand and reason with chemical knowledge in free-form dialogue. Quantitative evaluations show that ChemDFM significantly surpasses most representative open-source LLMs. It outperforms GPT-4 on a great portion of chemical tasks, despite the substantial size difference. We have open-sourced the inference codes, evaluation datasets, and model weights of ChemDFM on Huggingface (https://huggingface.co/OpenDFM/ChemDFM-v1.0-13B).2024-01-26T12:45:55Z10 pages, 12 figures, 12 tables. Published on Cell Report Physical Science, DOI: https://doi.org/10.1016/j.xcrp.2025.102523Cell Rep. Phys. Sci. 6 (2025) 102523Zihan ZhaoDa MaLu ChenLiangtai SunZihao LiYi XiaBo ChenHongshen XuZichen ZhuSu ZhuShuai FanGuodong ShenKai YuXin Chen10.1016/j.xcrp.2025.102523