https://arxiv.org/api/5ssVR+WUXAOzDvYNBeuwppi2K0s2026-06-10T10:42:34Z606118015http://arxiv.org/abs/2604.04562v1Paper Espresso: From Paper Overload to Research Insight2026-04-06T09:45:21ZThe accelerating pace of scientific publishing makes it increasingly difficult for researchers to stay current. We present Paper Espresso, an open-source platform that automatically discovers, summarizes, and analyzes trending arXiv papers. The system uses large language models (LLMs) to generate structured summaries with topical labels and keywords, and provides multi-granularity trend analysis at daily, weekly, and monthly scales through LLM-driven topic consolidation. Over 35 months of continuous deployment, Paper Espresso has processed over 13,300 papers and publicly released all structured metadata, revealing rich dynamics in the AI research landscape: a mid-2025 surge in reinforcement learning for LLM reasoning, non-saturating topic emergence (6,673 unique topics), and a positive correlation between topic novelty and community engagement (2.0x median upvotes for the most novel papers). A live demo is available at https://huggingface.co/spaces/Elfsong/Paper_Espresso.2026-04-06T09:45:21ZMingzhe DuLuu Anh TuanDong HuangSee-kiong Nghttp://arxiv.org/abs/2603.15416v2Estimating Absolute Web Crawl Coverage From Longitudinal Set Intersections2026-04-05T23:17:01ZWeb archives preserve portions of the web, but quantifying their completeness remains challenging. Prior approaches have estimated the coverage of a crawl by either comparing the outcomes of multiple crawlers, or by comparing the results of a single crawl to external ground truth datasets. We propose a method to estimate the absolute coverage of a crawl using only the archive's own longitudinal data, i.e., the data collected by multiple subsequent crawls. Our key insight is that coverage can be estimated from the empirical URL overlaps between subsequent crawls, which are in turn well described by a simple urn process. The parameters of the urn model can then be inferred from longitudinal crawl data using linear regression. Applied to our focused crawl configuration of the German Academic Web, with 15 semi-annual crawls between 2013-2021, we find a coverage of approximately 46 percent of the crawlable URL space for the stable crawl configuration regime. Our method is extremely simple, requires no external ground truth, and generalizes to any longitudinal focused crawl.2026-03-16T15:28:30ZMichael ParisGrigori ParisFabian Baumannhttp://arxiv.org/abs/2604.03776v1Bridging the Language Gap in Scholarly Data I: Enhancing Author Disambiguation Algorithms for Chinese Names2026-04-04T15:55:20ZDisambiguating scholars with identical names is essential for accurate authorship assignment and robust large-scale scientometric research. Existing methods are often designed for Latin-script metadata and perform poorly on Chinese names. In international publications, Chinese names typically appear as Romanized Pinyin, which is highly ambiguous as it can map to multiple distinct characters. Chinese characters, in contrast, reduce but do not eliminate this ambiguity, and are rarely available in international records. To address both challenges, we propose a rule-based disambiguation framework that integrates co-authorship networks, citation networks, author affiliations, and content similarity. We apply this framework to 65,241 physics papers from the China National Knowledge Infrastructure (CNKI), spanning over 70 years of data. On a human annotated sample of 80 name pairs, our method achieves F1-scores of 0.88 for Pinyin names and 0.89 for character-based names, outperforming two baseline approaches, with improvements driven primarily by higher recall. The comparable performance across both writing systems shows that our approach is script-agnostic, enabling reliable large-scale scientometric analyses.2026-04-04T15:55:20ZMingrong SheLiuhuaying YangAna Maria JaramilloLisette EspĂn-Noboahttp://arxiv.org/abs/2604.06236v1LLMs Have Made Failure Worth Publishing2026-04-04T13:57:49ZScientific publishing systematically filters out negative results. We argue that this long-standing asymmetry has become an urgent problem in the era of large language models, which inherit the positive bias of the literature they are trained on, face an impending shortage of high-quality training data, and are increasingly deployed as both research tools and peer reviewers. We analyze three ways in which LLMs have changed the value of failure data and show that the systematic absence of such data degrades their utility as research tools, training data consumers, and peer reviewers alike. We outline experimental protocols to validate these claims and discuss the structural conditions under which a failure-inclusive publishing culture could emerge.2026-04-04T13:57:49ZSungmin Leehttp://arxiv.org/abs/2604.03553v1Towards the AI Historian: Agentic Information Extraction from Primary Sources2026-04-04T02:38:23ZAI is supporting, accelerating, and automating scientific discovery across a diverse set of fields. However, AI adoption in historical research remains limited due to the lack of solutions designed for historians. In this technical progress report, we introduce the first module of Chronos, an AI Historian under development. This module enables historians to convert image scans of primary sources into data through natural-language interactions. Rather than imposing a fixed extraction pipeline powered by a vision-language model (VLM), it allows historians to adapt workflows for heterogeneous source corpora, evaluate the performance of AI models on specific tasks, and iteratively refine workflows through natural-language interaction with the Chronos agent. The module is open-source and ready to be used by historical researchers on their own sources.2026-04-04T02:38:23ZLorenz HufeNiclas GriesshaberGavin GreifSebastian Oliver EckPhilip Torrhttp://arxiv.org/abs/2604.03159v1BibTeX Citation Hallucinations in Scientific Publishing Agents: Evaluation and Mitigation2026-04-03T16:30:58ZLarge language models with web search are increasingly used in scientific publishing agents, yet they still produce BibTeX entries with pervasive field-level errors. Prior evaluations tested base models without search, which does not reflect current practice. We construct a benchmark of 931 papers across four scientific domains and three citation tiers -- popular, low-citation, and recent post-cutoff -- designed to disentangle parametric memory from search dependence, with version-aware ground truth accounting for multiple citable versions of the same paper. Three search-enabled frontier models (GPT-5, Claude Sonnet-4.6, Gemini-3 Flash) generate BibTeX entries scored on nine fields and a six-way error taxonomy, producing ~23,000 field-level observations. Overall accuracy is 83.6%, but only 50.9% of entries are fully correct; accuracy drops 27.7pp from popular to recent papers, revealing heavy reliance on parametric memory even when search is available. Field-error co-occurrence analysis identifies two failure modes: wholesale entry substitution (identity fields fail together) and isolated field error. We evaluate clibib, an open-source tool for deterministic BibTeX retrieval from the Zotero Translation Server with CrossRef fallback, as a mitigation mechanism. In a two-stage integration where baseline entries are revised against authoritative records, accuracy rises +8.0pp to 91.5%, fully correct entries rise from 50.9% to 78.3%, and regression rate is only 0.8%. An ablation comparing single-stage and two-stage integration shows that separating search from revision yields larger gains and lower regression (0.8% vs. 4.8%), demonstrating that integration architecture matters independently of model capability. We release the benchmark, error taxonomy, and clibib tool to support evaluation and mitigation of citation hallucinations in LLM-based scientific writing.2026-04-03T16:30:58Z37 pagesDelip RaoChris Callison-Burchhttp://arxiv.org/abs/2509.07801v4SciNLP: A Domain-Specific Benchmark for Full-Text Scientific Entity and Relation Extraction in NLP2026-04-03T13:16:07ZStructured information extraction from scientific literature is crucial for capturing core concepts and emerging trends in specialized fields. While existing datasets aid model development, most focus on specific publication sections due to domain complexity and the high cost of annotating scientific texts. To address this limitation, we introduce SciNLP - a specialized benchmark for full-text entity and relation extraction in the Natural Language Processing (NLP) domain. The dataset comprises 60 manually annotated full-text NLP publications, covering 6,409 entities and 1,648 relations. Compared to existing research, SciNLP is the first dataset providing full-text annotations of entities and their relationships in the NLP domain. To validate the effectiveness of SciNLP, we conducted comparative experiments with similar datasets and evaluated the performance of state-of-the-art supervised models on this dataset. Results reveal varying extraction capabilities of existing models across academic texts of different lengths. Cross-comparisons with existing datasets show that SciNLP achieves significant performance improvements on certain baseline models. Using models trained on SciNLP, we implemented automatic construction of a fine-grained knowledge graph for the NLP domain. Our KG has an average node degree of 3.3 per entity, indicating rich semantic topological information that enhances downstream applications. The dataset is publicly available at: https://github.com/AKADDC/SciNLP.2025-09-09T14:41:40ZEMNLP 2025 MainDecheng DuanYingyi ZhangJitong PengChengzhi Zhanghttp://arxiv.org/abs/2604.06232v1What Do Humanities Scholars Need? A User Model for Recommendation in Digital Archives2026-04-02T21:11:15ZUser models for recommender systems (RecSys) typically assume stable preferences, similarity-based relevance, and session-bounded interactions -- assumptions derived from high-volume consumer contexts. This paper investigates these assumptions for humanities scholars working with digital archives. Following a human-centered design approach, we conducted focus groups and analyzed interview data from 18 researchers. Our analysis identifies four dimensions where scholarly information-seeking diverges from common RecSys user modeling: (1) context volatility -- preferences shift with research tasks and domain expertise; (2) epistemic trust -- relevance depends on verifiable provenance; (3) contrastive seeking -- researchers seek items that challenge their current direction; and (4) strand continuity -- research spans long-term threads rather than discrete sessions. We discuss implications for user modeling and outline how these dimensions relate to collaborative filtering, content-based, and session-based recommendation. We propose these dimensions as a diagnostic framework applicable beyond archives to similar application domains where typical user modeling assumptions may not hold.2026-04-02T21:11:15ZTo be presented at the 34th ACM Conference on User Modeling, Adaptation and Personalization (UMAP'26), June 08-11, 2026, Gothenburg, SwedenFlorian Atzenhofer-BaumgartnerDominik Kowald10.1145/3774935.3806171http://arxiv.org/abs/2603.25638v2Beyond Via: Analysis and Estimation of the Impact of Large Language Models in Academic Papers2026-04-02T17:45:24ZThrough an analysis of arXiv papers, we report several shifts in word usage that are likely driven by large language models (LLMs) but have not previously received sufficient attention, such as the increased frequency of "beyond" and "via" in titles and the decreased frequency of "the" and "of" in abstracts. Due to the similarities among different LLMs, experiments show that current classifiers struggle to accurately determine which specific model generated a given text in multi-class classification tasks. Meanwhile, variations across LLMs also result in evolving patterns of word usage in academic papers. By adopting a direct and highly interpretable linear approach and accounting for differences between models and prompts, we quantitatively assess these effects and show that real-world LLM usage is heterogeneous and dynamic.2026-03-26T16:49:00ZVisualization of word usage patterns in arXiv abstracts: https://llm-impact.github.io/Mingmeng GengYuhang DongThierry Poibeauhttp://arxiv.org/abs/2604.01793v1Not Just Large: Tall Teams Dominate East Asia's Scientific Production2026-04-02T09:05:03ZPurpose: This study compares the hierarchical structure of scientific teams across countries and investigates factors associated with the observed cross-national differences.
Design/methodology/approach: Drawing on 150,817 publications with author contribution statements, we focus on the 15 countries with the largest volume of scientific publications, examine cross-country variations in the proportion of tall teams, and analyze how this proportion correlates with other factors.
Findings: Scientific output from East Asia is dominated by tall teams, which persist after controlling for team size, indicating that this pattern cannot be fully accounted for by the prevalence of larger teams in these countries. Cultural factors, measured by Power Distance, as well as the observed funding patterns of major basic science agencies, are associated with the dominance of tall teams in East Asia.
Research limitations: This study is limited by its reliance on publications with author contribution statements, which may introduce selection bias; its focus on cultural and funding factors, while leaving other institutional contexts unexamined; and its use of a leadership concentration measure that does not capture other dimensions of hierarchy.
Practical implications: Understanding cross-national differences in research team structures and their associated cultural and institutional factors can inform science policy and team management.
Originality/value: This study provides a systematic cross-national comparison of team hierarchy and offers a mechanistic understanding of the dominance of tall teams in East Asia, highlighting associations with cultural and funding factors.2026-04-02T09:05:03ZSiyuan LiuWenjin XieWenyu ChenTao Jiahttp://arxiv.org/abs/2604.01729v1Overton Engage: A Structured Database and Matching System for Academic Policy Engagement Opportunities2026-04-02T07:49:52ZAcademic policy engagement, the structured processes through which researchers contribute evidence and expertise to public decision-making, is shaped not only by research quality but by the accessibility of engagement opportunities. In practice, these opportunities are fragmented across institutions and platforms, unevenly advertised, and difficult to discover systematically (Parker et al., 2022), limiting both individual participation and comparison. We present Overton Engage (https://app.overton.io/ui/opportunities), a structured database of publicly documented academic policy engagement opportunities, together with a semantic matching system that links opportunities to researchers based on similarity between opportunity descriptions and publication records. We characterise the composition of the database across policy domains, countries, and opportunity types, and present UK-focused analyses comparing engagement opportunity topics with published policy documents. We further demonstrate an illustrative comparison of consultation topics between the UK and Australia, and apply a matching system to assess how closely research produced by UK higher education institutions aligns, topically, with domestic policy opportunities. Our results suggest that publicly documented engagement opportunities are unevenly distributed across policy domains and countries, though this may reflect collection bias. Matching analyses reveal a positive relationship between institutional publication volume and high-confidence match rates, but also that research specialisation can compensate for lower output volume in specific policy domains. The database itself is freely available and we welcome collaboration from researchers, policymakers, and institutions.2026-04-02T07:49:52ZCeire WincottAngel Luis Jaso TamameSusan CollardEuan AdieKatie Shamashhttp://arxiv.org/abs/2605.12514v1Structural Diversity Drives Disruptive Scientific Innovation2026-04-02T02:15:38ZScientific innovation increasingly depends on collaboration, yet the organizational structure that fosters breakthrough ideas remains poorly understood. Existing metrics - such as team size or compositional diversity - capture readily observable characteristics but not the deeper architecture of collaboration. We introduce Structural Diversity (SD): the extent to which a team bridges multiple distinct knowledge communities within its prior collaboration network. Using a century-scale dataset of 260 million scientific publications (1900-2025) and combining causal inference with a quasi-natural experiment based on a U.S. National Science Foundation policy change in 2012, we show that SD is a powerful and robust predictor of disruptive innovation, outperforming traditional team novelty indicators such as team freshness and edge density. Moreover, SD positively interacts with team size and is able to mitigate the well-known "curse of scale" by transforming scale from a liability into a resource for creative synthesis. We find that one mechanism underlying this effect is Disciplinary Integration (DI): teams with higher SD can more effectively combine heterogeneous knowledge into novel configurations. Our findings position SD as both a new theoretical construct and an actionable design principle for organizing scientific collaboration. By linking the architecture of team assembly to the dynamics of creative discovery, our work offers a structural explanation for how collective intelligence can be systematically engineered to foster disruptive innovation.2026-04-02T02:15:38ZYichun PengSaike HePeijie ZhangKang ZhaoYi YangNing ZhangQingpeng ZhangDaniel Dajun ZengHao Penghttp://arxiv.org/abs/2604.01186v1From Validity to Inter-Subjectivity: An Argument for Reliability Signals in Search Environments2026-04-01T17:34:45ZSearch engines and information platforms are increasingly scrutinized for their role in spreading misinformation. Traditional responses often focus on detecting falsehoods or verifying the ultimate validity of claims. This paper argues that such a validity-centered framing is inadequate for the epistemic challenges of search environments.2026-04-01T17:34:45Z4 pages. Extended abstract / conference paper for SEASON 2025 (September 24-25, 2025, Hamburg, Germany). Peer reviewedFrans van der Sluishttp://arxiv.org/abs/2604.09669v1Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR2026-04-01T17:16:01ZHybrid technologies enable the blending of physical and digital elements, creating new ways to experience and interact with the world. Such technologies can transform engagement with relics, both secular and sacred but they present challenges for capturing faith, belief, and representation responsibly. Given the complexities of digital representation and the ethical challenges inherent in digitising culturally significant objects, a transdisciplinary understanding of these issues is needed. To inform this discussion from a linguistic perspective, we examined the representation of relics in historical and contemporary texts. Using a corpus linguistic approach to extract modifiers of the word relic in corpora of Early Modern English books and contemporary web sourced texts from 2021, we examined the multifaceted ways in which relics have been perceived and evaluated over time. Early texts consider relics as both objects of moral and spiritual significance, and tools of religious and political control, while they are more often framed as heritage symbols, reflecting past events, places, and traditions in contemporary texts. We discuss how hybrid, sometimes AI based technologies can enhance accessibility and engagement, whilst also challenging traditional sensitivities around authenticity and sensory experience, which are integral to the meaning and significance of relics.2026-04-01T17:16:01ZThis is a (ACM J.5 Arts & Humanities Paper) relating to Hybrid Technologies, Language, AI, VR, Interaction and Experience. 24 pages. Int J Digit Humanities (2026)Emma McClaughlinGlenn McGarryAlan ChamberlainGeert De WildeOliver Butler10.1007/s42803-026-00120-4http://arxiv.org/abs/2604.01073v1Narrative Fingerprints: Multi-Scale Author Identification via Novelty Curve Dynamics2026-04-01T16:07:58ZWe test whether authors have characteristic "fingerprints" in the information-theoretic novelty curves of their published works. Working with two corpora -- Books3 (52,796 books, 759 qualifying authors) and PG-19 (28,439 books, 1,821 qualifying authors) -- we find that authorial voice leaves measurable traces in how novelty unfolds across a text. The signal is multi-scale: at book level, scalar dynamics (mean novelty, speed, volume, circuitousness) identify 43% of authors significantly above chance; at chapter level, SAX motif patterns in sliding windows achieve 30x-above-chance attribution, far exceeding the scalar features that dominate at book level. These signals are complementary, not redundant. We show that the fingerprint is partly confounded with genre but persists within-genre for approximately one-quarter of authors. Classical authors (Twain, Austen, Kipling) show fingerprints comparable in strength to modern authors, suggesting the phenomenon is not an artifact of contemporary publishing conventions.2026-04-01T16:07:58Z12 pages, 6 figures, 4 tablesFred ZimmermanHilmar AI