https://arxiv.org/api/f8k6sVcomm+EoJXWiNFapr752UY2026-06-10T17:18:46Z606128515http://arxiv.org/abs/2603.19244v1The IJCNN 2025 Review Process2026-02-19T15:17:17ZThe International Joint Conference on Neural Networks (IJCNN) is the premier international conference in the area of neural networks theory, analysis, and applications. The 2025 edition of the conference comprised 5,526 paper submissions, 7,877 active reviewers, 426 area chairs, 2,152 accepted papers, and more than 2,300 attendees. This represents a growth of about 100% in terms of submissions, 200% in terms of reviewers, and over 50% in terms of attendees as compared to the previous edition. In this paper, we describe several key aspects of the whole review process, including a strategy for ranking the scores provided by the reviewers by evaluating a score index and a calibrated version used experimentally to remove reviewer-specific bias from reviews.2026-02-19T15:17:17ZMichele ScarpinitiDanilo Comminiellohttp://arxiv.org/abs/2602.21249v1Quality of Descriptive Information on Cultural Heritage Objects: Definition and Empirical Evaluation2026-02-19T10:12:26ZEffective data processing depends on the quality of the underlying data. However, quality issues such as inconsistencies and uncertainties, can significantly impede the processing and subsequent use of data. Despite the centrality of data quality to a wide range of computational tasks, there is currently no broadly accepted, domain-independent consensus on the definition of data quality. Existing frameworks primarily define data quality in ways that are tailored to specific domains, data types, or contexts of use. Although quality assessment frameworks exist for specific domains, such as electronic health record data and linked data, corresponding approaches for descriptive information about cultural heritage objects remain underdeveloped. Moreover, existing quality definitions are often theoretical in nature and lack empirical validation based on real-world data problems. In this paper, we address these limitations by first defining a set of quality dimensions specifically designed to capture the characteristics of descriptive information about cultural heritage objects. Our definition is based on an in-depth analysis of existing dimensions and is illustrated through domain-specific examples. We then evaluate the practical applicability of our proposed quality definition using a curated set of real-world data quality problems from the cultural heritage domain. This empirical evaluation substantiates our definition of data quality, resulting in a comprehensive definition of data quality in this domain.2026-02-19T10:12:26ZpreprintMarkus MatoniArno KesperGabriele Taentzerhttp://arxiv.org/abs/2603.00107v1SciKGDash: The Scientific Knowledge Graph Dashboard for Supporting Knowledge Curation2026-02-18T10:37:32ZResearch knowledge graphs (RKGs) have emerged as essential technology for organizing scientific knowledge, but their success depends heavily on the quality of their underlying content. Knowledge curation is a critical task to ensure the quality of (research) knowledge graphs ((R)KGs), with human curation being the gold standard despite its time- and resource-intensive nature. Automated methods, while efficient, lack the precision of human expertise. Hybrid approaches, combining automated processes with human oversight, offer a promising solution to this challenge. Dashboards can act as supportive tools in hybrid curation approaches, offering real-time updates and visual overviews. This paper presents an action research study, conducted in collaboration with the Curation and Community Building (C&CB) team of the Open Research Knowledge Graph (ORKG), to explore the development of a dashboard, called SciKGDash, designed to support knowledge curation of the ORKG. SciKGDash serves as a minimum viable product (MVP) tailored to the needs of the C&CB team, with potential for adaptation to other (R)KGs. An experiment with 15 participants demonstrated the usability of SciKGDash, with successful completion of 4 out of 5 curation tasks in under 5 minutes. In addition, SciKGDash received a positive user experience rating (UEQ score of 1.93). While the tailored solution proved effective for the ORKG, the research also highlights limitations in applying specific quality metrics across diverse (R)KGs. Future work should focus on identifying common quality metrics and enhancing SciKGDash with user-friendly features for querying customized quality metrics. Overall, knowledge curation in RKGs remains an under-explored field, warranting further research.2026-02-18T10:37:32Z2025 ACM/IEEE Joint Conference on Digital Libraries (JCDL), 2025, pp. 187-196Lena JohnSören AuerOliver Karras10.1109/JCDL67857.2025.00030http://arxiv.org/abs/2510.22389v2Can Small and Reasoning Large Language Models Score Journal Articles for Research Quality and Do Averaging and Few-shot Help?2026-02-17T13:09:07ZPrevious research has shown that journal article quality ratings from the cloud based Large Language Model (LLM) families ChatGPT and Gemini and the medium sized open weights LLM Gemma3 27b correlate moderately with expert research quality scores. This article assesses whether other medium sized LLMs, smaller LLMs, and reasoning models have similar abilities. This is tested with Gemma3 variants, Llama4 Scout, Qwen3, Magistral Small and DeepSeek R1 on a dataset of 2,780 medical, health and life science papers in 6 fields, with two different gold standards, one novel. Few-shot and score averaging approaches are also evaluated. The results suggest that medium-sized LLMs have similar performance to ChatGPT 4o-mini and Gemini 2.0 Flash, but that 1b parameters may often, and 4b sometimes, be too few. Reasoning models did not have a clear advantage. Moreover, averaging scores from multiple identical queries seems to be a universally successful strategy, and there is weak evidence that few-shot prompts (four examples) tend to help. Overall, the results show, for the first time, that smaller LLMs >4b have a substantial capability to rate journal articles for research quality, especially if score averaging is used, but that reasoning does not give an advantage for this task; it is therefore not recommended because it is slow. The use of LLMs to support research evaluation is now more credible since multiple variants have a similar ability, including many that can be deployed offline in a secure environment without substantial computing resources.2025-10-25T18:12:41ZThelwall, M. & Mohammadi, E. (2026). Can small and reasoning Large Language Models score journal articles for research quality and do averaging and few-shot help? ScientometricsMike ThelwallEhsan Mohammadihttp://arxiv.org/abs/2510.01783v2PreprintToPaper dataset: connecting bioRxiv preprints with journal publications2026-02-17T12:51:57ZThe PreprintToPaper dataset connects bioRxiv preprints with their corresponding journal publications, enabling large-scale analysis of the preprint-to-publication process. It comprises metadata for 145,517 preprints from two periods, 2016-2018 (pre-pandemic) and 2020-2022 (pandemic), retrieved via the bioRxiv and Crossref APIs. We selected the two periods to capture preprint-publication dynamics before and during the COVID-19 pandemic while avoiding transitional years. Each record includes bibliographic information such as titles, abstracts, authors, institutions, submission dates, licenses, and subject categories, alongside enriched publication metadata including journal names, publication dates, author lists, and further information. In addition to the main dataset, a version-history subset provides all available versions of preprints within the two selected periods, enabling analysis of how preprints evolve over time. Preprints are categorized into three groups: Published (formally linked to a journal article), Preprint Only (posted on a preprint server), and Gray Zone (potentially published in a journal but unlinked). To enhance reliability, title and author similarity scores were computed, and a human-annotated subset of 299 records was created to evaluate Gray Zone cases. The dataset supports diverse applications, including studies of scholarly communication, open science policies, bibliometric tool development, and natural language processing research on textual changes between preprints and the corresponding journal articles. The dataset is publicly available in CSV format via Zenodo.2025-10-02T08:21:50Z13 pages, 3 figures, dataset paperScientific Data (2026)Fidan BadalovaJulian SienkiewiczPhilipp Mayr10.1038/s41597-026-06867-3http://arxiv.org/abs/2602.15413v1StatCounter: A Longitudinal Study of a Portable Scholarly Metric Display2026-02-17T08:13:55ZThis study explores a handheld, battery-operated e-ink device displaying Google Scholar citation statistics. The StatCounter places academic metrics into the flow of daily life rather than a desktop context. The work draws on a first-person, longitudinal auto-ethnographic inquiry examining how constant access to scholarly metrics influences motivation, attention, reflection, and emotional responses across work and non-work settings. The ambient proximity and pervasive availability of scholarly metrics invites frequent micro-checks, short reflective pauses, but also introduces moments of second-guessing when numbers drop or stagnate. Carrying the device prompts new narratives about academic identity, including a sense of companionship during travel and periods away from the office. Over time, the presence of the device turns metrics from an occasional reference into an ambient background of scholarly life. The study contributes insight into how situated, embodied access to academic metrics reshapes their meaning, and frames opportunities for designing tools that engage with scholarly evaluation in reflective ways.2026-02-17T08:13:55ZPublished in the proceedings of 10th ACM International Symposium on Pervasive Displays (PerDis '26)Jonas Oppenlaender10.1145/3797993.3798009http://arxiv.org/abs/2602.14755v1Measuring the relatedness between scientific publications using controlled vocabularies2026-02-16T13:58:47ZMeasuring the relatedness between scientific publications is essential in many areas of bibliometrics and science policy. Controlled vocabularies provide a promising basis for measuring relatedness and are widely used in combination with Salton's cosine similarity. The latter is problematic because it only considers exact matches between terms. This article introduces two alternative methods - soft cosine and maximum term similarities - that account for the semantic similarity between non-matching terms. The article compares the accuracy of all three methods using the assignment of publications to topics in the TREC 2006 Genomics Track and the assumption that accurate relatedness measures should assign high relatedness scores to publication pairs within the same topic and low scores to pairs from separate topics. Results show that soft cosine is the most accurate method, while the most widely used version of Salton's cosine is markedly less accurate than the other methods tested. These findings have implications for how controlled vocabularies should be used to measure relatedness.2026-02-16T13:58:47ZCurrently under review at Scientometrics (16 February 2026)Emil Dolmer Alnorhttp://arxiv.org/abs/2602.14384v1M-CODE: Materials Categorization via Ontology, Dimensionality and Evolution2026-02-16T01:18:15ZThe rapid advancement of artificial intelligence in materials science requires data standards and data management practices that can capture the complexity of real-world structures, including surfaces, interfaces, defects, and dimensionality reduction. We present M-CODE - Materials Categorization via Ontology, Dimensionality and Evolution - a compact categorization system that links materials-science-specific terminology to a set of reusable concepts as building blocks and provenance-aware transformations. M-CODE classifies structures by dimensionality, structural complexity (from pristine to compound pristine, defective, and processed), and variants that capture common structure creation and evolution approaches. A practical implementation of the categorization is provided in an open-source codebase that includes JSON schemas, examples, and Python and TypeScript types/interfaces, designed to support reproducible dataset generation, validation, and community contributions.2026-02-16T01:18:15Z13 pages, 2 figures, 5 tablesVsevolod BiryukovKamal ChoudharyTimur Bazhirovhttp://arxiv.org/abs/2602.14285v1FMMD: A multimodal open peer review dataset based on F1000Research2026-02-15T19:36:05ZAutomated scholarly paper review (ASPR) has entered the coexistence phase with traditional peer review, where artificial intelligence (AI) systems are increasingly incorporated into real-world manuscript evaluation. In parallel, research on automated and AI-assisted peer review has proliferated. Despite this momentum, empirical progress remains constrained by several critical limitations in existing datasets. While reviewers routinely evaluate figures, tables, and complex layouts to assess scientific claims, most existing datasets remain overwhelmingly text-centric. This bias is reinforced by a narrow focus on data from computer science venues. Furthermore, these datasets lack precise alignment between reviewer comments and specific manuscript versions, obscuring the iterative relationship between peer review and manuscript evolution. In response, we introduce FMMD, a multimodal and multidisciplinary open peer review dataset curated from F1000Research. The dataset bridges the current gap by integrating manuscript-level visual and structural data with version-specific reviewer reports and editorial decisions. By providing explicit alignment between reviewer comments and the exact article iteration under review, FMMD enables fine-grained analysis of the peer review lifecycle across diverse scientific domains. FMMD supports tasks such as multimodal issue detection and multimodal review comment generation. It provides a comprehensive empirical resource for the development of peer review research.2026-02-15T19:36:05ZWork in progressZhenzhen ZhuangYuqing FuJing ZhuZhangping ZhouJialiang Linhttp://arxiv.org/abs/2603.00080v1From Static Repositories to Agentic Knowledge Webs: ResearchTwin and the S-Index for Federated Human-AI Research Discovery2026-02-13T22:37:40ZThe exponential growth of scientific literature, datasets, and code repositories has created a discovery bottleneck that impedes knowledge synthesis and reproducibility. Traditional dissemination formats -- static PDFs, siloed code hosting, and fragmented data repositories -- fail to represent the interconnected narrative of modern research, while conventional metrics such as the H-index neglect contributions from reusable code and shared datasets. We present ResearchTwin, an open-source federated platform that transforms a researcher's scholarly output into a conversational digital twin, with a preliminary evaluation of its deployed prototype. The system uses a Bimodal Glial-Neural Optimization (BGNO) architecture comprising a Multi-Modal Connector Layer, a Glial Layer for caching and rate management, and a Neural Layer implementing Retrieval-Augmented Generation with a provider-agnostic LLM backend. We formalize the S-index, building on our earlier QIC framework, into a composite metric that extends FAIR principles -- via a binary accessibility/licensing gate, field-normalized impact scoring, and geometric collaboration scaling -- to quantify multimodal research impact. A case study comparing two researchers with similar H-indexes but substantially different S-indexes demonstrates that the metric captures dimensions of impact -- particularly dataset and code contributions -- invisible to citation-based measures alone. ResearchTwin exposes an inter-agentic discovery API using Schema.org typed responses and HATEOAS navigation, enabling AI agents to discover cross-lab synergies. A three-tier federated architecture preserves data sovereignty while enabling global discoverability.2026-02-13T22:37:40Z15 pages, 1 figure, https://github.com/martinfrasch/ResearchTwinMartin G. Fraschhttp://arxiv.org/abs/2309.04414v3Scientific productivity as a random walk2026-02-12T17:33:06ZThe expectation that scientific productivity follows regular patterns over a career underpins many scholarly evaluations. However, recent studies of individual productivity patterns reveal a puzzle: the average number of papers published per year robustly follows the ``canonical trajectory'' of a rapid rise followed by a gradual decline, yet only about 20\% of individual productivity trajectories follow this pattern. We resolve this puzzle by modeling scientific productivity as a random walk, showing that the canonical pattern can be explained as a decrease in the variance in changes to productivity in the early-to-mid career. By empirically characterizing the variable structure of 2,085 productivity trajectories of computer science faculty at 205 PhD-granting institutions, spanning 29,119 publications over 1980--2016, we (i) discover remarkably simple patterns in both early-career and year-to-year changes to productivity, and (ii) show that a random walk model of productivity both reproduces the canonical trajectory in the average productivity and captures much of the diversity of individual-level trajectories, including the lognormal distribution of cumulative productivity observed by William Shockley in 1957. We confirm that these results generalize across fields by fitting our model to a separate panel of 22,952 faculty across 12 fields from 2011 to 2023. These results highlight the importance of variance in shaping individual scientific productivity, opening up new avenues for characterizing how systemic incentives and opportunities can be directed for aggregate effect.2023-09-08T16:25:24ZSam ZhangNicholas LaBergeSamuel F. WayDaniel B. LarremoreAaron Clausethttp://arxiv.org/abs/2602.03828v2AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations2026-02-12T16:22:05ZHigh-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text. Specifically, before rendering the final result, AutoFigure engages in extensive thinking, recombination, and validation to produce a layout that is both structurally sound and aesthetically refined, outputting a scientific illustration that achieves both structural completeness and aesthetic appeal. Leveraging the high-quality data from FigureBench, we conduct extensive experiments to test the performance of AutoFigure against various baseline methods. The results demonstrate that AutoFigure consistently surpasses all baseline methods, producing publication-ready scientific illustrations. The code, dataset and huggingface space are released in https://github.com/ResearAI/AutoFigure.2026-02-03T18:41:43ZAccepted at the ICLR 2026Minjun ZhuZhen LinYixuan WengPanzhong LuQiujie XieYifan WeiSifan LiuQiyao SunYue Zhanghttp://arxiv.org/abs/2506.03527v2Distinguishing True Influence from Hyperprolificity with Citation Distance2026-02-12T08:43:46ZAccurately evaluating scholarly influence is essential for fair academic assessment, yet traditional bibliometric indicators - dominated by publication and citation counts - often favor hyperprolific authors over those with deeper, long-term impact. We propose the x-index, a novel citation-based metric that conceptualizes citation as a process of knowledge diffusion and incorporates citation distance to reflect the structural reach of scholarly work. By weighting citations according to the collaborative proximity between citing and cited authors, the x-index captures both the depth and breadth of influence within evolving academic networks. Empirical analyses show that the x-index significantly improves the rankings of Turing Award recipients while reducing those of hyperprolific authors, better aligning rankings with recognized academic merit. It also demonstrates superior discriminatory power among early-career researchers and reveals stronger sensitivity to institutional research quality. These results suggest that the x-index offers a more equitable and forward-looking alternative to existing metrics, with practical applications in talent identification, funding decisions, and academic recommendation systems.2025-06-04T03:19:11ZLu LiYun WanFeng Xiaohttp://arxiv.org/abs/2602.03866v3PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG2026-02-11T13:18:33ZTransforming scientific papers into multimodal presentation content is essential for research dissemination but remains labor intensive. Existing automated solutions typically treat each format as an isolated downstream task, leading to redundant processing and semantic inconsistency. We introduce PaperX, a unified framework that models academic presentation generation as a structural transformation and rendering process. Central to our approach is the Scholar DAG, an intermediate representation that decouples the paper's logical structure from its final presentation syntax. By applying adaptive graph traversal strategies, PaperX generates diverse, high quality outputs from a single source. Comprehensive evaluations demonstrate that our framework achieves the state of the art performance in content fidelity and aesthetic quality while significantly improving cost efficiency compared to specialized single task agents.2026-01-30T18:27:03Z29 pages, 9 figures, Project website: https://github.com/yutao1024/PaperXTao YuMinghui ZhangZhiqing CuiHao WangZhongtian LuoShenghua ChaiJunhao GongYuzhao PengYuxuan ZhouYujia YangZhenghao ZhangHaopeng JinXinming WangYufei XiongJiabing YangJiahao YuanHanqing WangHongzhu YiYan HuangLiang Wanghttp://arxiv.org/abs/2603.00069v1Top performers and top journals: Persistent concentration in scientific publishing2026-02-10T20:21:55ZIn this research, we analyze the relationship between publishing productivity and access to highly prestigious journals, treating publishing in top journals as a stratification mechanism selecting publishing elites. We study N = 144,314 Polish scientists publishing for 30 years (1992-2021) and their Nart = 433,546 unique research articles published in the period. Using bibliometric data from Scopus, we compare the scientists belonging to the top productivity decile (the upper 10%, termed top performers) and the remaining population of scientists (90%) by discipline and period (five six-year periods). We measure the share of publications in prestigious segments of journals, with particular reference to the 90th-99th percentiles, and we use nonlinear journal prestige-normalized productivity. Our results indicate that access to top journals (defined as the top 10% of journals indexed in Scopus) is powerfully and permanently concentrated in the group of top performers in all disciplines and periods studied. The differences between top performers and the other scientists are primarily of a qualitative nature: they are seen almost exclusively at the top of the journal hierarchy rather than in its bottom or middle segments. Our logistic regression models indicate the complementarity of quantity and quality: publishing intensity increases the probability of membership in the elite segment of top performers, especially when it is coupled with publishing in prestigious journals. Our results suggest that top journals function as selection gates to academic careers and that they function as durable mechanisms of elite reproduction in science.2026-02-10T20:21:55Z36 pages plus supplementray materialsMarek KwiekWojciech Roszka