https://arxiv.org/api/cA/hBuYnW3cYuPMsNrk5tgCPSa42026-06-13T19:39:48Z606546515http://arxiv.org/abs/2502.01364v2Meursault as a Data Point2025-11-26T07:19:57ZIn an era dominated by datafication, the reduction of human experiences to quantifiable metrics raises profound philosophical and ethical questions. This paper explores these issues through the lens of Meursault, the protagonist of Albert Camus' The Stranger, whose emotionally detached existence epitomizes the existential concept of absurdity. Using natural language processing (NLP) techniques including emotion detection (BERT), sentiment analysis (VADER), and named entity recognition (spaCy)-this study quantifies key events and behaviors in Meursault's life. Our analysis reveals the inherent limitations of applying algorithmic models to complex human experiences, particularly those rooted in existential alienation and moral ambiguity. By examining how modern AI tools misinterpret Meursault's actions and emotions, this research underscores the broader ethical dilemmas of reducing nuanced human narratives to data points, challenging the foundational assumptions of our data-driven society. The findings presented in this paper serve as a critique of the increasing reliance on data-driven narratives and advocate for incorporating humanistic values in artificial intelligence.2025-02-03T13:56:48Z7 pages, 9 figures, 4 tablesAbhinav Prataphttp://arxiv.org/abs/2511.21771v1From 'Individual Scientist' to 'Integrated Scientist': The Evolution of Scientific Organizational panels and Their Impact on the Scientific System2025-11-26T03:10:55ZThis article aims to propose and elucidate the analytical concepts of "individual scientist" and "integrated scientist" to depict the fundamental transformation in the modes of scientific research actors throughout the history of science. The "individual scientist" represents an early modern scientific research panel characterized by independence, egalitarian collaboration, and personal recognition, while the "integrated scientist" emerged in the context of "big science," marked by hierarchical teams, division of labor, collaboration, and the concentration of recognition on team leaders. Through historical review and case analysis, this article explores the underlying drivers of this transformation and focuses on its challenges and reconstructions concerning the name-based scientific reward system, aiming to provide a reflective perspective for contemporary scientific governance and research evaluation.2025-11-26T03:10:55Z5 pagesZekai Zhanghttp://arxiv.org/abs/2511.20862v1Access models, authorship patterns, and citation impact in Ukrainian scholarly publishing (2020-2023)2025-11-25T21:18:38ZThis study aimed to explore the relationship between access models, authorship patterns, and citation impact in Ukrainian research output from 2020 to 2023. The focus was on scholars affiliated with the National Academy of Sciences of Ukraine (NASU) and universities. Findings highlight that open access (OA) articles constituted the majority of publications by Ukrainian scholars during this period. This percentage reached 75.4% for NASU and 85.8% for universities. In both cases, the increase was driven by Gold OA and Hybrid Gold OA, the latter benefiting in part from Elsevier's waivers. Diamond OA prevailed for NASU, while Gold OA was dominant for universities. The effects of Russia's full-scale invasion of Ukraine included (1) a decline in the share of articles in foreign journals for both NASU and universities, (2) a decrease in Gold OA in foreign journals and an increase in Gold OA in Ukrainian journals for universities, and (3) a rise in internationally co-authored Gold OA articles in foreign journals for both entities. Despite waivers for Gold OA provided by major publishers and an increase in Gold OA articles in Elsevier and Springer journals, MDPI and Aluna Publishing House remained the dominant publishers of Gold OA in foreign journals. Hybrid Gold OA, Bronze OA, and Green OA articles in foreign journals had the highest citation impact. The citation impact of Gold OA outperformed Diamond OA. The study confirms the growing dominance of Gold OA, suggesting the need for sustainable OA models that ensure both equity and broad dissemination of research.2025-11-25T21:18:38ZMyroslava Hladchenkohttp://arxiv.org/abs/2511.20405v1The external rhythm of an actor in science: New indicators for the science of science2025-11-25T15:34:45ZWhen calculating citation indicators, whether it is the total number of received citations or the average citations per paper, we always face the same problem. Namely, that papers published in different years have varying citation potential. Hence, strictly speaking, their citations cannot be compared. In a former study, we created a new indicator called the internal rhythm indicator of an actor. The internal rhythm indicator makes it possible to compare the citation performances among different publication years, but it is only valid within the actor based framework. In this study, we define, create, and explore the external rhythm of an actor, which is also a sequence of ratios of observed citations to expected citations. The essential difference between internal rhythm and external rhythm lies in the way they are created and hence in the point of view taken to study an actor. The former is created based on its own publication-citation matrix, while the latter is based on two publication-citation matrices. One is the same as the former. The other one is a publication-citation matrix of a collective, which includes the actor under study. The external rhythm of an actor is a citation-based indicator of research that can be used to compare not only the citation performance of an actor with that of the collective the actor is part of, but also to compare two or more actors within the same collective. We further propose a summary average of ratios indicator.2025-11-25T15:34:45Z30 pagesLiming LiangRonald Rousseauhttp://arxiv.org/abs/2510.02143v2How to Find Fantastic AI Papers: Self-Rankings as a Powerful Predictor of Scientific Impact Beyond Peer Review2025-11-25T03:01:10ZPeer review in academic research aims not only to ensure factual correctness but also to identify work of high scientific potential that can shape future research directions. This task is especially critical in fast-moving fields such as artificial intelligence (AI), yet it has become increasingly difficult given the rapid growth of submissions. In this paper, we investigate an underexplored measure for identifying high-impact research: authors' own rankings of their multiple submissions to the same AI conference. Grounded in game-theoretic reasoning, we hypothesize that self-rankings are informative because authors possess unique understanding of their work's conceptual depth and long-term promise. To test this hypothesis, we conducted a large-scale experiment at a leading AI conference, where 1,342 researchers self-ranked their 2,592 submissions by perceived quality. Tracking outcomes over more than a year, we found that papers ranked highest by their authors received twice as many citations as their lowest-ranked counterparts; self-rankings were especially effective at identifying highly cited papers (those with over 150 citations). Moreover, we showed that self-rankings outperformed peer review scores in predicting future citation counts. Our results remained robust after accounting for confounders such as preprint posting time and self-citations. Together, these findings demonstrate that authors' self-rankings provide a reliable and valuable complement to peer review for identifying and elevating high-impact research in AI.2025-10-02T15:50:21ZBuxin SuNatalie CollinaGarrett WenDidong LiKyunghyun ChoJianqing FanBingxin ZhaoWeijie Suhttp://arxiv.org/abs/2511.19538v1Studying Maps at Scale: A Digital Investigation of Cartography and the Evolution of Figuration2025-11-24T10:35:37ZThis thesis presents methods and datasets to investigate cartographic heritage on a large scale and from a cultural perspective. Heritage institutions worldwide have digitized more than one million maps, and automated techniques now enable large-scale recognition and extraction of map content. Yet these methods have engaged little with the history of cartography, or the view that maps are semantic-symbolic systems, and cultural objects reflecting political and epistemic expectations. This work leverages a diverse corpus of 771,561 map records and 99,715 digitized images aggregated from 38 digital catalogs. After normalization, the dataset includes 236,925 contributors and spans six centuries, from 1492 to 1948. These data make it possible to chart geographic structures and the global chronology of map publication. The spatial focus of cartography is analyzed in relation to political dynamics, evidencing links between Atlantic maritime charting, the triangular trade, and colonial expansion. Further results document the progression of national, domestic focus and the impact of military conflicts on publication volumes. The research introduces semantic segmentation techniques and object detection models for the generic recognition of land classes and cartographic signs, trained on annotated data and synthetic images. The analysis of land classes shows that maps are designed images whose framing and composition emphasize features through centering and semantic symmetries. The study of cartographic figuration encodes 63 M signs and 25 M fragments into a latent visual space, revealing figurative shifts such as the replacement of relief hachures by terrain contours and showing that signs tend to form locally consistent systems. Analyses of collaboration and diffusion highlight the role of legitimacy, larger actors, and major cities in the spread of figurative norms and semiotic cultures.2025-11-24T10:35:37ZPhD thesis, EPFL. 396 pages, 156 figuresRemi Petitpierre10.5075/epfl-thesis-11559http://arxiv.org/abs/2409.08897v2Ensuring Adherence to Standards in Experiment-Related Metadata Entered Via Spreadsheets2025-11-23T16:09:38ZScientists increasingly recognize the importance of providing rich, standards-adherent metadata to describe their experimental results. Despite the availability of sophisticated tools to assist in the process of data annotation, investigators generally seem to prefer to use spreadsheets when supplying metadata, despite the limitations of spreadsheets in ensuring metadata consistency and compliance with formal specifications. In this paper, we describe an end-to-end approach that supports spreadsheet-based entry of metadata, while ensuring rigorous adherence to community-based metadata standards and providing quality control. Our methods employ several key components, including customizable templates that represent metadata standards and that can inform the spreadsheets that investigators use to author metadata, controlled terminologies and ontologies for defining metadata values that can be accessed directly from a spreadsheet, and an interactive Web-based tool that allows users to rapidly identify and fix errors in their spreadsheet-based metadata. We demonstrate how this approach is being deployed in a biomedical consortium known as HuBMAP to define and collect metadata about a wide range of biological assays.2024-09-13T15:11:20ZSci Data 12, 265 (2025)Martin J. O'ConnorJosef HardiMarcos Martínez-RomeroSowmya SomasundaramBrendan HonickStephen A. FisherAjay PillaiMark A. Musen10.1038/s41597-025-04589-6http://arxiv.org/abs/2511.18408v1A pipeline for matching bibliographic references with incomplete metadata: experiments with Crossref and OpenCitations2025-11-23T11:28:15ZWhile Crossref makes available more than 1.8 billion bibliographic references from publications for which it provides a DOI, more than 698 million of these references do not specify a DOI, making the creation of a formal citation link from the citing entity and the cited entity problematic. In this article, we propose an analysis of Crossref bibliographic references to show how we can use the unstructured text defining such references and the available (and partial) metadata specified in them to (a) map them to existing entities included in OpenCitations Meta and, then, (b) to enable the potential inclusion of additional and valid citations link among these entities. We have defined a precise methodology to address the analysis and run it against a manually defined Gold Standard and a subset of Crossref. While the heuristic-based tool developed has demonstrated strong matching precision and effective metadata integration, its recall limitations highlight the necessity of further enhancements to address metadata inconsistencies and leverage additional sources of citation data.2025-11-23T11:28:15ZMatteo GuenciIvan HeibiChiara ParraviciniSilvio PeroniMarta Soricettihttp://arxiv.org/abs/2511.21739v1The Rapid Growth of AI Foundation Model Usage in Science2025-11-21T19:00:15ZWe present the first large-scale analysis of AI foundation model usage in science - not just citations or keywords. We find that adoption has grown rapidly, at nearly-exponential rates, with the highest uptake in Linguistics, Computer Science, and Engineering. Vision models are the most used foundation models in science, although language models' share is growing. Open-weight models dominate. As AI builders increase the parameter counts of their models, scientists have followed suit but at a much slower rate: in 2013, the median foundation model built was 7.7x larger than the median one adopted in science, by 2024 this had jumped to 26x. We also present suggestive evidence that scientists' use of these smaller models may be limiting them from getting the full benefits of AI-enabled science, as papers that use larger models appear in higher-impact journals and accrue more citations.2025-11-21T19:00:15ZAna TrišovićAlex FogelsonJanakan SivaloganathanNeil Thompsonhttp://arxiv.org/abs/2507.04224v3Fairness Evaluation of Large Language Models in Academic Library Reference Services2025-11-21T15:33:14ZAs libraries explore large language models (LLMs) for use in virtual reference services, a key question arises: Can LLMs serve all users equitably, regardless of demographics or social status? While they offer great potential for scalable support, LLMs may also reproduce societal biases embedded in their training data, risking the integrity of libraries' commitment to equitable service. To address this concern, we evaluate whether LLMs differentiate responses across user identities by prompting six state-of-the-art LLMs to assist patrons differing in sex, race/ethnicity, and institutional role. We find no evidence of differentiation by race or ethnicity, and only minor evidence of stereotypical bias against women in one model. LLMs demonstrate nuanced accommodation of institutional roles through the use of linguistic choices related to formality, politeness, and domain-specific vocabularies, reflecting professional norms rather than discriminatory treatment. These findings suggest that current LLMs show a promising degree of readiness to support equitable and contextually appropriate communication in academic library reference services.2025-07-06T03:28:24ZHaining WangJason ClarkYueru YanStar BradleyRuiyang ChenYiqiong ZhangHengyi FuZuoyu Tianhttp://arxiv.org/abs/2511.17689v1ARISE: Agentic Rubric-Guided Iterative Survey Engine for Automated Scholarly Paper Generation2025-11-21T14:14:35ZThe rapid expansion of scholarly literature presents significant challenges in synthesizing comprehensive, high-quality academic surveys. Recent advancements in agentic systems offer considerable promise for automating tasks that traditionally require human expertise, including literature review, synthesis, and iterative refinement. However, existing automated survey-generation solutions often suffer from inadequate quality control, poor formatting, and limited adaptability to iterative feedback, which are core elements intrinsic to scholarly writing.
To address these limitations, we introduce ARISE, an Agentic Rubric-guided Iterative Survey Engine designed for automated generation and continuous refinement of academic survey papers. ARISE employs a modular architecture composed of specialized large language model agents, each mirroring distinct scholarly roles such as topic expansion, citation curation, literature summarization, manuscript drafting, and peer-review-based evaluation. Central to ARISE is a rubric-guided iterative refinement loop in which multiple reviewer agents independently assess manuscript drafts using a structured, behaviorally anchored rubric, systematically enhancing the content through synthesized feedback.
Evaluating ARISE against state-of-the-art automated systems and recent human-written surveys, our experimental results demonstrate superior performance, achieving an average rubric-aligned quality score of 92.48. ARISE consistently surpasses baseline methods across metrics of comprehensiveness, accuracy, formatting, and overall scholarly rigor. All code, evaluation rubrics, and generated outputs are provided openly at https://github.com/ziwang11112/ARISE2025-11-21T14:14:35Z20 pages including an appendix, 7 figures and 6 tablesZi WangXingqiao WangSangah LeeXiaowei Xuhttp://arxiv.org/abs/2512.03723v1Innovation by Displacement2025-11-20T23:42:07ZNew ideas are often thought to arise from recombining existing knowledge. Yet despite rapid publication growth - and expanding opportunities for recombination - scientific breakthroughs remain rare. This gap between productivity and progress challenges recombinant growth theory as the prevailing account of innovation. We argue that the limitation of this theory lies in treating ideas solely as complements, overlooking that breakthroughs often arise when ideas act as substitutes. To test this, we integrate scientist interviews, bibliometric validation, and machine learning analysis of 41 million papers (1965 - 2024). Interviews reveal that breakthroughs are marked not by novelty (Atypicality) alone but by their ability to displace dominant ideas (Disruption). Large-scale analysis confirms that novelty and disruption represent distinct innovation mechanisms: they are negatively correlated across domains, periods, team sizes, and paper versions. Novel papers extend dominant ideas across topics and attract immediate attention; disruptive papers displace them within the same topic and generate lasting influence. Hence, progress slows not from lack of effort but because most research extends rather than overturns ideas. Applying this perspective reveals distinct roles of theories and methods in scientific change: methods more often drive breakthroughs, whereas theories tend to be novel but rarely disruptive, reinforcing the dominance of established ideas.2025-11-20T23:42:07Z40 papges, 15 figuresLinzhuo LiYiling LinLingfei Wuhttp://arxiv.org/abs/2511.16198v1SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning2025-11-20T10:05:21ZEffective scientific communication depends on accurate citations that validate sources and guide readers to supporting evidence. Yet academic literature faces mounting challenges: semantic citation errors that misrepresent sources, AI-generated hallucinated references, and traditional citation formats that point to entire papers without indicating which sections substantiate specific claims. We introduce SemanticCite, an AI-powered system that verifies citation accuracy through full-text source analysis while providing rich contextual information via detailed reasoning and relevant text snippets. Our approach combines multiple retrieval methods with a four-class classification system (Supported, Partially Supported, Unsupported, Uncertain) that captures nuanced claim-source relationships and enables appropriate remedial actions for different error types. Our experiments show that fine-tuned lightweight language models achieve performance comparable to large commercial systems with significantly lower computational requirements, making large-scale citation verification practically feasible. The system provides transparent, evidence-based explanations that support user understanding and trust. We contribute a comprehensive dataset of over 1,000 citations with detailed alignments, functional classifications, semantic annotations, and bibliometric metadata across eight disciplines, alongside fine-tuned models and the complete verification framework as open-source software. SemanticCite addresses critical challenges in research integrity through scalable citation verification, streamlined peer review, and quality control for AI-generated content, providing an open-source foundation for maintaining citation accuracy at scale.2025-11-20T10:05:21Z21 pages, 4 figuresSebastian Haanhttp://arxiv.org/abs/2511.15872v1AI-Assisted Writing Is Growing Fastest Among Non-English-Speaking and Less Established Scientists2025-11-19T21:00:18ZThe dominance of English in global science has long created significant barriers for non-native speakers. The recent emergence of generative artificial intelligence (GenAI) dramatically reduces drafting and revision costs, but, simultaneously, raises a critical question: how is the technology being adopted by the global scientific community, and is it mitigating existing inequities? This study provides first large-scale empirical evidence by analyzing over two million full-text biomedical publications from PubMed Central from 2021 to 2024, estimating the fraction of AI-generated content using a distribution-based framework. We observe a significant post-ChatGPT surge in AI-assisted writing, with adoption growing fastest in contexts where language barriers are most pronounced: approximately 400% in non-English-speaking countries compared to 183% in English-speaking countries. This adoption is highest among less-established scientists, including those with fewer publications and citations, as well as those in early career stages at lower-ranked institutions. Prior AI research experience also predicted higher adoption. Finally, increased AI usage was associated with a modest increase in productivity, narrowing the publication gap between scientists from English-speaking and non-English-speaking countries with higher levels of AI adoption. These findings provide large-scale evidence that generative AI is being adopted unevenly, reflecting existing structural disparities while also offering a potential opportunity to mitigate long-standing linguistic inequalities.2025-11-19T21:00:18ZJialin LiuYongyuan HeZhihan ZhengYi BuChaoqun Nihttp://arxiv.org/abs/2510.21826v2The 1979 Iranian Revolution and the Lost Decade of Science: A Counterfactual Scientometric Analysis2025-11-18T19:57:06ZThis paper presents a comprehensive scientometric analysis of the long-term impact of the 1979 Iranian Revolution on the nation scientific development. Using Scopus-indexed data from 1960 to 2024, we benchmark Iran publication trajectory against a carefully selected peer group representing diverse development models, established scientific leaders, Netherlands, stable regional powers, Israel, and high-growth, Asian Tigers, South Korea, Taiwan, Singapore alongside Greece and China. The analysis reveals a stark divergence, in the late 1970s, Iran scientific output surpassed that of South Korea, China and Taiwan. The revolution, however, precipitated a collapse, followed by a lost decade of stagnation, precisely when its Asian peers began an unprecedented, state driven ascent. We employ counterfactual models based on pre revolutionary growth trends to quantify the resulting knowledge deficit. The findings suggest that, in an alternate, stable timeline, Iran scientific output could have rivaled South Korea today. We further outline a research agenda to analyze normalized impact metrics, such as FWCI, and collaboration patterns, complementing our findings on publication volume. By contextualizing Iran unique trajectory, this study contributes to a broader understanding of the divergent recovery patterns exhibited by national scientific systems following profound political shocks, offering insights into the enduring consequences of historical disruptions on the global scientific landscape.2025-10-22T00:40:35ZEhsan Roohi