https://arxiv.org/api/quvCB1UdiHwyZbCNahx1p4hHYB82026-06-15T02:43:51Z606688515http://arxiv.org/abs/2503.21423v1Resilience and Volatility in Academic Publishing, The Case of the University of Maribor 2004-20232025-03-27T12:10:22ZThis article investigates the dynamics of academic publishing resilience and volatility at Slovenia's University of Maribor (UM) from 2004 to 2023. This period was marked by significant economic pressures and policy shifts, including changes to higher education legislation and university funding. Using UM's employment data and OpenAlex publication records, the study examines the relationship between employed researcher numbers and unique authors publishing under the UM affiliation. Despite a substantial decrease in researcher employment during the 2009-2013 economic recession and austerity phase, the number of unique authors publishing with UM affiliation surprisingly increased. This growth was driven by factors such as a shift towards project-based funding, contributions from an expanding doctoral student cohort, and increased international collaborations. Analysis of author turnover reveals a notable contrast: high short-term volatility (annual churn rates of ~40-50%) versus significant mid-term stability (5-year churn rates of ~8-10%). Survival analysis confirms this trend, showing high initial attrition among publishing authors but long-term persistence for a core group. Furthermore, co-authorship network analysis indicates the UM research network has become more resilient over time. A critical finding is a fundamental shift in network structure around 2016, transitioning from dissassortative to assortative mixing, signaling profound changes in collaboration dynamics. The findings carry implications for research policy and university management, highlighting the necessity of balancing short-term performance indicators with the long-term stability and resilience essential for a thriving research community.2025-03-27T12:10:22ZMojca Tancer VerbotenDean Korošakhttp://arxiv.org/abs/2501.05001v240 Years of Interdisciplinary Research: Phases, Origins, and Key Turning Points (1981-2020)2025-03-27T04:47:45ZThis study examines the historical evolution of interdisciplinary research (IDR) over a 40-year period, focusing on its dynamic trends, phases, and key turning points. We apply time series analysis to identify critical years for interdisciplinary citations (CYICs) and categorizes IDR into three distinct phases based on these trends: Period I (1981-2002), marked by sporadic and limited interdisciplinary activity; Period II (2003-2016), characterized by the emergence of large-scale IDR led primarily by Medicine, with significant breakthroughs in cloning and medical technology; and Period III (2017-present), where IDR became a widely adopted research paradigm. Our findings indicate that IDR has been predominantly concentrated within the Natural Sciences, with Medicine consistently at the forefront, and highlights increasing contributions from Engineering and Environmental disciplines as a new trend. These insights enhance the understanding of the evolution of IDR, its driving factors, and the shifts in the focus of interdisciplinary collaborations.2025-01-09T06:30:12Z16 pages, 3 figuresGuoyang RongYing ChenFeicheng MaThorsten Kochhttp://arxiv.org/abs/2503.21114v1Measuring and Analyzing Subjective Uncertainty in Scientific Communications2025-03-27T03:12:50ZUncertainty of scientific findings are typically reported through statistical metrics such as $p$-values, confidence intervals, etc. The magnitude of this objective uncertainty is reflected in the language used by the authors to report their findings primarily through expressions carrying uncertainty-inducing terms or phrases. This language uncertainty is a subjective concept and is highly dependent on the writing style of the authors. There is evidence that such subjective uncertainty influences the impact of science on public audience. In this work, we turned our focus to scientists themselves, and measured/analyzed the subjective uncertainty and its impact within scientific communities across different disciplines. We showed that the level of this type of uncertainty varies significantly across different fields, years of publication and geographical locations. We also studied the correlation between subjective uncertainty and several bibliographical metrics, such as number/gender of authors, centrality of the field's community, citation count, etc. The underlying patterns identified in this work are useful in identification and documentation of linguistic norms in scientific communication in different communities/societies.2025-03-27T03:12:50ZComing with Appendix and supplementary materialJamshid SouratiGrace Shaohttp://arxiv.org/abs/2503.19848v1Guarding against artificial intelligence--hallucinated citations: the case for full-text reference deposit2025-03-25T17:12:38ZThe tendency of generative artificial intelligence (AI) systems to "hallucinate" false information is well-known; AI-generated citations to non-existent sources have made their way into the reference lists of peer-reviewed publications. Here, I propose a solution to this problem, taking inspiration from the Transparency and Openness Promotion (TOP) data sharing guidelines, the clash of generative AI with the American judiciary, and the precedent set by submissions of prior art to the United States Patent and Trademark Office. Journals should require authors to submit the full text of each cited source along with their manuscripts, thereby preventing authors from citing any material whose full text they cannot produce. This solution requires limited additional work on the part of authors or editors while effectively immunizing journals against hallucinated references.2025-03-25T17:12:38Z3 pagesGlynn A. Guarding against artificial intelligence -- hallucinated citations: The case for full-text reference deposit. Eur Sci Ed. 2025;51:e153973Alex Glynn10.3897/ese.2025.e153973http://arxiv.org/abs/2503.22714v1TRIDIS: A Comprehensive Medieval and Early Modern Corpus for HTR and NER2025-03-25T03:44:11ZThis paper introduces TRIDIS (Tria Digita Scribunt), an open-source corpus of medieval and early modern manuscripts. TRIDIS aggregates multiple legacy collections (all published under open licenses) and incorporates large metadata descriptions. While prior publications referenced some portions of this corpus, here we provide a unified overview with a stronger focus on its constitution. We describe (i) the narrative, chronological, and editorial background of each major sub-corpus, (ii) its semi-diplomatic transcription rules (expansion, normalization, punctuation), (iii) a strategy for challenging out-of-domain test splits driven by outlier detection in a joint embedding space, and (iv) preliminary baseline experiments using TrOCR and MiniCPM2.5 comparing random and outlier-based test partitions. Overall, TRIDIS is designed to stimulate joint robust Handwritten Text Recognition (HTR) and Named Entity Recognition (NER) research across medieval and early modern textual heritage.2025-03-25T03:44:11Z6 pages, 3 figures, 2 tablesSergio Torres Aguilarhttp://arxiv.org/abs/2503.19257v1SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings2025-03-25T01:37:02ZEvery scientific discovery starts with an idea inspired by prior work, interdisciplinary concepts, and emerging challenges. Recent advancements in large language models (LLMs) trained on scientific corpora have driven interest in AI-supported idea generation. However, generating context-aware, high-quality, and innovative ideas remains challenging. We introduce SCI-IDEA, a framework that uses LLM prompting strategies and Aha Moment detection for iterative idea refinement. SCI-IDEA extracts essential facets from research publications, assessing generated ideas on novelty, excitement, feasibility, and effectiveness. Comprehensive experiments validate SCI-IDEA's effectiveness, achieving average scores of 6.84, 6.86, 6.89, and 6.84 (on a 1-10 scale) across novelty, excitement, feasibility, and effectiveness, respectively. Evaluations employed GPT-4o, GPT-4.5, DeepSeek-32B (each under 2-shot prompting), and DeepSeek-70B (3-shot prompting), with token-level embeddings used for Aha Moment detection. Similarly, it achieves scores of 6.87, 6.86, 6.83, and 6.87 using GPT-4o under 5-shot prompting, GPT-4.5 under 3-shot prompting, DeepSeek-32B under zero-shot chain-of-thought prompting, and DeepSeek-70B under 5-shot prompting with sentence-level embeddings. We also address ethical considerations such as intellectual credit, potential misuse, and balancing human creativity with AI-driven ideation. Our results highlight SCI-IDEA's potential to facilitate the structured and flexible exploration of context-aware scientific ideas, supporting innovation while maintaining ethical standards.2025-03-25T01:37:02ZFarhana KeyaGollam RabbyPrasenjit MitraSahar VahdatiSören AuerYaser Jaradehhttp://arxiv.org/abs/2108.13898v3The emojification of sentiment on social media: Collection and analysis of a longitudinal Twitter sentiment dataset2025-03-24T17:29:20ZSocial media, as a means for computer-mediated communication, has been extensively used to study the sentiment expressed by users around events or topics. There is however a gap in the longitudinal study of how sentiment evolved in social media over the years. To fill this gap, we develop TM-Senti, a new large-scale, distantly supervised Twitter sentiment dataset with over 184 million tweets and covering a time period of over seven years. We describe and assess our methodology to put together a large-scale, emoticon- and emoji-based labelled sentiment analysis dataset, along with an analysis of the resulting dataset. Our analysis highlights interesting temporal changes, among others in the increasing use of emojis over emoticons. We publicly release the dataset for further research in tasks including sentiment analysis and text classification of tweets. The dataset can be fully rehydrated including tweet metadata and without missing tweets thanks to the archive of tweets publicly available on the Internet Archive, which the dataset is based on.2021-08-31T14:54:46Zcorrected typo in appendixWenjie YinRabab AlkhalifaArkaitz Zubiagahttp://arxiv.org/abs/2503.18506v1Trends in Open Access Academic Outputs of State Agricultural Universities in India: Patterns from OpenAlex2025-03-24T10:02:10ZPurpose: The study examines the Open Access (OA) landscape of Indian state agricultural universities, focusing on OA growth, leading institutions, prolific authors, preferred sources, funding, APC usage, and trending topics. It aims to identify research gaps, guide future research, and support policymakers in developing effective OA policies Design/methodology/approach The experiment utilized the OpenAlex database to collect global open access (OA) publications from Indian state agricultural universities over the past ten years (2014-2023). Using the Research Organization Registry ID, 97,536 publications were extracted. Data analysis was performed with OpenRefine, and ArcGIS 10.8 and Microsoft Excel were used for visualization. Findings: The global OA research output from state agricultural universities amounted to 65,889 publications across five OA categories: Green OA (7.35%), Diamond OA (6.74%), Gold OA (57.27%), Hybrid OA (9.24%), and Bronze OA (19.41%). Notably, 78.34% of articles were published in 864 low-impact domestic journals. Tamil Nadu Agricultural University produced the most publications in Gold, Diamond, Hybrid, and Bronze OA categories, while Punjab Agricultural University excelled in Green OA and received the highest funding, incurring the most article processing charges (APCs). Collaborative research focusing on agricultural policies, rice water management, soil fertility, and crop productivity had a greater impact. Originality/value The experiment is the first effort to evaluate the OA global academic research outputs of Indian state agriculture universities. The findings offer institutions, state governments, and funding agencies the opportunity to prioritise open-access publishing to promote sustainable agricultural research. Research limitations/implications The study is limited to the publications data indexed in the OpenAlex database.2025-03-24T10:02:10ZAbhijit RoyAkhandanand ShuklaAditya Tripathihttp://arxiv.org/abs/2503.18215v1Can news and social media attention reduce the influence of problematic research?2025-03-23T21:35:26ZNews and social media are widely used to disseminate science, but do they also help raise awareness of problems in research? This study investigates whether high levels of news and social media attention might accelerate the retraction process and increase the visibility of retracted articles. To explore this, we analyzed 15,642 news mentions, 6,588 blog mentions, and 404,082 X mentions related to 15,461 retracted articles. Articles receiving high levels of news and X mentions were retracted more quickly than non-mentioned articles in the same broad field and with comparable publication years, author impact, and journal impact. However, this effect was not statistically signicant for articles with high levels of blog mentions. Notably, articles frequently mentioned in the news experienced a significant increase in annual citation rates after their retraction, possibly because media exposure enhances the visibility of retracted articles, making them more likely to be cited. These findings suggest that increased public scrutiny can improve the efficiency of scientific self-correction, although mitigating the influence of retracted articles remains a gradual process.2025-03-23T21:35:26Z29 pagesEr-Te ZhengHui-Zhen FuXiaorui JiangZhichao FangMike Thelwallhttp://arxiv.org/abs/2210.04422v3Expertise diversity of teams predicts originality and long-term impact in science and technology2025-03-22T03:04:01ZDespite the growing importance of teams in producing innovative and high-impact science and technology, it remains unclear how expertise diversity among team members relates to the originality and impact of the work they produce. Here, we develop a new method to quantify the expertise distance of researchers based on their prior career histories and apply it to 23 million scientific publications and 4 million patents. We find that across science and technology, expertise-diverse teams tend to produce work with greater originality. Teams with more diverse expertise have no significant impact advantage in the short- (2 years) or mid-term (5 years). Instead, they exhibit substantially higher long-term impact (10 years), increasingly attracting larger cross-disciplinary influence. This impact premium of expertise diversity among team members becomes especially pronounced when other dimensions of team diversity are missing, as teams within the same institution or country appear to disproportionately reap the benefits of expertise diversity. While gender-diverse teams have relatively higher impact on average, teams with varied levels of gender diversity all seem to benefit from increased expertise diversity. Given the growing knowledge demands on individual researchers, implementation of incentives for original research, and the tradeoffs between short-term and long-term impacts, these results may have implications for funding, assembling, and retaining teams with originality and long-lasting impacts.2022-10-10T03:51:20Z15 pages, 5 figuresWeihua LiHongwei Zhenghttp://arxiv.org/abs/2412.15249v2LitLLMs, LLMs for Literature Review: Are we there yet?2025-03-21T14:56:58ZLiterature reviews are an essential component of scientific research, but they remain time-intensive and challenging to write, especially due to the recent influx of research papers. This paper explores the zero-shot abilities of recent Large Language Models (LLMs) in assisting with the writing of literature reviews based on an abstract. We decompose the task into two components: 1. Retrieving related works given a query abstract, and 2. Writing a literature review based on the retrieved results. We analyze how effective LLMs are for both components. For retrieval, we introduce a novel two-step search strategy that first uses an LLM to extract meaningful keywords from the abstract of a paper and then retrieves potentially relevant papers by querying an external knowledge base. Additionally, we study a prompting-based re-ranking mechanism with attribution and show that re-ranking doubles the normalized recall compared to naive search methods, while providing insights into the LLM's decision-making process. In the generation phase, we propose a two-step approach that first outlines a plan for the review and then executes steps in the plan to generate the actual review. To evaluate different LLM-based literature review methods, we create test sets from arXiv papers using a protocol designed for rolling use with newly released LLMs to avoid test set contamination in zero-shot evaluations. We release this evaluation protocol to promote additional research and development in this regard. Our empirical results suggest that LLMs show promising potential for writing literature reviews when the task is decomposed into smaller components of retrieval and planning. Our project page including a demonstration system and toolkit can be accessed here: https://litllm.github.io.2024-12-15T01:12:26ZShubham AgarwalGaurav SahuAbhay PuriIssam H. LaradjiKrishnamurthy DJ DvijothamJason StanleyLaurent CharlinChristopher Palhttp://arxiv.org/abs/2503.17419v1National Academic Depository: A Step Towards Digital India Vision2025-03-21T05:40:35ZThe National Academic Depository of India is a distinctive, novel and progressive step visualized by Ministry of Human Resources Development, Govt. of India towards maintaining a database to hold the academic awards issued by Educational Institutions in an electronic and digital form. NAD promises to abolish the difficulties / inefficiencies of collecting, maintaining, and presenting physical paper certificates that can be easily copied / created and the verification processes which are costly, time consuming and disorganized. The depository can eradicate the need to store academic awards in physical form. It can verify the awards issued by different Institutions to the students in an easy way. The secure digital depository is a good proposal to do away with fake and forged certificates. The concept of academic depository is identical to the concept of financial securities. The pilot project is successfully completed with the help of Central Board of Secondary Education and some universities. In order to become fully functional, the depository has to conquer a few challenges with respect to academic diversities in terms of duration of courses and equivalence. National Academic Depository is a revolutionary effort towards the vision of Digital India.2025-03-21T05:40:35Z4 pages, Indian Journal of Scientific Research; 2017Indian J.Sci.Res. Vol. 13, No. 1, 2017, pp. 204-207Satinder Bal GuptaMonika Guptabhttp://arxiv.org/abs/2312.17560v3Uncertain research country rankings. Should we continue producing uncertain rankings?2025-03-20T14:41:55ZPurpose: Citation-based assessments of countries' research capabilities often misrepresent their ability to achieve breakthrough advancements. These assessments commonly classify Japan as a developing country, which contradicts its prominent scientific standing. The purpose of this study is to investigate the underlying causes of such inaccurate assessments and to propose methods for conducting more reliable evaluations. Design/methodology/approach: The study evaluates the effectiveness of top-percentile citation metrics as indicators of breakthrough research. Using case studies of selected countries and research topics, the study examines how deviations from lognormal citation distributions impact the accuracy of these percentile indicators. A similar analysis is conducted using university data from the Leiden Ranking to investigate citation distribution deviations at the institutional level. Findings: The study finds that inflated lower tails in citation distributions lead to undervaluation of research capabilities in advanced technological countries, as captured by some percentile indicators. Conversely, research-intensive universities exhibit the opposite trend: a reduced lower tail relative to the upper tail, which causes percentile indicators to overestimate their actual research capacity. Research limitations: The descriptions are mathematical facts that are self-evident. Practical implications: Due to variations in citation patterns across countries and institutions, the Ptop 10%/P and Ptop 1%/P ratios are not universal predictors of breakthrough research. Evaluations should move away from these metrics. Relying on inappropriate citation-based measures could lead to poor decision-making in research policy, undermining the effectiveness of research strategies and their outcomes.2023-12-29T11:07:37Z29 pages, 6 figures, 4 tablesJournal of Data and Information Science 2025Alonso Rodriguez-Navarro10.2478/jdis-2025-0030http://arxiv.org/abs/2506.15040v1Enhancing the prediction of publications' long-term impact using early citations, readerships, and non-scientific factors2025-03-20T09:50:02ZThis study aims to improve the accuracy of long-term citation impact prediction by integrating early citation counts, Mendeley readership, and various non-scientific factors, such as journal impact factor, authorship and reference list characteristics, funding and open-access status. Traditional citation-based models often fall short by relying solely on early citations, which may not capture broader indicators of a publication's potential influence. By incorporating non-scientific predictors, this model provides a more nuanced and comprehensive framework that outperforms existing models in predicting long-term impact. Using a dataset of Italian-authored publications from the Web of Science, regression models were developed to evaluate the impact of these predictors over time. Results indicate that early citations and Mendeley readership are significant predictors of long-term impact, with additional contributions from factors like authorship diversity and journal impact factor. The study finds that open-access status and funding have diminishing predictive power over time, suggesting their influence is primarily short-term. This model benefits various stakeholders, including funders and policymakers, by offering timely and more accurate assessments of emerging research. Future research could extend this model by incorporating broader altmetrics and expanding its application to other disciplines and regions. The study concludes that integrating non-citation-based factors with early citations captures a more complex view of scholarly impact, aligning better with real-world research influence.2025-03-20T09:50:02Z27 pages, 6 figures, 5 tablesGiovanni AbramoTindaro CiceroCiriaco Andrea D'Angelohttp://arxiv.org/abs/1906.10969v3The UN Security Council debates 1992-20232025-03-17T10:07:09ZThis paper presents an updated dataset containing 106,302 speeches held in the public meetings of the UN Security Council (UNSC) between 1992 and 2023. The dataset is based on publicly available meeting transcripts with the S/PV document symbol and includes the full substance of individual speeches as well as automatically extracted and manually corrected metadata on the speaker, the position of the speech in the sequence of speeches of a meeting, and the date of the speech. After contextualizing the dataset in recent research on the UNSC, the paper presents descriptive statistics on UNSC meetings and speeches that characterize the period covered by the dataset. Data highlight the extensive presence of the UN bureaucracy in UNSC meetings as well as an emerging trend towards more lengthy open UNSC debates. These open debates cover key issues that have emerged only during the period that is covered by the dataset, for example the debates relating to Women, Peace and Security or Climate-related Disasters. The corpus is available online: https://doi.org/10.7910/DVN/KGVSYH2019-06-26T10:57:34ZThe UN Security Council Debates corpus is available online at https://doi.org/10.7910/DVN/KGVSYHMirco SchoenfeldSteffen EckhardRonny PatzHilde van MeegdenburgAntonio Pires