https://arxiv.org/api/g4UpwHSlpEubGGUHTQCb+jd17Dc2026-06-14T06:49:34Z606561515http://arxiv.org/abs/2402.12928v6A Literature Review of Literature Reviews in Pattern Analysis and Machine Intelligence2025-09-06T03:51:53ZThe rapid growth of research in Pattern Analysis and Machine Intelligence (PAMI) has rendered literature reviews essential for consolidating and interpreting knowledge across its many subfields. In this work, we present a comprehensive tertiary analysis of PAMI reviews along three complementary dimensions: (i) identifying structural and statistical regularities in existing surveys; (ii) developing quantitative strategies that help researchers navigate and prioritize within the expanding review corpus; and (iii) critically assessing emerging AI-generated review systems. To support this study, we construct RiPAMI, a large-scale database containing more than 3,000 review articles, and combine narrative synthesis with statistical analysis to capture structural and content-level features. Our analyses reveal distinctive organizational patterns as well as persistent gaps in current review practices. Building on these insights, we propose practical, article-level strategies for indicator-guided navigation that move beyond simple citation counts. Finally, our evaluation of state-of-the-art AI-generated reviews indicates encouraging advances in coherence and organization, yet also highlights enduring weaknesses in reference retrieval, coverage of recent work, and the incorporation of visual elements. Together, these findings provide both a critical appraisal of existing review practices and a forward-looking perspective on how AI-generated reviews can evolve into trustworthy, customizable, and transformative complements to traditional human-authored surveys.2024-02-20T11:28:50ZV2, V3, and V4 with incremental quality improvements. V5, V6 introduce major updatesPenghai ZhaoXin ZhangJiayue CaoMing-Ming ChengJian YangXiang Lihttp://arxiv.org/abs/2508.13182v2Using Artificial Intuition in Distinct, Minimalist Classification of Scientific Abstracts for Management of Technology Portfolios2025-09-05T23:50:47ZClassification of scientific abstracts is useful for strategic activities but challenging to automate because the sparse text provides few contextual clues. Metadata associated with the scientific publication can be used to improve performance but still often requires a semi-supervised setting. Moreover, such schemes may generate labels that lack distinction -- namely, they overlap and thus do not uniquely define the abstract. In contrast, experts label and sort these texts with ease. Here we describe an application of a process we call artificial intuition to replicate the expert's approach, using a Large Language Model (LLM) to generate metadata. We use publicly available abstracts from the United States National Science Foundation to create a set of labels, and then we test this on a set of abstracts from the Chinese National Natural Science Foundation to examine funding trends. We demonstrate the feasibility of this method for research portfolio management, technology scouting, and other strategic activities.2025-08-13T22:32:39ZPrateek RankaFred MorstatterAlexandra Graddy-ReedAndrea Belzhttp://arxiv.org/abs/2509.04759v1Toward Robust URL Extraction for Open Science: A Study of arXiv File Formats and Temporal Trends2025-09-05T02:38:12ZIn this work, we study how URL extraction results depend on input format. We compiled a pilot dataset by extracting URLs from 10 arXiv papers and used the same heuristic method to extract URLs from four formats derived from the PDF files or the source LaTeX files. We found that accurate and complete URL extraction from any single format or a combination of multiple formats is challenging, with the best F1-score of 0.71. Using the pilot dataset, we evaluate extraction performance across formats and show that structured formats like HTML and XML produce more accurate results than PDFs or Text. Combining multiple formats improves coverage, especially when targeting research-critical resources. We further apply URL extraction on two tasks, namely classifying URLs into open-access datasets and software and the others, and analyzing the trend of URLs usage in arXiv papers from 1992 to 2024. These results suggest that using a combination of multiple formats achieves better performance on URL extraction than a single format, and the number of URLs in arXiv papers has been steadily increasing since 1992 to 2014 and has been drastically increasing from 2014 to 2024. The dataset and the Jupyter notebooks used for the preliminary analysis are publicly available at https://github.com/lamps-lab/arxiv-urls2025-09-05T02:38:12ZPeer reviewed and accepted at WADL 2025, 8 pages, 4 figuresRochana R. ObadageLamia SalsabilSawood AlamBipasha BanarjeeWilliam A. IngramEdward A. FoxJian Wuhttp://arxiv.org/abs/2509.04190v1The changing role of cited papers over time: An analysis of highly cited papers based on a large full-text dataset2025-09-04T13:13:25ZThis paper examines how the role of cited papers evolves over time by analyzing nearly 900 highly cited papers (HCPs) published between 2000 and 2016 and the full text of over 220,000 papers citing them. We investigate multiple citation characteristics, including citation location within the full text, reference and in-text citation types, citation sentiment, and textual and bibliographic relatedness between citing and cited papers. Our findings reveal that as HCPs age, they tend to be cited earlier in papers citing them, mentioned fewer times in the full text, and more often cited alongside other references. Citation sentiment remains predominantly neutral, while both textual and bibliographic similarity between HCPs and their citing papers decline over time. These patterns indicate a shift from direct topical and methodological engagement toward more general, background, and symbolic referencing. The findings highlight the importance to consider citation context rather than relying solely on simple citation counts. Large-scale full-text analyses such as ours can help refine measures of scientific impact and advance scholarly search and science mapping by uncovering more nuanced connections between papers.2025-09-04T13:13:25ZGege LinNees Jan van EckHaiyan HouZhigang Huhttp://arxiv.org/abs/2509.04124v1Authorship-contribution normalized Sh-index and citations are better research output indicators2025-09-04T11:37:31ZBibliometric measures, such as total citations and h-index, have become a cornerstone for evaluating academic performance; however, these traditional metrics, being non-weighted, inadequately capture the nuances of individual contributions. To address this constraint, we developed GScholarLens, an open-access browser extension that integrates seamlessly with Google Scholar to enable detailed bibliometric analysis. GScholarLens categorizes publications by authorship roles, adjusts citation weightings accordingly, and introduces Scholar h-index, Sh-index, an authorship-contribution normalized h-index. This tool proportionally weights citations based on authorship position using heuristic percentages, i.e., corresponding 100 percent, first 90 percent, second 50 percent, co-authors in publications with less than six authors 25 percent, and co-authors with more than six authors 10 percent. Currently, there is no empirical data available for author-contribution weights, however, this proof-of-concept framework can easily adapt more precise author-contribution weightage data decided by authors at the time of manuscript submission along with CRediT, which journals and publishers can mandate. Furthermore, this tool incorporates retraction detection by mapping data from retraction databases into the Google Scholar interface. By aligning bibliometric evaluation more closely with actual scholarly contribution, GScholarLens presents a better open-access framework for academic recognition, particularly within interdisciplinary and highly collaborative research environments. This tool is freely accessible at https://project.iith.ac.in/sharmaglab/gscholarlens/.2025-09-04T11:37:31Z7 pages, 1 FigureVishvesh KarthikIndupalli Sishir AnandUtkarsha MahantaGaurav Sharmahttp://arxiv.org/abs/2508.20117v2Is Artificial Intelligence Reshaping the Landscape of the International Academic Community of Geosciences?2025-09-04T06:26:17ZThrough bibliometric analysis and topic modeling, we find that artificial intelligence (AI) is positively transforming geosciences research, with a notable increase in AI-related scientific output in recent years. We are encouraged to observe that earth scientists from developing countries have gained better visibility in the recent AI for Science (AI4S) paradigm and that AI is also improving the landscape of international collaboration in geoscience-related research.2025-08-21T11:17:24Zmiscommunication in the authorization process from the first authorLiang LiYuntian LiWenxin ZhaoShan YeYun Luhttp://arxiv.org/abs/2509.03391v1More Parameters Than Populations: A Systematic Literature Review of Large Language Models within Survey Research2025-09-03T15:15:31ZSurvey research has a long-standing history of being a human-powered field, but one that embraces various technologies for the collection, processing, and analysis of various behavioral, political, and social outcomes of interest, among others. At the same time, Large Language Models (LLMs) bring new technological challenges and prerequisites in order to fully harness their potential. In this paper, we report work-in-progress on a systematic literature review based on keyword searches from multiple large-scale databases as well as citation networks that assesses how LLMs are currently being applied within the survey research process. We synthesize and organize our findings according to the survey research process to include examples of LLM usage across three broad phases: pre-data collection, data collection, and post-data collection. We discuss selected examples of potential use cases for LLMs as well as its pitfalls based on examples from existing literature. Considering survey research has rich experience and history regarding data quality, we discuss some opportunities and describe future outlooks for survey research to contribute to the continued development and refinement of LLMs.2025-09-03T15:15:31ZTrent D. BuskirkFlorian KeuschLeah von der HeydeAdam Eckhttp://arxiv.org/abs/2409.01120v3Coverage and metadata completeness and accuracy of African research publications in OpenAlex: A comparative analysis2025-09-03T14:48:40ZUnlike traditional proprietary data sources such as Scopus and the Web of Science (WoS), OpenAlex emphasizes its comprehensiveness. This study analyzes OpenAlex coverage and metadata completeness and accuracy of African research publications. To achieve this, OpenAlex is compared with Scopus, WoS, and African Journals Online (AJOL). First, we examine the coverage of African research publications in OpenAlex relative to Scopus, WoS, and AJOL. Then, we assess and compare the availability and accuracy of metadata in OpenAlex, Scopus, and WoS. The findings indicate that OpenAlex offers the most extensive publication coverage. In terms of metadata, OpenAlex provides high coverage for publication and author information, though its coverage of affiliations, references, and funder information is comparatively lower. Metadata accuracy is similarly high for publication and author fields, while affiliation, reference, and funding information show higher rates of missing or incomplete data. Notably, the results demonstrate that both metadata availability and accuracy in OpenAlex improve significantly for publications also indexed in Scopus and WoS. These findings suggest that OpenAlex has the potential to replace proprietary data sources for certain types of analyses. However, for some metadata fields, there remains a trade-off between extensiveness and accuracy.2024-09-02T09:56:55ZPatricia Alonso-AlvarezNees Jan van Eckhttp://arxiv.org/abs/2509.01530v2Asymmetric Impact of Basic Scientists during Applied Shift2025-09-03T08:04:33ZDespite broad acclaim for basic research, science is undergoing an applied shift that marginalizes basic scientists. This gap reflects an incomplete understanding of their distinctive roles, which prevents translating philosophical appreciation into effective support. We introduce a scalable metric--the application score--to position research along the basic-applied spectrum and apply it to 62 million publications (1970-2023) to reveal the distinctive contributions of basic scientists. We find a structural asymmetry: involvement of basic scientists substantially increases citation impact, even more so in applied contexts, while applied scientists show no such effect in basic domains. This asymmetric effect arises from their intellectual leadership in conceptualization, writing, and experimental design, amplified in large, multidisciplinary, and intermediate career teams. Yet basic scientists remain concentrated in historically prestigious institutions, while new entrants shift toward applied work, indicating critical undersupply. These findings provide large-scale evidence for the indispensable role of basic scientists, guiding policy and institutional strategy to sustain the foundations of discovery and innovation.2025-09-01T15:05:41ZRikuei KakuMikako BitoKeita NishimotoIchiro SakataKimitaka Asatanihttp://arxiv.org/abs/2509.02356v1A World in Print: Introducing a Danish-Norwegian corpus of historical newspapers2025-09-02T14:19:10ZThis Data Descriptor introduces the dataset Enevaeldens Nyheder Online (News during Absolutism Online). The Enevaeldens Nyheder Online (ENO) dataset provides a reconstruction of the contents of major newspapers in Denmark and Norway during the period of Absolutism (1660-1849). The dataset contains approx. 474 million words, created using neural networks designed to process digitised microfilm versions of Danish newspapers as well as a smaller selection of Norwegian publications that were all hitherto illegible for computers. The contributions details this process and its results, including a way to derive standalone texts from the editions, and the accompanying BERT-model trained on a beta-version of the dataset.2025-09-02T14:19:10ZJohan HeinsenCamilla Bøgeskovhttp://arxiv.org/abs/2509.01304v1Animer une base de connaissance: des ontologies aux mod{è}les d'I.A. g{é}n{é}rative2025-09-01T09:40:55ZIn a context where the social sciences and humanities are experimenting with non-anthropocentric analytical frames, this article proposes a semiotic (structural) reading of the hybridization between symbolic AI and neural (or sub-symbolic) AI based on a field of application: the design and use of a knowledge base for area studies. We describe the LaCAS ecosystem -- Open Archives in Linguistic and Cultural Studies (thesaurus; RDF/OWL ontology; LOD services; harvesting; expertise; publication), deployed at Inalco (National Institute for Oriental Languages and Civilizations) in Paris with the Okapi (Open Knowledge and Annotation Interface) software environment from Ina (National Audiovisual Institute), which now has around 160,000 documentary resources and ten knowledge macro-domains grouping together several thousand knowledge objects. We illustrate this approach using the knowledge domain ''Languages of the world'' (~540 languages) and the knowledge object ''Quechua (language)''. On this basis, we discuss the controlled integration of neural tools, more specifically generative tools, into the life cycle of a knowledge base: assistance with data localization/qualification, index extraction and aggregation, property suggestion and testing, dynamic file generation, and engineering of contextualized prompts (generic, contextual, explanatory, adjustment, procedural) aligned with a domain ontology. We outline an ecosystem of specialized agents capable of animating the database while respecting its symbolic constraints, by articulating model-driven and data-driven methods.2025-09-01T09:40:55Zin French languagePeter StockingerESCOM, PLIDAM, Inalco, CIShttp://arxiv.org/abs/2409.12158v2Publishing Instincts: An Exploration-Exploitation Framework for Studying Academic Publishing Behavior and "Home Venues"2025-08-30T11:41:07ZScholarly communication is vital to scientific advancement, enabling the exchange of ideas and knowledge. When selecting publication venues, scholars consider various factors, such as journal relevance, reputation, outreach, and editorial standards and practices. However, some of these factors are inconspicuous or inconsistent across venues and individual publications. This study proposes that scholars' decision-making process can be conceptualized and explored through the biologically inspired exploration-exploitation (EE) framework, which posits that scholars balance between familiar and under-explored publication venues. Building on the EE framework, we introduce a grounded definition for "Home Venues" (HVs) - an informal concept used to describe the set of venues where a scholar consistently publishes - and investigate their emergence and key characteristics. Our analysis reveals that the publication patterns of roughly three-quarters of computer science scholars align with the expectations of the EE framework. For these scholars, HVs typically emerge and stabilize after approximately 15-20 publications. Additionally, scholars with higher h-indexes, greater number of publications, or higher academic age tend to have higher-ranking journals as their HVs.2024-09-18T17:18:48ZTeddy LazebnikShir Aviv-ReuvenAriel Rosenfeld10.1016/j.joi.2025.101705http://arxiv.org/abs/2303.00386v3Authorship conflicts in academia: an international cross-discipline survey2025-08-30T11:19:32ZCollaboration among scholars has emerged as a significant characteristic of contemporary science. As a result, the number of authors listed in publications continues to rise steadily. Unfortunately, determining the authors to be included in the byline and their respective order entails multiple difficulties which often lead to conflicts. Despite the large volume of literature about conflicts in academia, it remains unclear how exactly these are distributed over the main socio-demographic properties, as well as the different types of interactions academics experience. To address this gap, we conducted an international and cross-disciplinary survey answered by 752 academics from 41 fields of research and 93 countries that statistically well-represent the overall academic workforce. Our findings are concerning and suggest that conflicts over authorship credit arise very early in one's academic career, even at the level of Master and Ph.D., and become increasingly common over time.2023-03-01T10:11:50ZElizaveta SavchenkoAriel Rosenfeld10.1007/s11192-024-04972-xhttp://arxiv.org/abs/2508.19876v1The IRMA Dataset: A Structured Audio-MIDI Corpus for Iranian Classical Music2025-08-27T13:36:22ZWe present the IRMA Dataset (Iranian Radif MIDI Audio), a multi-level, open-access corpus designed for the computational study of Iranian classical music, with a particular emphasis on the radif, a structured repertoire of modal-melodic units central to pedagogy and performance. The dataset combines symbolic MIDI representations, phrase-level audio-MIDI alignment, musicological transcriptions in PDF format, and comparative tables of theoretical information curated from a range of performers and scholars. We outline the multi-phase construction process, including segment annotation, alignment methods, and a structured system of identifier codes to reference individual musical units. The current release includes the complete radif of Karimi; MIDI files and metadata from Mirza Abdollah's radif; selected segments from the vocal radif of Davami, as transcribed by Payvar and Fereyduni; and a dedicated section featuring audio-MIDI examples of tahrir ornamentation performed by prominent 20th-century vocalists. While the symbolic and analytical components are released under an open-access license (CC BY-NC 4.0), some referenced audio recordings and third-party transcriptions are cited using discographic information to enable users to locate the original materials independently, pending copyright permission. Serving both as a scholarly archive and a resource for computational analysis, this dataset supports applications in ethnomusicology, pedagogy, symbolic audio research, cultural heritage preservation, and AI-driven tasks such as automatic transcription and music generation. We welcome collaboration and feedback to support its ongoing refinement and broader integration into musicological and machine learning workflows.2025-08-27T13:36:22ZSepideh ShafieiShapour Hakam10.1145/3748336.3748341http://arxiv.org/abs/2508.21092v1Artificial Intelligence in Management Studies (2021-2025): A Bibliometric Mapping of Themes, Trends, and Global Contributions2025-08-27T12:53:18ZAI has become one of the most influential research areas over the past decade, with growing applications across multiple disciplines. In management studies, artificial intelligence is increasingly recognized as a driver of innovation, sustainability, and decision-making support. This bibliometric study examines the evolution of AI-related research in management between 2021 and 2025. Data were collected from the Scopus database and analyzed using the Bibliometrix R package, with visualizations generated through VOSviewer. The dataset consisted of 5,624 documents filtered by subject area, document type, and language. The analysis included annual scientific production, country and institutional contributions, leading journals, co-authorship networks, keyword co-occurrence, and thematic mapping. Results reveal a strong increase in publications from 2021 to 2024, followed by a decline in 2025. China, India, and the United States lead in publication output, while the United Kingdom shows higher citation impact. Thematic analysis indicates a shift from technical applications of AI to broader concerns such as sustainability, digital transformation, and decision-making processes. These findings highlight a changing landscape in AI research within management, where technological innovation, social responsibility, and organizational performance converge to shape future directions.2025-08-27T12:53:18ZYassine SekakiAbdelhafid KhazzarHamza Ziane