Web Archives for Verifying Attribution in Twitter Screenshots

2025-10-27T02:50:35Z

Screenshots of social media posts are a common approach for information sharing. Unfortunately, before sharing a screenshot, users rarely verify whether the attribution of the post is fake or real. There are numerous legitimate reasons to share screenshots. However, sharing screenshots of social media posts is also a vector for mis-/disinformation spread on social media. We are exploring methods to verify the attribution of a social media post shown in a screenshot, using resources found on the live web and in web archives. We focus on the use of web archives, since the attribution of non-deleted posts can be relatively easily verified using the live web. We show how information from a Twitter screenshot (Twitter handle, timestamp, and tweet text) can be extracted and used for locating potential archived tweets in the Internet Archive's Wayback Machine. We evaluate our method on a dataset of 1,571 single tweet screenshots.

Fraudulent Publishing in Mathematics: A European Call to Action and How Information Infrastructure Can Help

2025-10-26T08:18:19Z

The IMU-ICIAM working group's new report on Fraudulent Publishing in the Mathematical Sciences documents how gaming of bibliometrics, predatory outlets and paper-mill activity are eroding trust in research, mathematics included. This short EMS note brings that analysis home to Europe. We urge readers to recognise the warning signs of fraudulent publishing, to report serious irregularities so that they can be investigated and sanctioned, and to reflect critically on their own editorial and reviewing practices. We then sketch why Europe is well placed to lead a structural response: a decade of policy development on open science; mature infrastructures for data, software and scholarly communication; and new capacity for community-led diamond open access. Finally, we outline developments towards non-print contributions across member countries including the growth of formal proofs (e.g. with Lean and Isabelle) and we highlight the role of zbMATH Open as a European quality signal that can help editors, reviewers and authors steer clear of problematic venues.

Measuring Research Interest Similarity with Transition Probabilities

2025-10-24T19:41:12Z

We introduce a family of paper and author similarity measures based on the concept that papers are more similar if they are more likely to be retrieved during a literature search following backward and forward citations. Since this browsing process resembles a walk in a citation network, we operationalize the concept using the transition probability (TP) of random walkers. The proposed measures are continuous, symmetric, and can be implemented on any citation network. We conduct validation tests of the TP concept and other extant alternatives to gauge which metric can classify papers and predict future co-authors most consistently across different scales of analysis (co-authorships, journals, and disciplines). Our results show that the proposed basic TP measure outperforms alternative metrics such as personalized PageRank and the Node2vec machine-learning technique in classification tasks at various scales. Additionally, we discuss how publication-level data can be leveraged to approximate the research interest similarity of individual scientists. This paper is accompanied by a Python package that implements all the tested metrics.

A Stylometric Application of Large Language Models

2025-10-24T18:35:17Z

We show that large language models (LLMs) can be used to distinguish the writings of different authors. Specifically, an individual GPT-2 model, trained from scratch on the works of one author, will predict held-out text from that author more accurately than held-out text from other authors. We suggest that, in this way, a model trained on one author's works embodies the unique writing style of that author. We first demonstrate our approach on books written by eight different (known) authors. We also use this approach to confirm R. P. Thompson's authorship of the well-studied 15th book of the Oz series, originally attributed to F. L. Baum.

HIKMA: Human-Inspired Knowledge by Machine Agents through a Multi-Agent Framework for Semi-Autonomous Scientific Conferences

2025-10-24T11:52:24Z

HIKMA Semi-Autonomous Conference is the first experiment in reimagining scholarly communication through an end-to-end integration of artificial intelligence into the academic publishing and presentation pipeline. This paper presents the design, implementation, and evaluation of the HIKMA framework, which includes AI dataset curation, AI-based manuscript generation, AI-assisted peer review, AI-driven revision, AI conference presentation, and AI archival dissemination. By combining language models, structured research workflows, and domain safeguards, HIKMA shows how AI can support - not replace traditional scholarly practices while maintaining intellectual property protection, transparency, and integrity. The conference functions as a testbed and proof of concept, providing insights into the opportunities and challenges of AI-enabled scholarship. It also examines questions about AI authorship, accountability, and the role of human-AI collaboration in research.

The Shift Towards Preprints in AI Policy Research: A Comparative Study of Preprint Trends in the U.S., Europe, and South Korea

2025-10-21T17:17:02Z

The adoption of open science has quickly changed how artificial intelligence (AI) policy research is distributed globally. This study examines the regional trends in the citation of preprints, specifically focusing on the impact of two major disruptive events: the COVID-19 pandemic and the release of ChatGPT, on research dissemination patterns in the United States, Europe, and South Korea from 2015 to 2024. Using bibliometrics data from the Web of Science, this study tracks how global disruptive events influenced the adoption of preprints in AI policy research and how such shifts vary by region. By marking the timing of these disruptive events, the analysis reveals that while all regions experienced growth in preprint citations, the magnitude and trajectory of change varied significantly. The United States exhibited sharp, event-driven increases; Europe demonstrated institutional growth; and South Korea maintained consistent, linear growth in preprint adoption. These findings suggest that global disruptions may have accelerated preprint adoption, but the extent and trajectory are shaped by local research cultures, policy environments, and levels of open science maturity. This paper emphasizes the need for future AI governance strategies to consider regional variability in research dissemination and highlights opportunities for further longitudinal and comparative research to deepen our understanding of open-access adoption in AI policy development.

ORDENA: ORigin-DEstiNAtion data exploration

2025-10-21T04:07:09Z

Analyzing origin-destination flows is an important problem that has been extensively investigated in several scientific fields, particularly by the visualization community. The problem becomes especially challenging when involving massive data, demanding mechanisms such as data aggregation and interactive filtering to make the exploratory process doable. However, data aggregation tends to smooth out certain patterns, and deciding which data should be filtered is not straightforward. In this work, we propose ORDENA, a visual analytic tool to explore origin and destination data. ORDENA is built upon a simple and intuitive scatter plot where the horizontal and vertical axes correspond to origins and destinations. Therefore, each origin-destination flow is represented as a point in the scatter plot. How the points are organized in the plot layout reveals important spatial phenomena present in the data. Moreover, ORDENA provides explainability resources that allow users to better understand the relation between origin-destination flows and associated attributes. We illustrate ORDENA's effectiveness in a set of case studies, which have also been elaborated in collaboration with domain experts. The proposed tool has also been evaluated by domain experts not involved in its development, which provided quite positive feedback about ORDENA.

Data-Driven Framework Development for Public Space Quality Assessment

2025-10-20T14:39:15Z

Public space quality assessment lacks systematic methodologies that integrate factors across diverse spatial typologies while maintaining context-specific relevance. Current approaches remain fragmented within disciplinary boundaries, limiting comprehensive evaluation and comparative analysis across different space types. This study develops a systematic, data-driven framework for assessing public space quality through the algorithmic integration of empirical research findings. Using a 7-phase methodology, we transform 1,207 quality factors extracted from 157 peer-reviewed studies into a validated hierarchical taxonomy spanning six public space typologies: urban spaces, open spaces, green spaces, parks and waterfronts, streets and squares, and public facilities. The methodology combines semantic analysis, cross-typology distribution analysis, and domain knowledge integration to address terminological variations and functional relationships across space types. The resulting framework organizes 1,029 unique quality factors across 14 main categories and 66 subcategories, identifying 278 universal factors applicable across all space types, 397 space-specific factors unique to particular typologies, and 124 cross-cutting factors serving multiple functions. Framework validation demonstrates systematic consistency in factor organization and theoretical alignment with established research on public spaces. This research provides a systematic methodology for transforming empirical public space research into practical assessment frameworks, supporting evidence-based policy development, design quality evaluation, and comparative analysis across diverse urban contexts.

Evolving interdisciplinary contributions to global societal challenges: A 50-year overview

2025-10-19T13:04:35Z

Addressing global societal challenges necessitates insights and expertise that transcend the boundaries of individual disciplines. In recent decades, interdisciplinary collaboration has been recognised as a vital driver of innovation and effective problem-solving, with the potential to profoundly influence policy and practice worldwide. However, quantitative evidence remains limited regarding how cross-disciplinary efforts contribute to societal challenges, as well as the evolving roles and relevance of specific disciplines in addressing these issues. To fill this gap, this study examines the long-term evolution of interdisciplinary contributions to the United Nations' Sustainable Development Goals (SDGs), drawing on extensive bibliometric data from OpenAlex. By analysing publication and citation trends across 19 research fields from 1970 to 2022, we reveal how the relative presence of different disciplines in addressing particular SDGs has shifted over time. Our results also provide unique evidence of the increasing interconnection between fields since the 2000s, coinciding with the United Nations' initiative to tackle global societal challenges through interdisciplinary efforts. These insights will benefit policymakers and practitioners as they reflect on past progress and plan for future action, particularly with the SDG target deadline approaching in the next five years.

Shifting 'AI Policy' Preprints and Citation Trends in the U.S., U.K and E.U., and South Korea (2015-2024)

2025-10-18T12:54:53Z

This study of literature focusing on 'AI Policy' over the past decade, found that citations of preprints, publications on platforms such as arXiv, have increased from five percent to forty percent across three major regions: the U.S., U.K. & E.U., and South Korea. We compare regional responses of preprint citations across the global disruptions of COVID-19 and the release of ChatGPT. We discuss driving factors and risks of preprint normalization, which follows the trend in computer science.

Institute Disambiguation using Author-Institution Co-Occurrence

2025-10-18T08:43:41Z

In this article we propose a novel method to perform unsupervised clustering of different forms of Institute names. We use only author and affiliation metadata to perform the clustering without any string or pattern matching. After analysing only 50000 articles from Crossref database, we see encouraging results which can be scaled up to provide even better results. We compare our clustering with what a well-known method using string matching does and found that the results were complementary. This can help perform institute disambiguation better when integrated with existing systems, especially to provide aliases for cases where traditional string matching fails. The code of this open-source methodology can be found at: https://github.com/Jeet009/Institute-Disambiguation-using-Author-Institution-Co-Occurrence

Does GenAI Rewrite How We Write? An Empirical Study on Two-Million Preprints

2025-10-18T01:37:40Z

Preprint repositories become central infrastructures for scholarly communication. Their expansion transforms how research is circulated and evaluated before journal publication. Generative large language models (LLMs) introduce a further potential disruption by altering how manuscripts are written. While speculation abounds, systematic evidence of whether and how LLMs reshape scientific publishing remains limited. This paper addresses the gap through a large-scale analysis of more than 2.1 million preprints spanning 2016--2025 (115 months) across four major repositories (i.e., arXiv, bioRxiv, medRxiv, SocArXiv). We introduce a multi-level analytical framework that integrates interrupted time-series models, collaboration and productivity metrics, linguistic profiling, and topic modeling to assess changes in volume, authorship, style, and disciplinary orientation. Our findings reveal that LLMs have accelerated submission and revision cycles, modestly increased linguistic complexity, and disproportionately expanded AI-related topics, while computationally intensive fields benefit more than others. These results show that LLMs act less as universal disruptors than as selective catalysts, amplifying existing strengths and widening disciplinary divides. By documenting these dynamics, the paper provides the first empirical foundation for evaluating the influence of generative AI on academic publishing and highlights the need for governance frameworks that preserve trust, fairness, and accountability in an AI-enabled research ecosystem.

AI-Powered Citation Auditing: A Zero-Assumption Protocol for Systematic Reference Verification in Academic Research

2025-10-17T16:53:03Z

Academic citation integrity faces persistent challenges, with research indicating 20% of citations contain errors and manual verification requiring months of expert time. This paper presents a novel AI-powered methodology for systematic, comprehensive reference auditing using agentic AI with tool-use capabilities. We develop a zero-assumption verification protocol that independently validates every reference against multiple academic databases (Semantic Scholar, Google Scholar, CrossRef) without assuming any citation is correct. The methodology was validated across 30 academic documents (2,581 references) spanning undergraduate projects to doctoral theses and peer-reviewed publications. Results demonstrate 91.7% average verification rate on published PLOS papers, with successful detection of fabricated references, retracted articles, orphan citations, and predatory journals. Time efficiency improved dramatically: 90-minute audits for 916-reference doctoral theses versus months of manual review. The system achieved <0.5% false positive rate while identifying critical issues manual review might miss. This work establishes the first validated AI-agent methodology for academic citation integrity, demonstrating practical applicability for supervisors, students, and institutional quality assurance.

Foundation Models for Scientific Discovery: From Paradigm Enhancement to Paradigm Transition

2025-10-17T03:40:26Z

Foundation models (FMs), such as GPT-4 and AlphaFold, are reshaping the landscape of scientific research. Beyond accelerating tasks such as hypothesis generation, experimental design, and result interpretation, they prompt a more fundamental question: Are FMs merely enhancing existing scientific methodologies, or are they redefining the way science is conducted? In this paper, we argue that FMs are catalyzing a transition toward a new scientific paradigm. We introduce a three-stage framework to describe this evolution: (1) Meta-Scientific Integration, where FMs enhance workflows within traditional paradigms; (2) Hybrid Human-AI Co-Creation, where FMs become active collaborators in problem formulation, reasoning, and discovery; and (3) Autonomous Scientific Discovery, where FMs operate as independent agents capable of generating new scientific knowledge with minimal human intervention. Through this lens, we review current applications and emerging capabilities of FMs across existing scientific paradigms. We further identify risks and future directions for FM-enabled scientific discovery. This position paper aims to support the scientific community in understanding the transformative role of FMs and to foster reflection on the future of scientific discovery. Our project is available at https://github.com/usail-hkust/Awesome-Foundation-Models-for-Scientific-Discovery.

Situated Epistemic Infrastructures: A Diagnostic Framework for Post-Coherence Knowledge

2025-10-17T00:12:29Z

Large Language Models (LLMs) such as ChatGPT have rendered visible the fragility of contemporary knowledge infrastructures by simulating coherence while bypassing traditional modes of citation, authority, and validation. This paper introduces the Situated Epistemic Infrastructures (SEI) framework as a diagnostic tool for analyzing how knowledge becomes authoritative across hybrid human-machine systems under post-coherence conditions. Rather than relying on stable scholarly domains or bounded communities of practice, SEI traces how credibility is mediated across institutional, computational, and temporal arrangements. Integrating insights from infrastructure studies, platform theory, and epistemology, the framework foregrounds coordination over classification, emphasizing the need for anticipatory and adaptive models of epistemic stewardship. The paper contributes to debates on AI governance, knowledge production, and the ethical design of information systems by offering a robust alternative to representationalist models of scholarly communication.