The independence paradox in scientific careers

2026-03-26T01:44:56Z

Establishing an independent academic identity is a central yet insufficiently understood challenge for early-career researchers. However, limited resources and mentor-driven research agendas often constrain early efforts toward autonomy. To provide large-scale quantitative evidence on how junior researchers develop independence, we introduce a framework that traces how mentees diverge from their mentors in both research topics and collaboration networks, and how these divergences relate to long-term scientific impact. Analyzing over 500,000 mentee-mentor pairs in Chemistry, Neuroscience, and Physics across six decades, we find that high-impact scientists often initiate work in secondary areas of their mentors' expertise while adaptively establishing distinct research trajectories. This pattern is most pronounced among mentees who eventually surpass their mentors' impact. We identify an inverted U-shaped relationship between topic divergence and mentees' enduring impact, with moderate divergence yielding the highest scientific impact, revealing an independence paradox in scientific careers. This pattern holds whether topic divergence is measured by citation network or semantic thematic distance. We further reveal that excessive direct mentor-mentee collaborations correlate with lower mentee impact, whereas expanding professional networks to include mentors' collaborators is beneficial. These findings not only offer actionable guidance for early-career researchers navigating independence but also inform institutional policies that promote mentorship structures supporting intellectual innovation and recognizing original contributions in promotion evaluations.

Where Do Your Citations Come From? Citation-Constellation: A Free, Open-Source, No-Code, and Auditable Tool for Citation Network Decomposition with Complementary BARON and HEROCON Scores

2026-03-25T11:44:47Z

Standard citation metrics treat all citations as equal, obscuring the social and structural pathways through which scholarly influence propagates. I introduce Citation-Constellation, a freely available no-code tool for citation network analysis with two complementary bibliometric scores that decompose a researcher's citation profile by network proximity between citing and cited authors. BARON (Boundary-Anchored Research Outreach Network score) is a strict binary metric counting only citations from outside the detected collaborative network. HEROCON (Holistic Equilibrated Research Outreach CONstellation score) applies graduated weights assigning partial credit to in-group citations based on relationship proximity. The gap between scores serves as a diagnostic of inner-circle dependence. An extended abstract with full details appears in the paper. The tool implements this through a phased architecture: (1) self-citation analysis, (2) co-authorship graph traversal, (3) temporal institutional affiliation matching via ROR, and (4) AI-agent-driven venue governance extraction using a local LLM. Phases 1-3 are fully operational; Phase 4 is under development. Key design choices include ORCID-validated author identity resolution, an UNKNOWN classification for citations with insufficient metadata, and comprehensive audit trails documenting every classification decision. A no-code web interface enables researchers to compute scores without programming, installation, or registration. I present these scores as structural diagnostics, not quality indicators. BARON and HEROCON describe where in the social graph citations originate. They should not be used for hiring, promotion, or funding decisions. HEROCON weights are experimental and require empirical calibration.

Linking Global Science Funding to Research Publications

2026-03-25T10:12:55Z

Funding acknowledgments in scholarly publications provide large-scale trace data on organizations that support scientific research. We present a dataset for linking global science funding organizations to research publications by systematically disambiguating unique funding acknowledgment strings extracted from publication metadata. Funder names are matched to standardized organizational identifiers using a multi-stage pipeline that combines lexical normalization, similarity-based clustering, rule-based matching, named entity recognition assistance, and manual validation. The resulting dataset links 1.9 million unique funder strings to canonical organization identifiers and records match types and unresolved cases to support transparency. Technical validation includes paper-level comparisons across bibliometric sources and manual verification against full-text acknowledgment sections, with reported recall and precision metrics. This dataset supports analyses of funding flows, institutional funding portfolios, regional representation, and concentration patterns in the global research system.

The Costs of Early-career Disciplinary Pivots: Evidence from Ph.D. Admissions

2026-03-25T03:21:15Z

Scientific innovation often comes from researchers who pivot across disciplines. However, prior work found that established researchers face productivity penalties when pivoting. Here, we investigate the consequences of pivoting at the beginning of a research career -- doctoral admissions -- when the benefits of importing new ideas might outweigh the switching costs. Using applications to all PhD programs at a large research-intensive university between 2013-2023, we find that pivoters (those applying to programs outside their prior disciplinary training) have lower GPAs and standardized test scores than non-pivoters. Yet even conditional on these predictors of admission, pivoters are 1.3 percentage points less likely to be admitted. Examining applicants who applied to multiple programs in the same admissions cycle provides suggestive evidence that the admissions pivot penalty is causal. This penalty is significantly smaller for applicants who secure a recommendation from someone within the target discipline. Among those admitted and enrolled, pivoters are 12.9 percentage points less likely to graduate and do not show superior publication performance on average or at the tail. Our results reveal the substantial costs of disciplinary pivoting even at the outset of research careers, which constrain the flow of new ideas into research communities.

Systemic Gendered Citation Imbalance in Computer Science: Evidence from Conferences and Journals

2026-03-24T14:37:34Z

Gender imbalance persists across science, technology, engineering, and mathematics (STEM) fields, including computer science, where it appears in researcher demographics, productivity, recognition, hiring, and career progression. Given computer science's rapid expansion and global influence, addressing this imbalance is essential for broadening participation and fueling innovation. Although journal-oriented disciplines exhibit consistent gender imbalances in citation practices, it remains unclear whether similar patterns arise in the conference-centric culture of computer science. Here, we systematically investigate gender imbalance in citations of conference and journal papers in computer science. We find that papers for which a woman is listed as either first or last author receive fewer citations than expected, partly because of homophilic citation tendencies (i.e., authors tend to cite papers that share specific attributes). This imbalance is especially pronounced for conference papers--particularly those published at top-tier venues--relative to journals. Moreover, we find that the prominence of the first or last author and the structure of their local co-authorship networks are potential drivers of these imbalances. By exploring how conference-centric publishing practices can amplify systemic imbalances in computer science, our study offers insights that may inform efforts to foster more equitable representation in academia.

Trends in Equal-Contribution Authorship: A Large-Scale Bibliometric Analysis of Biomedical Literature

2026-03-24T09:33:54Z

Equal-contribution authorship, in which two or more authors are designated as having contributed equally, is increasingly common in scientific publishing. Using approximately 480,000 tagged records from PubMed and PMC (2010-2024), we examine temporal trends, journal-level patterns, geographic distributions, and byline positions of equal-contributing authors. Results show a sharp rise after 2017, with both high-output mega-journals and smaller, discipline-specific journals contributing to the growth. Journal-level analysis indicates a median increase in the share of tagged articles from about 19% in 2015 to over 30% in 2024, with some journals exceeding 50%. Geographically, China accounts for the largest share (40.8% of fractionalized contributions), followed by the United States (15.2%) and Germany (5.2%). Normalizing to 2015 baselines, China shows a 13.1x; increase by 2024, while even the slowest-growing countries more than tripled their levels. Analysis of normalized byline positions shows that equal-contribution designations are concentrated near the first-author position, with fewer cases in middle or last positions. These findings document a broad shift toward shared first-author credit across journal sizes and regions within the biomedical literature and suggest that journals and evaluators may need to rely more on transparent contributorship information and to monitor the use of such labels over time.

Do Large Language Models Reduce Research Novelty? Evidence from Information Systems Journals

2026-03-23T19:16:54Z

Large language models such as ChatGPT have increased scholarly output, but whether this productivity boost produces genuine intellectual advancement remains untested. I address this gap by measuring the semantic novelty of 13,847 articles published between 2020 and 2025 in 44 Information Systems journals. Using SPECTER2 embeddings, I operationalize novelty as the cosine distance between each paper and its nearest prior neighbors. A difference-in-differences design with the November 2022 release of ChatGPT as the treatment break reveals a heterogeneous pattern: authors affiliated with institutions in non-English-dominant countries show a 0.18 standard deviation decline in relative novelty compared to authors in English-dominant countries (beta = -0.176, p < 0.001), equivalent to a 7-percentile-point drop in the novelty distribution. This finding is robust across alternative novelty specifications, treatment break dates, and sub-samples, and survives a placebo test at a pre-treatment break. I interpret these results through the lens of construal level theory, proposing that LLMs function as proximity tools that shift researchers from abstract, exploratory thinking toward concrete, convention-following execution. The paper contributes to the growing debate on whether LLM-driven productivity gains come at the cost of intellectual diversity.

An Intelligent Framework for Real-Time Yoga Pose Detection and Posture Correction

2026-03-23T17:37:59Z

Yoga is widely recognized for improving physical fitness, flexibility, and mental well being. However, these benefits depend strongly on correct posture execution. Improper alignment during yoga practice can reduce effectiveness and increase the risk of musculoskeletal injuries, especially in self guided or online training environments. This paper presents a hybrid Edge AI based framework for real time yoga pose detection and posture correction. The proposed system integrates lightweight human pose estimation models with biomechanical feature extraction and a CNN LSTM based temporal learning architecture to recognize yoga poses and analyze motion dynamics. Joint angles and skeletal features are computed from detected keypoints and compared with reference pose configurations to evaluate posture correctness. A quantitative scoring mechanism is introduced to measure alignment deviations and generate real time corrective feedback through visual, text based, and voice based guidance. In addition, Edge AI optimization techniques such as model quantization and pruning are applied to enable low latency performance on resource constrained devices. The proposed framework provides an intelligent and scalable digital yoga assistant that can improve user safety and training effectiveness in modern fitness applications.

C$^2$-Cite: Contextual-Aware Citation Generation for Attributed Large Language Models

2026-03-23T15:22:01Z

The attribution technique enhances the credibility of LLMs by adding citations to the generated sentences, enabling users to trace back to the original sources and verify the reliability of the output. However, existing instruction-tuned attributed LLMs often fail to properly interpret the contextual semantics of citation symbols (e.g., [i]) during text generation. This shortcoming arises from their insufficient awareness of the context information surrounding citation markers, which in turn leads to disjointed references and poor integration of retrieved knowledge into the generated content. To address this issue, we propose a novel \textbf{C}ontextual-aware \textbf{C}itation generation framework (\textbf{C$^2$}-\textbf{Cite}) that explicitly integrates the semantic relationships between citation markers and their referenced content. Specifically, a contextual citation alignment mechanism is adopted: it first encodes the retrieved document contexts into the symbol representation of citations, then aligns the marker numbers by decoding information from a citation router function. This mechanism enables the transformation of citation markers from generic placeholders into active knowledge pointers that link to the referenced source information. Experimental results on the ALCE benchmark across three datasets validate our framework C$^2$-Cite++: it outperforms the SOTA baseline by an average of 5.8\% in citation quality and 17.4\% in response correctness. The implementation is publicly available at https://github.com/BAI-LAB/c2cite

A Stock-Flow Framework for Editorial Board Dynamics: The Case of Economics Journals, 1866-2019

2026-03-23T07:50:20Z

Research on the editorial boards of scholarly journals has predominantly relied on static, cross-sectional data, focusing on their composition or interlocking editorships at single points in time. To address this gap, a formal stock-flow framework is developed for analyzing the longitudinal dynamics of editorial boards. The model integrates three interconnected layers: journal demographics, the dynamics of editorial positions, and the dynamics of board members. This framework is applied to the Gatekeepers of Economics Longitudinal Database (GOELD), which contains annual snapshots of editorial boards for approximately 1,700 economics journals from 1866 to 2006 (by decade), plus the years 2012 and 2019. The period until 1946 was characterized by small-scale: few journals and compact editorial communities. The decade from 1946 to 1956 marked the shift toward a ''big science'' model, initiating an era of expansionary growth fueled primarily by the founding of new journals. The contemporary period (2006-2019) appears to represent a structural break, characterized by low flux and more stable and more closed editorial communities. The results shows that the proposed framework enables a dynamic, long-term analysis of how journals and their gatekeeping systems evolve, grow, and structure themselves.

Context Selection for Hypothesis and Statistical Evidence Extraction from Full-Text Scientific Articles

2026-03-22T12:28:21Z

Extracting hypotheses and their supporting statistical evidence from full-text scientific articles is central to the synthesis of empirical findings, but remains difficult due to document length and the distribution of scientific arguments across sections of the paper. The work studies a sequential full-text extraction setting, where the statement of a primary finding in an article's abstract is linked to (i) a corresponding hypothesis statement in the paper body and (ii) the statistical evidence that supports or refutes that hypothesis. This formulation induces a challenging within-document retrieval setting in which many candidate paragraphs are topically related to the finding but differ in rhetorical role, creating hard negatives for retrieval and extraction. Using a two-stage retrieve-and-extract framework, we conduct a controlled study of retrieval design choices, varying context quantity, context quality (standard Retrieval Augmented Generation, reranking, and a fine-tuned retriever paired with reranking), as well as an oracle paragraph setting to separate retrieval failures from extraction limits across four Large Language Model extractors. We find that targeted context selection consistently improves hypothesis extraction relative to full-text prompting, with gains concentrated in configurations that optimize retrieval quality and context cleanliness. In contrast, statistical evidence extraction remains substantially harder. Even with oracle paragraphs, performance remains moderate, indicating persistent extractor limitations in handling hybrid numeric-textual statements rather than retrieval failures alone.

AETAS: Analysis of Evolving Temporal Affect and Semantics for Legal History

2026-03-21T18:10:03Z

Digital-humanities work on semantic shift often alternates between handcrafted close readings and opaque embedding machinery. We present a reproducible expert-system style pipeline that quantifies lexical drift and its instability in the Old Bailey Corpus (1674-1913), coupling interpretable trajectories with legally meaningful axes. We bin proceedings by decade with dynamic merging for low-resource slices, train skip-gram embeddings, align spaces through orthogonal Procrustes to a 1900s anchor, and measure both geometric displacement and neighborhood turnover. We add split-half baselines and seed-sensitivity checks to separate within-bin instability from temporal change. Three visual analytics outputs (drift magnitudes, semantic trajectories, and movement along a mercy-versus-retribution axis) expose how justice, crime, poverty, and insanity evolve with penal reforms, transportation debates, and Victorian moral politics. The pipeline is implemented as auditable scripts so results can be reproduced in other historical corpora.

The Innovation Recognition Paradox: How Science Undervalues the Boundary-Crossing Work Women Produce

2026-03-21T01:54:15Z

Women and men pursue different but complementary forms of scientific innovation. Analyzing 261,452 solo-authored papers by U.S. scholars, with patterns confirmed by millions of multi-authored articles, we show that women more often bridge distant disciplines through novel reference combinations, while men more often recombine concepts within fields. Women's interdisciplinary innovations prove more disruptive and more prescient, yet science penalizes them for it. For equally innovative work, women's papers land in lower-prestige journals and tend to receive less downstream citation credit, though their disruptive impact is greater. These gaps narrow only at extreme levels of novelty, suggesting women must produce exceptionally surprising work to achieve parity. Men's within-field concept innovations, by contrast, attract recognition from disciplinary gatekeepers who control careers. The asymmetry reveals not a deficit in women's contributions but a reward structure that systematically undervalues the boundary-crossing work most likely to transform fields.

Astrophysics Research Organizations in the 21st Century: Database and Comparative Dashboards

2026-03-20T19:35:52Z

As many research papers in astronomy have been written since the beginning of the 21st century as had been written previously. This exponential growth has been accompanied by substantial changes in the structure of astrophysics research, which organizations perform it and where they are located. Using data from the Smithsonian/NASA Astrophysics Data System/Science Explorer (ADS/SciX) we have obtained an article number and citation based set of metrics as a function of the institutional affiliation of the first author; nearly every organization which has produced recent astronomy research is included. We use these data to examine changes in where astronomy research is being done. We demonstrate how to create custom rankings for the organizations. We develop a dashboard of key performance indicators (KPI) to examine the relative and absolute changes in the research performance for each of the 1949 organizations which have produced at least one first authored, refereed astronomy journal article since 1997. We also present KPI dashboards for 65 countries and three regions.

Cenergy3: An Open Software Package for City Energy 3D Modeling

2026-03-20T17:05:11Z

The efficient management and planning of urban energy systems require integrated three-dimensional (3D) models that accurately represent both consumption nodes and distribution networks. This paper introduces our developed approach and openly released software that automate the generation of digital 3D urban energy model from open data. We synthesize data from OpenTopography, OpenStreetMap, and Overture Maps in generating 3D models. The rendered model visualizes and contextualizes distribution power grids alongside the built environment and transportation networks. Our developed software, including an open python library and a free API, provides interactive figures for the 3D models. The rendered models are essential for analyzing infrastructure alignment and spatially linking energy demand nodes (buildings) with energy supply (utility grids). The developed API leverages standard Web Mercator coordinates (EPSG:3857) and JSON serialization to ensure interoperability within smart city and energy simulation platforms. We also provide a graphic user interface (GUI) where end-users can access our API via a cloud-based server, regardless of their programming skills and what devices and platforms their are using. We anticipate that our approach and software can support field researchers, developers, end-users, and policy-makers in a varieties of applications like urban energy monitoring, demand-supply analysis, and energy digital twins.