https://arxiv.org/api/VeJd93ofsQdAvuMRSunGddj4KK8 2026-06-13T22:04:18Z 1591 135 15 http://arxiv.org/abs/2506.00081v2 Artificial Empathy: AI based Mental Health 2025-10-31T05:23:32Z Many people suffer from mental health problems but not everyone seeks professional help or has access to mental health care. AI chatbots have increasingly become a go-to for individuals who either have mental disorders or simply want someone to talk to. This paper presents a study on participants who have previously used chatbots and a scenario-based testing of large language model (LLM) chatbots. Our findings indicate that AI chatbots were primarily utilized as a "Five minute therapist" or as a non-judgmental companion. Participants appreciated the anonymity and lack of judgment from chatbots. However, there were concerns about privacy and the security of sensitive information. The scenario-based testing of LLM chatbots highlighted additional issues. Some chatbots were consistently reassuring, used emojis and names to add a personal touch, and were quick to suggest seeking professional help. However, there were limitations such as inconsistent tone, occasional inappropriate responses (e.g., casual or romantic), and a lack of crisis sensitivity, particularly in recognizing red flag language and escalating responses appropriately. These findings can inform both the technology and mental health care industries on how to better utilize AI chatbots to support individuals during challenging emotional periods. 2025-05-30T02:36:56Z Aditya Naik Jovi Thomas Teja Sree Mandava Himavanth Reddy Vemula http://arxiv.org/abs/2511.00093v1 Pharmacovigilance Analysis of Drug-Induced Rhabdomyolysis Based on the FDA Adverse Event Reporting System (FAERS) 2025-10-30T07:08:11Z This study aimed to systematically identify and quantify risks for drug-induced rhabdomyolysis (DIR) using real-world data and to propose an evidence-based risk mitigation framework. We conducted a retrospective pharmacovigilance study using the FDA Adverse Event Reporting System (FAERS) database from Q1 2005 to Q1 2025. A two-stage analysis involved initial signal detection using the Reporting Odds Ratio (ROR), followed by a LASSO-optimized multivariate logistic regression to calculate adjusted odds ratios (aORs) for 54 target drugs while controlling for confounders. Our analysis confirmed potent DIR risks for known agents, such as gemfibrozil (aOR 173.67) and statins (lovastatin aOR 97.20, simvastatin aOR 85.12). Crucially, we identified strong, novel risk signals for drugs currently lacking warnings, most notably levetiracetam (aOR 11.02) and donepezil (aOR 8.90). A significant "labeling gap" was quantified: 61.1% of drugs with a statistically significant DIR risk lack a corresponding warning in U.S. drug labels. We subsequently developed a three-tiered risk stratification model. The proposed framework provides a data-driven foundation for developing tiered clinical decision support systems, enhancing prescribing safety, and guiding future regulatory action to bridge the identified evidence-to-labeling gap. 2025-10-30T07:08:11Z Enpu Liang http://arxiv.org/abs/2510.23783v1 Methylator: A Modular Framework for DNA Methylation Analysis in Mammals and Plants Using Galaxy 2025-10-27T19:14:00Z DNA cytosine methylation is a critical epigenetic mark regulating gene expression and thus playing an important role in development and differentiation across eukaryotes. Existing tools for high-throughput methylation analysis often lack cross-species flexibility or require command-line expertise. We present Methylator, a novel, end-to-end DNA methylation analysis framework integrated into the Galaxy platform, enabling accessible DNA methylation analysis for mammals and plants. Methylator supports analyses from data obtained using diverse protocols like WGBS, RRBS, and PBAT and handles all contexts of DNA methylation (CpG, CHG, and CHH). The Methylator framework includes quality control, alignment, methylation calling, differential analysis, and functional analysis through reproducible, user-friendly workflows. Its unique Dirty-Harry alignment method enhances mapping efficiency, while a Shiny-based interface allows for interactive, publication-ready visualizations. Methylator is freely available, offering researchers a versatile, user-friendly solution for epigenomic studies. 2025-10-27T19:14:00Z Jonas Bucher Ueli Grossniklaus Deepak Kumar Tanwar http://arxiv.org/abs/2510.23140v1 Fast Voxel-Wise Kinetic Modeling in Dynamic PET using a Physics-Informed CycleGAN 2025-10-27T09:17:02Z Tracer kinetic modeling serves a vital role in diagnosis, treatment planning, tracer development and oncology, but burdens practitioners with complex and invasive arterial input function estimation (AIF). We adopt a physics-informed CycleGAN showing promise in DCE-MRI quantification to dynamic PET quantification. Our experiments demonstrate sound AIF predictions and parameter maps closely resembling the reference. 2025-10-27T09:17:02Z 5 pages, 1 figure. Pre-review preprint. Submitted to MedEurIPS 2025 (EurIPS workshop) Christian Salomonsen Samuel Kuttner Michael Kampffmeyer Robert Jenssen Kristoffer Wickstrøm Jong Chul Ye Elisabeth Wetzer http://arxiv.org/abs/2507.22963v2 FedCVD++: Communication-Efficient Federated Learning for Cardiovascular Risk Prediction with Parametric and Non-Parametric Model Optimization 2025-10-25T10:38:41Z Cardiovascular diseases (CVD) cause over 17 million deaths annually worldwide, highlighting the urgent need for privacy-preserving predictive systems. We introduce FedCVD++, an enhanced federated learning (FL) framework that integrates both parametric models (logistic regression, SVM, neural networks) and non-parametric models (Random Forest, XGBoost) for coronary heart disease risk prediction. To address key FL challenges, we propose: (1) tree-subset sampling that reduces Random Forest communication overhead by 70%, (2) XGBoost-based feature extraction enabling lightweight federated ensembles, and (3) federated SMOTE synchronization for resolving cross-institutional class imbalance. Evaluated on the Framingham dataset (4,238 records), FedCVD++ achieves state-of-the-art results: federated XGBoost (F1 = 0.80) surpasses its centralized counterpart (F1 = 0.78), and federated Random Forest (F1 = 0.81) matches non-federated performance. Additionally, our communication-efficient strategies reduce bandwidth consumption by 3.2X while preserving 95% accuracy. Compared to existing FL frameworks, FedCVD++ delivers up to 15% higher F1-scores and superior scalability for multi-institutional deployment. This work represents the first practical integration of non-parametric models into federated healthcare systems, providing a privacy-preserving solution validated under real-world clinical constraints. 2025-07-30T06:17:33Z Need to add more results Abdelrhman Gaber Hassan Abd-Eltawab John Elgallab Youssif Abuzied Dineo Mpanya Turgay Celik Swarun Kumar Tamer ElBatt http://arxiv.org/abs/2510.22174v1 Identification of Shared Genetic Biomarkers to Discover Candidate Drugs for Cervical and Endometrial Cancer by Using the Integrated Bioinformatics Approaches 2025-10-25T06:02:52Z Cervical (CC) and endometrial cancers (EC) are two common types of gynecological tumors that threaten the health of females worldwide. Since their underlying mechanisms and associations remain unclear, computational bioinformatics analysis is required. In the present study, bioinformatics methods were used to screen for key candidate genes, their functions and pathways, and drug agents associated with CC and EC, aiming to reveal the possible molecular-level mechanisms. Four publicly available microarray datasets of CC and EC from the Gene Expression Omnibus database were downloaded, and 72 differentially expressed genes (DEGs) were selected through integrated analysis. Then, we performed the protein-protein interaction (PPI) analysis and identified 9 shared genetic biomarkers (SGBs). The GO functional and KEGG pathway enrichment analyses of these SGBs revealed some important functions and signaling pathways significantly associated with CC and EC. The interaction network analysis identified four transcription factors (TFs) and two miRNAs as key transcriptional and post-transcriptional regulators of SGBs. The expression of the AURKA, TOP2A, and UBE2C genes was higher in CC and EC tissues than in normal samples, and this gene expression was linked to disease progression. Furthermore, we performed docking analysis between 9 SGBs-based proteins and 145 meta-drugs, and identified the top-ranked 10 drugs as candidate drugs. Finally, we investigated the binding stability of the top-ranked three drugs (Sorafenib, Paclitaxel, Sunitinib) using 100 ns MD-based MM-PBSA simulations with UBE2C, AURKA, and TOP2A proteins, and observed their stable performance. Therefore, the proposed drugs might play a vital role in the treatment against CC and EC. 2025-10-25T06:02:52Z 25 pages, 8 figures Md. Selim Reza Mst. Ayesha Siddika Md. Tofazzal Hossain Md. Ashad Alam Md. Nurul Haque Mollah http://arxiv.org/abs/1706.04610v3 Transverse contributions to the longitudinal stiffness of the human foot 2025-10-24T20:23:23Z Humans rely on foot stiffness to withstand the propulsive forces of walking and running. Skeletal adaptations that increase foot stiffness include the medial longitudinal arch (MLA) and the transverse tarsal arch (TTA). The TTA has been hypothesized to stiffen the foot through cross-axis coupling of transverse intermetatarsal stiffness with sagittal-plane midfoot stiffness, but this has been tested only in cadaveric specimens. In vivo testing is essential because muscle contraction substantially modulates MLA function and may similarly affect the TTA's cross-axis coupling. Here we provide in vivo evidence for the TTA's contribution to foot stiffness by externally increasing intermetatarsal stiffness and measuring its effects on midfoot elasticity during walking. As predicted by the cross-axis coupling hypothesis, increasing intermetatarsal stiffness with an elastic tape wrapped around the forefoot reduced the energy absorbed in midfoot flattening and increased sagittal-plane midfoot stiffness concomitantly (mean,$\pm$,standard error of the mean (SEM): $13.9\% \pm 3\%$ and $16.8\% \pm 5.8\%$, respectively). However, taping did not change the curvature of the TTA, thereby isolating the effects of cross-axis coupling from morphological changes to the TTA. Thus, forefoot taping modulates midfoot stiffness through cross-axis coupling and could provide a non-invasive means to manage pathological foot flexibility or enhance athletic performance. 2017-06-14T17:56:27Z 21 pages, 14 figures Ali Yawar Lucia Korpas Shreyas Mandre Madhusudhan Venkadesan http://arxiv.org/abs/2510.19869v1 Challenges and Recommendations in Establishing National Human Diversity Genomic Projects 2025-10-22T05:20:05Z Genomic approaches have revolutionized medical research, providing valuable insights into human physiology and disease. Despite major benefits from large collections of genomes, the lack of diversity in genomic data represents a significant challenge for advancing biomedical discovery and accessible health solutions worldwide. Establishing a national genomic project is not a one-size-fits-all endeavor, as each country presents distinct challenges and opportunities. We identify challenges in the way of obtaining and publishing data from Whole Genome Sequencing (WGS) of people in various countries, discuss the progress made by some in their efforts to study their genetic diversity, and assess the most common issues. We recognize that a successful national genome database requires addressing several major issues, including the variable awareness of the recent developments in genomics among government officials, healthcare administrators, and policymakers, the absence of regulations, and ethical considerations, the challenges in securing funding, establishing legal frameworks, and building the necessary infrastructure. By assembling a diverse team of experts across 19 countries, we aim to provide a balanced approach in our recommendations to establish national projects. Our study acknowledges and addresses major intricacies and nuances specific to various settings and regions while presenting diverse opinions of scientists from both high-resource and low-resource countries contributing to a more inclusive and globally relevant framework for advancing genomic research and its applications. 2025-10-22T05:20:05Z Taras K. Oleksyk Walter W. Wolfsberger Karishma Chhugani Yu-Ning Huang Valerii Pokrytiuk Khrystyna Shchubelka Alex Zelikovsky Bogdan Pasaniuc Viorel Jinga Octavian Bucur Scott C. Edmunds Heinner Guio Zane Lombard Brenna M. Henn Andrei Lobiuc Alexei Levitchi Dumitru Ciorba Viorel Bostan Viorel Munteanu Victor Gordeev Christian P. Schaaf Hoh Boon-Peng Andrés Moreno Estrada Mihai Covasa Mihai Dimian Ulykbek Kairov Victoria M. Pak Seow Shih Wee Charleston W. K. Chiang Emmanuel Nepolo Matteo Pellegrini Yosr Hamdi Malak S. Abedalthagafi Nicola Jane Mulder Jazlyn Mooney Javier E. Sanchez-Galan Sandro José de Souza Henriette Raventós Marina Muzzio Gabriela Chavarria-Soley Serghei Mangul http://arxiv.org/abs/2510.20851v1 Combinations of histone deacetylase inhibitors extend chronological lifespan in S. cerevisiae 2025-10-21T21:20:48Z Aging is the primary risk factor for nearly all forms of human death, yet pharmaceutical interventions hold the potential to prevent it. Combinations of drugs have been shown to increase the lifespan of model organisms more than individual drugs, and geroprotective histone deacetylase (HDAC) inhibitors that have different molecular targets within the longevity regulation network show considerably higher drug synergy than many other compounds. In this study, four HDAC inhibitors (curcumin, quercetin, resveratrol, and berberine) have been administered in pairwise, three-, and four-combinations to yeast (S. cerevisiae) and their maximum chronological lifespans (CLS) have been measured. In five of the six pairwise combinations, the drugs exhibited synergy according to the Bliss Independence Model, on average extending maximum CLS 68% over the individual drugs. Three- and four-combinations further extended maximum CLS 49% and 107% over pairwise combinations, respectively. Since the targets of the HDAC inhibitors used in this study are evolutionarily conserved between yeast and humans, the results obtained have implications on human longevity. 2025-10-21T21:20:48Z 8 pages, 9 figures Owen H. Wherry http://arxiv.org/abs/2510.18401v1 Relation between in vitro microbial fermentations and in vivo performance in pigs selected for their residual feed intake 2025-10-21T08:22:27Z Bioinformatic analysis of microbiota revealed that certain metabolic pathways are associated with low- and high- residual feed intake (HRFI and LRFI), such as the amino-acid biosynthesis pathway and the tRNA-aminoacyl synthesis pathway. The latter is associated with increased propionate production. Yet, in vitro fermentation-profile analyses revealed that LRFI pigs, from the most efficient genetic line, produced more acetate (+15%) and propionate (+56%) from the insoluble fraction (IF) containing the insoluble dietary fibre recovered after simulation of upper gastrointestinal digestion. Valerate was also more frequently abundant in LRFI pigs (P < 0.01). 16S sequencing analysis of the microbes responsible for fermentation suggested that propionate obtained from the fraction of feed that is indigestible by the host is produced mainly by Prevotella and Lactobacillus. This production was strongly correlated with backfat thickness in LRFI pigs (Spearman's correlation = 0.80), while a moderate correlation existed between butyrate production and feed efficiency in HRFI pigs (Spearman's correlation = 0.44). These results revealed that propionate production is related to fat metabolism, suggesting that GPR43 receptor activation by propionate could play a physiological role in adipose cells in RFI-pigs. These observations highlight significant functional differences between the microbiota of HRFI and LRFI pigs, as well as variability within more efficient pigs that could be exploited to improve performance. 2025-10-21T08:22:27Z in French language, journ{é}es de la recherche porcine, Feb 2025, Saint malo, France Olivier Zemb GenPhySE, Comue de Toulouse Lauren Jouaron AGIR, GenPhySE, Comue de Toulouse Estelle Jordi GABI Anais Cazals GABI Caroline Achard GenPhySE, Comue de Toulouse Marion Schiavone GenPhySE, Comue de Toulouse Tiffany Page GenPhySE, Comue de Toulouse Laurent Cauquil GenPhySE Carole Bannelier GenPhySE, Comue de Toulouse Aliakbari Amir GenPhySE, Comue de Toulouse Yvon Billon GenESI Yves Farizon GenPhySE, Comue de Toulouse Hélène Gilbert GenPhySE http://arxiv.org/abs/2505.16619v2 Open and Sustainable AI: challenges, opportunities and the road ahead in the life sciences (October 2025 -- Version 2) 2025-10-14T08:23:04Z Artificial intelligence (AI) has recently seen transformative breakthroughs in the life sciences, expanding possibilities for researchers to interpret biological information at an unprecedented capacity, with novel applications and advances being made almost daily. In order to maximise return on the growing investments in AI-based life science research and accelerate this progress, it has become urgent to address the exacerbation of long-standing research challenges arising from the rapid adoption of AI methods. We review the increased erosion of trust in AI research outputs, driven by the issues of poor reusability and reproducibility, and highlight their consequent impact on environmental sustainability. Furthermore, we discuss the fragmented components of the AI ecosystem and lack of guiding pathways to best support Open and Sustainable AI (OSAI) model development. In response, this perspective introduces a practical set of OSAI recommendations directly mapped to over 300 components of the AI ecosystem. Our work connects researchers with relevant AI resources, facilitating the implementation of sustainable, reusable and transparent AI. Built upon life science community consensus and aligned to existing efforts, the outputs of this perspective are designed to aid the future development of policy and structured pathways for guiding AI implementation. 2025-05-22T12:52:34Z 1 PDF, 24 Pages, 2 figures within. Co-corresponding authors: Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece and Department of Biomedical Sciences, University of Padova, Padova, Italy. E-mails: fpsom[@]certh.gr, silvio.tosatto[@]unipd.it Gavin Farrell Department of Biomedical Sciences, University of Padova, Padova, Italy Eleni Adamidi Athena Research and Innovation Center, Marousi, Greece Rafael Andrade Buono VIB.AI Center for AI and Computational Biology, Ghent, Belgium Mihail Anton ELIXIR Europe Hub, EMBL-EBI, Hinxton, United Kingdom Omar Abdelghani Attafi Department of Biomedical Sciences, University of Padova, Padova, Italy Salvador Capella Gutierrez Barcelona Supercomputing Center Emidio Capriotti Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy and Computational Genomics Platform, IRCCS University Hospital of Bologna, Bologna, Italy Leyla Jael Castro ZB MED Information Centre for Life Sciences, Cologne, Germany Davide Cirillo Barcelona Supercomputing Center Lisa Crossman SequenceAnalysis.co.uk, United Kingdom and University of East Anglia, Norwich, United Kingdom Christophe Dessimoz Department of Computational Biology, University of Lausanne, Lausanne, Switzerland and Swiss Institute of Bioinformatics, Lausanne, Switzerland Alexandros Dimopoulos Institute for Fundamental Biomedical Science, Biomedical Sciences Research Center Alexander Fleming, Vari, Greece and Department of Informatics & Telematics, School of Digital Technology, Harokopio University, Athens, Greece Raul Fernandez-Diaz School of Medicine, University College Dublin, Dublin, Ireland and Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland and IBM Research Dublin, Dublin, Ireland Styliani-Christina Fragkouli Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece and Department of Biology, National & Kapodistrian University of Athens, Athens, Greece Carole Goble Department of Computer Science, University of Manchester, Manchester, United Kingdom Wei Gu Luxembourg National Data Service, Esch-sur-Alzette, Luxembourg John M. Hancock Institute of Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia Alireza Khanteymoori Department of Psychology, University of Freiburg, Freiburg, Germany Tom Lenaerts Machine Learning Group, Universite Libre de Bruxelles, Brussels, Belgium and Artificial Intelligence Lab, Vrije Universiteit Brussel, Brussels, Belgium and Interuniversity Institute of Bioinformatics in Brussels, ULB-VUB, Brussels, Belgium and FARI, AI for the common good institute, ULB-VUB, Brussels, Belgium and Center for Human-Compatible AI, UC Berkeley, Berkeley, CA, USA Fabio G. Liberante ELIXIR Europe Hub, EMBL-EBI, Hinxton, United Kingdom Peter Maccallum ELIXIR Europe Hub, EMBL-EBI, Hinxton, United Kingdom Alexander Miguel Monzon Department of Biomedical Sciences, University of Padova, Padova, Italy Magnus Palmblad Leiden University Medical Center, Leiden, Netherlands Lucy Poveda Swiss Institute of Bioinformatics, Lausanne, Switzerland Ovidiu Radulescu LPHI, University of Montpellier, CNRS, INSERM, Montpellier, France Denis C. Shields School of Medicine, University College Dublin, Dublin, Ireland and Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland Shoaib Sufi Department of Computer Science, University of Manchester, Manchester, United Kingdom Thanasis Vergoulis Athena Research and Innovation Center, Marousi, Greece Fotis Psomopoulos Institute of Applied Biosciences, Centre for Research and Technology Hellas, Thessaloniki, Greece Silvio C. E. Tosatto Department of Biomedical Sciences, University of Padova, Padova, Italy and Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, National Research Council 10.1038/s41592-026-03037-6 http://arxiv.org/abs/2510.10871v1 Interplay of Fidelity and Diversity in the Evolution of the Genetic Code 2025-10-13T00:40:14Z The origin and organizing principles of the genetic code remain fundamental puzzles in life science. The vanishingly low probability of the natural codon-to-amino acid mapping arising by chance has spurred the hypothesis that its structure is a solution optimized for robustness against mutations and translational errors. For the construction of effective molecular machines, the dictionary of encoded amino acids must also be diverse enough in physicochemical features. Here, we examine whether the standard genetic code can be understood as a near-optimal solution balancing these two objectives: minimizing error load and aligning codon assignments with the naturally occurring amino acid composition. Using simulated annealing, we explore this trade-off across a broad range of parameters. We find that the standard genetic code lies near local optima within the multidimensional parameter space. It is a highly effective solution that balances fidelity against resource availability constraints. These results suggest that the present genetic code reflects coevolution under conflicting pressures of fidelity and diversity, offering new insight into its emergence and evolution. 2025-10-13T00:40:14Z 16 pages, 7 figures Yudam Seo Tsvi Tlusty Junghyo Jo http://arxiv.org/abs/2510.10655v1 Isotropy and Geometry of Pretrained Protein LMs 2025-10-12T15:19:40Z Large pretrained language models have transformed natural language processing, and their adaptation to protein sequences -- viewed as strings of amino acid characters -- has advanced protein analysis. However, the distinct properties of proteins, such as variable sequence lengths and lack of word-sentence analogs, necessitate a deeper understanding of protein language models (LMs). We investigate the isotropy of protein LM embedding spaces using average pairwise cosine similarity and the IsoScore method, revealing that models like ProtBERT and ProtXLNet are highly anisotropic, utilizing only 2--14 dimensions for global and local representations. In contrast, multi-modal training in ProteinBERT, which integrates sequence and gene ontology data, enhances isotropy, suggesting that diverse biological inputs improve representational efficiency. We also find that embedding distances weakly correlate with alignment-based similarity scores, particularly at low similarity. 2025-10-12T15:19:40Z Published in the Proceedings of the ICML 2025 Workshop on Multi-modal Foun- dation Models and Large Language Models for Life Sciences, Vancouver, Canada. 2025 Sheikh Azizul Hakim Kowshic Roy M Saifur Rahman http://arxiv.org/abs/2510.14997v1 Evaluation and Implementation of Machine Learning Algorithms to Predict Early Detection of Kidney and Heart Disease in Diabetic Patients 2025-10-12T13:28:26Z Cardiovascular disease and chronic kidney disease are major complications of diabetes, leading to high morbidity and mortality. Early detection of these conditions is critical, yet traditional diagnostic markers often lack sensitivity in the initial stages. This study integrates conventional statistical methods with machine learning approaches to improve early diagnosis of CKD and CVD in diabetic patients. Descriptive and inferential statistics were computed in SPSS to explore associations between diseases and clinical or demographic factors. Patients were categorized into four groups: Group A both CKD and CVD, Group B CKD only, Group C CVD only, and Group D no disease. Statistical analysis revealed significant correlations: Serum Creatinine and Hypertension with CKD, and Cholesterol, Triglycerides, Myocardial Infarction, Stroke, and Hypertension with CVD. These results guided the selection of predictive features for machine learning models. Logistic Regression, Support Vector Machine, and Random Forest algorithms were implemented, with Random Forest showing the highest accuracy, particularly for CKD prediction. Ensemble models outperformed single classifiers in identifying high-risk diabetic patients. SPSS results further validated the significance of the key parameters integrated into the models. While challenges such as interpretability and class imbalance remain, this hybrid statistical machine learning framework offers a promising advancement toward early detection and risk stratification of diabetic complications compared to conventional diagnostic approaches. 2025-10-12T13:28:26Z This thesis was completed under the supervision of Prof. Dr. Darakhshan Saleem. I am deeply grateful for her mentorship throughout my graduate studies Syed Ibad Hasnain http://arxiv.org/abs/2510.11748v1 Thinned COE random matrix models for DNA replication 2025-10-11T21:05:33Z This paper details an observation that for more primitive organisms, such as some yeasts, the statistical distribution of the origins of replication sometimes looks remarkably like the distribution of eigenvalues from the Circular Orthogonal Ensemble (COE) of random matrices. This does not hold for more complex organisms, but a uniform thinning of the COE eigenvalues (which interpolates between the COE and uncorrelated, Poisson statistics) gives a platform to investigate characteristics of replication origin distribution in other species where data is available. 2025-10-11T21:05:33Z Huw Day Nina C. Snaith