https://arxiv.org/api/0iibv0zAbdul8var520k/GDhygs 2026-06-13T20:44:04Z 1591 120 15 http://arxiv.org/abs/2511.18598v1 Assessing Gaze and Pointing: Human Cue Interpretation by Indian Free-Ranging Dogs in a Food Retrieval Task 2025-11-23T19:39:58Z The urban habitat provides a landscape that increases the chances of human-animal interactions, which can lead to increased human-animal conflict, but also coexistence. Some species show high levels of socio-cognitive abilities that enable them to perceive communicational gestures of humans and use them for their own benefit. This study investigated the ability of Indian free-ranging dogs (Canis lupus familiaris) to utilise human social-referential cues (pointing and gazing) to locate hidden food, focusing on the relative effectiveness of unimodal versus multimodal cues. A total of 352 adult free-ranging dogs were tested in an object-choice task involving six different cue conditions: control (no cue), negative control (one baited bowl, no cue), combined pointing and gazing, pointing-only, gazing-only, and conflicting cues (pointing and gazing at opposite bowls). The dogs successfully chose the correct target only in the combined pointing and gazing condition, while performance under unimodal and conflicting cue conditions did not differ significantly from chance. This highlights the importance of signal redundancy and clarity in interspecific communication for this population. A dog's demeanor was a significant predictor of its willingness to engage: affiliative dogs were significantly more likely to succeed in the overall experiment and displayed a significantly shorter approach latency compared to anxious and neutral dogs. While demeanor affected the approach latency, it did not affect the accuracy of the choice, decoupling the dogs' personality from its cognitive ability to comprehend the clear cue. Neither the dogs' sex nor the experimental condition significantly predicted approach latency. 2025-11-23T19:39:58Z 5 figures Srijaya Nandi Dipanjan Roy Aesha Lahiri Anamitra Roy Anindita Bhadra http://arxiv.org/abs/2511.16096v1 Modelling the impact of improving access to healthcare on Hepatitis B prevalence in the Thai-Myanmar border region 2025-11-20T06:39:48Z Introduction: In Thailand, Hepatitis B is still endemic despite a strong program to eliminate the disease. A higher prevalence is reported in the border region and among migrants due to physical, financial and cultural barriers. Policies and programs targeting the border region and migrant communities have been suggested. Models can be used to understand and quantify the impact of these policies, given they can capture the heterogeneity within the population. Methods: In this study, we developed an Agent-based model that captures the differences between the Thai and migrant populations living in this region, notably the higher level of mobility, lower access to healthcare, and the higher prevalence of Hepatitis B among migrants, by modelling the origin of each individual explicitly. We used the model to estimate future trends of Hepatitis B prevalence in Thailand near the border with Myanmar under different scenarios of intervention. Results: Our study shows that although the current intervention level is effective in the Thai population, it is insufficient to reach national elimination targets due to high prevalence in migrants. Improving access to healthcare for migrants and the border region could potentially help to reach elimination targets, and we quantified the level of improvement needed to achieve elimination. Conclusion: Although there already exist policies to make healthcare more accessible to migrants and the border regions, they are still not yet effective due to financial and cultural barriers. Bringing down those barriers could reduce Hepatitis B prevalence in those communities and regions and contribute to reaching elimination targets in a reasonable timeline. 2025-11-20T06:39:48Z Anh D. Pham Robert Moss Wirichada Pan-ngum Rose McGready Nicholas Geard http://arxiv.org/abs/2511.15753v1 Prediction of Retention Time in Larger Antisense Oligonucleotide Datasets using Machine Learning 2025-11-19T05:45:42Z Antisense oligonucleotides (ASOs) are nucleic acid molecules with transformative therapeutic potential, especially for diseases that are untreatable by traditional drugs. However, the production and purification of ASOs remain challenging due to the presence of unwanted impurities. One tool successfully used to separate an ASO compound from the impurities is ion pair liquid chromatography (IPC). It is a critical step in separation, where each compound is identified by its retention time (tR) in the IPC. Due to the complex sequence-dependent behavior of ASOs and variability in chromatographic conditions, the accurate prediction of tR is a difficult task. This study addresses this challenge by applying machine learning (ML) to predict tR based on the sequence characteristics of ASOs. Four ML models Gradient Boosting, Random Forest, Decision Tree, and Support Vector Regression were evaluated on three large ASO datasets with different gradient times. Through feature engineering and grid search optimization, key predictors were identified and compared for model accuracy using root mean square error, coefficient of determination R-squared, and run time. The results showed that Gradient Boost performance competes with the Support Vector Machine in two of the three datasets, but is 3.94 times faster to tune. Additionally, newly proposed features representing the sulfur count and the nucleotides residing at the first and last positions of a sequence were found to improve the predictive power of the models. This study demonstrates the advantages of ML-based tR prediction at scale and provides insights into interpretable and efficient utilization of ML in chromatographic applications. 2025-11-19T05:45:42Z Machine learning with Application 2025 Manal Rahal Bestoun S. Ahmed Christoph A. Bauer Johan Ulander Jorgen Samuelsson http://arxiv.org/abs/2511.14523v1 Teaching Longitudinal Linear Mixed Models End-to-End: A Reproducible Case Study in Mouse Body-Weight Growth 2025-11-18T14:22:51Z Background: Linear mixed-effects models are central for analyzing longitudinal continuous data, yet many learners meet them as scattered formulas or software output rather than as a coherent workflow. There is a need for a single, reproducible case study that links questions, model building, diagnostics, and interpretation. Methods: We reanalyze a published mouse body-weight experiment with 31 mice in three groups weighed weekly for 12 weeks. After reshaping the data to long format and using profile plots to motivate linear time trends, we fit three random-intercept linear mixed models: a common-slope model, a fully interacted group-by-time model, and a parsimonious model with group-specific intercepts, a shared slope for two groups, and an extra slope for the third. Models are compared using maximum likelihood, AIC, BIC, and likelihood ratio tests, and linear contrasts are used to estimate group differences in weekly means and 12 week gains. Results: The parsimonious model fits as well as the fully interacted model and clearly outperforms the common-slope model, revealing small and similar gains in two groups and much steeper growth in the third, with highly significant contrasts for excess weight gain. Interpretation: This case study gives a complete, executable workflow for longitudinal linear mixed modeling, from raw data and exploratory plots through model selection, diagnostics, and targeted contrasts. By making explicit the mapping from scientific questions to model terms and estimable contrasts, and by providing R code and a stepwise checklist, it serves as a practical template for teaching and applied work in biostatistics, epidemiology, and related fields 2025-11-18T14:22:51Z 42 pages, 5 figures, 7 tables. Includes fully reproducible R code and teaching materials for longitudinal linear mixed models using a mouse body-weight case study Sunday A. Adetunji http://arxiv.org/abs/2511.14816v1 XGBoost-Powered Digital Twins Leverage Routine Blood Tests for Early Detection of Cancer and Cardiovascular Disease 2025-11-18T08:56:37Z Early detection of cancer and cardiovascular diseases is fundamental to improving patient outcomes and reducing healthcare expenditure. Current cancer screening programs are targeted towards specific cancers and are often inaccessible to large parts of the population, particularly in remote regions. This project aimed to develop digital blood twins: machine learning models that leverage routinely collected blood test data, demographics, comorbidities, and prescribed medications, for scalable and cost-effective disease screening. Digital blood twins were constructed using the UK Biobank dataset (n = 373,269). Using age, sex, comorbidities, medication profiles, and blood test z-scores, three iterations of XGBoost classifiers were trained for broad cancer, colorectal cancer, and cardiovascular disease prediction. Model interpretability was achieved through SHAP and dimensionality reduction analyses (UMAP, t-SNE). Broad-category cancer models achieved ROC-AUC = 0.607-0.706. Colorectal cancer prediction demonstrated excellent discrimination (ROC-AUC = 0.816-0.993), and cardiovascular models showed clinical utility, notably for hypertension (ROC-AUC = 0.813, F1 = 0.861). SHAP revealed consistent importance of age, sex, basophil count, and cystatin C. Immune digital blood twins as an agnostic tool demonstrate proof-of-concept feasibility for accessible, low-cost, and scalable screening of cancer and cardiovascular diseases, supporting future integration into predictive and preventive healthcare. 2025-11-18T08:56:37Z 43 pages with Supplementary Info Lo Kai Shun John Riya Nagar Abicumaran Uthamacumaran Hector Zenil http://arxiv.org/abs/2511.02340v2 Chronic Kidney Disease Prognosis Prediction Using Transformer 2025-11-18T01:31:17Z Chronic Kidney Disease (CKD) affects nearly 10\% of the global population and often progresses to end-stage renal failure. Accurate prognosis prediction is vital for timely interventions and resource optimization. We present a transformer-based framework for predicting CKD progression using multi-modal electronic health records (EHR) from the Seoul National University Hospital OMOP Common Data Model. Our approach (\textbf{ProQ-BERT}) integrates demographic, clinical, and laboratory data, employing quantization-based tokenization for continuous lab values and attention mechanisms for interpretability. The model was pretrained with masked language modeling and fine-tuned for binary classification tasks predicting progression from stage 3a to stage 5 across varying follow-up and assessment periods. Evaluated on a cohort of 91,816 patients, our model consistently outperformed CEHR-BERT, achieving ROC-AUC up to 0.995 and PR-AUC up to 0.989 for short-term prediction. These results highlight the effectiveness of transformer architectures and temporal design choices in clinical prognosis modeling, offering a promising direction for personalized CKD care. 2025-11-04T07:52:17Z 5 pages, 2 figures, 2 tables Yohan Lee DongGyun Kang SeHoon Park Sa-Yoon Park Kwangsoo Kim http://arxiv.org/abs/2511.12453v1 Self-Organization Dynamics Beyond Equilibrium: Discreteness, Computation, and Rules of Life 2025-11-16T04:46:48Z Living systems self-organize in ways that conventional physical frameworks-based on forces, energies, and continuous fields-cannot fully capture. Processes like gene regulation and cellular decision-making involve rule-based logic and computational interactions. Here, I introduce the concept of non-equilibrium capacity (NEC) to denote the finite capacity of living systems to generate and sustain life-associated dynamics-the very capacity that defines viability-and whose irreversible loss constitutes death. I argue that two lines of inquiry are especially promising for understanding why this capacity is inevitably lost. First, experiments that slow or suspend all cellular processes reveal "low speed limits" below which life collapses. Second, generalized cellular automata-where cells interact over diffusion-defined neighborhoods and obey discrete rules-provide a framework to understand how order emerges or persists. Together, these approaches suggest a new grammar of biology that complements energy-based physics and explains how living systems sustain and ultimately lose their NEC. 2025-11-16T04:46:48Z Hyun Youk http://arxiv.org/abs/2311.10443v3 MIFA: Metadata, Incentives, Formats, and Accessibility guidelines to improve the reuse of AI datasets for bioimage analysis 2025-11-12T12:54:05Z Artificial Intelligence methods are powerful tools for biological image analysis and processing. High-quality annotated images are key to training and developing new methods, but access to such data is often hindered by the lack of standards for sharing datasets. We brought together community experts in a workshop to develop guidelines to improve the reuse of bioimages and annotations for AI applications. These include standards on data formats, metadata, data presentation and sharing, and incentives to generate new datasets. We are positive that the MIFA (Metadata, Incentives, Formats, and Accessibility) recommendations will accelerate the development of AI tools for bioimage analysis by facilitating access to high quality training data. 2023-11-17T10:49:58Z 16 pages, 3 figures Teresa Zulueta-Coarasa Florian Jug Aastha Mathur Josh Moore Arrate Muñoz-Barrutia Liviu Anita Kola Babalola Pete Bankhead Perrine Gilloteaux Nodar Gogoberidze Martin Jones Gerard J. Kleywegt Paul Korir Anna Kreshuk Aybüke Küpcü Yoldaş Luca Marconato Kedar Narayan Nils Norlin Bugra Oezdemir Jessica Riesterer Norman Rzepka Ugis Sarkans Beatriz Serrano Christian Tischer Virginie Uhlmann Vladimír Ulman Matthew Hartley http://arxiv.org/abs/2503.09649v4 Technical and Legal Aspects of Federated Learning in Bioinformatics: Applications, Challenges and Opportunities 2025-11-09T22:47:36Z Federated learning leverages data across institutions to improve clinical discovery while complying with data-sharing restrictions and protecting patient privacy. This paper provides a gentle introduction to this approach in bioinformatics, and is the first to review key applications in proteomics, genome-wide association studies (GWAS), single-cell and multi-omics studies in their legal as well as methodological and infrastructural challenges. As the evolution of biobanks in genetics and systems biology has proved, accessing more extensive and varied data pools leads to a faster and more robust exploration and translation of results. More widespread use of federated learning may have a similar impact in bioinformatics, allowing academic and clinical institutions to access many combinations of genotypic, phenotypic and environmental information that are undercovered or not included in existing biobanks. 2025-03-12T08:45:31Z 28 pages, 4 figures Frontiers in Digital Health (2025), 7, 1644291 Daniele Malpetti Marco Scutari Francesco Gualdi Jessica van Setten Sander van der Laan Saskia Haitjema Aaron Mark Lee Isabelle Hering Francesca Mangili http://arxiv.org/abs/2511.03546v1 Physical fitness post VO2max -- a computational framework 2025-11-05T15:29:43Z This paper critically examines the conceptual and methodological limitations underlying the current understanding and application of VO2max in exercise science and physical activity prescription. Despite the establishment of WHO guidelines on physical activity, population-level adherence remains low. A key contributing factor is the continued reliance on VO2max as a central measure of physical fitness - an index that represents statistical relationships within large populations that provides limited insight or utility at the individual level. The authors demonstrate that the intrinsic constraints of VO2max make it unsuitable for the development of precise computational models capable of informing individualized exercise prescription. The paper reviews fundamental principles linking health, fitness, and exercise intensity, identifying critical gaps in how these constructs are conceptualized and operationalized. In response, alternative theoretical and methodological frameworks are proposed to more accurately capture the complex interplay between physiological and psychological factors that determine exercise performance and adherence. Building on this conceptual foundation, the authors introduce a novel computational approach that mathematically models these complex interdependencies and the various variables that affect adherence outcomes. A comprehensive empirical research framework and corresponding methodologies for deriving the required model equations are presented. Successful empirical validation of this model could provide a transformative step toward personalized, adaptive training systems enhancing both the efficacy and long-term adherence of health-oriented exercise programs. 2025-11-05T15:29:43Z 30 pages. 9 Figures J Borresen H Burger http://arxiv.org/abs/2511.03755v1 Mathematical and Computational Nuclear Oncology: Toward Optimized Radiopharmaceutical Therapy via Digital Twins 2025-11-04T23:40:55Z This article presents the general framework of theranostic digital twins (TDTs) in computational nuclear medicine, designed to support clinical decision-making and improve cancer patient prognosis through personalized radiopharmaceutical therapies (RPTs). It outlines potential clinical applications of TDTs and proposes a roadmap for successful implementation. Additionally, the chapter provides a conceptual overview of the current state of the art in the mathematical and computational modeling of RPTs, highlighting key challenges and the strategies being pursued to address them. 2025-11-04T23:40:55Z 22 pages, 5 figures. Pet Clin, 2026; Published online Marc Ryhiner Yangmeihui Song Babak Saboury Gerhard Glatting Arman Rahmim Kuangyu Shi 10.1016/j.cpet.2025.09.005 http://arxiv.org/abs/2511.03041v1 A Roadmap for Predictive Human Immunology 2025-11-04T22:35:35Z For over a century, immunology has masterfully discovered and dissected the components of our immune system, yet its collective behavior remains fundamentally unpredictable. In this perspective, we argue that building on the learnings of reductionist biology and systems immunology, the field is poised for a third revolution. This new era will be driven by the convergence of purpose-built, large-scale causal experiments and predictive, generalizable AI models. Here, we propose the Predictive Immunology Loop as the unifying engine to harness this convergence. This closed loop iteratively uses AI to design maximally informative experiments and, in turn, leverages the resulting data to improve dynamic, in silico models of the human immune system across biological scales, culminating in a Virtual Immune System. This engine provides a natural roadmap for addressing immunology's grand challenges, from decoding molecular recognition to engineering tissue ecosystems. It also offers a framework to transform immunology from a descriptive discipline into one capable of forecasting and, ultimately, engineering human health. 2025-11-04T22:35:35Z Aly A. Khan Jason Perera James Zou Loïc A. Royer Alan R. Lowe Ambrose Carr Theofanis Karaletsos Patricia Brennan Roham Parsa Marcus R. Clark Joe DeRisi Jay Shendure Sandra L. Schmid Scott E. Fraser Andrea Califano Shana O. Kelley http://arxiv.org/abs/2510.23833v2 On the distributions of restriction sites in human and pangolin sarbecoviruses 2025-11-03T11:10:53Z Since early 2020, several theories have suggested that a distribution of restriction endonuclease recognition sites in the SARS-CoV-2 genome indicates a synthetic origin. The most influential of these, a 2022 preprint by Bruttel et al. claimed: "The BsaI/BsmBI restriction map of SARS-CoV-2 is unlike any wild-type coronavirus, and it is unlikely to evolve from its closest relatives." To test this, I reanalyzed the same 11 contested sites using an expanded set of sarbecovirus genomes, including bat coronaviruses published after the Bruttel et al. preprint. For each site, I identified the bat coronaviruses most closely matching SARS-CoV-2 in the surrounding sequences, excluding the sites themselves. The Bruttel et al. hypothesis predicts that these closely related viruses should differ from SARS-CoV-2 at many of the contested sites if restriction sites had been artificially introduced or removed. Contrary to this prediction, one or more of the most closely related bat coronaviruses are identical to SARS-CoV-2 at all 11 sites. Equivalent "synthetic fingerprints" were identified in natural pangolin sarbecoviruses. Finally, I conducted a re-analysis of the dataset that Bruttel et al. used to test where the SARS-CoV-2 BsaI/BsmBI restriction map was significantly more "evenly spaced" than expected in a natural genome. I found technical and conceptual errors that resulted in Bruttel et al. reporting that their chosen metric was 0.07% likely to occur by chance rather than 4.2%, reducing the apparent rarity 60-fold. Using a more informative metric, I tested whether restriction sites in SARS-CoV-2 or two pangolin sarbecoviruses were significantly more evenly spaced than expected and found they were not. These results show that the restriction maps of SARS-CoV-2 and related pangolin viruses are unremarkable in the context of related bat coronaviruses. 2025-10-27T20:18:21Z Zach Hensel http://arxiv.org/abs/2511.00818v1 Deciphering Scientific Collaboration in Biomedical LLM Research: Dynamics, Institutional Participation, and Resource Disparities 2025-11-02T06:10:27Z Large language models (LLMs) are increasingly transforming biomedical discovery and clinical innovation, yet their impact extends far beyond algorithmic revolution-LLMs are restructuring how scientific collaboration occurs, who participates, and how resources shape innovation. Despite this profound transformation, how this rapid technological shift is reshaping the structure and equity of scientific collaboration in biomedical LLM research remains largely unknown. By analyzing 5,674 LLM-related biomedical publications from PubMed, we examine how collaboration diversity evolves over time, identify institutions and disciplines that anchor and bridge collaboration networks, and assess how resource disparities underpin research performance. We find that collaboration diversity has grown steadily, with a decreasing share of Computer Science and Artificial Intelligence authors, suggesting that LLMs are lowering technical barriers for biomedical investigators. Network analysis reveals central institutions, including Stanford University and Harvard Medical School, and bridging disciplines such as Medicine and Computer Science that anchor collaborations in this field. Furthermore, biomedical research resources are strongly linked to research performance, with high-performing resource-constrained institutions exhibiting larger collaboration volume with the top 1% most connected institutions in the network. Together, these findings reveal a complex landscape, where democratizing trends coexist with a persistent, resource-driven hierarchy, highlighting the critical role of strategic collaboration in this evolving field. 2025-11-02T06:10:27Z Lingyao Li Zhijie Duan Xuexin Li Xiaoran Xu Zhaoqian Xue Siyuan Ma Jin Jin http://arxiv.org/abs/2503.10710v3 How causal perspectives can inform neuroscience data analysis 2025-11-01T01:41:09Z Over the past two decades, considerable strides have been made in advancing neuroscientific techniques, yet challenges remain in attributing causality to observed associations. This review addresses a fundamental issue in observational neuroscience studies and advocates for incorporating causal inference frameworks into standard practice. We systematically introduce necessary definitions and concepts, emphasizing how causal assumptions underlie statistical analyses even when not explicitly stated. Through a running example on sleep quality and white matter integrity, we illustrate how persistent challenges, including confounding and selection biases, can be conceptualized and addressed using causal frameworks. We demonstrate practical approaches for making assumption violations transparent through hands-on examples: supplementary case studies using multi-site harmonization and head motion exclusion procedures provide step-by-step diagnostic techniques for checking covariate overlap and identifying selection bias through exclusion pattern analysis. We explore how these causal perspectives can inform both experimental design and analytical choices, particularly for observational studies where traditional randomization is infeasible. Together, we believe this framework offers concrete tools for strengthening causal interpretations and inspiring more robust approaches to problems in neuroscience. 2025-03-12T22:20:24Z Eric W. Bridgeford Brian S. Caffo Maya B. Mathur Russell A. Poldrack