https://arxiv.org/api/gPvFATd+qJpjuoIZ8n5/qIyB4L0 2026-06-13T13:49:18Z 1591 30 15 http://arxiv.org/abs/2605.00927v1 BioVeil MATRIX: Uncovering and categorizing vulnerabilities of agentic biological AI scientists 2026-04-30T19:22:44Z

Agentic AI scientists equipped with domain-specific tools are rapidly entering scientific workflows across disciplines, with especially strong uptake in the life sciences where they can be used for literature synthesis, sequence analysis, and experimental planning support. While these systems accelerate biological research, they also introduce risks for dual-use applications that are not captured by current model-centric safety evaluations. We present evidence that current agentic AI scientists, including Biomni and K-Dense, are willing to assist with dual-use tasks that are blocked by base model safeguards. We also found that in a paired evaluation framework for biology and chemistry prompts involving Weapons of Mass Destruction proxies (WMDP), agentic scaffolding of Biomni increased the benchmark performance relative to the underlying standalone model, producing measurable capability uplift. We believe it is necessary to include additional safeguards in existing models and build future tools from the ground up with agentic vulnerabilities in mind. To systematically categorize broader risks, we introduce BioVeil MATRIX, a defensive taxonomy that maps AI-enabled biosecurity risks using 10 tactical categories (TA01--TA10) and 22 different techniques. We propose to use this taxonomy as a baseline for future AI scientist development and generate specialized benchmarks and protocols for red-teaming these vulnerabilities before public deployment. BioVeil MATRIX can be found at: https://bioveilmatrix.com/

2026-04-30T19:22:44Z Kimon Antonios Provatas Avery Self Ioannis Mouratidis Ilias Georgakopoulos-Soares http://arxiv.org/abs/2605.00085v1 Tumor containment as an anti-percolation process 2026-04-30T16:37:59Z

Percolation theory from statistical physics has been applied to several aspects of tumor progression. Tumor growth on percolation clusters has been used to model spatial expansion, vascular percolation to describe nutrient supply and transport related percolation to investigate drug and gene delivery. At the molecular level, mutational percolation has been employed to account for the emergence of malignant phenotypes, while inverse percolation to represent treatment-induced structural disruption. We examined whether tumor containment can be interpreted as an anti percolation problem, in which spatial expansion depends on the formation of a connected malignant domain. We implemented a spatial simulation with biologically scaled parameters to represent tissue heterogeneity, local growth, cell movement and clearance. We measured both total malignant area and connectivity metrics, including the largest connected component and the probability of forming a spanning cluster. Our results indicate that tumor size and spatial connectivity are partially independent, with configurations of similar size showing different connectivity patterns. A transition from fragmented to connected structures emerged within a limited parameter range, consistent with a threshold like behavior. Incorporating spatial connectivity into quantitative analysis, our approach provides a complementary way to characterize tumor organization. Potential applications include integration of structural descriptors into computational models of tumor growth, design of experimental systems to probe spatial organization and interpretation of therapeutic approaches via connectivity-based metrics.

2026-04-30T16:37:59Z 9 pages, 2 figures Arturo Tozzi http://arxiv.org/abs/2604.27408v1 Personalizing Cancer Models under Data Scarcity via Parameter Decomposition 2026-04-30T04:18:58Z

Personalized cancer modeling for clinical applications requires robust and efficient parameter calibration, particularly in settings with limited patient data. This need is especially critical for medical digital twins (MDTs), which are virtual representations of disease continuously updated using longitudinal patient measurements. In this work, we propose a novel parameter personalization framework for dynamical cancer models under data scarcity. Our approach decomposes selected model parameters into a common component, shared across patients, and a personalized component, which is patient-specific and can be updated as new data become available. The common component captures population-level structure and is estimated once, providing an informed prior that enables rapid and accurate personalization. We demonstrate the effectiveness of this framework using synthetic data generated from canonical dynamical systems, such as logistic growth models with optimized treatment interventions. Our results show that parameter decomposition significantly improves calibration performance in limited-data regimes, facilitating fast and reliable personalization and supporting the development of patient-specific cancer models and MDTs.

2026-04-30T04:18:58Z Logan Rose Jonathan Martinez Juho Kim Jing Qin Boris Aguilar David Murrugarra http://arxiv.org/abs/2604.26998v1 Entropy-Dominated Temporal Vocal Dynamics as Digital Biomarkers for Depression Detection 2026-04-29T02:42:45Z

Automated depression detection often relies on static aggregation of conversational signals, potentially obscuring clinically meaningful behavioral dynamics. We investigated whether entropy-driven temporal biomarkers improve depression detection beyond standard pooled features using the DAIC-WOZ corpus. Using 142 labeled participants, we reconstructed utterance-level acoustic trajectories and compared pooled temporal baselines, trajectory dynamics, Shannon entropy biomarkers, recurrence quantification, sample entropy, fractal complexity, and coupling biomarkers under leakage-aware validation. Static pooling achieved an AUC of 0.593, trajectory dynamics improved performance to 0.637, and entropy biomarkers produced the strongest statistically significant improvement over pooled baselines (AUC 0.646; nested cross-validated AUC 0.615; permutation p = 0.017). Entropy biomarkers outperformed recurrence, coupling, sample entropy, and fractalbased features, with several biomarkers stable across folds. These findings suggest depression-related signal may lie less in average acoustic levels than in entropy of conversational dynamics, supporting temporally informed digital phenotypes for mental-health assessment.

2026-04-29T02:42:45Z 16 pages Himadri S Samanta http://arxiv.org/abs/2604.24157v1 OxyPOM: a biogeochemical model for Oxygen and Particulate Organic Matter dynamics with detailed temperature sensitivity 2026-04-27T08:13:31Z

Periods of low dissolved oxygen concentration -- hypoxia and anoxia -- threaten the health of aquatic ecosystems and the services they provide.Hypoxia is strongly influenced by temperature, but the different sensitivities and response functions of oxygen removal and production processes to temperature are not regarded in most models. Here we present OxyPOM -- Oxygen and Particulate Organic Matter, a nuanced temperature-aware process-based biogeochemical model. OxyPOM incorporates nuanced temperature sensitivities for the key oxygen-related processes photosynthesis, re-aeration, respiration, mineralization, and nitrification. Further sensitive variables like optimal light intensity, winter grazing inhibition, and pathogenesis are also represented. Our model was tested in an idealized water column experiment, representing a typical estuarine seasonal low-oxygen environment. Differences between nuanced and uniform temperature sensitivities affect seasonal patterns of oxygen-related processes, resulting in under- or overestimation during different times of the year, particularly with higher differences in summer. While these changes may balance in the overall annual oxygen budget, uniform sensitivities underestimate particulate organic carbon production by up to a factor of four along the year and overestimate nutrient concentrations. This nuanced approach to temperature sensitivity allows us to explore and test new hypotheses related to climate warming and heatwaves, addressing the ecosystem changes demanded by climate change models.

2026-04-27T08:13:31Z Ovidio García-Oliva Carsten Lemmen http://arxiv.org/abs/2604.23773v1 Differential Analysis of Microbial Interaction Networks 2026-04-26T15:51:11Z

Microbiome studies increasingly indicate that disease-associated shifts cannot be understood from compositional changes alone. The functional architecture of microbial communities encoded in patterns of association among microbial gene families may reveal how these systems reorganize across biological conditions. Here, we present a network-based framework for characterizing microbiome rewiring across conditions. The approach combines condition-specific network inference, differential network analysis and pathway enrichment to identify interactions that are gained, lost or altered between groups, with a specific focus on sex-dependent differences. We apply the framework to inflammatory bowel disease, type 2 diabetes and atherosclerotic cardiovascular disease, comparing male and female specific microbial gene-family networks within each disease context. Across these settings, differential networks reveal extensive rewiring of microbial functional interactions, suggesting that microbiome alterations are shaped not only by changes in abundance but also by shifts in community organization. Importantly, pathway enrichment of rewired interactions uncovers functional signals that are not apparent from individual networks alone, highlighting latent disease and sex associated mechanisms. Code, data and supplementary information are available on the web site.

2026-04-26T15:51:11Z Marianna Milano Pietro Hiram Guzzi http://arxiv.org/abs/2604.24796v1 A multi-stage soft computing framework for complex disease modelling and decision support: A liver cirrhosis case study 2026-04-26T11:55:48Z

Liver cirrhosis is a major global health problem causing millions of deaths annually, and timely detection with aggressive treatment can significantly improve patients' quality of life. Modelling complex diseases from biomedical data is computationally challenging due to high dimensionality, strong feature correlations, noise, and limited labelled samples. Conventional Machine Learning (ML) pipelines often struggle with robustness, interpretability, and generalisation under such conditions. In this study, we propose an ML-driven multi-stage decision framework for complex disease modelling and therapeutic exploration. The framework integrates single-cell transcriptomic profiling, high-dimensional network-based feature stabilisation, multi-model learning, deep representation construction, and post-hoc decision support. Specifically, single-cell sequencing data were analysed to identify key cellular subpopulations, followed by high-dimensional weighted gene co-expression network analysis (hdWGCNA) to stabilise gene modules under sparsity and noise. To enhance non-linear feature interaction modelling, tabular molecular features were restructured into two-dimensional disease maps and analysed using a CNN. Finally, molecular docking was incorporated as a decision-support module to evaluate candidate therapeutic compounds. Using liver cirrhosis as a representative case, the framework identified a disease-associated endothelial subpopulation and extracted seven robust signature genes (HSPB1, GADD45A, CLDN5, ATP1B3, C1QBP, ENPP2, and PARL). The CNN-based representation learning module outperformed conventional pipelines in classification. The framework is disease-agnostic and readily extends to other omics-driven biomedical applications involving uncertainty, heterogeneity, and limited samples.

2026-04-26T11:55:48Z 20 pages, 8 figures Xueyuan Huang Yuheng Wang Yuanzhi He Siqi Gou Lu Bai Wenqian Wu Peifeng Liu Aijia Wang Tianhui Fan Ze Zhou Jiayu Xu http://arxiv.org/abs/2604.18316v2 Predictive Modelling of Natural Medicinal Compounds for Alzheimer disease Using Machine Learning and Cheminformatics 2026-04-25T02:02:58Z

Alzheimer disease (AD) is a neurodegenerative disease that lacks specific treatment options. Natural drugs have displayed neuroprotective effects; however, their high-throughput discovery is challenging because of the expense of experimental testing.The study proposed a machine learning approach to identify the anti-dementia activity of natural compounds based on molecular descriptors obtained from cheminformatics. The study used a set of active and inactive compounds obtained from public databases like ChEMBL and PubChem. Various molecular descriptors, including molecular weight, lipophilicity (LogP), topological polar surface area (TPSA), and hydrogen bonding descriptors, were calculated with RDKit. Data preprocessing and feature selection were applied, followed by the development of several classification models (Random Forest, XGBoost, Support Vector Machines, Logistic Regression) and their evaluation based on accuracy, precision, recall, F1-score and ROC-AUC. The outcome suggests that ensemble techniques, such as Random Forest, delivered the best predictive accuracy and ROC-AUC values. This study also highlights that critical physicochemical descriptors in particular lipophilicity, molecular weight and polarity are important in driving neuroprotective activity as identified by feature importance analysis. The integrated machine learning approach shows the potential of combining natural product research and machine learning in early drug discovery for dementia. They provide a means of rapidly exploring large datasets and selecting candidates for experimental confirmation, thus minimising costs and time in the development of drugs for neurodegenerative diseases.

2026-04-20T14:19:15Z 12 pages, 9 figures, submitted as a conference paper Hafiza Syeda Yusra Tirmizi Syed Ibad Hasnain Muhammad Faris Rabail Khowaja Saad Abdullah http://arxiv.org/abs/2604.22890v1 AI-Derived Reproductive Phenotypes and Explainable ML for Concurrent Early Multimorbidity in U.S. Women: NHANES 2017-March 2020 2026-04-24T09:36:03Z

Background:Adverse reproductive history is a multisystemic risk factor, but evidence is constrained by isolated outcome studies, limited adjustment, and non-interpretable algorithmic models. We re-frame the estimand from prediction to concurrent risk classification and emphasize calibration, interpretability, and systematic error. Methods:We analyzed 1,602 U.S. women aged 20-44 years from NHANES 2017-March 2020 with reproductive-history variables, chronic-condition indicators, and PHQ-9 data. Restricted multimorbidity was defined as at least two of hypertension, hypercholesterolemia, cardiovascular disease, kidney disease, and kidney stones. Features were summarized using principal components analysis and k-means clustering. We compared multivariable logistic regression with XGBoost and used SHAP values to quantify contributions. Results:Early multimorbidity occurred in 6.6% (106/1,602); 71.0% had no chronic condition and 22.4% had one. Adverse reproductive burden was common: 58% had at least one adverse reproductive factor and 12.6% had three or more. Four latent phenotypes emerged (n=398, 508, 102, 594), including a fragile subgroup in which 77.5% met the multimorbidity definition. In holdout evaluation, XGBoost improved discrimination relative to logistic regression (ROC-AUC 0.766 vs 0.667), but showed worse probability accuracy and calibration (Brier 0.069 vs 0.059; expected calibration error 0.113 vs 0.037). Dominant drivers were age, PHQ-9 score, income-to-poverty ratio, race/ethnicity, education, and the adverse reproductive index. Conclusions: Principal components analysis and k-means phenotyping revealed that adverse reproductive life-course structure is strongly clustered with concurrent early multimorbidity in U.S. women aged 20-44 years. Although XGBoost improved discrimination, calibration and feature attribution remained essential for reliable translation into practice

2026-04-24T09:36:03Z Refereed (Peer-Reviewed) Conference Paper 2026 Symposium on Data Science and Statistic | American Statistics Association Sunday A. Adetunji http://arxiv.org/abs/2604.22887v1 StackFeat: a convergent algorithm for optimal predictor selection in genomic data 2026-04-24T09:00:28Z

In high-dimensional genomic data, the curse of dimensionality (d >> n) and limited sampling make feature selection inherently unstable - a critical barrier to biomarker discovery. We introduce StackFeat, an iterative algorithm that accumulates two statistics across repeated cross-validation: signed coefficients (measuring effect strength and direction) and selection frequencies (estimating selection probability). Only features ranking highly by both criteria are retained. On a COVID-19 miRNA dataset (GSE240888), StackFeat identified a stable 5-miRNA signature from 332 features (98.5% reduction), achieving AUC 0.922, significantly outperforming the benchmark 9-gene set (AUC 0.907, p = 0.0016). The signature includes hsa-miR-150-5p, a marker implicated in both COVID-19 survival and Dengue infection. This dual-criterion approach provides convergence guarantees absent in single-criterion methods, enabling discovery of known biomarkers, novel candidates, and previously unknown relationships. Keywords: marker selection, feature selection, bioinformatics, dimensionality reduction, robust algorithm, stacking, miRNA, COVID-19

2026-04-24T09:00:28Z 10 pages. Presented at 16th International Conference on Bioscience, Biochemistry and Bioinformatics (ICBBB Kobe 2026) Akbar Yermekov D. A. Herrera-Martí http://arxiv.org/abs/2511.03769v2 Current validation practice undermines surgical AI development 2026-04-21T10:47:11Z

Surgical data science (SDS) is rapidly advancing, yet clinical adoption of artificial intelligence (AI) in surgery remains limited, with inadequate validation emerging as an important contributing factor. In fact, existing validation practices often neglect the temporal and hierarchical structure of intraoperative videos, producing misleading, unstable, or clinically irrelevant results. In a pioneering, consensus-driven effort, we introduce a comprehensive catalog of validation pitfalls in AI-based surgical video analysis that was derived from a multi-stage Delphi process with 92 international experts. The collected pitfalls span three categories: (1) data (e.g., incomplete annotation, spurious correlations), (2) metric selection and configuration (e.g., neglect of temporal stability, mismatch with clinical needs), and (3) aggregation and reporting (e.g., clinically uninformative aggregation, failure to account for frame dependencies in hierarchical data structures). A systematic review of surgical AI papers reveals that these pitfalls are widespread in current practice, with the majority of studies failing to account for temporal dynamics or hierarchical data structure, or relying on clinically uninformative metrics. Experiments on real surgical video datasets provide empirical evidence that ignoring temporal and hierarchical data structures can substantially understate uncertainty, obscure critical failure modes, and even alter algorithm rankings. To address these shortcomings, we provide a catalogue of best practices compiled in a multi-stage Delphi process. Together, this work provides an evidence-based framework to inform more rigorous validation of surgical video analysis algorithms and to guide future efforts in benchmarking, reporting, regulatory review, and clinical translation.

2025-11-05T16:44:01Z Under review in Nature BME Annika Reinke Ziying O. Li Minu D. Tizabi Pascaline André Marcel Knopp Mika M. Rother Ines P. Machado Maria S. Altieri Deepak Alapatt Sophia Bano Sebastian Bodenstedt Oliver Burgert Elvis C. S. Chen Justin W. Collins Olivier Colliot Evangelia Christodoulou Tobias Czempiel Adrito Das Reuben Docea Daniel Donoho Qi Dou Jennifer Eckhoff Sandy Engelhardt Gabor Fichtinger Philipp Fuernstahl Pablo García Kilroy Stamatia Giannarou Stephen Gilbert Ines Gockel Patrick Godau Jan Gödeke Teodor P. Grantcharov Tamas Haidegger Alexander Hann Makoto Hashizume Charles Heitz Rebecca Hisey Hanna Hoffmann Arnaud Huaulmé Paul F. Jäger Pierre Jannin Anthony Jarc Rohit Jena Yueming Jin Leo Joskowicz Luc Joyeux Max Kirchner Axel Krieger Gernot Kronreif Kyle Lam Shlomi Laufer Joël L. Lavanchy Gyusung I. Lee Robert Lim Peng Liu Hani J. Marcus Pietro Mascagni Ozanan R. Meireles Beat P. Mueller Lars Mündermann Hirenkumar Nakawala Nassir Navab Abdourahmane Ndong Juliane Neumann Felix Nickel Marco Nolden Chinedu Nwoye Namkee Oh Nicolas Padoy Thomas Pausch Micha Pfeiffer Tim Rädsch Hongliang Ren Nicola Rieke Dominik Rivoir Duygu Sarikaya Samuel Schmidgall Matthias Seibold Silvia Seidlitz Alexander Seitel Lalith Sharan Jeffrey H. Siewerdsen Vinkle Srivastav Raphael Sznitman Russell Taylor Thuy N. Tran Matthias Unberath Fons van der Sommen Martin Wagner Amine Yamlahi Shaohua K. Zhou Aneeq Zia Amin Madani Danail Stoyanov Stefanie Speidel Daniel A. Hashimoto Fiona R. Kolbinger Lena Maier-Hein http://arxiv.org/abs/2604.19842v1 Energy gradients as potential drivers of pre-cellular chemical organization 2026-04-21T10:11:49Z

The onset of life is often framed around membrane bound compartments and encoded metabolism, leaving unresolved how spatial organization arose before stable boundaries. In this context, environmental gradients are usually treated as boundary conditions rather than variables structuring chemical dynamics. We ask whether spatial localization and functional coupling can emerge under realistic environmental gradients in the absence of membranes, proposing that spatial variations in energy availability act as organizing variables that bias transport and reaction. We introduce a reaction diffusion model in which interacting chemical species evolve within an externally imposed activity landscape defined by coupled gradients in pH, redox potential and temperature, integrating diffusion, gradient driven drift and position dependent reaction kinetics. We performed simulations across a range of gradient strengths representative of hydrothermal vent like conditions. Our results suggest that sufficiently strong gradients induce spontaneous accumulation of reactants, spatial alignment of reaction maxima and the emergence of stable, confined chemical states. Localization arises above a threshold at which gradient driven transport overcomes diffusive and degradative losses. We conclude that spatially structured energy landscapes can support organized chemical dynamics without predefined compartments, providing a mechanism for coupling and persistence in continuous media. Potential applications include experimental platforms for studying prebiotic chemistry, microfluidic systems with controlled gradients and the design of chemically responsive materials.

2026-04-21T10:11:49Z 14 pages, 5 figures Arturo Tozzi http://arxiv.org/abs/2604.18784v1 Mathematical modeling and intuition in microbiology: a perspective 2026-04-20T19:47:42Z

Mathematical models are increasingly a part of microbiological research. Here, we share our perspective on how modeling advances the discipline by: (i) enforcing logical consistency, (ii) enabling quantitative prediction, (iii) extracting hidden parameters from data, and (iv) generating intuitive understanding. We map a spectrum of modeling frameworks, from whole-cell simulations to minimal logistic growth equations, and provide interactive examples for some common frameworks. Building on this overview, we outline pragmatic criteria for choosing an appropriate level of description to capture phenomena of interest. Finally, we present a case study in modeling of microbial ecosystems from our own work to illustrate how mechanistic modeling can yield generalizable intuition. This perspective aims to be an introductory roadmap for integrating mathematical modeling into experimental microbiology.

2026-04-20T19:47:42Z Environmental Microbiology 28 (4), e70266 (2026) Jamie A. Lopez Amir Erez 10.1111/1462-2920.70266 http://arxiv.org/abs/2604.13141v1 Baseline glycemia exhibits non-random, history-dependent variation across repeated meals 2026-04-14T14:24:59Z

Glycemic regulation is often described as maintaining glucose levels near a stable baseline. However, continuous glucose monitoring after meals displays intra-individual variability even under controlled conditions, suggesting intrinsic system dynamics beyond sensor noise, measurement error or short-term variability around a fixed set point. Therefore, we estimated pre-meal glucose baselines, tracking their changes across repeated identical meal challenges within individuals. The baseline was defined as the median glucose level in a pre-meal window, while successive displacements were computed between consecutive repetitions. Using a publicly available dataset of normoglycemic subjects, we observed systematic changes in baseline levels across repeated exposures. These displacements exceeded short-term fluctuations within the same pre-meal interval and were robust to alternative baseline definitions. Moreover, the magnitude of each baseline shifted is positively related to the size of the preceding postprandial response. This association persisted under permutation testing, indicating that it cannot be explained by random temporal ordering. Overall, these findings suggest that glycemic dynamics cannot be fully described as independent fluctuations around a fixed baseline. Instead, baseline levels evolve across repeated perturbations through history-dependent adjustments, such that each perturbation influences subsequent system states. Potential applications include refined interpretation of continuous glucose monitoring data and development of models that incorporate temporal dependence in glucose dynamics.

2026-04-14T14:24:59Z 8 pages, 1 figure Arturo Tozzi http://arxiv.org/abs/2604.11287v1 Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model 2026-04-13T10:50:44Z

Background: Large language models (LLMs) have been explored as tools for generating personalized exercise prescriptions, yet the consistency of outputs under identical conditions remains insufficiently examined. Objective: This study evaluated the intra-model consistency of LLM-generated exercise prescriptions using a repeated generation design. Methods: Six clinical scenarios were used to generate exercise prescriptions using Gemini 2.5 Flash (20 outputs per scenario; total n = 120). Consistency was assessed across three dimensions: (1) semantic consistency using SBERT-based cosine similarity, (2) structural consistency based on the FITT principle using an AI-as-a-judge approach, and (3) safety expression consistency, including inclusion rates and sentence-level quantification. Results: Semantic similarity was high across scenarios (mean cosine similarity: 0.879-0.939), with greater consistency in clinically constrained cases. Frequency showed consistent patterns, whereas variability was observed in quantitative components, particularly exercise intensity. Unclassifiable intensity expressions were observed in 10-25% of resistance training outputs. Safety-related expressions were included in 100% of outputs; however, safety sentence counts varied significantly across scenarios (H=86.18, p less than 0.001), with clinical cases generating more safety expressions than healthy adult cases. Conclusions: LLM-generated exercise prescriptions demonstrated high semantic consistency but showed variability in key quantitative components. Reliability depends substantially on prompt structure, and additional structural constraints and expert validation are needed before clinical deployment.

2026-04-13T10:50:44Z 15 pages, 5 tables, 3 figures Kihyuk Lee