https://arxiv.org/api/zG6dRPE6ACGzdNceeRU8e6xfv542026-06-21T12:37:45Z132589015http://arxiv.org/abs/2606.08475v1Parameter uncertainty in dynamical models: a practical identifiability index2026-06-07T06:44:39ZOrdinary differential equation models are widely used to understand and forecast complex dynamical systems, but their predictive value depends on reliable parameter estimation. Structural identifiability assesses whether parameters can be uniquely recovered from ideal observations, whereas practical identifiability depends on finite, noisy and partially observed data. We introduce the Practical Identifiability Index (PII), a marginal uncertainty-width metric based on the logarithmic span of confidence intervals. Expressed on an order-of-magnitude scale, the PII summarises how tightly individual positive-valued parameters are constrained by available observations, enabling comparison across parameters, models, error structures and observation designs. The PII is intended as a complementary diagnostic, not a standalone identifiability test, and should be interpreted alongside coverage, profile likelihoods, posterior summaries, sensitivity analysis or structural identifiability results. Using parametric bootstrap experiments across growth and compartmental epidemic models, we identify consistent principles: uncertainty decreases as calibration windows become more informative, increases with observation noise and parameter coupling, and remains high for latent or indirectly observed processes. Parameters governing early observable dynamics become constrained sooner, while additional observables can improve constraint for latent progression and recovery parameters. The PII provides a simple, reportable summary of marginal parameter uncertainty for dynamical modelling.2026-06-07T06:44:39ZHamed KaramiAlexandra SmirnovaSunmi LeeGerardo Chowellhttp://arxiv.org/abs/2606.08391v1Cruise Ship-Associated Andes Virus Cluster aboard MV Hondius, 2026: A Stochastic Scenario Analysis2026-06-07T00:59:19ZIn April 2026, the MV Hondius expedition cruise ship became the site of the first documented cruise ship-associated Andes hantavirus (ANDV) cluster, with 13 confirmed and probable cases and 3 deaths among 149 passengers and crew. We applied a stochastic epidemic model to evaluate four embarkation scenarios under reproductive numbers anchored to published ANDV estimates. Scenario D, involving two latent infected persons at embarkation, was most consistent with the observed outbreak, yielding P(final size >= 13) = 11.6% and P(takeoff) = 58.5% at R0 = 2.12. Approximate Bayesian computation provided complementary support for multiple latent infections at embarkation, especially E1(0)=1 and E3(0)=2, but R0 remained weakly identifiable. A day-35 transmission reduction changed takeoff probability little in this counterfactual model. Findings support exposure-history assessment, early onboard surveillance, rapid isolation of symptomatic cases, and postdisembarkation monitoring for travelers from ANDV-endemic regions.2026-06-07T00:59:19ZRaj Kumar SubediHamed KaramiKaustubh WaghKenji MizumotoGerardo Chowellhttp://arxiv.org/abs/2606.08366v1MetaboliSim: a Python implementation of the Mader model for dynamic and steady-state simulation of muscular energy metabolism2026-06-06T22:59:09ZThe Mader model is the most widely used mathematical framework for muscular energy metabolism in German-language sport science, underpinning lactate diagnostics, maximal lactate steady state (MLSS) estimation and training prescription. Despite decades of use, neither its dynamic ODE formulation nor its steady-state equations have been available as open code, leaving results based on the model impossible to reproduce independently. We close this gap with MetaboliSim, an open-source Python implementation of both formulations: a dynamic model that integrates the five-variable ODE system (phosphate potential, $\dot{V}\mathrm{O}_2$, muscle and blood lactate, and glycogen) with a fourth-order Runge-Kutta scheme, and a steady-state model that computes MLSS power and the lactate-power relationship in one- and two-compartment variants. We verified implementation correctness against published reference values and assessed physiological plausibility across constant-load, step-test, sprint and running protocols. The implementation reproduces the published reference output within stated tolerances and remains numerically stable throughout (halving the time step changes blood lactate by less than 0.01 mmol/L), with both formulations yielding congruent MLSS estimates. Key physiological behaviour ($\dot{V}\mathrm{O}_2$ on-kinetics, lactate accumulation, PCr dynamics and the sub/supra-MLSS separation) emerges directly from the model equations without protocol-specific tuning, and a sensitivity analysis shows MLSS power varying approximately linearly with $\dot{V}\mathrm{O}_{2\max}$ and nonlinearly with $\dot{V}\mathrm{La}_{\max}$. As the first openly available implementation of the complete Mader model (AGPL-3.0), MetaboliSim lets independent groups reproduce, verify and build on published model-based results. Source code: https://codeberg.org/3phos/metabolisim; Platform: https://metabolisim.org2026-06-06T22:59:09ZKatharina DunstVincent ScharfClemens HesseAlexander Asterothhttp://arxiv.org/abs/2606.08191v1Frequency-Domain Latent Attention Gating for Cross-Domain Token Aggregation2026-06-06T14:21:55ZToken aggregation is a common bottleneck in models that map token representations to sample-level predictions, yet most pooling methods operate only in the original token domain. We propose FLaG, a plug-in aggregation module that transforms token representations with the real FFT, summarizes spectral components with learnable latent queries, applies a channel-wise gate, and reconstructs enhanced time-domain tokens for final pooling. We evaluate FLaG on antimicrobial peptide (AMP) activity prediction with ESM2, image classification with ResNet18 on CIFAR-10 and CIFAR-100, and text classification with RoBERTa on IMDB and GLUE. FLaG achieves its clearest gains on the ESM2-8M antimicrobial peptide tasks and on CIFAR-100, while remaining competitive with strong text baselines on IMDB and GLUE. Then we probe its behavior on the AMP setting with band knockouts, gate summaries, residue perturbations, latent-query readouts, and structure-proxy stratification. We find that low-frequency bands contribute the most overall, and the remaining higher-band pattern is more sample-specific. The gate acts as a broadly shared spectral reweighting stage and the cross-attention patterns are sample-specific with mild query-wise differentiation, and higher-helix peptides exhibit stronger average spectral sensitivity in both bacteria. The supplementary materials, source code and data are released at https://www.healthinformaticslab.org/supp/ and https://github.com/Kewei2023/AMPCliff/tree/FLaG.2026-06-06T14:21:55ZKewei LiRongying ZhangXueli WangXiwen GongZhongjian WangLan HuangRuochi ZhangFengfeng Zhouhttp://arxiv.org/abs/2507.23146v2Lightweight Language Models are Prone to Reasoning Errors for Complex Computational Phenotyping Tasks2026-06-05T17:18:11ZAlthough computational phenotyping is a central informatics activity with resulting cohorts supporting a wide variety of applications, it is time-intensive because of manual data review. We previously assessed the ability of LLMs to perform computational phenotyping tasks using computable phenotypes for ARF respiratory support therapies. They successfully performed concept classification and classification of single-therapy phenotypes but underperformed on multi-therapy phenotypes. To understand issues with these complex tasks, we expanded PHEONA, a generalizable framework for evaluation of LLMs, to include methods specifically for evaluating faulty reasoning. We assessed the responses of two lightweight non-reasoning LLMs (Mistral Small 24 billion and Phi-4 14 billion) and one lightweight reasoning LLM (Qwen-distilled DeepSeek-r1 32 billion) both with and without prompt modifications to identify explanation correctness and unfaithfulness errors for phenotyping. For experiments without prompt modifications, both errors were present across all models. For experiments assessing accuracy impact after prompt modifications, Mistral had the highest overall accuracy impact when compared to DeepSeek and Phi. Since reasoning errors were ubiquitous across models, our enhancement of PHEONA to include a component for assessing faulty reasoning provides critical support for LLM evaluation and evidence for reasoning errors for complex tasks. While insights from reasoning errors can help prompt refinement, a deeper understanding of why LLM reasoning errors occur will likely require further development and refinement of interpretability methods.2025-07-30T22:48:34ZSarah PungitoreShashank YadavDavid MaughanVignesh Subbianhttp://arxiv.org/abs/2606.07301v1Structure-guided taxonomic placement of divergent RNA viruses with ViraClass2026-06-05T14:17:44ZMetatranscriptomic sequencing has expanded our knowledge of the RNA virosphere far more rapidly than novel viruses can be taxonomically classified. Taxonomic assignment above the family level is particularly difficult because the RNA-dependent RNA polymerase (RdRp) is often the only gene retained across RNA viruses yet exhibits little sequence similarity among highly divergent viruses. Here we show that RdRp protein structure retains taxonomic signal at evolutionary depths where RdRp primary sequence similarity has largely collapsed, and that the organization of this signal is consistent with the current ICTV hierarchy. Based on this, we developed ViraClass, a hierarchical framework for RNA virus taxonomic placement that uses RdRp structure for rank-by-rank assignment from phylum to genus, stopping at the deepest rank supported by confidence thresholds, and calibrated structural clustering for viruses that remain outside existing reference space. Across random-split, prospective and taxonomic hold-out benchmarks, ViraClass outperforms sequence-based and genome-content baselines. The largest gains emerge at deep evolutionary distances, in benchmarks that withhold entire families, orders or classes from the reference, where sequence-based methods lose most of their signal. In challenging boundary cases such as the Flaviviridae, ViraClass's structure-based placements capture the taxonomic boundary tensions highlighted by recent phylogenetic studies. When applied to a large collection of previously unclassified RdRp sequences, ViraClass places high-confidence queries into existing phyla and organizes the remainder into compact structural groups. ViraClass therefore provides a scalable approach from large-scale virus discovery to hierarchical taxonomic interpretation, particularly at the deep evolutionary ranges that current sequence-based pipelines cannot reach.2026-06-05T14:17:44ZSheng XuWenxuan HuangShutong YueWeiqiang BaiShiyang FengXiaohan HeBo ZhangQiantai FengEdward C. HolmesWeifeng ShiSiqi Sunhttp://arxiv.org/abs/2606.07258v1CaliPPer: quantifying, predicting and improving AI model performance for binding prediction2026-06-05T13:34:47ZBinding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.2026-06-05T13:34:47ZJian-Qing ZhengHantao LouZinan YinSam FarrarYuze ZhouElie AntounXiangxi WangXuetao CaoTao Donghttp://arxiv.org/abs/2605.11197v2The Same Problem by Different Names: Unifying Regression Dilution and Regression to the Mean2026-06-05T12:59:11ZRegression to the Mean (RTM) and Regression Dilution are traditionally treated as unrelated issues in the clinical and ecological literatures. In this work, we demonstrate that within a linear errors-in-variables framework where baseline variables are subject to transient temporal or measurement noise, these two phenomena share an identical underlying mathematical signature. We unify these disparate traditions by comparing specialized clinical tools, such as the Berry shrinkage correction, with standard sign-agnostic structural estimators like Major Axis (MA) and Reduced Major Axis (RMA) regression. Using an analytical framework, we evaluate the closed-form population limits and finite-sample performance of these methods across various noise-to-signal ratios and sample sizes. Our results show that the Berry method is a specialized tool designed for clinical scenarios where a 1:1 relationship is expected. However, applying it to ecological trade-offs with negative slopes can lead to severe errors. We provide maps of optimality to identify which estimator most accurately recovers the true biological signal under different conditions. By reconciling these disparate methods, we offer a principled guide for researchers to choose the correct tool based on their data's noise profile rather than their disciplinary tradition.2026-05-11T20:04:13ZMathematics 14 (2026) 2052José F. FontanariMauro Santos10.3390/math14122052http://arxiv.org/abs/2606.09898v1TRAPS: Therapeutic Response Analysis via Pathway-informed Stratification2026-06-05T04:59:09ZCancer treatment planning requires decisions across multiple clinical dimensions at once. Clinicians must determine whether a patient should receive targeted molecular therapy, radiation therapy, and whether they are likely to survive beyond six months. Existing pathway-informed deep learning models have been developed and tested in isolation, making fair comparison across architectures impossible. We present the first unified benchmark for pathway-guided therapy response modeling, evaluating three biologically informed architectures, BINN, GraphPath, and PATH, across five cancer cohorts drawn from The Cancer Genome Atlas, representing 2,622 patients encoded using Reactome pathway activity scores. Each model is trained jointly on all three clinical outcomes under identical data and evaluation conditions, the first study to treat pathway-structured deep learning as a combined therapy and survival prediction problem. Our results show that no single architecture wins across all tasks: PATH performs best for targeted molecular therapy prediction overall, BINN is most reliable for survival prediction, and no model produces useful predictions for radiation therapy, as the key drivers of that decision are clinical variables not captured in gene expression data. Most strikingly, GraphPath achieves an AUROC of 0.92 on prostate targeted molecular therapy prediction, the highest score in the entire benchmark, demonstrating that lateral co-regulation structure produces exceptional discriminative power when matched to a cohort with a narrow targetable driver programme, even under conditions of extreme class imbalance at only 11\% positive prevalence.2026-06-05T04:59:09ZSujoy BanikSayantan ChakrabortyBoishakhi Das TomaZainab GhafoorUshashi BhattacharjeeKoushik HowladerTirtho Royhttp://arxiv.org/abs/2606.06749v1Deterministic access to global viral sequence data enables robust agentic scientific discovery2026-06-04T22:19:42ZPublic viral genome resources such as the National Center for Biotechnology Information (NCBI) Virus database are central to outbreak response, evolutionary analysis, vaccine design, and genomic surveillance. Yet many high-value retrieval workflows remain optimized for interactive use rather than deterministic, reproducible programmatic interfaces. This creates a challenge for Large Language Model (LLM)-based scientific agents, where errors in metadata interpretation, filtering logic, or retrieval can propagate into incorrect datasets. To evaluate agentic viral data retrieval, we built VirBench, a manually curated benchmark of 120 queries spanning diverse pathogens, taxonomic levels, and metadata filters. When autonomous AI systems, including Biomni, Claude, GPT, and Edison Analysis, were tasked with these queries without a dedicated retrieval layer, performance varied widely: mean accuracy ranged from 16.9% for Claude Sonnet 4 to 91.3% for GPT-5.5, with newer frontier models showing progress but residual errors remaining consequential. To address this, we built gget virus, a deterministic query framework that formalizes NCBI Virus-style filtering as a reproducible programmatic system. By staging retrieval, applying metadata constraints before sequence download, and retrieving structured GenBank records, gget virus reduces data transfer by more than 98% for high-volume queries while preserving exact-match semantics. Instructing autonomous AI systems to use gget virus increased accuracy to at least 90.0% across all evaluated systems and up to 99.7% for GPT-5.5, improved response stability to 0.92-1.00, reduced error magnitude, and generally decreased runtime and tool calls. Together, this work establishes deterministic data access as critical infrastructure for reliable agentic science and provides a reproducible retrieval layer for robust human- and AI-driven viral genomics workflows.2026-06-04T22:19:42ZFerdous NasriSarah GurevPatrick VarillyKrithik RameshNuala A. O'LearyJonah CoolBernhard Y. RenardPardis C. SabetiLaura Luebberthttp://arxiv.org/abs/2606.06717v1ShallowBench: Benchmarking Generative Drug Design Models on Shallow-Pocket Targets2026-06-04T21:06:31ZWhile generative AI models have demonstrated remarkable success in structure-based drug design, they predominantly rely on deep binding pockets and struggle to sample effective ligands for challenging low-pocketability targets, such as the historically "undruggable" oncology targets KRAS and MYC. To address this gap, we introduce ShallowBench, a strictly curated benchmark of 5,780 shallow-pocket targets extracted from CrossDocked2020. By computing the difference between an Alpha Shape "lid" volume and the underlying protein atom voxel volume, we successfully isolated targets with low concavity while ensuring sufficient surface area for binding. Evaluating various state-of-the-art generative models reveals weaker predicted binding affinity on these low-concavity interfaces. ShallowBench therefore provides a rigorous benchmark for generative biology models and highlights the necessity of new architectural innovations or loss functions capable of navigating these challenging targets.2026-06-04T21:06:31ZSaket ReddyShiwei Liuhttp://arxiv.org/abs/2606.06562v1Iterative AI-guided optimisation of selective triple-drug combinations for breast cancer2026-06-04T15:06:43ZPersonalised cancer therapy aims to tailor treatment to individual tumour profiles, yet tumour heterogeneity and adaptive resistance continue to limit clinical efficacy. Drug combinations offer a strategy to overcome resistance by simultaneously targeting multiple pathways, but their rational design is constrained by the vast combinatorial search space and experimental cost. Here, we present an AI-guided, QSAR-driven iterative optimisation framework that integrates machine learning with automated experimental screening to enable closed-loop discovery of selective multi-drug therapies. Starting from an initial random screen, the system iteratively predicts, tests, and refines three-drug combinations targeting MCF7 breast cancer cells. Incorporation of non-tumorigenic MCF10A cells enables explicit optimisation of tumour-selective efficacy, prioritising regimens that maximise cancer cell killing while sparing healthy cells. Across successive iterations, the framework rapidly enriched for highly selective, high-efficacy combinations, while maintaining chemical and mechanistic diversity and avoiding convergence on a narrow solution space. By continuously learning from experimental feedback, the approach efficiently navigates millions of combinations to identify a small set of validated, tumour-selective regimens. These results establish a scalable proof-of-concept for AI-driven, closed-loop optimisation of higher-order drug combinations, demonstrating how iterative integration of computation and experimentation can enable adaptive and potentially personalised therapeutic design in precision oncology.2026-06-04T15:06:43Z4 figures, 3 tablesOghenejokpeme OrhoborAbbi Abdel-RehimEmma TateHolly X. SmithElizabeth BourneRoss J. CollinsLarisa N. SoldatovaRoss D. Kinghttp://arxiv.org/abs/2606.06117v1$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences2026-06-04T13:05:36ZWe introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distance on $k$-mer prefixes, which captures hierarchical positional structure, and a compositional $L_1$ distance on $k$-mer frequencies, which captures local sequence content. The two distances jointly parameterise a bi-filtered Vietoris--Rips complex, and per-sequence topological summaries from this bi-filtration serve as features for standard machine learning classifiers. We establish theoretical guarantees for the construction: stability under metric perturbations and invariance to the choice of prime, alongside a result that explains why a single $p$-adic axis is topologically uninformative and why the bi-filtration recovers nontrivial homology. On twelve genomic benchmarks ($28$ to $500$ sequences, $3$ to $7$ classes), pVR outperforms four established alignment-free baselines on three of six low-sample datasets, with gains of up to $21$ percentage points; it underperforms only on a SARS-CoV-2 variant benchmark whose point-mutation divergence violates the hierarchical assumption, and all methods saturate in the large-sample regime. pVR also outperforms zero-shot frozen embeddings from the 500M-parameter Nucleotide Transformer v2 by $6.7$ to $11.4$ percentage points on three low-sample benchmarks. The pVR codebase is publicly available at https://github.com/MAHI-Group/pVR.2026-06-04T13:05:36Z12 pages, 5 figures, 8 tablesTirtharaj DashGunja Sachdevahttp://arxiv.org/abs/2606.05980v1On the Promises and Limits of Multi-omics Integration for Deconvolution: The HADACA3 Benchmark2026-06-04T10:23:13ZUnderstanding the cellular composition of complex tissues, such as tumors, is a key challenge in biology and medicine. A common approach, known as deconvolution, aims to estimate the cellular composition from bulk molecular measurements. With the growing availability of multiple types of molecular data, it is often assumed that combining data sources should improve deconvolution performance. Here, we present HADACA3, a community-driven benchmark designed to evaluate this assumption. We conducted a four-day collaborative competition followed by a large-scale computational benchmark, testing more than 250,000 analysis pipelines across nine datasets with matched DNA methylation (DNAm) and RNA profiles, representing a wide range of biological and experimental conditions. Our framework jointly evaluates the impact of preprocessing, feature selection, modeling, and integration strategies. We find that DNAm alone achieves the highest median performance across datasets, making it the most stable and reliable single-modality approach. However, multi-omics integration strategies can regularly achieve higher top performance in specific datasets and pipeline configurations. Among the tested strategies, late integration based on error-weighted averaging provides a strong and reliable baseline, while non-linear early integration methods, such as optimal transport, show promising results on real biological datasets. Overall, our results show that multi-omics integration does not systematically improve average performance over DNAm alone, but can improve best-case performance in specific settings. This highlights a trade-off between robustness and peak performance, and emphasizes the importance of aligning integration strategies with the statistical properties of the data. All data, code, and evaluation tools are publicly available to support reproducible research and future method development.2026-06-04T10:23:13ZHugo BarbotIMT, UT3Elise AmblardIMT, UT3Nicolas HombergIMT, UT3Lucie LamotheIMT, UT3Morgane TérézolIMT, UT3Hadaca ConsortiumIMT, UT3Mira AyadiIMT, UT3Aurélia BaurèsIMT, UT3Yasmina KermezliIMT, UT3Carl HerrmannIMT, UT3Sebastien DejeanIMT, UT3Lionel SpinelliTAGC, CIMLDavid CauseurAPTIKAL, LIGFlorent ChuffartAPTIKAL, LIGAnaïs BaudotAPTIKAL, LIGYuna BlumAPTIKAL, LIGMagali RichardAPTIKAL, LIGhttp://arxiv.org/abs/2606.05918v1Federated SPARQL querying for genomic variant functional annotation2026-06-04T09:24:07ZSensitive health data should preferentially be analysed on site. In typical bioinformatics workows, public databases are duplicated and used by specialised tools to enrich the local datasets. In the case of genomic variation data, this process is called variant annotation. In this session we demonstrate variant annotation using federated SPARQL queries. We rst overview how clinico-genomic data can be modelled as a knowledge graph (KG), leveraging state-of-the-art biomedical ontologies. We then perform variant annotation by querying UniprotKB, a massive curated KG for gene and proteins. Our approach avoids public data duplication while maintaining genomic data on site and aligning it with FAIR principles. Our use-case is based on the ICAN project, a research program aimed at studying the physiopathology of cerebral berry aneurysms.2026-06-04T09:24:07ZEuropean Semantic Web Conference 2026, European Semantic Web Conference 2026 Organising Committee, May 2026, Dubrovnik, CroatiaAlexandrina Bodrug-SchepersIFB-coreRomain BourcierIFB-coreRichard RedonIFB-coreAlban GaignardIFB-core