https://arxiv.org/api/Irv6F9P0isiPLPaP7xEiV3efl7s2026-03-25T11:53:57Z665039015http://arxiv.org/abs/2504.11249v3Cryo-em images are intrinsically low dimensional2025-09-03T20:25:50ZSimulation-based inference provides a powerful framework for cryo-electron microscopy, employing neural networks in methods like CryoSBI to infer biomolecular conformations via learned latent representations. This latent space represents a rich opportunity, encoding valuable information about the physical system and the inference process. Harnessing this potential hinges on understanding the underlying geometric structure of these representations. We investigate this structure by applying manifold learning techniques to CryoSBI representations of hemagglutinin (simulated and experimental). We reveal that these high-dimensional data inherently populate low-dimensional, smooth manifolds, with simulated data effectively covering the experimental counterpart. By characterizing the manifold's geometry using Diffusion Maps and identifying its principal axes of variation via coordinate interpretation methods, we establish a direct link between the latent structure and key physical parameters. Discovering this intrinsic low-dimensionality and interpretable geometric organization not only validates the CryoSBI approach but enables us to learn more from the data structure and provides opportunities for improving future inference strategies by exploiting this revealed manifold geometry.2025-04-15T14:46:25ZLuke EvansOctavian-Vlad MuradLars DingeldeinPilar CossioRoberto CovinoMarina Meila10.1103/txrb-fw3zhttp://arxiv.org/abs/2509.01550v1OTMol: Robust Molecular Structure Comparison via Optimal Transport2025-09-01T15:27:31ZRoot-mean-square deviation (RMSD) is widely used to assess structural similarity in systems ranging from flexible ligand conformers to complex molecular cluster configurations. Despite its wide utility, RMSD calculation is often challenged by inconsistent atom ordering, indistinguishable configurations in molecular clusters, and potential chirality inversion during alignment. These issues highlight the necessity of accurate atom-to-atom correspondence as a prerequisite for meaningful alignment. Traditional approaches often rely on heuristic cost matrices combined with the Hungarian algorithm, yet these methods underutilize the rich intra-molecular structural information and may fail to generalize across chemically diverse systems. In this work, we introduce OTMol, a method that formulates the molecular alignment task as a fused supervised Gromov-Wasserstein (fsGW) optimal transport problem. By leveraging the intrinsic geometric and topological relationships within each molecule, OTMol eliminates the need for manually defined cost functions and enables a principled, data-driven matching strategy. Importantly, OTMol preserves key chemical features such as molecular chirality and bond connectivity consistency. We evaluate OTMol across a wide range of molecular systems, including Adenosine triphosphate, Imatinib, lipids, small peptides, and water clusters, and demonstrate that it consistently achieves low RMSD values while preserving computational efficiency. Importantly, OTMol maintains molecular integrity by enforcing one-to-one mappings between entire molecules, thereby avoiding erroneous many-to-one alignments that often arise in comparing molecular clusters. Our results underscore the utility of optimal transport theory for molecular alignment and offer a generalizable framework applicable to structural comparison tasks in cheminformatics, molecular modeling, and related disciplines.2025-09-01T15:27:31ZXiaoqi WeiXuhang DaiYaqi WuYanxiang ZhaoYingkai ZhangZixuan Canghttp://arxiv.org/abs/2509.01038v1Learning residue level protein dynamics with multiscale Gaussians2025-09-01T00:38:44ZMany methods have been developed to predict static protein structures, however understanding the dynamics of protein structure is essential for elucidating biological function. While molecular dynamics (MD) simulations remain the in silico gold standard, its high computational cost limits scalability. We present DynaProt, a lightweight, SE(3)-invariant framework that predicts rich descriptors of protein dynamics directly from static structures. By casting the problem through the lens of multivariate Gaussians, DynaProt estimates dynamics at two complementary scales: (1) per-residue marginal anisotropy as $3 \times 3$ covariance matrices capturing local flexibility, and (2) joint scalar covariances encoding pairwise dynamic coupling across residues. From these dynamics outputs, DynaProt achieves high accuracy in predicting residue-level flexibility (RMSF) and, remarkably, enables reasonable reconstruction of the full covariance matrix for fast ensemble generation. Notably, it does so using orders of magnitude fewer parameters than prior methods. Our results highlight the potential of direct protein dynamics prediction as a scalable alternative to existing methods.2025-09-01T00:38:44ZMihir BafnaBowen JingBonnie Bergerhttp://arxiv.org/abs/2508.21350v1The Quantum Compass Mechanism in Cryptochromes2025-08-29T06:24:27ZCryptochrome flavoproteins are prime candidates for mediating magnetic sensing in migratory animals via the radical pair mechanism (RPM), a spin-dependent process initiated by photoinduced electron transfer. The canonical FAD-tryptophan radical pair exhibits pronounced anisotropic hyperfine couplings, enabling sensitivity to geomagnetic fields. However, maintaining spin coherence under physiological conditions and explaining responses to weak radiofrequency fields remain unresolved challenges. Alternative radicals, such as superoxide and ascorbate, have been proposed to enhance anisotropy or suppress decoherence. This review summarizes the quantum basis of magnetoreception, evaluates both canonical and alternative radical pair models, and discusses amplification strategies including triads, spin scavenging, and bystander radicals. Emphasis is placed on how molecular geometry, exchange and dipolar interactions, and hyperfine topology modulate magnetic sensitivity. Key open questions and future directions are outlined, highlighting the need for structural and dynamical data under physiological conditions.2025-08-29T06:24:27ZZou ChengyeLiu Ya-junWang Beibeihttp://arxiv.org/abs/2504.16621v3Stable electron-irradiated [1-$^{13}$C]alanine radicals for clinically viable metabolic imaging with Dynamic Nuclear Polarization2025-08-28T21:12:08ZDissolution Dynamic Nuclear Polarisation (dDNP) increases the sensitivity of magnetic resonance experiments by $>10^4$-fold, permitting isotopically-labelled molecules to be transiently visible in MRI scans. dDNP requires a source of unpaired electrons in contact with labelled nuclei, cooled to $\sim$1K, and spin-pumped into a given state by microwaves. These electrons are usually chemical radicals, requiring removal by filtration prior to injection into humans. Alternative sources, such as UV irradiation, generate lower polarisation and require cryogenic transport. We present ultra-high-dose-rate electron irradiation as a novel alternative for generating non-persistent radicals in alanine/glycerol mixtures. These are stable for months at room temperature, quench spontaneously upon dissolution, are present in dose-dependent concentrations, and generate comparable nuclear polarisation to trityl radicals used clinically (20\%) through a novel mechanism. This process is inherently sterilising, permitting imaging of alanine metabolism \textit{in vivo}. As well as scientific novelty, this overcomes the biggest barrier to clinically translating dDNP.2025-04-23T11:23:23ZThis is the original version of a manuscript submitted to Science Advances. Per editorial policy this is not the version post the first revision, but rather the manuscript as originally submittedCatriona H. E. RooneyDepartment of Physiology, Anatomy and Genetics, University of OxfordJustin Y. C. LauGE HealthCare, Waukesha, WI, USAEsben S. S. HansenThe MR Research Centre, Aarhus University, Aarhus, DenmarkNichlas Vous ChristensenThe MR Research Centre, Aarhus University, Aarhus, DenmarkDuy A. DangThe MR Research Centre, Aarhus University, Aarhus, DenmarkKristoffer PeterssonDepartment of Oncology, University of OxfordIain D. C. TullisDepartment of Oncology, University of OxfordBorivoj VojnovicDepartment of Oncology, University of OxfordSean SmartDepartment of Oncology, University of OxfordJarrod LewisDepartment of Material Science, University of OxfordWilliam MyersCAESR, Department of Chemistry, University of OxfordZoe RichardsonDepartment of Physics, University of OxfordBrett W. C. KennedyDepartment of Physiology, Anatomy and Genetics, University of OxfordAlice M. BowenDepartment of Material Science, University of OxfordLotte Bonde BertelsenThe MR Research Centre, Aarhus University, Aarhus, DenmarkChristoffer LaustsenThe MR Research Centre, Aarhus University, Aarhus, DenmarkDamian J. TylerDepartment of Physiology, Anatomy and Genetics, University of OxfordOCMR, Cardiovascular Medicine, University of OxfordJack J. MillerThe MR Research Centre, Aarhus University, Aarhus, Denmark10.1126/sciadv.adz4334http://arxiv.org/abs/2506.06294v2GLProtein: Global-and-Local Structure Aware Protein Representation Learning2025-08-28T09:38:11ZProteins are central to biological systems, participating as building blocks across all forms of life. Despite advancements in understanding protein functions through protein sequence analysis, there remains potential for further exploration in integrating protein structural information. We argue that the structural information of proteins is not only limited to their 3D information but also encompasses information from amino acid molecules (local information) to protein-protein structure similarity (global information). To address this, we propose \textbf{GLProtein}, the first framework in protein pre-training that incorporates both global structural similarity and local amino acid details to enhance prediction accuracy and functional insights. GLProtein innovatively combines protein-masked modelling with triplet structure similarity scoring, protein 3D distance encoding and substructure-based amino acid molecule encoding. Experimental results demonstrate that GLProtein outperforms previous methods in several bioinformatics tasks, including predicting protein-protein interaction, contact prediction, and so on.2025-05-17T14:45:13ZAccepted to EMNLP 2025 FindingsYunqing LiuWenqi FanXiaoyong WeiQing Lihttp://arxiv.org/abs/2508.19829v1Single-molecule biophysics2025-08-27T12:27:08ZBiological molecules, like all active matter, use free energy to generate force and motion which drive them out of thermal equilibrium, and undergo inherent dynamic interconversion between metastable free energy states separated by levels barely higher than stochastic thermal energy fluctuations. Here, we explore the founding and emerging approaches of the field of single-molecule biophysics which, unlike traditional ensemble average approaches, enable the detection and manipulation of individual molecules and facilitate exploration of biomolecular heterogeneity and its impact on transitional molecular kinetics and underpinning molecular interactions. We discuss the ground-breaking technological innovations which scratch far beyond the surface into open questions of real physiology, that correlate orthogonal data types and interplay empirical measurement with theoretical and computational insights, many of which are enabling artificial matter to be designed inspired by biological systems. And finally, we examine how these insights are helping to develop new physics framed around biology.2025-08-27T12:27:08ZarXiv admin note: text overlap with arXiv:1704.06837Mark C Leakehttp://arxiv.org/abs/2508.19800v1A Multi-Layered Framework for Modeling Human Biology: From Basic AI Agents to a Full-Body AI Agent2025-08-27T11:35:29ZWe envision the Full-Body AI Agent as a comprehensive AI system designed to simulate, analyze, and optimize the dynamic processes of the human body across multiple biological levels. By integrating computational models, machine learning tools, and experimental platforms, this system aims to replicate and predict both physiological and pathological processes, ranging from molecules and cells to tissues, organs, and entire body systems. Central to the Full-Body AI Agent is its emphasis on integration and coordination across these biological levels, enabling analysis of how molecular changes influence cellular behaviors, tissue responses, organ function, and systemic outcomes. With a focus on biological functionality, the system is designed to advance the understanding of disease mechanisms, support the development of therapeutic interventions, and enhance personalized medicine. We propose two specialized implementations to demonstrate the utility of this framework: (1) the metastasis AI Agent, a multi-scale metastasis scoring system that characterizes tumor progression across the initiation, dissemination, and colonization phases by integrating molecular, cellular, and systemic signals; and (2) the drug AI Agent, a system-level drug development paradigm in which a drug AI-Agent dynamically guides preclinical evaluations, including organoids and chip-based models, by providing full-body physiological constraints. This approach enables the predictive modeling of long-term efficacy and toxicity beyond what localized models alone can achieve. These two agents illustrate the potential of Full-Body AI Agent to address complex biomedical challenges through multi-level integration and cross-scale reasoning.2025-08-27T11:35:29ZAoqi WangJiajia LiuJianguo WenYangyang LuoZhiwei FanLiren YangXi HuRuihan LuoYankai YuSophia LiWeiling ZhaoXiaobo Zhouhttp://arxiv.org/abs/2508.19632v1TopoBind: Multi-Modal Prediction of Antibody-Antigen Binding Free Energy via Sequence Embeddings and Structural Topology2025-08-27T07:12:32ZPredicting the binding free energy between antibodies and antigens is a key challenge in structure-aware biomolecular modeling, with direct implications for antibody design. Most existing methods either rely solely on sequence embeddings or struggle to capture complex structural relationships, thus limiting predictive performance. In this work, we present a novel framework that integrates sequence-based representations from pre-trained protein language models (ESM-2) with a set of topological features. Specifically, we extract contact map metrics reflecting residue-level connectivity, interface geometry descriptors characterizing cross-chain interactions, distance map statistics quantifying spatial organization, and persistent homology invariants that systematically capture the emergence and persistence of multi-scale topological structures - such as connected components, cycles, and cavities - within individual proteins and across the antibody-antigen interface. By leveraging a cross-attention mechanism to fuse these diverse modalities, our model effectively encodes both global and local structural organization, thereby substantially enhancing the prediction of binding free energy. Extensive experiments demonstrate that our model consistently outperforms sequence-only and conventional structural models, achieving state-of-the-art accuracy in binding free energy prediction.2025-08-27T07:12:32Z17 pages, 9 figuresCiyuan YuHongzong LiJiahao MaShiqin TangYe-Fan HuJian-Dong Huanghttp://arxiv.org/abs/2508.01799v2Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery2025-08-27T04:11:51ZAccurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.2025-08-03T15:25:42Z10 pages, 4 figuresJing LanHexiao DingHongzhao ChenYufeng JiangNga-Chun NgGerald W. Y. ChengZongxi LiJing CaiLiang-ting LinJung Sun Yoohttp://arxiv.org/abs/2404.02360v2FraGNNet: A Deep Probabilistic Model for Tandem Mass Spectrum Prediction2025-08-27T03:28:45ZCompound identification from tandem mass spectrometry (MS/MS) data is a critical step in the analysis of complex mixtures. Typical solutions for the MS/MS spectrum to compound (MS2C) problem involve comparing the unknown spectrum against a library of known spectrum-molecule pairs, an approach that is limited by incomplete library coverage. Compound to MS/MS spectrum (C2MS) models can improve retrieval rates by augmenting real libraries with predicted MS/MS spectra. Unfortunately, many existing C2MS models suffer from problems with mass accuracy, generalization, or interpretability. We develop a new probabilistic method for C2MS prediction, FraGNNet, that can efficiently and accurately simulate MS/MS spectra with high mass accuracy. Our approach formulates the C2MS problem as learning a distribution over molecule fragments. FraGNNet achieves state-of-the-art performance in terms of prediction error and surpasses existing C2MS models as a tool for retrieval-based MS2C.2024-04-02T23:16:15ZTransactions on Machine Learning Research (TMLR), 08/2025Adamo YoungFei WangDavid S WishartBo WangRussell GreinerHannes Rösthttp://arxiv.org/abs/2508.19049v1Evaluation of in vitro antibacterial activity and phytochemical profile of aqueous leaf extract of Asystasia variabilis2025-08-26T14:03:50ZThis study evaluated the in vitro antibacterial effect and the phytochemical profile of aqueous extract of fresh mature leaves of Asystasia variabilis, a Sri Lankan indigenous plant, against four common wound infective bacteria (Staphylococcus aureus, Bacillus subtilis, Pseudomonas aeruginosa and Escherichia coli) using Kirby-Bauer disk diffusion test. Gentamicin 10 μg/ disk and distilled water was used as positive and negative controls respectively The study revealed, for the first time, that the extract possessed significant antibacterial activity against all four test organisms in a concentration dependent manner (r values ranging from 0.921-0.992, P<0.01) with inhibition zone diameters ranging between 8 and 28 mm. Highest antibacterial activity was exhibited against B. subtilis at 1000 μg/ disk (27.43-+0.02 mm). The extract showed inhibitory effects comparable to gentamicin towards B. subtilis and P. aeruginosa at 500 μg/ disk and towards E. coli and S. aureus at 1000 μg/ disk. Qualitative phytochemical screening revealed the presence of flavonoids, tannins, phenols, cardiac glycosides, amino acids, carbohydrates, alkaloids and saponins. Therefore it is likely that the antibacterial effect of the extract is mediated by synergistic mechanisms. Furthermore, results of this study scientifically justified the claim traditional and folk medicine in the treatment of abscesses, wounds and ulcers and indicated the potential for the development of a novel drug from mature leaves of Asystasia variabilis.2025-08-26T14:03:50ZR WijerathnaNAV AsanthiWD RatnasooriyaRN PathiranaNRM Nelumdeniyahttp://arxiv.org/abs/2508.19025v1In-vitro Anti-bacterial Activity of Methanol and Aqueous Crude Extracts of Horsfieldia iryaghedhi2025-08-26T13:43:40ZAims: Over the past two decades, the rise of multidrug resistance (MDR) in bacteria has posed a significant threat to global health. The urgent need for new treatment alternatives has brought attention to the potential of plants, which harbor a wealth of unexplored phytochemicals with therapeutic properties. This study aims to evaluate the anti-bacterial efficacy of methanol and aqueous extracts from the leaves and bark of Horsfieldia iryaghedhi In vitro. Methodology: Aqueous and methanol extracts were obtained from the cold maceration method. In vitro anti-bacterial activity of methanol and aqueous leaf, bark, and combination extracts were determined against gram-negative bacteria Escherichia coli (ATCC 25922) and gram-positive bacteria Staphylococcus aureus (ATCC25923). The anti-bacterial assay for different concentrations of each extract was conducted through the well-diffusion method, with Gentamycin serving as the positive control. Results: Methanol leaf and combination extracts of Horsfieldia iryaghedhi have shown a positive anti-bacterial response at their highest concentrations of 1000mcg/mL and 500mcg/mL against grampositive bacteria Staphylococcus aureus while none of the extracts showed anti-bacterial activity against gram-negative E. coli at the experimented concentrations. Conclusion: The study concludes that methanol extracts of H.iryaghedhi should be further analyzed for their anti-bacterial activity, and there could be potential lead molecules that can be developed as antibiotics2025-08-26T13:43:40ZRMHKK RajapakshaEMN FernandoAWMKK BandaraNRM NelumdeniyaARN Silva10.9734/aprj/2024/v12i4259http://arxiv.org/abs/2408.09896v2Instruction-Based Molecular Graph Generation with Unified Text-Graph Diffusion Model2025-08-26T07:02:24ZRecent advancements in computational chemistry have increasingly focused on synthesizing molecules based on textual instructions. Integrating graph generation with these instructions is complex, leading most current methods to use molecular sequences with pre-trained large language models. In response to this challenge, we propose a novel framework, named $\textbf{UTGDiff (Unified Text-Graph Diffusion Model)}$, which utilizes language models for discrete graph diffusion to generate molecular graphs from instructions. UTGDiff features a unified text-graph transformer as the denoising network, derived from pre-trained language models and minimally modified to process graph data through attention bias. Our experimental results demonstrate that UTGDiff consistently outperforms sequence-based baselines in tasks involving instruction-based molecule generation and editing, achieving superior performance with fewer parameters given an equivalent level of pretraining corpus. Our code is availble at https://github.com/ran1812/UTGDiff.2024-08-19T11:09:15ZECAI 2025Yuran XiangHaiteng ZhaoChang MaZhi-Hong Denghttp://arxiv.org/abs/2508.18493v1Disordered But Rhythmic: the role of intrinsic protein disorder in eukaryotic circadian timing2025-08-25T21:07:14ZIntrinsically disordered protein regions (IDRs) are found across all domains of life and are characterized by a lack of stable 3D structure. Nevertheless, IDRs play critical roles in the most tightly regulated cellular processes, including in the core circadian clock. The molecular oscillator at the heart of circadian regulation leverages IDRs as dynamic interaction modules for activation and repression to support robust timekeeping and expand clock output and regulation. Here, we cover the biophysical mechanisms conferred by IDRs and their modulators. We survey the intrinsically disordered regions in clock proteins that are widely prevalent from fungi to mammals and discuss the importance of IDRs to the core clock and beyond.2025-08-25T21:07:14ZEmery T. UsherJacqueline F. Pelham