https://arxiv.org/api/iMUsW4vabHbfMBudsub9c8k22cs 2026-03-22T14:46:53Z 6642 195 15 http://arxiv.org/abs/2511.18893v1 A universal phase-plane model for in vivo protein aggregation 2025-11-24T08:50:44Z

Neurodegenerative diseases are driven by the accumulation of protein aggregates in the brain of affected individuals. The aggregation behaviour in vitro is well understood and driven by the equilibration of a super-saturated protein solution to its aggregated equilibrium state. However, the situation is altered fundamentally in living systems where active processes consume energy to remove aggregates. It remains unclear how and why cells transition from a state with predominantly monomeric protein, which is stable over decades, to one dominated by aggregates. Here, we develop a simple but universal theoretical framework to describe cellular systems that include both aggregate formation and removal. Using a two-dimensional phase-plane representation, we show that the interplay of aggregate formation and removal generates cell-level bistability, with a bifurcation structure that explains both the emergence of disease and the effects of therapeutic interventions. We explore a wide range of aggregate formation and removal mechanisms and show that phenomena such as seeding arise robustly when a minimal set of requirements on the mechanism are satisfied. By connecting in vitro aggregation mechanisms to changes in cell state, our framework provides a general conceptual link between molecular-level therapeutic interventions and their impact on disease progression.

2025-11-24T08:50:44Z Matthew W. Cotton Alain Goriely David Klenerman Georg Meisl http://arxiv.org/abs/2508.00578v2 Learning Potential Energy Surfaces of Hydrogen Atom Transfer Reactions in Peptides 2025-11-24T08:44:20Z

Hydrogen atom transfer (HAT) reactions are essential in many biological processes, such as radical migration in damaged proteins, but their mechanistic pathways remain incompletely understood. Simulating HAT is challenging due to the need for quantum chemical accuracy at biologically relevant scales; thus, neither classical force fields nor DFT-based molecular dynamics are applicable. Machine-learned potentials offer an alternative, able to learn potential energy surfaces (PESs) with near-quantum accuracy. However, training these models to generalize across diverse HAT configurations, especially at radical positions in proteins, requires tailored data generation and careful model selection. Here, we systematically generate HAT configurations in peptides to build large datasets using semiempirical methods and DFT. We benchmark three graph neural network architectures (SchNet, Allegro, and MACE) on their ability to learn HAT PESs and indirectly predict reaction barriers from energy predictions. MACE consistently outperforms the others in energy, force, and barrier prediction, achieving a mean absolute error of 1.13 kcal/mol on out-of-distribution DFT barrier predictions. Using molecular dynamics, we show our MACE potential is stable, reactive, and generalizes beyond training data to model HAT barriers in collagen I. This accuracy enables integration of ML potentials into large-scale collagen simulations to compute reaction rates from predicted barriers, advancing mechanistic understanding of HAT and radical migration in peptides. We analyze scaling laws, model transferability, and cost-performance trade-offs, and outline strategies for improvement by combining ML potentials with transition state search algorithms and active learning. Our approach is generalizable to other biomolecular systems, enabling quantum-accurate simulations of chemical reactivity in complex environments.

2025-08-01T12:21:49Z 20 pages, 12 figures, and 4 tables (references and SI included) Marlen Neubert Patrick Reiser Frauke Gräter Pascal Friederich http://arxiv.org/abs/2405.16123v2 Gradient Propagation in Retrosynthetic Space: An Efficient Framework for Synthesis Plan Generation 2025-11-24T08:23:34Z

Retrosynthesis, which aims to identify viable synthetic pathways for target molecules by decomposing them into simpler precursors, is often treated as a search problem. However, its complexity arises from multi-branched tree-structured pathways rather than linear paths. Some algorithms have been successfully applied in this task, but they either overlook the uncertainties inherent in chemical space or face limitations in practical application scenarios. To address these challenges, this paper introduces a novel gradient-propagation-based algorithmic framework for retrosynthetic route exploration. The proposed framework obtains the contributions of different nodes to the target molecule's success probability through gradient propagation and then guides the algorithm to greedily select the node with the highest contribution for expansion, thereby conducting efficient search in the chemical space. Experimental validations demonstrate that our algorithm achieves broad applicability across diverse molecular targets and exhibits superior computational efficiency compared to existing methods.

2024-05-25T08:23:40Z Chengyang Tian Yuhang Chang Yangpeng Zhang Yang Liu http://arxiv.org/abs/2510.20792v3 BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation 2025-11-23T04:07:18Z

The rapid progress of graph generation has raised new security concerns, particularly regarding backdoor vulnerabilities. While prior work has explored backdoor attacks in image diffusion and unconditional graph generation, conditional, especially text-guided graph generation remains largely unexamined. This paper proposes BadGraph, a backdoor attack method against latent diffusion models for text-guided graph generation. BadGraph leverages textual triggers to poison training data, covertly implanting backdoors that induce attacker-specified subgraphs during inference when triggers appear, while preserving normal performance on clean inputs. Extensive experiments on four benchmark datasets (PubChem, ChEBI-20, PCDes, MoMu) demonstrate the effectiveness and stealth of the attack: less than 10% poisoning rate can achieves 50% attack success rate, while 24% suffices for over 80% success rate, with negligible performance degradation on benign samples. Ablation studies further reveal that the backdoor is implanted during VAE and diffusion training rather than pretraining. These findings reveal the security vulnerabilities in latent diffusion models of text-guided graph generation, highlight the serious risks in models' applications such as drug discovery and underscore the need for robust defenses against the backdoor attack in such diffusion models.

2025-10-23T17:54:17Z Liang Ye Shengqin Chen Jiazhu Dai http://arxiv.org/abs/2511.18010v1 EscalNet: Learn isotropic representation space for biomolecular dynamics based on effective energy 2025-11-22T10:19:07Z

Deep learning has emerged as a powerful framework for analyzing biomolecular dynamics trajectories, enabling efficient representations that capture essential system dynamics and facilitate mechanistic studies. We propose a neural network architecture incorporating Fourier Transform analysis to process trajectory data, achieving dual objectives: eliminating high-frequency noise while preserving biologically critical slow conformational dynamics, and establishing an isotropic representation space through the last hidden layer for enhanced dynamical quantification. Comparative protein simulations demonstrate our approach generates more uniform feature distributions than linear regression methods, evidenced by smoother state similarity matrices and clearer classification boundaries. Moreover, by using saliency score, we identified key structural determinants linked to effective energy landscapes governing system dynamics. We believe that the fusion of neural network features with physical order parameters creates a robust analytical framework for advancing biomolecular trajectory analysis.

2025-11-22T10:19:07Z 21 pages, 4 figures Guanghong Zuo http://arxiv.org/abs/2511.12135v2 RTMol: Rethinking Molecule-text Alignment in a Round-trip View 2025-11-21T09:48:05Z

Aligning molecular sequence representations (e.g., SMILES notations) with textual descriptions is critical for applications spanning drug discovery, materials design, and automated chemical literature analysis. Existing methodologies typically treat molecular captioning (molecule-to-text) and text-based molecular design (text-to-molecule) as separate tasks, relying on supervised fine-tuning or contrastive learning pipelines. These approaches face three key limitations: (i) conventional metrics like BLEU prioritize linguistic fluency over chemical accuracy, (ii) training datasets frequently contain chemically ambiguous narratives with incomplete specifications, and (iii) independent optimization of generation directions leads to bidirectional inconsistency. To address these issues, we propose RTMol, a bidirectional alignment framework that unifies molecular captioning and text-to-SMILES generation through self-supervised round-trip learning. The framework introduces novel round-trip evaluation metrics and enables unsupervised training for molecular captioning without requiring paired molecule-text corpora. Experiments demonstrate that RTMol enhances bidirectional alignment performance by up to 47% across various LLMs, establishing an effective paradigm for joint molecule-text understanding and generation.

2025-11-15T09:55:55Z Letian Chen Runhan Shi Gufeng Yu Yang Yang http://arxiv.org/abs/2511.16868v1 The Joint Gromov Wasserstein Objective for Multiple Object Matching 2025-11-21T00:31:21Z

The Gromov-Wasserstein (GW) distance serves as a powerful tool for matching objects in metric spaces. However, its traditional formulation is constrained to pairwise matching between single objects, limiting its utility in scenarios and applications requiring multiple-to-one or multiple-to-multiple object matching. In this paper, we introduce the Joint Gromov-Wasserstein (JGW) objective and extend the original framework of GW to enable simultaneous matching between collections of objects. Our formulation provides a non-negative dissimilarity measure that identifies partially isomorphic distributions of mm-spaces, with point sampling convergence. We also show that the objective can be formulated and solved for point cloud object representations by adapting traditional algorithms in Optimal Transport, including entropic regularization. Our benchmarking with other variants of GW for partial matching indicates superior performance in accuracy and computational efficiency of our method, while experiments on both synthetic and real-world datasets show its effectiveness for multiple shape matching, including geometric shapes and biomolecular complexes, suggesting promising applications for solving complex matching problems across diverse domains, including computer graphics and structural biology.

2025-11-21T00:31:21Z Aryan Tajmir Riahi Khanh Dao Duc http://arxiv.org/abs/2511.16456v1 Entropy Transfer Throughout the Structure of PDZ-2 and TIM-Barrel Proteins. A Dynamic Gaussian Network Model Study 2025-11-20T15:23:41Z

This research reports the entropy transfer throughout the tridimensional structure of PDZ-2 and TIM barrel structures using the dynamic Gaussian Network Model. The model predicts the allocation of the allosteric pathways of the PDZ-2. Moreover. A visualization analysis reveals that entropy and information is transported towards the effector site in PDZ-2 and near to the catalytic site of the TIM-Barrel protein. The results suggest the presence of a functional hierarchy that determine information and entropy flow directionality.

2025-11-20T15:23:41Z 8 Figures German Mino Galaz Javier Patino Baez Nicolas Mino Berdu Jose Gonzalez Suarez http://arxiv.org/abs/2512.02033v1 CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design 2025-11-20T03:38:46Z

Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148\%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8\%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.

2025-11-20T03:38:46Z Zijun Gao Mutian He Shijia Sun Hanqun Cao Jingjie Zhang Zihao Luo Xiaorui Wang Xiaojun Yao Chang-Yu Hsieh Chunbin Gu Pheng Ann Heng http://arxiv.org/abs/2511.15906v1 Unified all-atom molecule generation with neural fields 2025-11-19T22:18:13Z

Generative models for structure-based drug design are often limited to a specific modality, restricting their broader applicability. To address this challenge, we introduce FuncBind, a framework based on computer vision to generate target-conditioned, all-atom molecules across atomic systems. FuncBind uses neural fields to represent molecules as continuous atomic densities and employs score-based generative models with modern architectures adapted from the computer vision literature. This modality-agnostic representation allows a single unified model to be trained on diverse atomic systems, from small to large molecules, and handle variable atom/residue counts, including non-canonical amino acids. FuncBind achieves competitive in silico performance in generating small molecules, macrocyclic peptides, and antibody complementarity-determining region loops, conditioned on target structures. FuncBind also generated in vitro novel antibody binders via de novo redesign of the complementarity-determining region H3 loop of two chosen co-crystal structures. As a final contribution, we introduce a new dataset and benchmark for structure-conditioned macrocyclic peptide generation. The code is available at https://github.com/prescient-design/funcbind.

2025-11-19T22:18:13Z NeurIPS 2025 Matthieu Kirchmeyer Pedro O. Pinheiro Emma Willett Karolis Martinkus Joseph Kleinhenz Emily K. Makowski Andrew M. Watkins Vladimir Gligorijevic Richard Bonneau Saeed Saremi http://arxiv.org/abs/2408.13479v5 Quantum-machine-assisted Drug Discovery 2025-11-19T17:18:51Z

Drug discovery is lengthy and expensive, with traditional computer-aided design facing limits. This paper examines integrating quantum computing across the drug development cycle to accelerate and enhance workflows and rigorous decision-making. It highlights quantum approaches for molecular simulation, drug-target interaction prediction, and optimizing clinical trials. Leveraging quantum capabilities could accelerate timelines and costs for bringing therapies to market, improving efficiency and ultimately benefiting public health.

2024-08-24T05:38:31Z 23 pages, 4 figures NPJ Drug Discov. 3, 1 (2026) Yidong Zhou Jintai Chen Jinglei Cheng Xu Cao Yuanyuan Zhang Gopal Karemore Marinka Zitnik Frederic T. Chong Junyu Liu Tianfan Fu Zhiding Liang 10.1038/s44386-025-00033-2 http://arxiv.org/abs/2511.15628v1 Methods for Secondary and Tertiary Structure Prediction of Microproteins 2025-11-19T17:13:24Z

Microproteins are a newly recognized and rapidly growing class of small proteins, typically encoded by fewer than 100 to 150 codons and translated from small open reading frames (smORFs). Although research has shown that smORFs and their corresponding microproteins constitute a significant portion of the genome and proteome, there is still limited information available in the literature regarding the structural characteristics of microproteins. In this paper, we discuss the methods available for predicting their secondary and tertiary structures and provide examples of calculations done with three archetypical methods (AlphaFold, I TASSER and ROSETTA). We present results predicting the structures of 44 microproteins. For this set of microproteins the methods considered here show a reasonable agreement among them and with the very few cases in which experimental structures are available. None the less, the agreement with experimental structures is not as good as for larger proteins, indicating that it is necessary to obtain a much larger set of experimental microproteins structures to better evaluate and eventually calibrate prediction methods.

2025-11-19T17:13:24Z Julio C. Facelli http://arxiv.org/abs/2512.02031v1 Pharmacophore-based design by learning on voxel grids 2025-11-19T17:10:04Z

Ligand-based drug discovery (LBDD) relies on making use of known binders to a protein target to find structurally diverse molecules similarly likely to bind. This process typically involves a brute force search of the known binder (query) against a molecular library using some metric of molecular similarity. One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps. While this virtual screening workflow has had considerable success in hit diversification, scaffold hopping, and patent busting, it scales poorly with library sizes and restricts candidate generation to existing library compounds. Leveraging recent advances in voxel-based generative modelling, we propose a pharmacophore-based generative model and workflows that address the scaling and fecundity issues of conventional pharmacophore-based virtual screening. We introduce \emph{VoxCap}, a voxel captioning method for generating SMILES strings from voxelised molecular representations. We propose two workflows as practical use cases as well as benchmarks for pharmacophore-based generation: \emph{de-novo} design, in which we aim to generate new molecules with high pharmacophore-shape similarities to query molecules, and fast search, which aims to combine generative design with a cheap 2D substructure similarity search for efficient hit identification. Our results show that VoxCap significantly outperforms previous methods in generating diverse \textit{de-novo} hits. When combined with our fast search workflow, VoxCap reduces computational time by orders of magnitude while returning hits for all query molecules, enabling the search of large libraries that are intractable to search by brute force.

2025-11-19T17:10:04Z Omar Mahmood Pedro O. Pinheiro Richard Bonneau Saeed Saremi Vishnu Sresht http://arxiv.org/abs/2512.02030v1 Generative design and validation of therapeutic peptides for glioblastoma based on a potential target ATP5A 2025-11-19T11:17:17Z

Glioblastoma (GBM) remains the most aggressive tumor, urgently requiring novel therapeutic strategies. Here, we present a dry-to-wet framework combining generative modeling and experimental validation to optimize peptides targeting ATP5A, a potential peptide-binding protein for GBM. Our framework introduces the first lead-conditioned generative model, which focuses exploration on geometrically relevant regions around lead peptides and mitigates the combinatorial complexity of de novo methods. Specifically, we propose POTFlow, a \underline{P}rior and \underline{O}ptimal \underline{T}ransport-based \underline{Flow}-matching model for peptide optimization. POTFlow employs secondary structure information (e.g., helix, sheet, loop) as geometric constraints, which are further refined by optimal transport to produce shorter flow paths. With this design, our method achieves state-of-the-art performance compared with five popular approaches. When applied to GBM, our method generates peptides that selectively inhibit cell viability and significantly prolong survival in a patient-derived xenograft (PDX) model. As the first lead peptide-conditioned flow matching model, POTFlow holds strong potential as a generalizable framework for therapeutic peptide design.

2025-11-19T11:17:17Z Hao Qian Pu You Lin Zeng Jingyuan Zhou Dengdeng Huang Kaicheng Li Shikui Tu Lei Xu http://arxiv.org/abs/2511.14676v1 Exploring AlphaFold 3 for CD47 Antibody-Antigen Binding Affinity: An Unexpected Discovery of Reverse docking 2025-11-18T17:18:24Z

AlphaFold 3 (AF3) is a powerful biomolecular structure-predicting tool based on the latest deep learning algorithms and revolutionized AI model architectures. A few of papers have already investigated its accuracy in predicting different biomolecular structures. However, the potential applications of AF3 beyond basic structure prediction have not been fully explored. In our study, we firstly focused on structure predictions of antibody-antigen (CD47) complexes, which is believed to be challenge for AF3 due to limited resolved cognate crystallographic structures. Furtherly, we aimed to the potentiality of AF3 in performing pre-screening for potent antibody candidates as an auxiliary work through binding affinity analysis compared to other molecular docking modules of commercial software, which would greatly benefit the lead identification or optimization process in the drug development. In essence, this is not limited to antibody-antigen binding affinity, but many other chemical or physical properties of any drug candidate based on AF3's accurate predicting structures that are extremely close to the reality. According to our experimental results, AF3 is a very promising competitor, which can efficiently produce highly reliable molecular structures and subsequent binding energy predictions for most subjects. Surprisingly, an unexpected and nonrandom phenomenon "reverse docking" was observed for two of our antibody subjects, suggesting new issues arising from the architectural revolution of AF3. Our analysis and error correction experiments show that this phenomenon is likely to be caused by revolutionized AI model architectures, which provides important experience and reminders for the optimization and design direction of AI for structural prediction. All software copyrights belong to the China Pharmaceutical University (CPU) and its affiliated School of Pharmacy and School of Science.

2025-11-18T17:18:24Z 15 pages,4 figures, submitted to ACS Omega Yiyang Xu Ziyou Shen Yanqing Lv Shutong Tan Chun Sun Juan Zhang