From Prediction to Simulation: AlphaFold 3 as a Differentiable Framework for Structural Biology

2025-08-25T19:49:28Z

AlphaFold 3 represents a transformative advancement in computational biology, enhancing protein structure prediction through novel multi-scale transformer architectures, biologically informed cross-attention mechanisms, and geometry-aware optimization strategies. These innovations dramatically improve predictive accuracy and generalization across diverse protein families, surpassing previous methods. Crucially, AlphaFold 3 embodies a paradigm shift toward differentiable simulation, bridging traditional static structural modeling with dynamic molecular simulations. By reframing protein folding predictions as a differentiable process, AlphaFold 3 serves as a foundational framework for integrating deep learning with physics-based molecular

ImmunoAI: Accelerated Antibody Discovery Using Gradient-Boosted Machine Learning with Thermodynamic-Hydrodynamic Descriptors and 3D Geometric Interface Topology

2025-08-25T19:41:35Z

Human metapneumovirus (hMPV) poses serious risks to pediatric, elderly, and immunocompromised populations. Traditional antibody discovery pipelines require 10-12 months, limiting their applicability for rapid outbreak response. This project introduces ImmunoAI, a machine learning framework that accelerates antibody discovery by predicting high-affinity candidates using gradient-boosted models trained on thermodynamic, hydrodynamic, and 3D topological interface descriptors. A dataset of 213 antibody-antigen complexes was curated to extract geometric and physicochemical features, and a LightGBM regressor was trained to predict binding affinity with high precision. The model reduced the antibody candidate search space by 89%, and fine-tuning on 117 SARS-CoV-2 binding pairs further reduced Root Mean Square Error (RMSE) from 1.70 to 0.92. In the absence of an experimental structure for the hMPV A2.2 variant, AlphaFold2 was used to predict its 3D structure. The fine-tuned model identified two optimal antibodies with predicted picomolar affinities targeting key mutation sites (G42V and E96K), making them excellent candidates for experimental testing. In summary, ImmunoAI shortens design cycles and enables faster, structure-informed responses to viral outbreaks.

Designing de novo TIM Barrels: Insights into Stabilization, Diversification, and Functionalization Strategies

2025-08-25T14:09:33Z

The TIM-barrel fold is one of the most versatile and ubiquitous protein folds in nature, hosting a wide variety of catalytic activities and functions while serving as a model system in protein biochemistry and engineering. This review explores its role as a key fold model in protein design, particularly in addressing challenges in stabilization and functionalization. We discuss historical and recent advances in de novo TIM barrel design from the landmark creation of sTIM11 to the development of the diversified variants, with a special focus on deepening our understanding of the determinants that modulate the sequence-structure-function relationships of this architecture. Also, we examine why the diversification of de novo TIM barrels towards functionalization remains a major challenge, given the absence of natural-like active site features. Current approaches have focused on incorporating structural extensions, modifying loops, and using cutting-edge AI-based strategies to create scaffolds with tailored characteristics. Despite significant advances, achieving enzymatically active de novo TIM barrels has been proven difficult, with only recent breakthroughs demonstrating functionalized designs. We discuss the limitations of stepwise functionalization approaches and support an integrated approach that simultaneously optimizes scaffold structure and active site shape, using both physical- and AI-driven methods. By combining computational and experimental insights, we highlight the TIM barrel as a powerful template for custom enzyme design and as a model system to explore the intersection of protein biochemistry, biophysics, and design.

One pocket to activate them all: Efforts on understanding the modulator pocket in K2P channels

2025-08-25T10:57:47Z

The modulator pocket is a cryptic site discovered in the TREK1 K2P channel that accommodates agonists capable of increasing the channel's activity. Since its discovery, equivalent sites in other K2P channels have been shown to bind various ligands, both endogenous and exogenous. In this review, we attempt to elucidate how the modulator pocket contributes to K2P channel activation. To this end, we first describe the gating mechanisms reported in the literature and rationalize their modes of action. We then highlight previous experimental and computational evidence for agonists that bind to the modulator pocket, together with mutations at this site that affect gating. Finally, we elaborate how the activation signal arising from the modulator pocket is transduced to the gates in K2P channels. In doing so, we outline a potential common modulator pocket architecture across K2P channels: a largely amphipathic structure -consistent with the expected properties of a pocket exposed at the interface between a hydrophobic membrane and the aqueous solvent- but still with some important channel-sequence-variations. This architecture and its key differences can be leveraged for the design of new selective and potent modulators.

Multi-domain Distribution Learning for De Novo Drug Design

2025-08-25T09:12:01Z

We introduce DrugFlow, a generative model for structure-based drug design that integrates continuous flow matching with discrete Markov bridges, demonstrating state-of-the-art performance in learning chemical, geometric, and physical aspects of three-dimensional protein-ligand data. We endow DrugFlow with an uncertainty estimate that is able to detect out-of-distribution samples. To further enhance the sampling process towards distribution regions with desirable metric values, we propose a joint preference alignment scheme applicable to both flow matching and Markov bridge frameworks. Furthermore, we extend our model to also explore the conformational landscape of the protein by jointly sampling side chain angles and molecules.

Boltzina: Efficient and Accurate Virtual Screening via Docking-Guided Binding Prediction with Boltz-2

2025-08-24T23:40:10Z

In structure-based drug discovery, virtual screening using conventional molecular docking methods can be performed rapidly but suffers from limitations in prediction accuracy. Recently, Boltz-2 was proposed, achieving extremely high accuracy in binding affinity prediction, but requiring approximately 20 seconds per compound per GPU, making it difficult to apply to large-scale screening of hundreds of thousands to millions of compounds. This study proposes Boltzina, a novel framework that leverages Boltz-2's high accuracy while significantly improving computational efficiency. Boltzina achieves both accuracy and speed by omitting the rate-limiting structure prediction from Boltz-2's architecture and directly predicting affinity from AutoDock Vina docking poses. We evaluate on eight assays from the MF-PCBA dataset and show that while Boltzina performs below Boltz-2, it provides significantly higher screening performance compared to AutoDock Vina and GNINA. Additionally, Boltzina achieved up to 11.8$\times$ faster through reduced recycling iterations and batch processing. Furthermore, we investigated multi-pose selection strategies and two-stage screening combining Boltzina and Boltz-2, presenting optimization methods for accuracy and efficiency according to application requirements. This study represents the first attempt to apply Boltz-2's high-accuracy predictions to practical-scale screening, offering a pipeline that combines both accuracy and efficiency in computational biology. The Boltzina is available on github; https://github.com/ohuelab/boltzina.

Chemical classification program synthesis using generative artificial intelligence

2025-08-24T01:27:40Z

Accurately classifying chemical structures is essential for cheminformatics and bioinformatics, including tasks such as identifying bioactive compounds of interest, screening molecules for toxicity to humans, finding non-organic compounds with desirable material properties, or organizing large chemical libraries for drug discovery or environmental monitoring. However, manual classification is labor-intensive and difficult to scale to large chemical databases. Existing automated approaches either rely on manually constructed classification rules, or are deep learning methods that lack explainability. This work presents an approach that uses generative artificial intelligence to automatically write chemical classifier programs for classes in the Chemical Entities of Biological Interest (ChEBI) database. These programs can be used for efficient deterministic run-time classification of SMILES structures, with natural language explanations. The programs themselves constitute an explainable computable ontological model of chemical class nomenclature, which we call the ChEBI Chemical Class Program Ontology (C3PO). We validated our approach against the ChEBI database, and compared our results against deep learning models and a naive SMARTS pattern based classifier. C3PO outperforms the naive classifier, but does not reach the performance of state of the art deep learning methods. However, C3PO has a number of strengths that complement deep learning methods, including explainability and reduced data dependence. C3PO can be used alongside deep learning classifiers to provide an explanation of the classification, where both methods agree. The programs can be used as part of the ontology development process, and iteratively refined by expert human curators.

Nonequilibrium protein complexes as molecular automata

2025-08-21T14:26:13Z

Biology stores information and computes at the molecular scale, yet the ways in which it does so are often distinct from human-engineered computers. Mapping biological computation onto architectures familiar to computer science remains an outstanding challenge. Here, inspired by Crick's proposal for molecular memory, we analyse a thermodynamically-consistent model of a protein complex subject to driven, nonequilibrium enzymatic reactions. In the strongly driven limit, we find that the system maps onto a stochastic, asynchronous variant of cellular automata, where each rule corresponds to a different set of enzymes being present. We find a broad class of phenomena in these 'molecular automata' that can be exploited for molecular computation, including error-tolerant memory via multistable attractors, and long transients that can be used as molecular stopwatches. By systematically enumerating all possible dynamical rules, we identify those that allow molecular automata to implement simple computational architectures such as finite-state machines. Overall, our results provide a framework for engineering synthetic molecular automata, and offer a route to building protein-based computation in living cells.

Sesame: Opening the door to protein pockets

2025-08-21T12:22:56Z

Molecular docking is a cornerstone of drug discovery, relying on high-resolution ligand-bound structures to achieve accurate predictions. However, obtaining these structures is often costly and time-intensive, limiting their availability. In contrast, ligand-free structures are more accessible but suffer from reduced docking performance due to pocket geometries being less suited for ligand accommodation in apo structures. Traditional methods for artificially inducing these conformations, such as molecular dynamics simulations, are computationally expensive. In this work, we introduce Sesame, a generative model designed to predict this conformational change efficiently. By generating geometries better suited for ligand accommodation at a fraction of the computational cost, Sesame aims to provide a scalable solution for improving virtual screening workflows.

Visualizing Poloidal Orientation in DNA Minicircles

2025-08-20T22:01:51Z

A short (<150 bp) double-stranded DNA (dsDNA) molecule ligated end-to-end forms a DNA minicircle. Due to sequence-dependent, nonuniform bending energetics, such a minicircle is predicted to adopt a certain inside-out orientation, known as the poloidal orientation. Despite theoretical and computational predictions, experimental evidence for this phenomenon has been lacking. In this study, we introduce a single-molecule approach to visualize the poloidal orientation of DNA minicircles. We constructed a set of DNA minicircles, each containing a single biotin located at a different position along one helical turn of the dsDNA, and imaged the location of biotin-bound NeutrAvidin relative to the DNA minicircle using atomic force microscopy (AFM). We applied this approach to two DNA sequences previously predicted to exhibit strongly preferred poloidal orientations. The observed relative positions of NeutrAvidin shifted between the inside and outside of the minicircle with different phases, indicating distinct poloidal orientations for the two sequences. Coarse-grained simulations revealed narrowly distributed poloidal orientations with different mean orientations for each sequence, consistent with the AFM results. Together, our findings provide experimental confirmation of preferred poloidal orientations in DNA minicircles, offering insights into the intrinsic dynamics of circular DNA.

Yeast growth is controlled by the proportional scaling of mRNA and ribosome concentrations

2025-08-20T18:33:45Z

Despite growth being fundamental to all aspects of cell biology, we do not yet know its organizing principles in eukaryotic cells. Classic models derived from the bacteria E. coli posit that protein-synthesis rates are set by mass-action collisions between charged tRNAs produced by metabolic enzymes and mRNA-bound ribosomes. These models show that faster growth is achieved by simultaneously raising both ribosome content and peptide elongation speed. Here, we test if these models are valid for eukaryotes by combining single-molecule tracking, spike-in RNA sequencing, and proteomics in 15 carbon- and nitrogen-limited conditions using the budding yeast S. cerevisiae. Ribosome concentration increases linearly with growth rate, as in bacteria, but the peptide elongation speed remains constant (~9 amino acids/s) and charged tRNAs are not limiting. Total mRNA concentration rises in direct proportion to ribosomes, driven by enhanced RNA polymerase II occupancy of the genome. We show that a simple kinetic model of mRNA-ribosome binding predicts both the fraction of active ribosomes, the growth rate, and responses to transcriptional perturbations. Yeast accelerate growth by coordinately and proportionally co-up-regulating total mRNA and ribosome concentrations, not by speeding elongation. Taken together, our work establishes a new framework for eukaryotic growth control and resource allocation.

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

2025-08-19T18:39:25Z

The synthesis of complex natural products remains one of the grand challenges of organic chemistry. We present DeepRetro, a major advancement in computational retrosynthesis that enables the discovery of viable synthetic routes for complex molecules typically considered beyond the reach of existing retrosynthetic methods. DeepRetro is a novel, open-source framework that tightly integrates large language models (LLMs), traditional retrosynthetic engines, and expert human feedback in an iterative design loop. Prior approaches rely solely on template-based methods or unconstrained LLM outputs. In contrast, DeepRetro combines the precision of template-based methods with the generative flexibility of LLMs, controlled by rigorous chemical validity checks and enhanced by recursive refinement. This hybrid system dynamically explores and revises synthetic pathways, guided by both algorithmic checks and expert chemist feedback through an interactive user interface. While DeepRetro achieves strong performance on standard retrosynthesis benchmarks, its true strength lies in its ability to propose novel, viable pathways to highly complex natural products-targets that have historically eluded automated planning. Through detailed case studies, we illustrate how this approach enables new routes for total synthesis and facilitates human-machine collaboration in organic chemistry. Beyond retrosynthesis, DeepRetro represents a working model for how to leverage LLMs in scientific discovery. We provide a transparent account of the system's design, algorithms, and human-feedback loop, enabling broad adaptation across scientific domains. By releasing DeepRetro as an open-source tool, we aim to empower chemists to tackle increasingly ambitious synthetic targets, accelerating progress in drug discovery, materials design, and beyond.

QUBODock: A Pip-Installable QUBO Tool for Ligand Pose Generation

2025-08-18T15:34:08Z

We present QUBODock, a pip-installable tool that formulates ligand pose generation as a Quadratic Unconstrained Binary Optimization (QUBO) problem and solves it efficiently on CPU or GPU. QUBODock focuses exclusively on pose generation and deliberately excludes any built-in scoring function, allowing researchers to pair its poses with external scorers of their choice. The software provides a minimal, reproducible interface for (i) protein-ligand structure ingestion and preprocessing, (ii) QUBO model construction from geometric/compatibility constraints, and (iii) decoding solutions into candidate poses for downstream ranking. Implemented in Python with GPU acceleration, QUBODock emphasizes usability and reproducibility: it is distributed on PyPI and can be installed with a single command. We release the source to support benchmarking, teaching, and method development around QUBO-based docking pose generation.

Generative Modeling of Full-Atom Protein Conformations using Latent Diffusion on Graph Embeddings

2025-08-18T10:37:31Z

Generating diverse, all-atom conformational ensembles of dynamic proteins such as G-protein-coupled receptors (GPCRs) is critical for understanding their function, yet most generative models simplify atomic detail or ignore conformational diversity altogether. We present latent diffusion for full protein generation (LD-FPG), a framework that constructs complete all-atom protein structures, including every side-chain heavy atom, directly from molecular dynamics (MD) trajectories. LD-FPG employs a Chebyshev graph neural network (ChebNet) to obtain low-dimensional latent embeddings of protein conformations, which are processed using three pooling strategies: blind, sequential and residue-based. A diffusion model trained on these latent representations generates new samples that a decoder, optionally regularized by dihedral-angle losses, maps back to Cartesian coordinates. Using D2R-MD, a 2-microsecond MD trajectory (12 000 frames) of the human dopamine D2 receptor in a membrane environment, the sequential and residue-based pooling strategy reproduces the reference ensemble with high structural fidelity (all-atom lDDT of approximately 0.7; C-alpha-lDDT of approximately 0.8) and recovers backbone and side-chain dihedral-angle distributions with a Jensen-Shannon divergence of less than 0.03 compared to the MD data. LD-FPG thereby offers a practical route to system-specific, all-atom ensemble generation for large proteins, providing a promising tool for structure-based therapeutic design on complex, dynamic targets. The D2R-MD dataset and our implementation are freely available to facilitate further research.

Deep Learning-based QSAR Model for Therapeutic Strategies Targeting SmTGR Protein's Immune Modulating Role in Host-Parasite Interaction

2025-08-18T06:33:39Z

Schistosomiasis, a neglected tropical disease caused by Schistosoma parasites, remains a major global health challenge. The Schistosoma mansoni thioredoxin glutathione reductase (SmTGR) is essential for parasite redox balance and immune evasion, making it a key therapeutic target. This study employs predictive Quantitative Structure-Activity Relationship (QSAR) modeling to identify potential SmTGR inhibitors. Using deep learning, a robust QSAR model was developed and validated, achieving high predictive accuracy. The predicted novel inhibitors were further validated through molecular docking studies, which demonstrated strong binding affinities, with the highest docking score of -10.76+-0.01kcal/mol. Visualization of the docked structures in both 2D and 3D confirmed similar interactions for the inhibitors and commercial drugs, further supporting their therapeutic effectiveness and the predictive ability of the model. This study demonstrates the potential of QSAR modeling in accelerating drug discovery, offering a promising avenue for developing novel therapeutics targeting SmTGR to improve schistosomiasis treatment.