https://arxiv.org/api/YLzEAJ6GUbl0SoddkeEPO6rARZM 2026-03-20T10:46:48Z 6642 120 15 http://arxiv.org/abs/2512.10115v2 Decoding How Proteins Fold 2026-01-10T14:08:13Z

One of the most puzzling and unsolved challenges in molecular biology is understanding how proteins fold. Despite having advanced predictive tools that can accurately estimate the native structures of proteins, we still lack a comprehensive model that explains how amino acid sequences dictate folding pathways and trajectories. This manuscript takes a fresh approach to this problem by resorting to the principle of least action. This approach enables us to explore an intriguing question: how does a protein achieve its native state at a constant folding rate and within a time frame that is biologically plausible? A response to this inquiry will help us understand why proteins must fold along specific pathways and identify the boundary conditions that restrict their availability. It will also clarify why different folding pathways could be characterized by a common effective folding trajectory. Finally, it will provide a clear explanation for Levinthal's paradox. Our results are expected to pave the way for a more profound understanding of how proteins fold, shedding light on how the amino acid sequence and its surrounding environment encode the protein's folding pathways and, consequently, the protein's three-dimensional structure.

2025-12-10T22:06:59Z 18 pages, 3 figures Jorge Vila http://arxiv.org/abs/2401.17894v5 Recent methods from statistical inference and machine learning to improve integrative modeling of macromolecular assemblies 2026-01-10T04:59:42Z

Integrative modeling of macromolecular assemblies allows for structural characterization of large assemblies that are recalcitrant to direct experimental observation. A Bayesian inference approach facilitates combining data from complementary experiments along with physical principles, statistics of known structures, and prior models, for structure determination. Here, we review recent methods for integrative modeling based on statistical inference and machine learning. These methods improve over the current state-of-the-art by enhancing the data collection, optimizing coarse-grained model representations, making scoring functions more accurate, sampling more efficient, and model analysis more rigorous. We also discuss three new frontiers in integrative modeling: incorporating recent deep learning-based methods, integrative modeling with in situ data, and metamodeling.

2024-01-31T15:00:53Z Updated DOI pf published version Shreyas Arvindekar Kartik Majila Shruthi Viswanath http://arxiv.org/abs/2601.14273v1 New water oxidation mechanism in Photosystem II resolves major experimental controversies 2026-01-09T20:15:15Z

Light driven oxygen formation in Photosystem II protein is a fundamental process that sustains our biosphere and serves as a blue print to future clean energy solutions due to its high energy conversion efficiency. Last decade of intense research by advanced physical techniques delivered new insights on the structure and function of the Mn4CaO5 cluster a center of the oxygen evolving complex (OEC). However, discrepancies in experimental observations and computational models persist impeding the understanding of the O-O bond formation and the role of the protein environment in the process. Here we show that i) assignment of the OEC unique oxygen O3 ligated by histidine (His337) via dynamic H-bond as a slow exchanging substrate and ii) its coupling with O6 oxygen generated at Mn1 in the S2 to S3 transition give the O-O bond formation mechanism most consistent with all currently available experimental data. Proposal shows how protein environment can steer the O-O bond formation by charge control via H-bond and open coordination of Mn1. Obtained O3-O6 peroxide is at lower energy than peroxides in the most studied O5-O6 bond formation pathway. His337 appears to be similar to distal His in globins used for management of the O2 and H2O2 intermediates. The new mechanism breaks the prior impasse and will undoubtedly invigorate future detailed studies uncovering its further details.

2026-01-09T20:15:15Z Yulia Pushkar http://arxiv.org/abs/2601.05792v1 Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning 2026-01-09T13:39:49Z

Accurate drug-target interaction (DTI) prediction is essential for computational drug discovery, yet existing models often rely on single-modality predefined molecular descriptors or sequence-based embeddings with limited representativeness. We propose Tensor-DTI, a contrastive learning framework that integrates multimodal embeddings from molecular graphs, protein language models, and binding-site predictions to improve interaction modeling. Tensor-DTI employs a siamese dual-encoder architecture, enabling it to capture both chemical and structural interaction features while distinguishing interacting from non-interacting pairs. Evaluations on multiple DTI benchmarks demonstrate that Tensor-DTI outperforms existing sequence-based and graph-based models. We also conduct large-scale inference experiments on CDK2 across billion-scale chemical libraries, where Tensor-DTI produces chemically plausible hit distributions even when CDK2 is withheld from training. In enrichment studies against Glide docking and Boltz-2 co-folder, Tensor-DTI remains competitive on CDK2 and improves the screening budget required to recover moderate fractions of high-affinity ligands on out-of-family targets under strict family-holdout splits. Additionally, we explore its applicability to protein-RNA and peptide-protein interactions. Our findings highlight the benefits of integrating multimodal information with contrastive objectives to enhance interaction-prediction accuracy and to provide more interpretable and reliability-aware models for virtual screening.

2026-01-09T13:39:49Z Accepted at the Generative and Experimental Perspectives for Biomolecular Design Workshop at ICLR 2025 and at the Learning Meaningful Representations of Life Workshop at ICLR 2025 Manel Gil-Sorribes Júlia Vilalta-Mor Isaac Filella-Mercè Robert Soliva Álvaro Ciudad Víctor Guallar Alexis Molina http://arxiv.org/abs/2601.01740v2 Fold-switching proteins push the boundaries of conformational ensemble prediction 2026-01-08T13:20:48Z

A protein's function depends critically on its conformational ensemble, a collection of energy weighted structures whose balance depends on temperature and environment. Though recent deep learning (DL) methods have substantially advanced predictions of single protein structures, computationally modeling conformational ensembles remains a challenge. Here, we focus on modeling fold-switching proteins, which remodel their secondary and/or tertiary structures and change their functions in response to cellular stimuli. These underrepresented members of the protein universe serve as test cases for a method's generalizability. They reveal that DL models often predict conformational ensembles by association with training-set structures, limiting generalizability. These observations suggest use cases for when DL methods will likely succeed or fail. Developing computational methods that successfully identify new fold-switching proteins from large pools of candidates may advance modeling conformational ensembles more broadly.

2026-01-05T02:32:20Z Myeongsang Lee Lauren L. Porter 10.1146/annurev-biodatasci-092524-114822 http://arxiv.org/abs/2601.04874v1 Structural-dynamic behavior of histamine in solution: the role of water models 2026-01-08T12:19:20Z

A highly diluted aqueous solution of histamine was studied by molecular dynamics using the TIP3P and SPC/E water models. It was shown that the local structure of the solution around histamine is determined by local Coulomb interactions and hydrogen bonds and is practically independent of the choice of the water model. Dynamic analysis based on the mean square displacement functions revealed a significant dependence of the diffusion behavior of histamine on the water model. It was found that the TIP3P water model leads to overestimated values of the diffusion coefficients of water and histamine and a transition to the diffusion mode of motion. It was found that the SPC/E water model provides slower dynamics of the solution components, and the values of the diffusion coefficients are in better agreement with experimental data. It was shown that the dynamics of histamine is highly sensitive to the choice of the water model, and the SPC/E model is more suitable for the correct description of the dynamic properties of the ``histamine--water'' system under physiological conditions.

2026-01-08T12:19:20Z Dmytro A. Gavryushenko N. Atamas Oleg K. Myronenko http://arxiv.org/abs/2407.16580v2 Assessment of scoring functions for computational models of protein-protein interfaces 2026-01-07T15:31:02Z

A goal of computational studies of protein-protein interfaces (PPIs) is to predict the binding site between two monomers that form a heterodimer. The simplest version of this problem is to rigidly re-dock the bound forms of the monomers, which involves generating computational models of the heterodimer and then scoring them to determine the most native-like models. Scoring functions have been assessed previously using rank- and classification-based metrics, however, these methods are sensitive to the number and quality of models in the scoring function training set. We assess the accuracy of seven PPI scoring functions by comparing their scores to a measure of structural similarity to the x-ray crystal structure (i.e. the DockQ score) for a non-redundant set of heterodimers from the Protein Data Bank. For each heterodimer, we generate re-docked models uniformly sampled over DockQ and calculate the Spearman correlation between the PPI scores and DockQ. For some targets, the scores and DockQ are highly correlated; however, for many targets, there are weak correlations. Several physical features can explain the difference between difficult- and easy-to-score targets. For example, strong correlations exist between the score and DockQ for targets with highly intertwined monomers and many interface contacts. We also develop a new score based on only three physical features that matches or exceeds the performance of current PPI scoring functions. These results emphasize that PPI prediction can be improved by focusing on correlations between the PPI score and DockQ and incorporating more discriminating physical features into PPI scoring functions.

2024-07-23T15:36:47Z 21 pages, 7 figures Jacob Sumner Grace Meng Naomi Brandt Alex T. Grigas Andrés Córdoba Mark D. Shattuck Corey S. O'Hern http://arxiv.org/abs/2510.09668v2 A Hybrid Computational Intelligence Framework with Metaheuristic Optimization for Drug-Drug Interaction Prediction 2026-01-07T10:34:27Z

Drug-drug interactions (DDIs) are a leading cause of preventable adverse events, often complicating treatment and increasing healthcare costs. At the same time, knowing which drugs do not interact is equally important, as such knowledge supports safer prescriptions and better patient outcomes. In this study, we propose an interpretable and efficient framework that blends modern machine learning with domain knowledge to improve DDI prediction. Our approach combines two complementary molecular embeddings - Mol2Vec, which captures fragment-level structural patterns, and SMILES-BERT, which learns contextual chemical features - together with a leakage-free, rule-based clinical score (RBScore) that injects pharmacological knowledge without relying on interaction labels. A lightweight neural classifier is then optimized using a novel three-stage metaheuristic strategy (RSmpl-ACO-PSO), which balances global exploration and local refinement for stable performance. Experiments on real-world datasets demonstrate that the model achieves high predictive accuracy (ROC-AUC 0.911, PR-AUC 0.867 on DrugBank) and generalizes well to a clinically relevant Type 2 Diabetes Mellitus cohort. Beyond raw performance, studies show how embedding fusion, RBScore, and the optimizer each contribute to precision and robustness. Together, these results highlight a practical pathway for building reliable, interpretable, and computationally efficient models that can support safer drug therapies and clinical decision-making.

2025-10-08T09:55:18Z After further internal review, we identified that the methodological contribution claimed in Section 3 substantially overlaps with prior published work and lacks sufficient novel theoretical or empirical justification. As this affects the core contribution, the authors request withdrawal rather than replacement Maryam Abdollahi Shamami Babak Teimourpour Farshad Sharifi http://arxiv.org/abs/2601.03704v1 Investigating Knowledge Distillation Through Neural Networks for Protein Binding Affinity Prediction 2026-01-07T08:43:08Z

The trade-off between predictive accuracy and data availability makes it difficult to predict protein--protein binding affinity accurately. The lack of experimentally resolved protein structures limits the performance of structure-based machine learning models, which generally outperform sequence-based methods. In order to overcome this constraint, we suggest a regression framework based on knowledge distillation that uses protein structural data during training and only needs sequence data during inference. The suggested method uses binding affinity labels and intermediate feature representations to jointly supervise the training of a sequence-based student network under the guidance of a structure-informed teacher network. Leave-One-Complex-Out (LOCO) cross-validation was used to assess the framework on a non-redundant protein--protein binding affinity benchmark dataset. A maximum Pearson correlation coefficient (P_r) of 0.375 and an RMSE of 2.712 kcal/mol were obtained by sequence-only baseline models, whereas a P_r of 0.512 and an RMSE of 2.445 kcal/mol were obtained by structure-based models. With a P_r of 0.481 and an RMSE of 2.488 kcal/mol, the distillation-based student model greatly enhanced sequence-only performance. Improved agreement and decreased bias were further confirmed by thorough error analyses. With the potential to close the performance gap between sequence-based and structure-based models as larger datasets become available, these findings show that knowledge distillation is an efficient method for transferring structural knowledge to sequence-based predictors. The source code for running inference with the proposed distillation-based binding affinity predictor can be accessed at https://github.com/wajidarshad/ProteinAffinityKD.

2026-01-07T08:43:08Z Wajid Arshad Abbasi Syed Ali Abbas Maryum Bibi Saiqa Andleeb Muhammad Naveed Akhtar http://arxiv.org/abs/2601.03677v1 Roadmap for Condensates in Cell Biology 2026-01-07T07:59:06Z

Biomolecular condensates govern essential cellular processes yet elude description by traditional equilibrium models. This roadmap, distilled from structured discussions at a workshop and reflecting the consensus of its participants, clarifies key concepts for researchers, funding bodies, and journals. After unifying terminology that often separates disciplines, we outline the core physics of condensate formation, review their biological roles, and identify outstanding challenges in nonequilibrium theory, multiscale simulation, and quantitative in-cell measurements. We close with a forward-looking outlook to guide coordinated efforts toward predictive, experimentally anchored understanding and control of biomolecular condensates.

2026-01-07T07:59:06Z 14 pages, 5 figures Dilimulati Aierken Sebastian Aland Stefano Bo Steven Boeynaems Danfeng Cai Serena Carra Lindsay B. Case Hue Sun Chan Jorge R. Espinosa Trevor K. GrandPre Alexander Y. Grosberg Ivar S. Haugerud William M. Jacobs Jerelle A. Joseph Frank Jülicher Kurt Kremer Guido Kusters Liedewij Laan Keren Lasker Katrin S. Laxhuber Hyun O. Lee Kathy F. Liu Dimple Notani Yicheng Qiang Paul Robustelli Leonor Saiz Omar A. Saleh Helmut Schiessel Jeremy Schmit Meng Shen Krishna Shrinivas Antonia Statt Andres R. Tejedor Tatjana Trcek Christoph A. Weber Stephanie C. Weber Ned S. Wingreen Huaiying Zhang Yaojun Zhang Huan Xiang Zhou David Zwicker http://arxiv.org/abs/2601.02265v1 Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning 2026-01-05T16:49:17Z

Polymer-based long-acting injectables (LAIs) have transformed the treatment of chronic diseases by enabling controlled drug delivery, thus reducing dosing frequency and extending therapeutic duration. Achieving controlled drug release from LAIs requires extensive optimization of the complex underlying physicochemical properties. Machine learning (ML) can accelerate LAI development by modeling the complex relationships between LAI properties and drug release. However, recent ML studies have provided limited information on key properties that modulate drug release, due to the lack of custom modeling and analysis tailored to LAI data. This paper presents a novel data transformation and explainable ML approach to synthesize actionable information from 321 LAI formulations by predicting early drug release at 24, 48, and 72 hours, classification of release profile types, and prediction of complete release profiles. These three experiments investigate the contribution and control of LAI material characteristics in early and complete drug release profiles. A strong correlation (>0.65) is observed between the true and predicted drug release in 72 hours, while a 0.87 F1-score is obtained in classifying release profile types. A time-independent ML framework predicts delayed biphasic and triphasic curves with better performance than current time-dependent approaches. Shapley additive explanations reveal the relative influence of material characteristics during early and for complete release which fill several gaps in previous in-vitro and ML-based studies. The novel approach and findings can provide a quantitative strategy and recommendations for scientists to optimize the drug-release dynamics of LAI. The source code for the model implementation is publicly available.

2026-01-05T16:49:17Z Karla N. Robles Manar D. Samad http://arxiv.org/abs/2310.08061v2 ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking 2026-01-03T02:40:21Z

Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.

2023-10-12T06:23:12Z The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: {10.1007/s11704-026-51026-x} Yiqiang Yi Xu Wan Yatao Bian Le Ou-Yang Peilin Zhao http://arxiv.org/abs/2601.00769v1 Evolutionary and Structural Constraints Define a Mutation-Resistant Catalytic Core in E. coli Serine Hydroxy methyltransferase (SHMT) 2026-01-02T18:00:31Z

Serine hydroxymethyltransferase is an essential enzyme in the Escherichia coli folate pathway, yet it has not been adopted as an antibacterial target, unlike DHFR, DHPS, or thymidylate synthase. To investigate this discrepancy, we applied a multi-scale computational framework that integrates large-scale sequence analysis of 1000 homologs, coevolutionary interaction mapping, structural community analysis, intrinsic disorder profiling, and adaptive fitness modelling. These analyses converge on a single conclusion: the catalytic core of SHMT forms an exceptionally conserved and tightly coupled structural unit. This region exhibits dense coevolution, strong intramolecular connectivity, minimal disorder, and extremely low mutational tolerance. Peripheral loops and termini, in contrast, are far more flexible. Relative to established folate-pathway antibiotic targets, SHMT active site is even more rigid and evolutionarily constrained. This extreme constraint may limit the emergence of resistance-compatible mutations, providing a plausible explanation for the absence of natural-product inhibitors. Fitness trajectory modelling supports this interpretation, showing that nearly all active-site residues tolerate only rare or neutral substitutions. Together, these findings identify SHMT as a structurally stable and evolutionarily restricted enzyme whose catalytic architecture is unusually protected. This makes SHMT an underexplored yet promising target for the rational design of next-generation antibacterial agents.

2026-01-02T18:00:31Z 50 pages 6 figures Deeptanshu Pandey Dwipanjan Sanyal Vladimir N. Uversky Daniel C. Zielinski Sourav Chowdhury http://arxiv.org/abs/2601.00618v1 Quantifying the uncertainty of molecular dynamics simulations : Good-Turing statistics revisited 2026-01-02T09:21:18Z

We have previously shown that Good-Turing statistics can be applied to molecular dynamics trajectories to estimate the probability of observing completely new (thus far unobserved) biomolecular structures, and showed that the method is stable, dependable and its predictions verifiable. The major problem with that initial algorithm was the requirement for calculating and storing in memory the two-dimensional RMSD matrix of the currently available trajectory. This requirement precluded the application of the method to very long simulations. Here we describe a new variant of the Good-Turing algorithm whose memory requirements scale linearly with the number of structures in the trajectory, making it suitable even for extremely long simulations. We show that the new method gives essentially identical results with the older implementation, and present results obtained from trajectories containing up to 22 million structures. A computer program implementing the new algorithm is available from standard repositories.

2026-01-02T09:21:18Z Vasiliki Tsampazi Nicholas M. Glykos http://arxiv.org/abs/2601.00599v1 The thermodynamics of pressure activated assembly of supramolecules in isochoric and isobaric systems 2026-01-02T07:40:53Z

The efficacy of cryopreservation is constrained by the difficulty of achieving sufficiently high intracellular concentrations of cryoprotective solutes without inducing osmotic injury or chemical toxicity during loading. This thermodynamic study introduces a new conceptual mechanism for cryoprotectant delivery into cells directly or through vascular perfusion. In this framework, effective cryoprotection could be achieved through the in situ generation of high intracellular concentrations of cryoprotective solutes via pressure-activated disassembly of membrane-permeant supramolecular assemblies composed of cryoprotectant monomers or oligomers. These supramolecules, present initially at low concentrations, are envisioned to enter cells through passive partitioning or endocytosis with minimal osmotic effect, and subsequently transform into a high intracellular concentration of cryoprotectants upon disassembly. We propose that elevated hydrostatic pressure, generated intrinsically during isochoric (constant-volume) freezing or applied externally under isobaric (constant-pressure) conditions, can destabilize supramolecular assemblies whose dissociated state occupies a smaller molar volume than the assembled state. Under isochoric freezing, ice formation within a fixed volume produces a substantial pressure increase as a thermodynamic consequence of phase change, rendering pressure a dependent variable governed by the Helmholtz free energy. Under isobaric conditions, pressure acts as an externally controlled variable through the Gibbs free energy. In both formulations, pressure-activated disassembly decouples membrane transport from cryoprotectant availability and enables synchronized solute generation precisely during cooling or freezing, without pre-loading of osmotically active solutes.

2026-01-02T07:40:53Z 16 pages, one figure, one table Boris Rubinsky