https://arxiv.org/api/+OfNR8x/tHKml8cSWGpufOBfEQM 2026-03-16T06:41:02Z 6635 0 15 http://arxiv.org/abs/2511.07024v2 The role of topology on protein thermal stability 2026-03-12T15:41:54Z For several decades, experimental and computational studies have been used to investigate the potential functional role of knots in protein structures. A property that has attracted considerable attention is thermal stability, i.e., the extent to which a protein retains its native conformation and biological activity at high temperatures, without undergoing denaturation or aggregation. Thermal stability is quantified by the melting temperature Tm, an equilibrium property that corresponds to the peak of heat capacity in differential scanning calorimetry (DSC) experiments. Experimental and computational studies report conflicting effects of knotting on protein thermal stability. Here, we use extensive Monte Carlo simulations of a simple C-alpha model of protein YibK, with energetics modeled by the Go potential, to show that Tm does not depend on the topological state of the protein. Our simulations further support the view that the discrepancy between the experimental and computational results stems from a pronounced separation of timescales for unknotting and unfolding that is inherent to deeply knotted proteins like YibK. In particular, the timescale separation implies that the complete unfolding-untying transition may not be accessible within the duration of a DSC experiment, whose apparent Tm measurements likely reflect a non-equilibrium distribution lacking unfolded states that are also unknotted. 2025-11-10T12:10:38Z João N. C. Especial Beatriz P. Teixeira Ana Nunes Miguel Machuqueiro Patrícia F. N. Faísca 10.1103/9kzy-w2tz http://arxiv.org/abs/2603.12053v1 Topological Enhancement of Protein Kinetic Stability 2026-03-12T15:24:59Z Knotted proteins embed a physical (i.e., open) knot within their native structures. For decades, significant effort has been devoted to elucidating the functional role of knots in proteins, yet no consensus has been reached. Here, using extensive Monte Carlo off-lattice simulations of a simple structure-based model, we isolate the effect of topology by comparing simulations that preserve the linear topology of the chain with simulations that allow chain crossings. This controlled framework enables us to isolate topological effects from sequence, structure and energetic contributions. We show that protein kinetic stability, defined as resistance to unfolding at a fixed temperature, is higher in knotted proteins. Additionally, kinetic stability increases significantly with knot depth, whereas foldability (or folding efficiency) is comparatively less affected. By considering a simple model of protein evolution in which amino-acid alphabet size is used as a proxy for evolutionary time, we find that increasing primary-sequence complexity through the addition of biotic amino acids predominantly enhances kinetic stability. Taken together, these results indicate that kinetic stability is a functional advantage conferred by protein knots and suggest that evolutionary pressure for kinetic stability could contribute to the persistence of knotted proteins. 2026-03-12T15:24:59Z João NC Especial Patrícia FN Faísca http://arxiv.org/abs/2603.11732v1 Scaling Laws and Paradoxical Metastable States in Nanofilament Entropic Separation 2026-03-12T09:38:21Z Entropic forces play a fundamental role in nanoscale phenomena, from colloidal self-assembly to biomolecular disaggregation. Here, we develop an exact analytical theory and find general scaling laws for the entropic separation of tether-mediated nanofilament bundles, revealing that a single dimensionless parameter--the ratio of the excluded-volume radius to the tether length--dictates whether filaments are pushed apart or, contrary to the usual expectation, pulled together. This unexpected regime challenges the view that entropic forces invariably promote disaggregation, instead uncovering conditions under which the bundles can remain in attractive metastable states. Brownian dynamics simulations confirm this paradoxical effect, offering predictive insights for applications in biophysics, soft matter physics, and nanotechnology. 2026-03-12T09:38:21Z 17 pages, 7 figures Jose M. G. Vilar J. Miguel Rubi Leonor Saiz http://arxiv.org/abs/2603.09860v1 Joint Geometric-Chemical Distance for Protein Surfaces 2026-03-10T16:20:58Z Protein function is executed at the molecular surface, where shape and chemistry act together to govern interaction. Yet most comparison methods treat these aspects separately, privileging either global fold or local descriptors and missing their coupled organization. Here we introduce IFACE (Intrinsic Field-Aligned Coupled Embedding), a correspondence-based framework that aligns protein surfaces through probabilistic coupling of intrinsic geometry with spatially distributed chemical fields. From this alignment, we derive a joint geometric--chemical distance that integrates structural and physicochemical discrepancies within a single formulation. Across diverse proteins, this distance separates conformational variability from true structural divergence more effectively than fold-based similarity measures. Applied to the cytochrome P450 family, it reveals coherent family-level organization and identifies conserved buried catalytic pockets despite the complex topology. By linking interpretable surface correspondences with a unified distance, IFACE establishes a principled basis for comparing protein interfaces and detecting functionally related interaction patches across proteins. 2026-03-10T16:20:58Z Himanshu Swami John M. McBride Jean-Pierre Eckmann Tsvi Tlusty http://arxiv.org/abs/2501.17901v3 Molecular Fingerprints Are Strong Models for Peptide Function Prediction 2026-03-10T09:47:07Z Understanding peptide properties is often assumed to require modeling long-range molecular interactions, motivating the use of complex graph neural networks and pretrained transformers. Yet, whether such long-range dependencies are essential remains unclear. We investigate if simple, domain-specific molecular fingerprints can capture peptide function without these assumptions. Atomic-level representation aims to provide richer information than purely sequence-based models and better efficiency than structural ones. Across 132 datasets, including LRGB and five other peptide benchmarks, models using count-based ECFP, Topological Torsion, and RDKit fingerprints with LightGBM achieve state-of-the-art accuracy. Despite encoding only short-range molecular features, these models outperform GNNs and transformer-based approaches. Control experiments with sequence shuffling and amino acid counts confirm that fingerprints, though inherently local, suffice for robust peptide property prediction. Our results challenge the presumed necessity of long-range interaction modeling and highlight molecular fingerprints as efficient, interpretable, and computationally lightweight alternatives for peptide prediction. 2025-01-29T10:05:27Z Jakub Adamczyk Piotr Ludynia Wojciech Czech http://arxiv.org/abs/2603.08300v1 A thermodynamic metric quantitatively predicts disordered protein partitioning and multicomponent phase behavior 2026-03-09T12:24:36Z Intrinsically disordered regions (IDRs) of proteins mediate sequence-specific interactions underlying diverse cellular processes, including the formation of biomolecular condensates. Although IDRs strongly influence condensate compositions, quantitative frameworks that predict and explain their phase behavior in complex mixtures remain lacking. Here we introduce a thermodynamic model that quantitatively predicts the behavior of arbitrary combinations of IDRs across a wide range of concentrations, with accuracy comparable to state-of-the-art simulations. The model learns low-dimensional, context-independent representations of IDR sequences that combine to form mixture representations, producing context-dependent interactions. These representations define a thermodynamic metric space in which distances between IDRs correspond directly to differences in their thermodynamic properties. We show that the model predicts multicomponent phase diagrams in quantitative agreement with molecular simulations without being trained on free-energy or phase-coexistence data. The metric space provides geometrically intuitive predictions of IDR partitioning, multicomponent condensation, and context-dependent mutational effects, addressing several central problems in IDR biophysics within a single model. Systematic interrogation of the learned representations reveals how amino-acid composition and sequence patterning jointly determine mixture thermodynamics. Together, our results establish a unified and interpretable framework for predicting and understanding the behavior of complex mixtures of IDRs and other sequence-dependent biomolecules. 2026-03-09T12:24:36Z Includes Supplementary Information Zhuang Liu Beijia Yuan Mihir Rao Gautam Reddy William M. Jacobs http://arxiv.org/abs/2602.22263v2 CryoNet.Refine: A One-step Diffusion Model for Rapid Refinement of Structural Models with Cryo-EM Density Map Restraints 2026-03-09T08:34:19Z High-resolution structure determination by cryo-electron microscopy (cryo-EM) requires the accurate fitting of an atomic model into an experimental density map. Traditional refinement pipelines such as Phenix.real_space_refine and Rosetta are computationally expensive, demand extensive manual tuning, and present a significant bottleneck for researchers. We present CryoNet.Refine, an end-to-end deep learning framework that automates and accelerates molecular structure refinement. Our approach utilizes a one-step diffusion model that integrates a density-aware loss function with robust stereochemical restraints, enabling rapid optimization of a structure against experimental data. CryoNet.Refine provides a unified and versatile solution capable of refining protein complexes as well as DNA/RNA-protein complexes. In benchmarks against Phenix.real_space_refine, CryoNet.Refine consistently achieves substantial improvements in both model-map correlation and overall geometric quality metrics. By offering a scalable, automated, and powerful alternative, CryoNet.Refine aims to serve as an essential tool for next-generation cryo-EM structure refinement. Web server: https://cryonet.ai/refine; Source code: https://github.com/kuixu/cryonet.refine. 2026-02-25T04:18:18Z Published as a conference paper at ICLR 2026 Fuyao Huang Xiaozhu Yu Kui Xu Qiangfeng Cliff Zhang http://arxiv.org/abs/2603.07710v1 Reverse Distillation: Consistently Scaling Protein Language Model Representations 2026-03-08T16:24:05Z Unlike the predictable scaling laws in natural language processing and computer vision, protein language models (PLMs) scale poorly: for many tasks, models within the same family plateau or even decrease in performance, with mid-sized models often outperforming the largest in the family. We introduce Reverse Distillation, a principled framework that decomposes large PLM representations into orthogonal subspaces guided by smaller models of the same family. The resulting embeddings have a nested, Matryoshka-style structure: the first k dimensions of a larger model's embedding are exactly the representation from the smaller model. This ensures that larger reverse-distilled models consistently outperform smaller ones. A motivating intuition is that smaller models, constrained by capacity, preferentially encode broadly-shared protein features. Reverse distillation isolates these shared features and orthogonally extracts additional contributions from larger models, preventing interference between the two. On ProteinGym benchmarks, reverse-distilled ESM-2 variants outperform their respective baselines at the same embedding dimensionality, with the reverse-distilled 15 billion parameter model achieving the strongest performance. Our framework is generalizable to any model family where scaling challenges persist. Code and trained models are available at https://github.com/rohitsinghlab/plm_reverse_distillation. 2026-03-08T16:24:05Z Proceedings of ICLR 2026 Darius Catrina Christian Bepler Samuel Sledzieski Rohit Singh http://arxiv.org/abs/2505.23354v5 Representing local protein environments with machine learning force fields 2026-03-07T13:03:29Z The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments. 2025-05-29T11:25:47Z Meital Bojan Sanketh Vedula Advaith Maddipatla Nadav Bojan Sellam Anar Rzayev Federico Napoli Paul Schanda Alex M. Bronstein http://arxiv.org/abs/2603.07137v1 Preservation Constraints on aDNA Information Generation and the HSF Posterior Sourcing Framework: A First-Principles Critique of Conventional Methods 2026-03-07T10:11:38Z Fossil DNA preservation varies with depositional environments and diagenesis, producing fragments of heterogeneous origins and degradation states. We use first-principles biomolecular analysis to classify fossil molecular environments into four system types, distinguished by three orthogonal indicators: origin (H/h: host/heterologous), deamination status (D/d), and similarity ratio (S/s). Conventional aDNA pipelines assume a binary mix of endogenous host DNA and modern contaminants, overlooking multisource complexity from multiple species and time-averaged deposits. This leads to bias: authentic signals suppressed during enrichment, alignment, or damage filtering, and exogenous/ancient admixed fragments misassigned as endogenous, particularly in open systems. We introduce the HSF (Host/Species-specific Fragment) posterior traceability framework to address this. It treats fragments as primary units, maximizes source diversity, detects isolated sequences, defers lineage assignment to preserve uncertainty, and applies phylogenetic consistency to discriminate origins. Combined with preservation characterization (e.g., 3D imaging and volumetric openness assessment), it improves authenticity evaluation and reduces misassignment in mixed-signal samples. Case studies identify novel fossil DNA patterns (CRSRR and SRRA) and demonstrate superior performance compared with conventional methods. The HSF framework enhances aDNA reliability, extends molecular archaeology to challenging contexts, and aids genome evolution and lineage reconstruction. 2026-03-07T10:11:38Z 29 pages, 3 figures,4 tables, 23 references Wan-Qian Zhao Shu-Jie Zhang Zhan-Yong Guo Mei-Jun Li http://arxiv.org/abs/2603.06559v1 Sampling-based Continuous Optimization for Messenger RNA Design 2026-03-06T18:47:57Z Designing messenger RNA (mRNA) sequences for a fixed target protein requires searching an exponentially large synonymous space while optimizing properties that affect stability and downstream performance. This is challenging because practical mRNA design involves multiple coupled objectives beyond classical folding criteria, and different applications prefer different trade-offs. We propose a general sampling-based continuous optimization framework, inspired by SamplingDesign, that iteratively samples candidate synonymous sequences, evaluates them with black-box metrics, and updates a parameterized sampling distribution. Across a diverse UniProt protein set and the SARS-CoV-2 spike protein, our method consistently improves the chosen objective, with particularly strong gains on average unpaired probability and accessible uridine percentage compared to LinearDesign and EnsembleDesign. Moreover, our multi-objective COMBO formulation enables weight-controlled exploration of the design space and naturally extends to incorporate additional computable metrics. 2026-03-06T18:47:57Z Feipeng Yue Ning Dai Wei Yu Tang Tianshuo Zhou David H. Mathews Liang Huang http://arxiv.org/abs/2405.16861v3 BInD: Bond and Interaction-generating Diffusion Model for Multi-objective Structure-based Drug Design 2026-03-06T12:36:27Z Recent remarkable advancements in geometric deep generative models, coupled with accumulated structural data, enable structure-based drug design (SBDD) using only target protein information. However, existing models often struggle to balance multiple objectives, excelling only in specific tasks. BInD, a diffusion model with knowledge-based guidance, is introduced to address this limitation by co-generating molecules and their interactions with a target protein. This approach ensures balanced consideration of key objectives, including target-specific interactions, molecular properties, and local geometry. Comprehensive evaluations demonstrate that BInD achieves robust performance across all objectives, matching or surpassing state-of-the-art methods. Additionally, an NCI-driven molecule design and optimization method is proposed, enabling the enhancement of target binding and specificity by elaborating the adequate interaction patterns. 2024-05-27T06:26:55Z Published in Advanced Science 12(35), e02702 (2025) Advanced Science 12(35), e02702 (2025) Joongwon Lee Wonho Zhung Jisu Seo Woo Youn Kim 10.1002/advs.202502702 http://arxiv.org/abs/2507.03197v3 Quantifying Cross-Attention Interaction in Transformers for Interpreting TCR-pMHC Binding 2026-03-05T23:10:05Z CD8+ "killer" T cells and CD4+ "helper" T cells play a central role in the adaptive immune system by recognizing antigens presented by Major Histocompatibility Complex (pMHC) molecules via T Cell Receptors (TCRs). Modeling binding between T cells and the pMHC complex is fundamental to understanding basic mechanisms of human immune response as well as in developing therapies. While transformer-based models such as TULIP have achieved impressive performance in this domain, their black-box nature precludes interpretability and thus limits a deeper mechanistic understanding of T cell response. Most existing post-hoc explainable AI (XAI) methods are confined to encoder-only, co-attention, or model-specific architectures and cannot handle encoder-decoder transformers used in TCR-pMHC modeling. To address this gap, we propose Quantifying Cross-Attention Interaction (QCAI), a new post-hoc method designed to interpret the cross-attention mechanisms in transformer decoders. Quantitative evaluation is a challenge for XAI methods; we have compiled TCR-XAI, a benchmark consisting of 274 experimentally determined TCR-pMHC structures to serve as ground truth for binding. Using these structures we compute physical distances between relevant amino acid residues in the TCR-pMHC interaction region and evaluate how well our method and others estimate the importance of residues in this region across the dataset. We show that QCAI achieves state-of-the-art performance on both interpretability and prediction accuracy under the TCR-XAI benchmark. 2025-07-03T22:18:54Z The Fourteenth International Conference on Learning Representations (Project Page: https://qcai.jiarui.li/) Jiarui Li Zixiang Yin Haley Smith Zhengming Ding Samuel J. Landry Ramgopal R. Mettu http://arxiv.org/abs/2602.24007v2 Inference-time optimization for experiment-grounded protein ensemble generation 2026-03-04T22:31:58Z Protein function relies on dynamic conformational ensembles, yet current generative models like AlphaFold3 often fail to produce ensembles that match experimental data. Recent experiment-guided generators attempt to address this by steering the reverse diffusion process. However, these methods are limited by fixed sampling horizons and sensitivity to initialization, often yielding thermodynamically implausible results. We introduce a general inference-time optimization framework to solve these challenges. First, we optimize over latent representations to maximize ensemble log-likelihood, rather than perturbing structures post hoc. This approach eliminates dependence on diffusion length, removes initialization bias, and easily incorporates external constraints. Second, we present novel sampling schemes for drawing Boltzmann-weighted ensembles. By combining structural priors from AlphaFold3 with force-field-based priors, we sample from their product distribution while balancing experimental likelihoods. Our results show that this framework consistently outperforms state-of-the-art guidance, improving diversity, physical energy, and agreement with data in X-ray crystallography and NMR, often fitting the experimental data better than deposited PDB structures. Finally, inference-time optimization experiments maximizing ipTM scores reveal that perturbing AlphaFold3 embeddings can artificially inflate model confidence. This exposes a vulnerability in current design metrics, whose mitigation could offer a pathway to reduce false discovery rates in binder engineering. 2026-02-27T13:31:07Z Advaith Maddipatla Anar Rzayev Marco Pegoraro Martin Pacesa Paul Schanda Ailie Marx Sanketh Vedula Alex M. Bronstein http://arxiv.org/abs/2507.08474v2 RNA Dynamics and Interactions Revealed through Atomistic Simulations 2026-03-04T21:13:56Z RNA function is deeply intertwined with its conformational dynamics. In this review, we survey recent advances in the use of atomistic molecular dynamics simulations to characterize RNA dynamics in diverse contexts, including isolated molecules and complexes with ions, small molecules, or proteins. We highlight how enhanced sampling techniques and integrative approaches can improve both the precision and accuracy of the resulting structural ensembles. Finally, we examine the emerging role of artificial intelligence in accelerating progress in RNA modeling and simulation. 2025-07-11T10:34:19Z Accepted Manuscript Olivier Languin-Cattoën Giovanni Bussi 10.1146/annurev-physchem-082624-013453