https://arxiv.org/api/PKV198fCu0awbEUmp00cUE2BrZ82026-03-22T16:16:25Z664221015http://arxiv.org/abs/2511.14663v1ApexGen: Simultaneous design of peptide binder sequence and structure for target proteins2025-11-18T17:03:18ZPeptide-based drugs can bind to protein interaction sites that small molecules often cannot, and are easier to produce than large protein drugs. However, designing effective peptide binders is difficult. A typical peptide has an enormous number of possible sequences, and only a few of these will fold into the right 3D shape to match a given protein target. Existing computational methods either generate many candidate sequences without considering how they will fold, or build peptide backbones and then find suitable sequences afterward. Here we introduce ApexGen, a new AI-based framework that simultaneously designs a peptide's amino-acid sequence and its three-dimensional structure to fit a given protein target. For each target, ApexGen produces a full all-atom peptide model in a small number of deterministic integration steps. In tests on hundreds of protein targets, the peptides designed by ApexGen fit tightly onto their target surfaces and cover nearly the entire binding site. These peptides have shapes similar to those found in natural protein-peptide complexes, and they show strong predicted binding affinity in computational experiments. Because ApexGen couples sequence and structure design at every step of Euler integration within a flow-matching sampler, it is much faster and more efficient than prior approaches. This unified method could greatly accelerate the discovery of new peptide-based therapeutics.2025-11-18T17:03:18ZXiaoqiong XiaCesar de la Fuente-Nunezhttp://arxiv.org/abs/2511.12931v2cryoSENSE: Compressive Sensing Enables High-throughput Microscopy with Sparse and Generative Priors on the Protein Cryo-EM Image Manifold2025-11-18T15:32:11ZCryo-electron microscopy (cryo-EM) enables the atomic-resolution visualization of biomolecules; however, modern direct detectors generate data volumes that far exceed the available storage and transfer bandwidth, thereby constraining practical throughput. We introduce cryoSENSE, the computational realization of a hardware-software co-designed framework for compressive cryo-EM sensing and acquisition. We show that cryo-EM images of proteins lie on low-dimensional manifolds that can be independently represented using sparse priors in predefined bases and generative priors captured by a denoising diffusion model. cryoSENSE leverages these low-dimensional manifolds to enable faithful image reconstruction from spatial and Fourier-domain undersampled measurements while preserving downstream structural resolution. In experiments, cryoSENSE increases acquisition throughput by up to 2.5$\times$ while retaining the original 3D resolution, offering controllable trade-offs between the number of masked measurements and the level of downsampling. Sparse priors favor faithful reconstruction from Fourier-domain measurements and moderate compression, whereas generative diffusion priors achieve accurate recovery from pixel-domain measurements and more severe undersampling. Project website: https://cryosense.github.io.2025-11-17T03:37:35ZZain ShabeebDaniel SaeediDarin TsuiVida JamaliAmirali Aghazadehhttp://arxiv.org/abs/2511.14559v1Apo2Mol: 3D Molecule Generation via Dynamic Pocket-Aware Diffusion Models2025-11-18T15:01:27ZDeep generative models are rapidly advancing structure-based drug design, offering substantial promise for generating small molecule ligands that bind to specific protein targets. However, most current approaches assume a rigid protein binding pocket, neglecting the intrinsic flexibility of proteins and the conformational rearrangements induced by ligand binding, limiting their applicability in practical drug discovery. Here, we propose Apo2Mol, a diffusion-based generative framework for 3D molecule design that explicitly accounts for conformational flexibility in protein binding pockets. To support this, we curate a dataset of over 24,000 experimentally resolved apo-holo structure pairs from the Protein Data Bank, enabling the characterization of protein structure changes associated with ligand binding. Apo2Mol employs a full-atom hierarchical graph-based diffusion model that simultaneously generates 3D ligand molecules and their corresponding holo pocket conformations from input apo states. Empirical studies demonstrate that Apo2Mol can achieve state-of-the-art performance in generating high-affinity ligands and accurately capture realistic protein pocket conformational changes.2025-11-18T15:01:27ZAccepted by AAAI 2026Xinzhe ZhengShiyu JiangGustavo SeabraChenglong LiYanjun Lihttp://arxiv.org/abs/2511.13916v1Structural Flexibility of the TCF7L2-DNA Complex with the Type 2 Diabetes SNP rs79031462025-11-17T21:13:42ZThe single nucleotide polymorphism (SNP) rs7903146 in the TCF7L2 gene has been determined as one of the strongest common genetic risk factors for Type 2 Diabetes (T2D). The location of the SNP in a non-coding region suggests a regulatory mechanism, meaning the SNP doesn't change the protein's own structure but rather affects how the TCF7L2 protein binds to DNA to control other genes. This binding, however, is highly dependent on the shape and flexibility of the DNA. This study aims to reveal the atomic-level effects of the SNP's cytosine-to-thymine substitution on the TCF7L2-DNA complex. We first utilized AlphaFold to generate individual high-confidence structures of the TCF7L2 protein and two 15-base pair DNA duplexes: one containing the reference C allele and one containing the variant T allele. These structures were then used as inputs for Neurosnap's Boltz2 deep learning model to generate two complete protein-DNA complexes of the TCF7L2 HMG-box bound to each DNA variant. Using the iMODS server, we conducted a Normal Mode Analysis (NMA) to predict and compare large-scale flexibility and differences in interactions between the complexes. The protein-DNA interface was dissected using PDBsum to locate atomic contacts, clefts, and interaction maps. Overall, our results show that the T allele variant exhibits increased global stiffness with a higher eigenvalue and reduced flexibility, suggesting that the SNP disrupts the mechanism and biomechanical balance needed for efficient TCF7L2-DNA binding, thus affecting downstream gene regulation.2025-11-17T21:13:42Z10 pages, 6 figures. Accepted to the 2025 IEEE International Conference on E-Health and Bioengineering (EHB); conference proceedings to be indexed/published by Springer NatureKarthik VenuturimilliBerkeley National LaboratoryYang HaBerkeley National Laboratoryhttp://arxiv.org/abs/2511.13583v1Evaluating and Scoring Ebolavirus Protein-protein Docking Models Using PIsToN2025-11-17T16:46:34ZProtein-protein docking is crucial for understanding how proteins interact. Numerous docking tools have been developed to discover possible conformations of two interacting proteins. However, the reliability and success of these docking tools rely on their scoring function. Accurate and efficient scoring functions are necessary to distinguish between native and non-native docking models to ensure the accuracy of a docking tool. Like in other fields where deep learning methods have been successfully utilized, these methods have also introduced innovative scoring functions. An outstanding tool for scoring and differentiating native-like docking models from non-native or incorrect conformations is called Protein binding Interfaces with Transformer Networks (PIsToN). PIsToN significantly outperforms state-of-the-art scoring functions. Using models of complexes obtained from binding the Ebola Virus Protein VP40 to the host cell's Sec24c protein as an example, we show how to evaluate docking models using PIsToN.2025-11-17T16:46:34ZAzam ShiraliVitalii StebliankinJimeng ShiPrem ChapagainGiri Narasimhanhttp://arxiv.org/abs/2511.12843v1Treatment of phenol wastewater by electro-Fenton oxidative degradation based on efficient iron-based-gas diffusion-photocatalysis2025-11-17T00:14:00ZThis study introduces a novel iron-based gas diffusion electrode-photocatalytic system aimed at enhancing the degradation of phenolic compounds in wastewater. Phenolic compounds are toxic environmental pollutants with significant resistance to biodegradation. The traditional methods for treating phenol wastewater, including biological treatments and adsorption techniques, often fall short in achieving complete mineralization. Our approach utilizes a dual-chamber electrochemical setup integrating stainless steel felt-2-EAQ gas diffusion electrodes with TiO2 photocatalysis. This combination significantly boosts hydroxyl radical production, critical for effective pollutant breakdown. Experimentally, the system achieved up to 92% degradation efficiency for phenol at an optimized operating current of 10 mA/cm^2 in 3 hours, surpassing traditional methods. Additionally, energy consumption was reduced by 40% compared to conventional electro-Fenton systems. The stability tests indicated that the electrodes maintain over 80% of their initial activity after five cycles of use. These findings suggest that our system offers a more sustainable and efficient solution for treating phenolic wastewater by enhancing both degradation rates and energy efficiency.2025-11-17T00:14:00ZZhang JunyeZheng HongyuCheng JingranZhang Shenglihttp://arxiv.org/abs/2511.11860v1Understanding Molecular Basis of PTPN11-Related Diseases2025-11-14T20:37:27ZThe PTPN11 gene encodes the Src homology 2 domain-containing protein tyrosine phosphatase (SHP2), a key regulator of cell growth, differentiation, and apoptosis through its modulation of various signaling pathways, including the RAS/MAPK signaling pathway. Missense variants in PTPN11 disrupt SHP2's proper catalytic activity and the regulation of signaling pathways, leading to disorders such as Noonan syndrome (NS), LEOPARD syndrome (LS), or juvenile myelomonocytic leukemia (JMML). These missense variants have molecular disruptions resulting in gains and losses of function at both the molecular and phenotypic levels. Depending on their location within SHP2, missense substitutions disrupt inter-domain regulation or impair phosphatase function, resulting in altered phosphatase activity. In this study, we investigate the molecular basis underlying the differential pathogenicity of PTPN11 missense variants and predict the structural consequences of these variants using MutPred2 and AlphaFold2. We find that LOF and GOF variants display distinct functional mechanisms in sodium and DNA binding, and that NS-associated missense variants identified in fetuses with ultrasound-detected anomalies and familiar cases are more likely to be pathogenic.2025-11-14T20:37:27ZSeungha UmTulika KakatiLilia M IakouchevaYile ChenSean Mooneyhttp://arxiv.org/abs/2511.11128v1Effects of diode laser photobiomodulation on peri-implant inflammation and stability in orthodontic mini-implants: A randomized controlled trial2025-11-14T10:01:18ZPeri-implant inflammation in orthodontic mini-implant may lead to patient discomfort and treatment failure. This study aims to evaluate the effects of diode laser application on the health of mini-implant, preventing peri-implantitis and promoting healing. A randomized controlled trial was conducted involving 30 orthodontic patients (12 males and 18 females, aged 18-32) who had mini-implants implanted on both sides of the maxilla for anterior teeth retraction. One side of each patient was assigned to either an experimental group receiving diode laser irradiation (650 nm, 25 mW) at specific postoperative intervals or a control group receiving simulated radiation. Clinical assessments included plaque index, modified sulcus bleeding index, probing depth, and incidence of peri-implant mucositis and implant mobility, measured at 1, 4, and 12 weeks post-implantation. Additionally, interleukin-1 beta (IL-1\b{eta}) levels in peri-implant fluid were analyzed via enzyme-linked immunosorbent assay (ELISA). Results indicated that the experimental group exhibited significantly lower plaque indices, sulcus bleeding indices, and probing depths (p < 0.05) compared to the control group. Moreover, the experimental group had fewer cases of peri-implant mucositis (p < 0.05), while differences in implant stability were not statistically significant (p > 0.05). IL-1\b{eta} levels were consistently lower in the experimental group throughout the study duration (p < 0.05). In conclusion, adjunctive diode laser therapy appears to enhance peri-implant health and reduce complications associated with orthodontic mini-implants, suggesting a promising direction for improving patient outcomes in orthodontics. Future research should explore long-term effects and the mechanisms underlying these benefits.2025-11-14T10:01:18ZJun LiuLinlin LiXiaofei SunQiang Zhanghttp://arxiv.org/abs/2507.01725v3More sophisticated is not always better: comparison of similarity measures for unsupervised learning of pathways in biomolecular simulations2025-11-13T16:25:59ZFinding process pathways in molecular simulations such as the unbinding paths of small molecule ligands from their binding sites at protein targets in a set of trajectories via unsupervised learning approaches requires the definition of a suitable similarity measure between trajectories. We here evaluate the performance of four such measures with varying degree of sophistication, i.e., Euclidean and Wasserstein distances, Procrustes analysis and dynamical time warping, when analyzing trajectory data from two different biased simulation driving protocols in the form of constant velocity constraint targeted MD and steered MD. In a streptavidin-biotin benchmark system with known ground truth clusters, Wasserstein distances yielded the best clustering performance, closely followed by Euclidean distances, both being the most computationally efficient similarity measures. In a more complex A2a receptor-inhibitor system, however, the simplest measure, i.e., Euclidean distances, was sufficient to reveal meaningful and interpretable clusters.2025-07-02T13:58:46ZThis preprint is the unedited version of a manuscript that has been published as a peer-reviewed article in J. Phys. Chem. B. Copyright with the authors and ACSJäger, M., Wolf, S. More sophisticated is not always better: comparison of similarity measures for unsupervised learning of pathways in biomolecular simulations. J. Phys. Chem. B 2025, 129, 42, 10956-10966Miriam JägerSteffen Wolf10.1021/acs.jpcb.5c04586http://arxiv.org/abs/2501.05457v2How Evaluation Choices Distort the Outcome of Generative Drug Discovery2025-11-13T11:05:27Z"How to evaluate the de novo designs proposed by a generative model?" Despite the transformative potential of generative deep learning in drug discovery, this seemingly simple question has no clear answer. The absence of standardized guidelines challenges both the benchmarking of generative approaches and the selection of molecules for prospective studies. In this work, we take a fresh - critical and constructive - perspective on de novo design evaluation. By training chemical language models, we analyze approximately 1 billion molecule designs and discover principles consistent across different neural networks and datasets. We uncover a key confounder: the size of the generated molecular library significantly impacts evaluation outcomes, often leading to misleading model comparisons. We find increasing the number of designs as a remedy and propose new and compute-efficient metrics to compute at large-scale. We also identify critical pitfalls in commonly used metrics - such as uniqueness and distributional similarity - that can distort assessments of generative performance. To address these issues, we propose new and refined strategies for reliable model comparison and design evaluation. Furthermore, when examining molecule selection and sampling strategies, our findings reveal the constraints to diversify the generated libraries and draw new parallels and distinctions between deep learning and drug discovery. We anticipate our findings to help reshape evaluation pipelines in generative drug discovery, paving the way for more reliable and reproducible generative modeling approaches.2024-12-24T15:41:37ZÖzçelik, R., Grisoni, F. How evaluation choices distort the outcome of generative drug discovery. J Cheminform 17, 169 (2025)Rıza ÖzçelikFrancesca Grisoni10.1186/s13321-025-01108-yhttp://arxiv.org/abs/2511.09636v1Prebiotic Chemistry Insights for Dragonfly: Thermodynamics of Amino Acid Synthesis in Selk Crater on Titan2025-11-12T19:00:04ZSaturnian moon Titan presents a compelling testbed for probing prebiotic chemistry beyond early Earth. Impact-generated melt pools provide transient aqueous habitats in an otherwise cryogenic environment. We use Cantera equilibrium models to assess whether mixtures of hydrogen cyanide (HCN), acetylene (C2H2), and ammonia (NH3) can drive amino acid synthesis in Selk-sized craters. Across twenty-one amino acids (twenty proteinogenic plus beta-alanine), NH3-free systems yield only proline, alanine, and beta-alanine, whereas adding as little as 1% NH3 (relative to H2O) renders almost the full suite accessible, with yields peaking at 2% and tapering thereafter. The NH3-free alanine result implies alternative pathways beyond classical Strecker or aminonitrile hydrolysis, suggesting acetylene, abundant on Titan but scarce on early Earth, as a plausible feedstock. We identify acrylonitrile (detected on Titan) as a thermodynamically favorable intermediate that can convert to alanine under aqueous conditions in an NH3-free pathway. For glycine and alanine production from nitrile hydrolysis, comparison with laboratory kinetics shows that our equilibrium models predict near-complete conversion, while observed rates yield only partial products over weeks. Yet estimated chemical equilibration times (years-centuries) are far shorter than melt lifetimes, supporting plausibility of equilibrium in situ. These predictions are directly testable with Dragonfly mass spectrometer (DraMS), for which we recommend pre-flight standards to test proline, alanine, beta-alanine, cysteine, and methionine. The first three offer the best chances for amino acid detection regardless of ammonia availability; the latter two offer diagnostic tools for determining the presence of reactive sulfur in post-impact Titan ponds.2025-11-12T19:00:04ZNote that there is an appendix after the references of the main manuscriptIshaan MadanBen K. D. Pearce10.3847/PSJ/ae1c18http://arxiv.org/abs/2503.05738v2Learning conformational ensembles of proteins based on backbone geometry2025-11-12T11:45:06ZDeep generative models have recently been proposed for sampling protein conformations from the Boltzmann distribution, as an alternative to often prohibitively expensive Molecular Dynamics simulations. However, current state-of-the-art approaches rely on fine-tuning pre-trained folding models and evolutionary sequence information, limiting their applicability and efficiency, and introducing potential biases. In this work, we propose a flow matching model for sampling protein conformations based solely on backbone geometry - BBFlow. We introduce a geometric encoding of the backbone equilibrium structure as input and propose to condition not only the flow but also the prior distribution on the respective equilibrium structure, eliminating the need for evolutionary information. The resulting model is orders of magnitudes faster than current state-of-the-art approaches at comparable accuracy, is transferable to multi-chain proteins, and can be trained from scratch in a few GPU days. In our experiments, we demonstrate that the proposed model achieves competitive performance with reduced inference time, across not only an established benchmark of naturally occurring proteins but also de novo proteins, for which evolutionary information is scarce or absent. BBFlow is available at https://github.com/graeter-group/bbflow.2025-02-19T17:16:27ZTo be published in proceedings of NeurIPS 2025Nicolas WolfLeif SeuteVsevolod ViliugaSimon WagnerJan StühmerFrauke Gräterhttp://arxiv.org/abs/2511.09183v1Measuring irreversibility in stochastic systems by categorizing single-molecule displacements2025-11-12T10:25:09ZQuantifying the irreversibility and dissipation of non-equilibrium processes is crucial to understanding their behavior, assessing their possible capabilities, and characterizing their efficiency. We introduce a physical quantity that quantifies the irreversibility of stochastic Langevin systems from the observation of individual molecules' displacements. Categorizing these displacements into a few groups based on their initial and final position allows us to measure irreversibility precisely without the need to know the forces and magnitude of the fluctuations acting on the system. Our model-free estimate of irreversibility is related to entropy production by a conditional fluctuation theorem and provides a lower bound to the average entropy production. We validate the method on single-molecule force spectroscopy experiments of proteins subject to force ramps. We show that irreversibility is sensitive to detailed features of the energy landscape underlying the protein folding dynamics and suggest how our methods can be employed to unveil key properties of protein folding processes.2025-11-12T10:25:09Z12 pages Main Text, 5 pages Appendix, 18 pages Supplementary MaterialAlvaro LanzaInés Martínez-MartínRafael Tapia-RojoStefano Bohttp://arxiv.org/abs/2511.08564v1Genetically encoding stimulated Raman-scattering probes for cell imaging using infrared fluorescent proteins2025-11-11T18:47:17ZStimulated Raman scattering (SRS) microscopy offers great potential to surpass fluorescent-based approaches, owing to the sharp linewidth of Raman vibrations amenable to super-multiplex cell imaging, but currently lacks one crucial component: genetically encodable tags equivalent to fluorescent proteins. Here, we show that infrared fluorescent proteins (IRFPs) can be used as genetically encoded SRS probes and benefit from the electronic pre-resonant SRS enhancement effect with near-infrared exciting pulses, comparable to synthetic dyes reported in the literature. SRS imaging of the nucleus in mammalian cells is demonstrated where a histone protein is fused to an IRFP. This work opens the route towards Raman-based cell imaging using genetically encoded probes, motivating efforts in solving the challenges of photostability and creating a vibrational palette.2025-11-11T18:47:17ZDavid ReganOzan AksakalAthena ZittiJohn McLarnonMagdalena Lipka-LloydPierre J. RizkallahAnna J. WarrenPeter D. WatsonWolfgang LangbeinD. Dafydd JonesPaola Borrihttp://arxiv.org/abs/2510.07286v2Evolutionary Profiles for Protein Fitness Prediction2025-11-11T18:18:41ZPredicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available at https://github.com/aim-uofa/EvoIF.2025-10-08T17:46:02ZJigang FanXiaoran JiaoShengdong LinZhanming LiangWeian MaoChenchen JingHao ChenChunhua Shen