https://arxiv.org/api/OWUWD2yVtLhRqxGeJyn3STONvAg 2026-03-23T02:15:53Z 6644 315 15 http://arxiv.org/abs/2510.04176v1 Relief of EGFR/FOS-downregulated miR-103a by loganin alleviates NF-kappaB-triggered inflammation and gut barrier disruption in colitis 2025-10-05T12:36:31Z

Due to the ever-rising global incidence rate of inflammatory bowel disease (IBD) and the lack of effective clinical treatment drugs, elucidating the detailed pathogenesis, seeking novel targets, and developing promising drugs are the top priority for IBD treatment. Here, we demonstrate that the levels of microRNA (miR)-103a were significantly downregulated in the inflamed mucosa of ulcerative colitis (UC) patients, along with elevated inflammatory cytokines (IL-1beta/TNF-alpha) and reduced tight junction protein (Occludin/ZO-1) levels, as compared with healthy control objects. Consistently, miR-103a deficient intestinal epithelial cells Caco-2 showed serious inflammatory responses and increased permeability, and DSS induced more severe colitis in miR-103a-/- mice than wild-type ones. Mechanistic studies unraveled that c-FOS suppressed miR-103a transcription via binding to its promoter, then miR-103a-targeted NF-kappaB activation contributes to inflammatory responses and barrier disruption by targeting TAB2 and TAK1. Notably, the traditional Chinese medicine Cornus officinalis (CO) and its core active ingredient loganin potently mitigated inflammation and barrier disruption in UC by specifically blocking the EGFR/RAS/ERK/c-FOS signaling axis, these effects mainly attributed to modulated miR-103a levels as the therapeutic activities of them were almost completely shielded in miR-103a KO mice. Taken together, this work reveals that loganin relieves EGFR/c-FOS axis-suppressed epithelial miR-103a expression, thereby inhibiting NF-kappaB pathway activation, suppressing inflammatory responses, and preserving tight junction integrity in UC. Thus, our data enrich mechanistic insights and promising targets for UC treatment.

2025-10-05T12:36:31Z Yan Li Teng Hui Xinhui Zhang Zihan Cao Ping Wang Shirong Chen Ke Zhao Yiran Liu Yue Yuan Dou Niu Xiaobo Yu Gan Wang Changli Wang Yan Lin Fan Zhang Hefang Wu Guodong Feng Yan Liu Jiefang Kang Yaping Yan Hai Zhang Xiaochang Xue Xun Jiang http://arxiv.org/abs/2506.03157v3 UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules 2025-10-04T14:13:31Z

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

2025-05-20T14:29:06Z ICML 2025 poster Ziyang Yu Wenbing Huang Yang Liu http://arxiv.org/abs/2412.05788v2 On diffusion posterior sampling via sequential Monte Carlo for zero-shot scaffolding of protein motifs 2025-10-03T12:25:21Z

With the advent of diffusion models, new proteins can be generated at an unprecedented rate. The motif scaffolding problem requires steering this generative process to yield proteins with a desirable functional substructure called a motif. While models have been trained to take the motif as conditional input, recent techniques in diffusion posterior sampling can be leveraged as zero-shot alternatives whose approximations can be corrected with sequential Monte Carlo (SMC) algorithms. In this work, we introduce a new set of guidance potentials for describing scaffolding tasks and solve them by adapting SMC-aided diffusion posterior samplers with an unconditional model, Genie, as a prior. In single motif problems, we find that (i) the proposed potentials perform comparably, if not better, than the conventional masking approach, (ii) samplers based on reconstruction guidance outperform their replacement method counterparts, and (iii) measurement tilted proposals and twisted targets improve performance substantially. Furthermore, as a demonstration, we provide solutions to two multi-motif problems by pairing reconstruction guidance with an SE(3)-invariant potential. We also produce designable internally symmetric monomers with a guidance potential for point symmetry constraints. Our code is available at: https://github.com/matsagad/mres-project.

2024-12-08T02:38:59Z Published in Transactions on Machine Learning Research (09/2025). Reviewed on OpenReview: https://openreview.net/forum?id=KXRYY7iwqh James Matthew Young O. Deniz Akyildiz http://arxiv.org/abs/2510.02734v1 SAE-RNA: A Sparse Autoencoder Model for Interpreting RNA Language Model Representations 2025-10-03T05:34:59Z

Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein advances (e.g., ESM) inspiring emerging RNA language models such as RiNALMo. Yet how and what these RNA Language Models internally encode about messenger RNA (mRNA) or non-coding RNA (ncRNA) families remains unclear. We present SAE- RNA, interpretability model that analyzes RiNALMo representations and maps them to known human-level biological features. Our work frames RNA interpretability as concept discovery in pretrained embeddings, without end-to-end retraining, and provides practical tools to probe what RNA LMs may encode about ncRNA families. The model can be extended to close comparisons between RNA groups, and supporting hypothesis generation about previously unrecognized relationships.

2025-10-03T05:34:59Z preprint Taehan Kim Sangdae Nam http://arxiv.org/abs/2510.00352v2 AReUReDi: Annealed Rectified Updates for Refining Discrete Flows with Multi-Objective Guidance 2025-10-03T00:49:30Z

Designing sequences that satisfy multiple, often conflicting, objectives is a central challenge in therapeutic and biomolecular engineering. Existing generative frameworks largely operate in continuous spaces with single-objective guidance, while discrete approaches lack guarantees for multi-objective Pareto optimality. We introduce AReUReDi (Annealed Rectified Updates for Refining Discrete Flows), a discrete optimization algorithm with theoretical guarantees of convergence to the Pareto front. Building on Rectified Discrete Flows (ReDi), AReUReDi combines Tchebycheff scalarization, locally balanced proposals, and annealed Metropolis-Hastings updates to bias sampling toward Pareto-optimal states while preserving distributional invariance. Applied to peptide and SMILES sequence design, AReUReDi simultaneously optimizes up to five therapeutic properties (including affinity, solubility, hemolysis, half-life, and non-fouling) and outperforms both evolutionary and diffusion-based baselines. These results establish AReUReDi as a powerful, sequence-based framework for multi-property biomolecule generation.

2025-09-30T23:33:33Z Tong Chen Yinuo Zhang Pranam Chatterjee http://arxiv.org/abs/2411.15684v7 Disentangling the Complex Multiplexed DIA Spectra in De Novo Peptide Sequencing 2025-10-02T21:16:38Z

Data-Independent Acquisition (DIA) was introduced to improve sensitivity to cover all peptides in a range rather than only sampling high-intensity peaks as in Data-Dependent Acquisition (DDA) mass spectrometry. However, it is not very clear how useful DIA data is for de novo peptide sequencing as the DIA data are marred with coeluted peptides, high noises, and varying data quality. We present a new deep learning method DIANovo, and address each of these difficulties, and improves the previous established systems by a large margin, via equipping the model with a deeper understanding of coeluted DIA spectra. This paper also provides criteria about when DIA data could be used for de novo peptide sequencing and when not to by providing a comparison between DDA and DIA, in both de novo and database search mode. We find that while DIA excels with narrow isolation windows on older-generation instruments, it loses its advantage with wider windows. However, with Orbitrap Astral, DIA consistently outperforms DDA due to narrow window mode enabled. We also provide a theoretical explanation of this phenomenon, emphasizing the critical role of the signal-to-noise profile in the successful application of de novo sequencing.

2024-11-24T02:10:29Z Zheng Ma Zeping Mao Ruixue Zhang Jiazhen Chen Lei Xin Baozhen Shan Ali Ghodsi Ming Li http://arxiv.org/abs/2510.02259v1 Transformers Discover Molecular Structure Without Graph Priors 2025-10-02T17:42:10Z

Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinates$\unicode{x2013}$without predefined graphs or physical priors$\unicode{x2013}$can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patterns$\unicode{x2013}$such as attention weights that decay inversely with interatomic distance$\unicode{x2013}$and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.

2025-10-02T17:42:10Z Tobias Kreiman Yutong Bai Fadi Atieh Elizabeth Weaver Eric Qu Aditi S. Krishnapriyan http://arxiv.org/abs/2506.06305v2 Template-Guided 3D Molecular Pose Generation via Flow Matching and Differentiable Optimization 2025-10-02T11:50:49Z

Predicting the 3D conformation of small molecules within protein binding sites is a key challenge in drug design. When a crystallized reference ligand (template) is available, it provides geometric priors that can guide 3D pose prediction. We present a two-stage method for ligand conformation generation guided by such templates. In the first stage, we introduce a molecular alignment approach based on flow-matching to generate 3D coordinates for the ligand, using the template structure as a reference. In the second stage, a differentiable pose optimization procedure refines this conformation based on shape and pharmacophore similarities, internal energy, and, optionally, the protein binding pocket. We introduce a new benchmark of ligand pairs co-crystallized with the same target to evaluate our approach and show that it outperforms standard docking tools and open-access alignment methods, especially in cases involving low similarity to the template or high ligand flexibility.

2025-05-22T09:50:51Z Noémie Bergues Arthur Carré Paul Join-Lambert Brice Hoffmann Arnaud Blondel Hamza Tajmouati http://arxiv.org/abs/2510.01890v1 Folding lattice proteins confined on minimal grids using a quantum-inspired encoding 2025-10-02T10:58:31Z

Steric clashes pose a challenge when exploring dense protein systems using conventional explicit-chain methods. A minimal example is a single lattice protein confined on a minimal grid, with no free sites. Finding its minimum energy is a hard optimization problem, withsimilarities to scheduling problems. It can be recast as a quadratic unconstrained binary optimization (QUBO) problem amenable to classical and quantum approaches. We show that this problem in its QUBO form can be swiftly and consistently solved for chain length 48, using either classical simulated annealing or hybrid quantum-classical annealing on a D-Wave system. In fact, the latter computations required about 10 seconds. We also test linear and quadratic programming methods, which work well for a lattice gas but struggle with chain constraints. All methods are benchmarked against exact results obtained from exhaustive structure enumeration, at a high computational cost.

2025-10-02T10:58:31Z 22 pages, 5 figures Phys. Rev. E 112, 045302 (2025) Anders Irbäck Lucas Knuthson Sandipan Mohanty 10.1103/8n7p-7lh2 http://arxiv.org/abs/2510.01632v1 BioBlobs: Differentiable Graph Partitioning for Protein Representation Learning 2025-10-02T03:25:02Z

Protein function is driven by coherent substructures which vary in size and topology, yet current protein representation learning models (PRL) distort these signals by relying on rigid substructures such as k-hop and fixed radius neighbourhoods. We introduce BioBlobs, a plug-and-play, fully differentiable module that represents proteins by dynamically partitioning structures into flexibly-sized, non-overlapping substructures ("blobs"). The resulting blobs are quantized into a shared and interpretable codebook, yielding a discrete vocabulary of function-relevant protein substructures used to compute protein embeddings. We show that BioBlobs representations improve the performance of widely used protein encoders such as GVP-GNN across various PRL tasks. Our approach highlights the value of architectures that directly capture function-relevant protein substructures, enabling both improved predictive performance and mechanistic insight into protein function.

2025-10-02T03:25:02Z Xin Wang Carlos Oliver http://arxiv.org/abs/2510.01571v1 From Supervision to Exploration: What Does Protein Language Model Learn During Reinforcement Learning? 2025-10-02T01:31:10Z

Protein language models (PLMs) have advanced computational protein science through large-scale pretraining and scalable architectures. In parallel, reinforcement learning (RL) has broadened exploration and enabled precise multi-objective optimization in protein design. Yet whether RL can push PLMs beyond their pretraining priors to uncover latent sequence-structure-function rules remains unclear. We address this by pairing RL with PLMs across four domains: antimicrobial peptide design, kinase variant optimization, antibody engineering, and inverse folding. Using diverse RL algorithms and model classes, we ask if RL improves sampling efficiency and, more importantly, if it reveals capabilities not captured by supervised learning. Across benchmarks, RL consistently boosts success rates and sample efficiency. Performance follows a three-factor interaction: task headroom, reward fidelity, and policy capacity jointly determine gains. When rewards are accurate and informative, policies have sufficient capacity, and tasks leave room beyond supervised baselines, improvements scale; when rewards are noisy or capacity is constrained, gains saturate despite exploration. This view yields practical guidance for RL in protein design: prioritize reward modeling and calibration before scaling policy size, match algorithm and regularization strength to task difficulty, and allocate capacity where marginal gains are largest. Implementation is available at https://github.com/chq1155/RL-PLM.

2025-10-02T01:31:10Z 24 pages, 7 figures, 4 tables Hanqun Cao Hongrui Zhang Junde Xu Zhou Zhang Lingdong Shen Minghao Sun Ge Liu Jinbo Xu Wu-Jun Li Jinren Ni Cesar de la Fuente-Nunez Tianfan Fu Yejin Choi Pheng-Ann Heng Fang Wu http://arxiv.org/abs/2510.17826v1 Speak to a Protein: An Interactive Multimodal Co-Scientist for Protein Analysis 2025-10-01T22:12:34Z

Building a working mental model of a protein typically requires weeks of reading, cross-referencing crystal and predicted structures, and inspecting ligand complexes, an effort that is slow, unevenly accessible, and often requires specialized computational skills. We introduce \emph{Speak to a Protein}, a new capability that turns protein analysis into an interactive, multimodal dialogue with an expert co-scientist. The AI system retrieves and synthesizes relevant literature, structures, and ligand data; grounds answers in a live 3D scene; and can highlight, annotate, manipulate and see the visualization. It also generates and runs code when needed, explaining results in both text and graphics. We demonstrate these capabilities on relevant proteins, posing questions about binding pockets, conformational changes, or structure-activity relationships to test ideas in real-time. \emph{Speak to a Protein} reduces the time from question to evidence, lowers the barrier to advanced structural analysis, and enables hypothesis generation by tightly coupling language, code, and 3D structures. \emph{Speak to a Protein} is freely accessible at https://open.playmolecule.org.

2025-10-01T22:12:34Z Carles Navarro Mariona Torrens Philipp Thölke Stefan Doerr Gianni De Fabritiis http://arxiv.org/abs/2509.25479v2 Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design 2025-10-01T20:52:17Z

Recent advances in structure-based protein design have accelerated de novo binder generation, yet interfaces on large domains or spanning multiple domains remain challenging due to high computational cost and declining success with increasing target size. We hypothesized that protein folding neural networks (PFNNs) operate in a ``local-first'' manner, prioritizing local interactions while displaying limited sensitivity to global foldability. Guided by this hypothesis, we propose an epitope-only strategy that retains only the discontinuous surface residues surrounding the binding site. Compared to intact-domain workflows, this approach improves in silico success rates by up to 80% and reduces the average time per successful design by up to forty-fold, enabling binder design against previously intractable targets such as ClpP and ALS3. Building on this foundation, we further developed a tailored pipeline that incorporates a Monte Carlo-based evolution step to overcome local minima and a position-specific biased inverse folding step to refine sequence patterns. Together, these advances not only establish a generalizable framework for efficient binder design against structurally large and otherwise inaccessible targets, but also support the broader ``local-first'' hypothesis as a guiding principle for PFNN-based design.

2025-09-29T20:33:32Z Accepted by NeurIPS2025-AI4Science Zhenfeng Deng Ruijie Hou Ningrui Xie Mike Tyers Michał Koziarski http://arxiv.org/abs/2510.14989v1 Constrained Diffusion for Protein Design with Hard Structural Constraints 2025-10-01T17:55:45Z

Diffusion models offer a powerful means of capturing the manifold of realistic protein structures, enabling rapid design for protein engineering tasks. However, existing approaches observe critical failure modes when precise constraints are necessary for functional design. To this end, we present a constrained diffusion framework for structure-guided protein design, ensuring strict adherence to functional requirements while maintaining precise stereochemical and geometric feasibility. The approach integrates proximal feasibility updates with ADMM decomposition into the generative process, scaling effectively to the complex constraint sets of this domain. We evaluate on challenging protein design tasks, including motif scaffolding and vacancy-constrained pocket design, while introducing a novel curated benchmark dataset for motif scaffolding in the PDZ domain. Our approach achieves state-of-the-art, providing perfect satisfaction of bonding and geometric constraints with no degradation in structural diversity.

2025-10-01T17:55:45Z Jacob K. Christopher Austin Seamann Jingyi Cui Sagar Khare Ferdinando Fioretto http://arxiv.org/abs/2510.01108v1 Integrative modelling of biomolecular dynamics 2025-10-01T16:54:10Z

Much of our mechanistic understanding of the functions of biological macromolecules is based on static structural experiments, which can be modelled either as single structures or conformational ensembles. While these provide us with invaluable insights, they do not directly reveal that molecules are inherently dynamic. Advances in time-dependent and time-resolved experimental methods have made it possible to capture the dynamics of biomolecules at increasingly higher spatial and temporal resolutions. To complement these, computational models can represent the structural and dynamical behaviour of biomolecules at atomistic resolution and femtosecond timescale, and are therefore useful to interpret these experiments. Here, we review the progress in integrating simulations with dynamical experiments, focusing on the combination of simulations with time-resolved and time-dependent experimental data.

2025-10-01T16:54:10Z 7 pages, 2 figures, 1 table Daria Gusew Carl G. Henning Hansen Kresten Lindorff-Larsen