https://arxiv.org/api/VX4XpyHJkYqoYs8VKpbO4CXIWV8 2026-03-23T00:33:12Z 6642 300 15 http://arxiv.org/abs/2510.24732v1 Flows, straight but not so fast: Exploring the design space of Rectified Flows in Protein Design 2025-10-13T07:36:40Z

Generative modeling techniques such as Diffusion and Flow Matching have achieved significant successes in generating designable and diverse protein backbones. However, many current models are computationally expensive, requiring hundreds or even thousands of function evaluations (NFEs) to yield samples of acceptable quality, which can become a bottleneck in practical design campaigns that often generate $10^4\ -\ 10^6$ designs per target. In image generation, Rectified Flows (ReFlow) can significantly reduce the required NFEs for a given target quality, but their application in protein backbone generation has been less studied. We apply ReFlow to improve the low NFE performance of pretrained SE(3) flow matching models for protein backbone generation and systematically study ReFlow design choices in the context of protein generation in data curation, training and inference time settings. In particular, we (1) show that ReFlow in the protein domain is particularly sensitive to the choice of coupling generation and annealing, (2) demonstrate how useful design choices for ReFlow in the image domain do not directly translate to better performance on proteins, and (3) make improvements to ReFlow methodology for proteins.

2025-10-13T07:36:40Z Junhua Chen Simon Mathis Charles Harris Kieran Didi Pietro Lio http://arxiv.org/abs/2506.04235v2 AbBiBench: A Benchmark for Antibody Binding Affinity Maturation and Design 2025-10-10T23:13:44Z

We introduce AbBiBench (Antibody Binding Benchmarking), a benchmarking framework for antibody binding affinity maturation and design. Unlike previous strategies that evaluate antibodies in isolation, typically by comparing them to natural sequences with metrics such as amino acid recovery rate or structural RMSD, AbBiBench instead treats the antibody-antigen (Ab-Ag) complex as the fundamental unit. It evaluates an antibody design's binding potential by measuring how well a protein model scores the full Ab-Ag complex. We first curate, standardize, and share more than 184,500 experimental measurements of antibody mutants across 14 antibodies and 9 antigens-including influenza, lysozyme, HER2, VEGF, integrin, Ang2, and SARS-CoV-2-covering both heavy-chain and light-chain mutations. Using these datasets, we systematically compare 15 protein models including masked language models, autoregressive language models, inverse folding models, diffusion-based generative models, and geometric graph models by comparing the correlation between model likelihood and experimental affinity values. Additionally, to demonstrate AbBiBench's generative utility, we apply it to antibody F045-092 in order to introduce binding to influenza H1N1. We sample new antibody variants with the top-performing models, rank them by the structural integrity and biophysical properties of the Ab-Ag complex, and assess them with in vitro ELISA binding assays. Our findings show that structure-conditioned inverse folding models outperform others in both affinity correlation and generation tasks. Overall, AbBiBench provides a unified, biologically grounded evaluation framework to facilitate the development of more effective, function-aware antibody design models.

2025-05-23T21:09:04Z Xinyan Zhao Yi-Ching Tang Akshita Singh Victor J Cantu KwanHo An Junseok Lee Adam E Stogsdill Ibraheem M Hamdi Ashwin Kumar Ramesh Zhiqiang An Xiaoqian Jiang Yejin Kim http://arxiv.org/abs/2508.09143v2 An Angle-Based Algorithmic Framework for the Interval Discretizable Distance Geometry Problem 2025-10-10T17:06:57Z

Distance Geometry plays a central role in determining protein structures from Nuclear Magnetic Resonance (NMR) data, a task known as the Molecular Distance Geometry Problem (MDGP). A subclass of this problem, the Discretizable Distance Geometry Problem (DDGP), allows a recursive solution via the combinatorial Branch-and-Prune (BP) algorithm by exploiting specific vertex orderings in protein backbones. To accommodate the inherent uncertainty in NMR data, the interval Branch-and-Prune (\textit{i}BP) algorithm was introduced, incorporating interval distance constraints through uniform sampling. In this work, we propose two new algorithmic frameworks for solving the three-dimensional interval DDGP (\textit{i}DDGP): the interval Angular Branch-and-Prune (\textit{i}ABP), and its extension, the interval Torsion-angle Branch-and-Prune (\textit{i}TBP). These methods convert interval distances into angular constraints, enabling structured sampling over circular arcs. The \textit{i}ABP method guarantees feasibility by construction and removes the need for explicit constraint checking. The \textit{i}TBP algorithm further incorporates known torsion angle intervals, enforcing local chirality and planarity conditions critical for protein geometry. We present formal mathematical foundations for both methods and a systematic strategy for generating biologically meaningful \textit{i}DDGP instances from the Protein Data Bank (PDB) structures. Computational experiments demonstrate that both \textit{i}ABP and \textit{i}TBP consistently outperform \textit{i}BP in terms of solution rate and computational efficiency. In particular, \textit{i}TBP yields solutions with lower RMSD variance relative to the original PDB structures, better reflecting biologically plausible conformations.

2025-07-29T13:31:39Z Wagner A. A. da Rocha Carlile Lavor Leo Liberti Leticia de Melo Costa Leonardo D. Secchin Therese E. Malliavin http://arxiv.org/abs/2510.09372v1 Design of DNA Strand Displacement Reactions 2025-10-10T13:28:10Z

DNA strand displacement (SD) reactions are central to the operation of many synthetic nucleic acid systems, including molecular circuits, sensors, and machines. Over the years, a broad set of design frameworks has emerged to accommodate various functional goals, initial configurations, and environmental conditions. Nevertheless, key challenges persist, particularly in reliably predicting reaction kinetics. This review examines recent approaches to SD reaction design, with emphasis on the properties of single reactions, including kinetics, structural factors, and limitations in current modelling practices. We identify promising innovations while analysing the factors that continue to hinder predictive accuracy. We conclude by outlining future directions for achieving more robust and programmable behaviour in DNA-based systems.

2025-10-10T13:28:10Z 16 pages, 3 figures. Invited review article Križan Jurinović Merry Mitra Rakesh Mukherjee Thomas E. Ouldridge http://arxiv.org/abs/2510.08971v1 Communication System Design using Synthetic Photoisomerizable Azobenzene-Regulated K+(SPARK) channel 2025-10-10T03:25:42Z

Biomolecules exhibit a remarkable property of transforming signals from their environment. This paper presents a communication system design using a light-modulated protein channel: Synthetic Photoisomerizable Azobenzene-regulated K+ (SPARK). Our approach involves a comprehensive design incorporating the SPARK-based receiver, encoding methods, modulation techniques, and detection processes. By analyzing the resulting communication system, we determine how different parameters influence its performance. Furthermore, we explore the potential design in terms of bioengineering and demonstrate that the data rate scales up with the number of receptors, indicating the possibility of achieving high-speed communication.

2025-10-10T03:25:42Z arXiv admin note: text overlap with arXiv:2411.05236 Taha Sajjad Andrew W. Eckford http://arxiv.org/abs/2403.08744v2 GTP before ATP: The energy currency at the origin of genes 2025-10-09T10:24:43Z

Life is an exergonic chemical reaction. Many individual reactions in metabolism entail slightly endergonic steps that are coupled to free energy release, typically as ATP hydrolysis, in order to go forward. ATP is almost always supplied by the rotor-stator ATP synthase, which harnesses chemiosmotic ion gradients. Because the ATP synthase is a protein, it arose after the ribosome did. What was the energy currency of metabolism before the origin of the ATP synthase and how (and why) did ATP come to be the universal energy currency? About 27% of a cell's energy budget is consumed as GTP during translation. The universality of GTP-dependence in ribosome function indicates that GTP was the ancestral energy currency of protein synthesis. The use of GTP in translation and ATP in small molecule synthesis are conserved across all lineages, representing energetic compartments that arose in the last universal common ancestor, LUCA. And what came before GTP? Recent findings indicate that the energy supporting the origin of LUCA's metabolism stemmed from H2-dependent CO2 reduction along routes that strongly resemble the reactions and transition metal catalysts of the acetyl-CoA pathway.

2024-03-13T17:46:50Z 32 pages, 5 figures, 2 tables Natalia Mrnjavac William F. Martin http://arxiv.org/abs/2503.16278v3 Unified Cross-Scale 3D Generation and Understanding via Autoregressive Modeling 2025-10-09T02:59:45Z

3D structure modeling is essential across scales, enabling applications from fluid simulation and 3D reconstruction to protein folding and molecular docking. Yet, despite shared 3D spatial patterns, current approaches remain fragmented, with models narrowly specialized for specific domains and unable to generalize across tasks or scales. We propose Uni-3DAR, a unified autoregressive framework for cross-scale 3D generation and understanding. At its core is a coarse-to-fine tokenizer based on octree data structures, which compresses diverse 3D structures into compact 1D token sequences. We further propose a two-level subtree compression strategy, which reduces the octree token sequence by up to 8x. To address the challenge of dynamically varying token positions introduced by compression, we introduce a masked next-token prediction strategy that ensures accurate positional modeling, significantly boosting model performance. Extensive experiments across multiple 3D generation and understanding tasks, including small molecules, proteins, polymers, crystals, and macroscopic 3D objects, validate its effectiveness and versatility. Notably, Uni-3DAR surpasses previous state-of-the-art diffusion models by a substantial margin, achieving up to 256\% relative improvement while delivering inference speeds up to 21.8x faster.

2025-03-20T16:07:04Z Shuqi Lu Haowei Lin Lin Yao Zhifeng Gao Xiaohong Ji Yitao Liang Weinan E Linfeng Zhang Guolin Ke http://arxiv.org/abs/2509.03487v2 SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models 2025-10-08T17:47:56Z

Proteins play crucial roles in almost all biological processes. The advancement of deep learning has greatly accelerated the development of protein foundation models, leading to significant successes in protein understanding and design. However, the lack of systematic red-teaming for these models has raised serious concerns about their potential misuse, such as generating proteins with biological safety risks. This paper introduces SafeProtein, the first red-teaming framework designed for protein foundation models to the best of our knowledge. SafeProtein combines multimodal prompt engineering and heuristic beam search to systematically design red-teaming methods and conduct tests on protein foundation models. We also curated SafeProtein-Bench, which includes a manually constructed red-teaming benchmark dataset and a comprehensive evaluation protocol. SafeProtein achieved continuous jailbreaks on state-of-the-art protein foundation models (up to 70% attack success rate for ESM3), revealing potential biological safety risks in current protein foundation models and providing insights for the development of robust security protection technologies for frontier models. The codes will be made publicly available at https://github.com/jigang-fan/SafeProtein.

2025-09-03T17:13:56Z Jigang Fan Zhenghong Zhou Ruofan Jin Le Cong Mengdi Wang Zaixi Zhang http://arxiv.org/abs/2509.07627v4 LSMTCR: A Scalable Multi-Architecture Model for Epitope-Specific T Cell Receptor de novo Design 2025-10-08T13:31:39Z

Designing full-length, epitope-specific TCR α\b{eta} remains challenging due to vast sequence space, data biases and incomplete modeling of immunogenetic constraints. We present LSMTCR, a scalable multi-architecture framework that separates specificity from constraint learning to enable de novo, epitope-conditioned generation of paired, full-length TCRs. A diffusion-enhanced BERT encoder learns time-conditioned epitope representations; conditional GPT decoders, pretrained on CDR3\b{eta} and transferred to CDR3α, generate chain-specific CDR3s under cross-modal conditioning with temperature-controlled diversity; and a gene-aware Transformer assembles complete α/\b{eta} sequences by predicting V/J usage to ensure immunogenetic fidelity. Across GLIPH, TEP, MIRA, McPAS and our curated dataset, LSMTCR achieves higher predicted binding than baselines on most datasets, more faithfully recovers positional and length grammars, and delivers superior, temperature-tunable diversity. For α-chain generation, transfer learning improves predicted binding, length realism and diversity over representative methods. Full-length assembly from known or de novo CDR3s preserves k-mer spectra, yields low edit distances to references, and, in paired α/\b{eta} co-modelling with epitope, attains higher pTM/ipTM than single-chain settings. LSMTCR outputs diverse, gene-contextualized, full-length TCR designs from epitope input alone, enabling high-throughput screening and iterative optimization.

2025-09-09T11:55:53Z 13 main pages, 5 figures, 2 tables Ruihao Zhang Xiao Liu http://arxiv.org/abs/2510.05747v1 Physicochemically Informed Dual-Conditioned Generative Model of T-Cell Receptor Variable Regions for Cellular Therapy 2025-10-07T10:05:54Z

Physicochemically informed biological sequence generation has the potential to accelerate computer-aided cellular therapy, yet current models fail to \emph{jointly} ensure novelty, diversity, and biophysical plausibility when designing variable regions of T-cell receptors (TCRs). We present \textbf{PhysicoGPTCR}, a large generative protein Transformer that is \emph{dual-conditioned} on peptide and HLA context and trained to autoregressively synthesise TCR sequences while embedding residue-level physicochemical descriptors. The model is optimised on curated TCR--peptide--HLA triples with a maximum-likelihood objective and compared against ANN, GPTCR, LSTM, and VAE baselines. Across multiple neoantigen benchmarks, PhysicoGPTCR substantially improves edit-distance, similarity, and longest-common-subsequence scores, while populating a broader region of sequence space. Blind in-silico docking and structural modelling further reveal a higher proportion of binding-competent clones than the strongest baseline, validating the benefit of explicit context conditioning and physicochemical awareness. Experimental results demonstrate that dual-conditioned, physics-grounded generative modelling enables end-to-end design of functional TCR candidates, reducing the discovery timeline from months to minutes without sacrificing wet-lab verifiability.

2025-10-07T10:05:54Z Jiahao Ma Hongzong Li Ye-Fan Hu Jian-Dong Huang http://arxiv.org/abs/2510.05626v1 Paraplume: A fast and accurate paratope prediction method provides insights into repertoire-scale binding dynamics 2025-10-07T07:16:20Z

The specific region of an antibody responsible for binding to an antigen, known as the paratope, is essential for immune recognition. Accurate identification of this small yet critical region can accelerate the development of therapeutic antibodies. Determining paratope locations typically relies on modeling the antibody structure, which is computationally intensive and difficult to scale across large antibody repertoires. We introduce Paraplume, a sequence-based paratope prediction method that leverages embeddings from protein language models (PLMs), without the need for structural input and achieves superior performance across multiple benchmarks compared to current methods. In addition, reweighting PLM embeddings using Paraplume predictions yields more informative sequence representations, improving downstream tasks such as affinity prediction, binder classification, and epitope binning. Applied to large antibody repertoires, Paraplume reveals that antigen-specific somatic hypermutations are associated with larger paratopes, suggesting a potential mechanism for affinity enhancement. Our findings position PLM-based paratope prediction as a powerful, scalable alternative to structure-dependent approaches, opening new avenues for understanding antibody evolution.

2025-10-07T07:16:20Z Gabriel Athènes Adam Woolfe Thierry Mora Aleksandra M. Walczak http://arxiv.org/abs/2510.04897v1 Impact of Force Field Polarization on Correlated Motions of Proteins 2025-10-06T15:15:05Z

Correlated motions of proteins underpin many physiological mechanisms, such as substrate binding, signal transduction, enzymatic activity and allostery. These motions arise from low frequency collective movements of biomolecules and have mostly been studied using molecular dynamics simulations. Here, we present the effects of two different empirical energy force fields used for molecular dynamics simulations on correlated motions -- the non-polarizable CHARMM 36m additive force field and the polarizable Drude-2019 force field. The study was conducted on two proteins, ubiquitin - a small protein with a well-described dynamic - and the nuclear receptor protein PPAR___. The ligand binding domain of PPAR___ was of particular interest since its function is to regulate transcription through ligand and coregulator protein binding. It has been previously shown that a dynamical network of correlated motions ensures the transmission of information related to PPAR___ ligand binding. We present the results of classical MD simulations where we analyze the results in terms of residue fluctuations, residue correlation maps, community network analysis and hydrophobic cluster analysis. We find that RMS fluctuations tend to be greater and correlated motions are less intense with Drude-2019 force field than with the non-polarizable all atom additive force field. Analysis of large hydrophobic clusters in the respective proteins show a greater loss of native contacts in the simulations using the Drude-2019 force field than in the simulations using the all atom force additive force field. Our results provide the first quantification of the impact of using a polarizable force field in computational studies that focus on correlated motions.

2025-10-06T15:15:05Z Journal of Physical Chemistry B, 2025 Ana Milinski IGBMC Annick Dejaegere IGBMC Roland Stote IGBMC http://arxiv.org/abs/2510.04408v1 Twist dominates bending in the liquid crystal organization of bacteriophage DNA 2025-10-06T00:35:06Z

DNA frequently adopts liquid-crystalline conformations in both cells and viruses. The Oseen--Frank framework provides a powerful continuum description of these phases through three elastic moduli: splay ($K_1$), twist or cholesteric ($K_2$), and bending ($K_3$). While $K_1$ is typically assumed to dominate, the relative magnitude of $K_2$ and $K_3$ in confined DNA remains poorly understood. Here, we combine cryo-electron microscopy, liquid-crystal modeling, and knot theory to quantify this relationship in bacteriophage P4, whose genome is partially organized in a spool-like liquid-crystalline phase. We first show experimentally that the ordered DNA occupies three concentric layers within the capsid. We then formulate an Oseen--Frank model for this geometry and use it, together with the measured layer radii, to estimate the elastic ratio $α= K_3/K_2$. We find $α\approx 0.0064$, indicating that twist elasticity overwhelmingly dominates bending. To validate this result, we perform Langevin dynamics simulations of DNA trajectories and classify the resulting knots. The predicted knot distribution agrees with experimental data from P4, demonstrating consistency between elasticity, topology, and observed genome organization.

2025-10-06T00:35:06Z Pei Liu Tamara Christiani Zhijie Wang Fei Guo Mariel Vazquez M. Carme Calderer Javier Arsuaga http://arxiv.org/abs/2510.04176v1 Relief of EGFR/FOS-downregulated miR-103a by loganin alleviates NF-kappaB-triggered inflammation and gut barrier disruption in colitis 2025-10-05T12:36:31Z

Due to the ever-rising global incidence rate of inflammatory bowel disease (IBD) and the lack of effective clinical treatment drugs, elucidating the detailed pathogenesis, seeking novel targets, and developing promising drugs are the top priority for IBD treatment. Here, we demonstrate that the levels of microRNA (miR)-103a were significantly downregulated in the inflamed mucosa of ulcerative colitis (UC) patients, along with elevated inflammatory cytokines (IL-1beta/TNF-alpha) and reduced tight junction protein (Occludin/ZO-1) levels, as compared with healthy control objects. Consistently, miR-103a deficient intestinal epithelial cells Caco-2 showed serious inflammatory responses and increased permeability, and DSS induced more severe colitis in miR-103a-/- mice than wild-type ones. Mechanistic studies unraveled that c-FOS suppressed miR-103a transcription via binding to its promoter, then miR-103a-targeted NF-kappaB activation contributes to inflammatory responses and barrier disruption by targeting TAB2 and TAK1. Notably, the traditional Chinese medicine Cornus officinalis (CO) and its core active ingredient loganin potently mitigated inflammation and barrier disruption in UC by specifically blocking the EGFR/RAS/ERK/c-FOS signaling axis, these effects mainly attributed to modulated miR-103a levels as the therapeutic activities of them were almost completely shielded in miR-103a KO mice. Taken together, this work reveals that loganin relieves EGFR/c-FOS axis-suppressed epithelial miR-103a expression, thereby inhibiting NF-kappaB pathway activation, suppressing inflammatory responses, and preserving tight junction integrity in UC. Thus, our data enrich mechanistic insights and promising targets for UC treatment.

2025-10-05T12:36:31Z Yan Li Teng Hui Xinhui Zhang Zihan Cao Ping Wang Shirong Chen Ke Zhao Yiran Liu Yue Yuan Dou Niu Xiaobo Yu Gan Wang Changli Wang Yan Lin Fan Zhang Hefang Wu Guodong Feng Yan Liu Jiefang Kang Yaping Yan Hai Zhang Xiaochang Xue Xun Jiang http://arxiv.org/abs/2506.03157v3 UniSim: A Unified Simulator for Time-Coarsened Dynamics of Biomolecules 2025-10-04T14:13:31Z

Molecular Dynamics (MD) simulations are essential for understanding the atomic-level behavior of molecular systems, giving insights into their transitions and interactions. However, classical MD techniques are limited by the trade-off between accuracy and efficiency, while recent deep learning-based improvements have mostly focused on single-domain molecules, lacking transferability to unfamiliar molecular systems. Therefore, we propose \textbf{Uni}fied \textbf{Sim}ulator (UniSim), which leverages cross-domain knowledge to enhance the understanding of atomic interactions. First, we employ a multi-head pretraining approach to learn a unified atomic representation model from a large and diverse set of molecular data. Then, based on the stochastic interpolant framework, we learn the state transition patterns over long timesteps from MD trajectories, and introduce a force guidance module for rapidly adapting to different chemical environments. Our experiments demonstrate that UniSim achieves highly competitive performance across small molecules, peptides, and proteins.

2025-05-20T14:29:06Z ICML 2025 poster Ziyang Yu Wenbing Huang Yang Liu