https://arxiv.org/api/XIf9qx9ullZS0eB9d7dj7BZhzf82026-03-18T08:47:22Z66389015http://arxiv.org/abs/2510.24736v2RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics2026-01-29T16:34:46ZGenerating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for mRNA sequence generation that operates directly on a learned manifold of real data. By performing iterative local optimization constrained to this manifold, RNAGenScape preserves biological viability, accesses reliable guidance, and avoids excursions into nonfunctional regions of the ambient sequence space. The framework integrates three components: (1) an autoencoder jointly trained with a property predictor to learn a property-organized latent manifold, (2) a denoising autoencoder that projects updates back onto the manifold, and (3) a property-guided Langevin dynamics procedure that performs optimization along the manifold. Across three real-world mRNA datasets spanning two orders of magnitude in size, RNAGenScape increases median property gain by up to 148% and success rate by up to 30% while ensuring biological viability of generated sequences, and achieves competitive inference efficiency relative to existing generative approaches.2025-10-14T19:55:41ZICML 2025 Generative AI and Biology (GenBio) Workshop, Oral presentation (top 9.7%)Danqi LiaoChen LiuXingzhi SunDié TangHaochen WangScott YoultenSrikar Krishna GopinathHaejeong LeeEthan C. StrayerAntonio J. GiraldezSmita Krishnaswamyhttp://arxiv.org/abs/2601.21216v1Multiple binding modes of AKT on PIP$_3$-containing membranes2026-01-29T03:26:27ZThe PI3K/AKT signaling pathway is triggered by recruitment of AKT to cellular membranes. Although AKT is a multidomain serine/threonine kinase composed of an N-terminal pleckstrin homology (PH) domain and a C-terminal kinase domain, how these domains cooperate to regulate AKT activation on membranes remains unclear at the molecular level. Here, using molecular dynamics simulations of full-length AKT on PIP$_3$-containing lipid bilayers, we identify four distinct membrane-binding modes that differ in the orientations and membrane contacts of the PH and kinase domains. In addition to PIP$_3$ binding to the PH domain, we observe specific PIP$_3$ interactions with basic residues in the kinase domain. In the most stable mode, PIP$_3$ interacts with both the canonical and a secondary binding site in the PH domain, while the kinase domain adopts an orientation in which the activation-loop phosphorylation site is exposed to the solvent. Interestingly, the populations of these binding modes depend on the PIP$_3$ concentration in the membrane, leading to changes in the preferred orientation of AKT. These findings shed light on how lipid recognition by the PH domain and the kinase domain of AKT cooperatively shape its membrane-bound conformations.2026-01-29T03:26:27ZYuki NakagakiEiji Yamamotohttp://arxiv.org/abs/2601.17138v2AI Developments for T and B Cell Receptor Modeling and Therapeutic Design2026-01-28T19:03:26ZArtificial intelligence (AI) is accelerating progress in modeling T and B cell receptors by enabling predictive and generative frameworks grounded in sequence data and immune context. This chapter surveys recent advances in the use of protein language models, machine learning, and multimodal integration for immune receptor modeling. We highlight emerging strategies to leverage single-cell and repertoire-scale datasets, and optimize immune receptor candidates for therapeutic design. These developments point toward a new generation of data-efficient, generalizable, and clinically relevant models that better capture the diversity and complexity of adaptive immunity.2026-01-23T19:28:08ZLinhui XieAurelien PelissierYanjun ShaoMaria Rodriguez Martinezhttp://arxiv.org/abs/2601.11013v2De novo emergence of metabolically active protocells2026-01-27T16:28:47ZA continuous route from a disordered soup of simple chemical feedstocks to a functional protocell -- a compartment that metabolizes, grows, and propagates -- remains elusive. Here, we show that a homogeneous aqueous chemical mixture containing phosphorus, iron, molybdenum salts and formaldehyde spontaneously self-organizes into compartments that couple robust non-equilibrium chemical dynamics to their own growth. These structures mature to a sustained, dissipative steady state and support an organic synthetic engine, producing diverse molecular species including many core biomolecular classes. Internal spherules that are themselves growth-competent are produced within the protocells, establishing a rudimentary mode of self-perpetuation. The chemical dynamics we observe in controlled laboratory conditions also occur in reaction mixtures exposed to natural day-night cycles. Strikingly, the morphology and chemical composition of the protocells in our experiments closely resemble molybdenum-rich microspheres recently discovered in current oceanic environments. Our work establishes a robust, testable route to de novo protocell formation. The emergence of life-like spatiotemporal organization and chemical dynamics from minimal initial conditions is more facile than previously thought and could be a recurring natural phenomenon.2026-01-16T06:08:36ZNayan ChakrabortyShashi Thutupallihttp://arxiv.org/abs/2601.19257v1PCEvo: Path-Consistent Molecular Representation via Virtual Evolutionary2026-01-27T06:40:11ZMolecular representation learning aims to learn vector embeddings that capture molecular structure and geometry, thereby enabling property prediction and downstream scientific applications. In many AI for science tasks, labeled data are expensive to obtain and therefore limited in availability. Under the few-shot setting, models trained with scarce supervision often learn brittle structure-property relationships, resulting in substantially higher prediction errors and reduced generalization to unseen molecules. To address this limitation, we propose PCEvo, a path-consistent representation method that learns from virtual paths through dynamic structural evolution. PCEvo enumerates multiple chemically feasible edit paths between retrieved similar molecular pairs under topological dependency constraints. It transforms the labels of the two molecules into stepwise supervision along each virtual evolutionary path. It introduces a path-consistency objective that enforces prediction invariance across alternative paths connecting the same two molecules. Comprehensive experiments on the QM9 and MoleculeNet datasets demonstrate that PCEvo substantially improves the few-shot generalization performance of baseline methods. The code is available at https://anonymous.4open.science/r/PCEvo-4BF2.2026-01-27T06:40:11Z10 pages, 4 figures, 5 tablesKun LiLongtao HuYida XiongJiajun YuHongzhi ZhangJiameng ChenXiantao CaiJia WuWenbin Huhttp://arxiv.org/abs/2601.19205v1EnzyPGM: Pocket-conditioned Generative Model for Substrate-specific Enzyme Design2026-01-27T05:07:55ZDesigning enzymes with substrate-binding pockets is a critical challenge in protein engineering, as catalytic activity depends on the precise interaction between pockets and substrates. Currently, generative models dominate functional protein design but cannot model pocket-substrate interactions, which limits the generation of enzymes with precise catalytic environments. To address this issue, we propose EnzyPGM, a unified framework that jointly generates enzymes and substrate-binding pockets conditioned on functional priors and substrates, with a particular focus on learning accurate pocket-substrate interactions. At its core, EnzyPGM includes two main modules: a Residue-atom Bi-scale Attention (RBA) that jointly models intra-residue dependencies and fine-grained interactions between pocket residues and substrate atoms, and a Residue Function Fusion (RFF) that incorporates enzyme function priors into residue representations. Also, we curate EnzyPock, an enzyme-pocket dataset comprising 83,062 enzyme-substrate pairs across 1,036 four-level enzyme families. Extensive experiments demonstrate that EnzyPGM achieves state-of-the-art performance on EnzyPock. Notably, EnzyPGM reduces the average binding energy of 0.47 kcal/mol over EnzyGen, showing its superior performance on substrate-specific enzyme design. The code and dataset will be released later.2026-01-27T05:07:55Z9 pages, 4 figures, under reviewZefeng LinZhihang ZhangWeirong ZhuTongchang HanXianyong FangTianfan FuXiaohua Xuhttp://arxiv.org/abs/2601.18716v1Conditioned Generative Modeling of Molecular Glues: A Realistic AI Approach for Synthesizable Drug-like Molecules2026-01-26T17:39:59ZAlzheimer's disease (AD) is marked by the pathological accumulation of amyloid beta-42 (Abeta-42), contributing to synaptic dysfunction and neurodegeneration. While extracellular amyloid plaques are well-studied, increasing evidence highlights intracellular Abeta-42 as an early and toxic driver of disease progression. In this study, we present a novel, AI-assisted drug design approach to promote targeted degradation of Abeta-42 via the ubiquitin-proteasome system (UPS), using E3 ligase-directed molecular glues. We systematically evaluated the ternary complex formation potential of Abeta-42 with three E3 ligases: CRBN, VHL, and MDM2, through structure-based modeling, ADMET screening, and docking. We then developed a Ligase-Conditioned Junction Tree Variational Autoencoder (LC-JT-VAE) to generate ligase-specific small molecules, incorporating protein sequence embeddings and torsional angle-aware molecular graphs. Our results demonstrate that this generative model can produce chemically valid, novel, and target-specific molecular glues capable of facilitating Abeta-42 degradation. This integrated approach offers a promising framework for designing UPS-targeted therapies for neurodegenerative diseases.2026-01-26T17:39:59Z30 pages, 8 figuresBiomolecules 2025, 15, 849Naeyma N. IslamThomas R. Caulfield10.3390/biom15060849http://arxiv.org/abs/2509.14788v2Structure-Aware Contrastive Learning with Fine-Grained Binding Representations for Drug Discovery2026-01-26T08:19:42ZAccurate identification of drug-target interactions (DTI) remains a central challenge in computational pharmacology, where sequence-based methods offer scalability. This work introduces a sequence-based drug-target interaction framework that integrates structural priors into protein representations while maintaining high-throughput screening capability. Evaluated across multiple benchmarks, the model achieves state-of-the-art performance on Human and BioSNAP datasets and remains competitive on BindingDB. In virtual screening tasks, it surpasses prior methods on LIT-PCBA, yielding substantial gains in AUROC and BEDROC. Ablation studies confirm the critical role of learned aggregation, bilinear attention, and contrastive alignment in enhancing predictive robustness. Embedding visualizations reveal improved spatial correspondence with known binding pockets and highlight interpretable attention patterns over ligand-residue contacts. These results validate the framework's utility for scalable and structure-aware DTI prediction.2025-09-18T09:38:46ZAccepted by 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2026)Jing LanHexiao DingHongzhao ChenYufeng JiangNga-Chun NgGwing Kei YipGerald W. Y. ChengYunlin MaoJing CaiLiang-ting LinJung Sun Yoohttp://arxiv.org/abs/2409.02588v2Multiview Random Vector Functional Link Network for Predicting DNA-Binding Proteins2026-01-26T06:22:52ZThe identification of DNA-binding proteins (DBPs) is essential due to their significant impact on various biological activities. Understanding the mechanisms underlying protein-DNA interactions is essential for elucidating various life activities. In recent years, machine learning-based models have been prominently utilized for DBP prediction. In this paper, to predict DBPs, we propose a novel framework termed a multiview random vector functional link (MvRVFL) network, which fuses neural network architecture with multiview learning. The MvRVFL model integrates both late and early fusion advantages, enabling separate regularization parameters for each view, while utilizing a closed-form solution for efficiently determining unknown parameters. The primal objective function incorporates a coupling term aimed at minimizing a composite of errors stemming from all views. From each of the three protein views of the DBP datasets, we extract five features. These features are then fused together by incorporating a hidden feature during the model training process. The performance of the proposed MvRVFL model on the DBP dataset surpasses that of baseline models, demonstrating its superior effectiveness. We further validate the practicality of the proposed model across diverse benchmark datasets, and both theoretical analysis and empirical results consistently demonstrate its superior generalization performance over baseline models.2024-09-04T10:14:17ZA. QuadirM. SajidM. Tanveerhttp://arxiv.org/abs/2601.15771v1Rethinking Drug-Drug Interaction Modeling as Generalizable Relation Learning2026-01-22T09:00:30ZDrug-drug interaction (DDI) prediction is central to drug discovery and clinical development, particularly in the context of increasingly prevalent polypharmacy. Although existing computational methods achieve strong performance on standard benchmarks, they often fail to generalize to realistic deployment scenarios, where most candidate drug pairs involve previously unseen drugs and validated interactions are scarce. We demonstrate that proximity in the embedding spaces of prevailing molecule-centric DDI models does not reliably correspond to interaction labels, and that simply scaling up model capacity therefore fails to improve generalization. To address these limitations, we propose GenRel-DDI, a generalizable relation learning framework that reformulates DDI prediction as a relation-centric learning problem, in which interaction representations are learned independently of drug identities. This relation-level abstraction enables the capture of transferable interaction patterns that generalize to unseen drugs and novel drug pairs. Extensive experiments across multiple benchmark demonstrate that GenRel-DDI consistently and significantly outperforms state-of-the-art methods, with particularly large gains on strict entity-disjoint evaluations, highlighting the effectiveness and practical utility of relation learning for robust DDI prediction. The code is available at https://github.com/SZU-ADDG/GenRel-DDI.2026-01-22T09:00:30Z9 pages, 5 figuresDong XuJiantao WuQihua PanSisi YuanZexuan ZhuJunkai Jihttp://arxiv.org/abs/2601.14574v1De novo design of protein binders targeting the human sweet taste receptor as potential sweet proteins2026-01-21T01:18:55ZExcessive consumption of dietary sugars is a major contributor to metabolic disorders, driving global interest in finding alternative sweeteners with reduced caloric impact. Natural sweet proteins, such as brazzein, offer exceptional sweetness intensity with little caloric contribution. However, their widespread use is limited by restricted natural diversity, low stability, and high production costs. Recent advances in structural biology and de novo protein design provide new opportunities to overcome these limitations through rational engineering. In this study, we report an integrated computational pipeline for the de novo design of protein binders targeting the human sweet taste receptor subunit TAS1R2, a key component of the heterodimeric class C G protein-coupled receptor mediating sweetness perception. The workflow combines diffusion-based backbone generation (RFdiffusion), neural network-guided sequence design (ProteinMPNN), structure-based filtering using Boltz-1, and binding energy evaluation via MM/GBSA calculations. Using the recently resolved cryo-EM structure of the TAS1R2 receptor, protein binders were designed to target both the Venus Flytrap Domain and the cysteine-rich domain of TAS1R2. A few designed binders exhibited favorable structural confidence and predicted binding energetics. In particular, Binder2 exhibited brazzein-like structural plausibility through specific short-range CRD contacts, while Binder1 displayed the strongest predicted binding affinity. Structural analyses of the binder-receptor complex revealed distinct binding modes and secondary structure profiles among the designs. This study demonstrates the feasibility of de novo designing protein binders that emulate key functional properties of natural sweet proteins, establishing a computational framework for the rational development of next-generation protein-based sweeteners.2026-01-21T01:18:55ZSaisai DingYi Zhanghttp://arxiv.org/abs/2601.13693v1End-to-End Reverse Screening Identifies Protein Targets of Small Molecules Using HelixFold32026-01-20T07:45:53ZIdentifying protein targets for small molecules, or reverse screening, is essential for understanding drug action, guiding compound repurposing, predicting off-target effects, and elucidating the molecular mechanisms of bioactive compounds. Despite its critical role, reverse screening remains challenging because accurately capturing interactions between a small molecule and structurally diverse proteins is inherently complex, and conventional step-wise workflows often propagate errors across decoupled steps such as target structure modeling, pocket identification, docking, and scoring. Here, we present an end-to-end reverse screening strategy leveraging HelixFold3, a high-accuracy biomolecular structure prediction model akin to AlphaFold3, which simultaneously models the folding of proteins from a protein library and the docking of small-molecule ligands within a unified framework. We validate this approach on a diverse and representative set of approximately one hundred small molecules. Compared with conventional reverse docking, our method improves screening accuracy and demonstrates enhanced structural fidelity, binding-site precision, and target prioritization. By systematically linking small molecules to their protein targets, this framework establishes a scalable and straightforward platform for dissecting molecular mechanisms, exploring off-target interactions, and supporting rational drug discovery.2026-01-20T07:45:53ZShengjie XuXianbin YeMengran ZhuXiaonan ZhangShanzhuo ZhangXiaomin Fanghttp://arxiv.org/abs/2601.13564v1Multi-objective fluorescent molecule design with a data-physics dual-driven generative framework2026-01-20T03:41:02ZDesigning fluorescent small molecules with tailored optical and physicochemical properties requires navigating vast, underexplored chemical space while satisfying multiple objectives and constraints. Conventional generate-score-screen approaches become impractical under such realistic design specifications, owing to their low search efficiency, unreliable generalizability of machine-learning prediction, and the prohibitive cost of quantum chemical calculation. Here we present LUMOS, a data-and-physics driven framework for inverse design of fluorescent molecules. LUMOS couples generator and predictor within a shared latent representation, enabling direct specification-to-molecule design and efficient exploration. Moreover, LUMOS combines neural networks with a fast time-dependent density functional theory (TD-DFT) calculation workflow to build a suite of complementary predictors spanning different trade-offs in speed, accuracy, and generalizability, enabling reliable property prediction across diverse scenarios. Finally, LUMOS employs a property-guided diffusion model integrated with multi-objective evolutionary algorithms, enabling de novo design and molecular optimization under multiple objectives and constraints. Across comprehensive benchmarks, LUMOS consistently outperforms baseline models in terms of accuracy, generalizability and physical plausibility for fluorescence property prediction, and demonstrates superior performance in multi-objective scaffold- and fragment-level molecular optimization. Further validation using TD-DFT and molecular dynamics (MD) simulations demonstrates that LUMOS can generate valid fluorophores that meet various target specifications. Overall, these results establish LUMOS as a data-physics dual-driven framework for general fluorophore inverse design.2026-01-20T03:41:02ZTotal 43 pages: 32 pages Main Text + 11 pages SIYanheng LiZhichen PuLijiang YangZehao ZhouYi Qin Gaohttp://arxiv.org/abs/2601.12381v1Multimodal Spatial Omics: From Data Acquisition to Computational Integration2026-01-18T12:28:14ZRecent developments in spatial omics technologies have enabled the generation of high dimensional molecular data, such as transcriptomes, proteomes, and epigenomes, within their spatial tissue context, either through coprofiling on the same slice or through serial tissue sections. These datasets, which are often complemented by images, have given rise to multimodal frameworks that capture both the cellular and architectural complexity of tissues across multiple molecular layers. Integration in such multimodal data poses significant computational challenges due to differences in scale, resolution, and data modality. In this review, we present a comprehensive overview of computational methods developed to integrate multimodal spatial omics and imaging datasets. We highlight key algorithmic principles underlying these methods, ranging from probabilistic to the latest deep learning approaches.2026-01-18T12:28:14ZEsra Busra IsikYusuf Hakan UstaHaozhe LiuMaryam RiaziWilliam RoachHongpeng ZhouMagnus RattraySokratia Georgakahttp://arxiv.org/abs/2408.16245v6Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions2026-01-17T22:19:06ZThe transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data - either proteins or nucleic acids - and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural modeling. However, single-omic pretraining limits the ability of these models to capture cross-modal interactions. Here we present OmniBioTE, the largest open-source multi-omic model trained on over 250 billion tokens of mixed protein and nucleic acid data. We show that despite only being trained on unlabeled sequence data, OmniBioTE learns joint representations mapping genes to their corresponding protein sequences. We further demonstrate that OmniBioTE achieves state-of-the-art results predicting the change in Gibbs free energy ({ΔG}) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any a priori structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Compared to single-omic controls trained with identical compute, OmniBioTE also demonstrates superior performance-per-FLOP across both multi-omic and single-omic benchmarks. Together, these results highlight the power of a unified modeling approach for biological sequences and establish OmniBioTE as a foundation model for multi-omic discovery.2024-08-29T03:56:40Z47 pages, 5 figuresSully F. ChenRobert J. SteeleGlen M. HockyBeakal LemenehShivanand P. LadEric K. Oermann