Rethinking Drug-Drug Interaction Modeling as Generalizable Relation Learning

2026-01-22T09:00:30Z

Drug-drug interaction (DDI) prediction is central to drug discovery and clinical development, particularly in the context of increasingly prevalent polypharmacy. Although existing computational methods achieve strong performance on standard benchmarks, they often fail to generalize to realistic deployment scenarios, where most candidate drug pairs involve previously unseen drugs and validated interactions are scarce. We demonstrate that proximity in the embedding spaces of prevailing molecule-centric DDI models does not reliably correspond to interaction labels, and that simply scaling up model capacity therefore fails to improve generalization. To address these limitations, we propose GenRel-DDI, a generalizable relation learning framework that reformulates DDI prediction as a relation-centric learning problem, in which interaction representations are learned independently of drug identities. This relation-level abstraction enables the capture of transferable interaction patterns that generalize to unseen drugs and novel drug pairs. Extensive experiments across multiple benchmark demonstrate that GenRel-DDI consistently and significantly outperforms state-of-the-art methods, with particularly large gains on strict entity-disjoint evaluations, highlighting the effectiveness and practical utility of relation learning for robust DDI prediction. The code is available at https://github.com/SZU-ADDG/GenRel-DDI.

De novo design of protein binders targeting the human sweet taste receptor as potential sweet proteins

2026-01-21T01:18:55Z

Excessive consumption of dietary sugars is a major contributor to metabolic disorders, driving global interest in finding alternative sweeteners with reduced caloric impact. Natural sweet proteins, such as brazzein, offer exceptional sweetness intensity with little caloric contribution. However, their widespread use is limited by restricted natural diversity, low stability, and high production costs. Recent advances in structural biology and de novo protein design provide new opportunities to overcome these limitations through rational engineering. In this study, we report an integrated computational pipeline for the de novo design of protein binders targeting the human sweet taste receptor subunit TAS1R2, a key component of the heterodimeric class C G protein-coupled receptor mediating sweetness perception. The workflow combines diffusion-based backbone generation (RFdiffusion), neural network-guided sequence design (ProteinMPNN), structure-based filtering using Boltz-1, and binding energy evaluation via MM/GBSA calculations. Using the recently resolved cryo-EM structure of the TAS1R2 receptor, protein binders were designed to target both the Venus Flytrap Domain and the cysteine-rich domain of TAS1R2. A few designed binders exhibited favorable structural confidence and predicted binding energetics. In particular, Binder2 exhibited brazzein-like structural plausibility through specific short-range CRD contacts, while Binder1 displayed the strongest predicted binding affinity. Structural analyses of the binder-receptor complex revealed distinct binding modes and secondary structure profiles among the designs. This study demonstrates the feasibility of de novo designing protein binders that emulate key functional properties of natural sweet proteins, establishing a computational framework for the rational development of next-generation protein-based sweeteners.

End-to-End Reverse Screening Identifies Protein Targets of Small Molecules Using HelixFold3

2026-01-20T07:45:53Z

Identifying protein targets for small molecules, or reverse screening, is essential for understanding drug action, guiding compound repurposing, predicting off-target effects, and elucidating the molecular mechanisms of bioactive compounds. Despite its critical role, reverse screening remains challenging because accurately capturing interactions between a small molecule and structurally diverse proteins is inherently complex, and conventional step-wise workflows often propagate errors across decoupled steps such as target structure modeling, pocket identification, docking, and scoring. Here, we present an end-to-end reverse screening strategy leveraging HelixFold3, a high-accuracy biomolecular structure prediction model akin to AlphaFold3, which simultaneously models the folding of proteins from a protein library and the docking of small-molecule ligands within a unified framework. We validate this approach on a diverse and representative set of approximately one hundred small molecules. Compared with conventional reverse docking, our method improves screening accuracy and demonstrates enhanced structural fidelity, binding-site precision, and target prioritization. By systematically linking small molecules to their protein targets, this framework establishes a scalable and straightforward platform for dissecting molecular mechanisms, exploring off-target interactions, and supporting rational drug discovery.

Multi-objective fluorescent molecule design with a data-physics dual-driven generative framework

2026-01-20T03:41:02Z

Designing fluorescent small molecules with tailored optical and physicochemical properties requires navigating vast, underexplored chemical space while satisfying multiple objectives and constraints. Conventional generate-score-screen approaches become impractical under such realistic design specifications, owing to their low search efficiency, unreliable generalizability of machine-learning prediction, and the prohibitive cost of quantum chemical calculation. Here we present LUMOS, a data-and-physics driven framework for inverse design of fluorescent molecules. LUMOS couples generator and predictor within a shared latent representation, enabling direct specification-to-molecule design and efficient exploration. Moreover, LUMOS combines neural networks with a fast time-dependent density functional theory (TD-DFT) calculation workflow to build a suite of complementary predictors spanning different trade-offs in speed, accuracy, and generalizability, enabling reliable property prediction across diverse scenarios. Finally, LUMOS employs a property-guided diffusion model integrated with multi-objective evolutionary algorithms, enabling de novo design and molecular optimization under multiple objectives and constraints. Across comprehensive benchmarks, LUMOS consistently outperforms baseline models in terms of accuracy, generalizability and physical plausibility for fluorescence property prediction, and demonstrates superior performance in multi-objective scaffold- and fragment-level molecular optimization. Further validation using TD-DFT and molecular dynamics (MD) simulations demonstrates that LUMOS can generate valid fluorophores that meet various target specifications. Overall, these results establish LUMOS as a data-physics dual-driven framework for general fluorophore inverse design.

Multimodal Spatial Omics: From Data Acquisition to Computational Integration

2026-01-18T12:28:14Z

Recent developments in spatial omics technologies have enabled the generation of high dimensional molecular data, such as transcriptomes, proteomes, and epigenomes, within their spatial tissue context, either through coprofiling on the same slice or through serial tissue sections. These datasets, which are often complemented by images, have given rise to multimodal frameworks that capture both the cellular and architectural complexity of tissues across multiple molecular layers. Integration in such multimodal data poses significant computational challenges due to differences in scale, resolution, and data modality. In this review, we present a comprehensive overview of computational methods developed to integrate multimodal spatial omics and imaging datasets. We highlight key algorithmic principles underlying these methods, ranging from probabilistic to the latest deep learning approaches.

Large-Scale Multi-omic Biosequence Transformers for Modeling Protein-Nucleic Acid Interactions

2026-01-17T22:19:06Z

The transformer architecture has revolutionized bioinformatics and driven progress in the understanding and prediction of the properties of biomolecules. To date, most biosequence transformers have been trained on single-omic data - either proteins or nucleic acids - and have seen incredible success in downstream tasks in each domain, with particularly noteworthy breakthroughs in protein structural modeling. However, single-omic pretraining limits the ability of these models to capture cross-modal interactions. Here we present OmniBioTE, the largest open-source multi-omic model trained on over 250 billion tokens of mixed protein and nucleic acid data. We show that despite only being trained on unlabeled sequence data, OmniBioTE learns joint representations mapping genes to their corresponding protein sequences. We further demonstrate that OmniBioTE achieves state-of-the-art results predicting the change in Gibbs free energy ({ΔG}) of the binding interaction between a given nucleic acid and protein. Remarkably, we show that multi-omic biosequence transformers emergently learn useful structural information without any a priori structural training, allowing us to predict which protein residues are most involved in the protein-nucleic acid binding interaction. Compared to single-omic controls trained with identical compute, OmniBioTE also demonstrates superior performance-per-FLOP across both multi-omic and single-omic benchmarks. Together, these results highlight the power of a unified modeling approach for biological sequences and establish OmniBioTE as a foundation model for multi-omic discovery.

Calibrating Generative Models to Distributional Constraints

2026-01-17T01:24:40Z

Generative models frequently suffer miscalibration, wherein statistics of the sampling distribution such as class probabilities deviate from desired values. We frame calibration as a constrained optimization problem and seek the closest model in Kullback-Leibler divergence satisfying calibration constraints. To address the intractability of imposing these constraints exactly, we introduce two surrogate objectives for fine-tuning: (1) the relax loss, which replaces the constraint with a miscalibration penalty, and (2) the reward loss, which converts calibration into a reward fine-tuning problem. We demonstrate that these approaches substantially reduce calibration error across hundreds of simultaneous constraints and models with up to one billion parameters, spanning applications in protein design, image generation, and language modeling.

Principles of Client Enrichment in Multicomponent Biomolecular Condensates

2026-01-16T17:15:19Z

Biomolecular condensates are commonly organized by a small number of scaffold molecules that drive phase separation together with client molecules that do not condense on their own but become selectively recruited into the dense phase. A central open question is how client recruitment feeds back on scaffold interactions to determine condensate composition. Here we address this problem in a reconstituted focal adhesion system composed of focal adhesion kinase (FAK) and phosphorylated p130Cas (Cas) as scaffolds and the adaptor protein paxillin (PXN) as a client. We show that both FAK phosphorylation and PXN recruitment produce a common compositional response in which FAK becomes enriched while Cas is depleted within the condensate. To interpret these observations, we develop two complementary theoretical descriptions. First, within a two-component Flory-Huggins framework, we show that phosphorylation can be captured by either strengthening heterotypic FAK-Cas interactions or increasing the effective number of interaction-relevant segments on FAK, both of which bias partitioning toward FAK-rich condensates. Second, we introduce a minimal three-component Flory-Huggins theory without an explicit solvent and map it onto an effective two-component description, demonstrating that client recruitment renormalizes homotypic and heterotypic scaffold interactions. Analytical predictions for the location of the critical point are tested in reconstituted multicomponent systems through PXN addition, showing that client recruitment alone tunes proximity to criticality and reshapes condensate composition. Together, our results reveal distinct yet convergent physical routes by which post-translational modification and client recruitment control scaffold composition in multicomponent condensates.

Effects of 2.45 GHz radiofrequency upon Leuconostoc mesenteroides Glucose-6-phosphate dehydrogenase enzymatic activity

2026-01-16T15:43:22Z

In this report we evaluate the effect in the enzyme activity of Glucose 6-phosphate Dehydrogenase from Leuconostoc mesenteroides by irradiation with 2.45 GHz radiofrequency at a power output of 0.1 W during a 91 h period. The results show that the RF irradiation preserves the activity of treated samples of this enzyme with respect to a non-treated sample that instead suffer an increased rate of activity loss. Our estimates indicate that the enzyme activation is due to a non-thermal effect. The results are consistent with reports about the effect of 2.45 GHz radiation upon other enzymatic systems.

The Protective Effects of the Ethyl Acetate Part of Er Miao San on Adjuvant Arthritis Rats by Regulating the Function of Bone Marrow-Derived Dendritic Cells

2026-01-16T09:35:35Z

Aims. /e aim of this study was to evaluate the protective effects of Er Miao San (EMS) and the regulative function of bone marrow-derived dendritic cells (BMDCs) on adjuvant arthritis (AA) in rats. Methods. /e ethyl acetate part of EMS (3 g/kg, 1.5 g/kg, and 0.75 g/kg) was orally administered from day 15 after immunization to day 29. /e polyarthritis index and paw swelling were measured, the ankle joint pathological changes were observed using hematoxylin-eosin (HE) staining, and the spleen and thymus index were determined. Moreover, T and B cell proliferation were determined using the CCK-8 assay. /e expression of BMDC surface costimulatory molecules and inflammatory factors were determined using flow cytometry and ELISA kits, respectively. Results. Compared with the AA model rats, the ethyl acetate fraction of EMS obviously reduced paw swelling (from 1.0 to 0.7) and the polyarthritis index (from 12 to 9) (P < 0.01) and improved the severity of histopathology (P < 0.01). /e treatment using ethyl acetate fraction of EMS significantly reduced the spleen and thymus index (P < 0.01) and inhibited T and B cell proliferation (P < 0.01). Moreover, EMS significantly modulated the expression of surface costimulatory molecules in BMDCs, including CD40, CD80, CD86, and major histocompatibility complex class II (MHC-II) (P < 0.01). /e results also showed that the ethyl acetate part of EMS significant inhibited the levels of proinflammatory cytokines interleukin- (IL-) 23 tumor necrosis factor- (TNF-) α and inflammatory factor prostaglandin (PG) E2 in the supernatant of BMDCs. However, the level of antiinflammatory cytokine IL-10 was significantly increased (P < 0.01). Conclusion. /ese results suggest that the ethyl acetate part of EMS has better protective effects on AA rats, probably by regulating the function of BMDCs and modulating the balance of cytokines.

AutoBinder Agent: An MCP-Based Agent for End-to-End Protein Binder Design

2026-01-16T08:57:03Z

Modern AI technologies for drug discovery are distributed across heterogeneous platforms-including web applications, desktop environments, and code libraries-leading to fragmented workflows, inconsistent interfaces, and high integration overhead. We present an agentic end-to-end drug design framework that leverages a Large Language Model (LLM) in conjunction with the Model Context Protocol (MCP) to dynamically coordinate access to biochemical databases, modular toolchains, and task-specific AI models. The system integrates four state-of-the-art components: MaSIF (MaSIF-site and MaSIF-seed-search) for geometric deep learning-based identification of protein-protein interaction (PPI) sites, Rosetta for grafting protein fragments onto protein backbones to form mini proteins, ProteinMPNN for amino acid sequences redesign, and AlphaFold3 for near-experimental accuracy in complex structure prediction. Starting from a target structure, the framework supports de novo binder generation via surface analysis, scaffold grafting and pose construction, sequence optimization, and structure prediction. Additionally, by replacing rigid, script-based workflows with a protocol-driven, LLM-coordinated architecture, the framework improves reproducibility, reduces manual overhead, and ensures extensibility, portability, and auditability across the entire drug design process.

HERMES: Holographic Equivariant neuRal network model for Mutational Effect and Stability prediction

2026-01-16T02:56:31Z

Predicting the stability and fitness effects of amino acid mutations in proteins is a cornerstone of biological discovery and engineering. Various experimental techniques have been developed to measure mutational effects, providing us with extensive datasets across a diverse range of proteins. By training on these data, traditional computational modeling and more recent machine learning approaches have advanced significantly in predicting mutational effects. Here, we introduce HERMES, a 3D rotationally equivariant structure-based neural network model for mutational effect and stability prediction. Pre-trained to predict amino acid propensity from its surrounding 3D structure, HERMES can be fine-tuned for mutational effects using our open-source code. We present a suite of HERMES models, pre-trained with different strategies, and fine-tuned to predict the stability effect of mutations. Benchmarking against other models shows that HERMES often outperforms or matches their performance in predicting mutational effect on stability, binding, and fitness. HERMES offers versatile tools for evaluating mutational effects and can be fine-tuned for specific predictive objectives.

ProteinGuide: On-the-fly property guidance for protein sequence generative models

2026-01-16T02:24:47Z

Sequence generative models are transforming protein engineering. However, no principled framework exists for conditioning these models on auxiliary information, such as experimental data, without additional training of a generative model. Herein, we present ProteinGuide, a method for such "on-the-fly" conditioning, amenable to a broad class of protein generative models including Masked Language Models (e.g. ESM3), any-order auto-regressive models (e.g. ProteinMPNN) as well as diffusion and flow matching models (e.g. MultiFlow). ProteinGuide stems from our unifying view of these model classes under a single statistical framework. As proof of principle, we perform several in silico experiments. We first guide pre-trained generative models to design proteins with user-specified properties, such as higher stability or activity. Next, we design for optimizing two desired properties that are in tension with each other. Finally, we apply our method in the wet lab, using ProteinGuide to increase the editing activity of an adenine base editor in vivo with data from only a single pooled library of 2,000 variants. We find that a single round of ProteinGuide achieves a higher editing efficiency than was previously achieved using seven rounds of directed evolution.

Network Pharmacology Framework Characterizes Polypharmacological Properties of Dietary Flavonoids: Integration of Computational, Experimental, and Epidemiological Evidence

2026-01-13T02:21:48Z

Dietary flavonoids associate with disease prevention in epidemiological studies, yet their polypharmacological mechanisms remain unclear. We establish network pharmacology as a systematic framework to characterize flavonoid therapeutic properties through integrated computational, experimental, and epidemiological validation. We constructed a master network of 17,869 human proteins, 14 dietary flavonoids, and 1,496 FDA-approved drugs with 278,768 interactions. Flavonoids averaged 45.3 target proteins per compound compared to 16.8 for FDA-approved drugs (2.7-fold higher; p=7.5x10^-4), reflecting multi-target architecture. Statistical analysis revealed that 71.4% of flavonoids targeted proteins associated with cardiovascular drugs and 78.6% aligned with antineoplastic drug targets. MTT-based Jurkat cell assays confirmed network predictions: high-association flavonoids (luteolin LC50=31.4 microM, myricetin=29.5 microM) produced strong cytotoxicity, while low-association flavonoids showed minimal activity (LC50>200 microM). Network-predicted association strengths correlated with experimental bioactivity (Pearson r=0.918; R^2=0.843). We translated network associations into food-level predictions across 506 foods, identifying 685 food-drug therapeutic combinations. Systematic literature searches confirmed 96 associations supported by 132 unique references. Cardiovascular domains achieved 47.1% validation. Top-validated foods included tea (31 evidence items), blueberries (18 items), tomato (13 items), grape juice (10 items), and plum (9 items). Network pharmacology characterizes dietary polypharmacological properties and generates evidence-based food-therapeutic predictions, bridging nutritional science and systems pharmacology.

Contribution of Water to Pressure and Cold Denaturation of Proteins

2026-01-12T11:20:59Z

The mechanisms of cold- and pressure-denaturation of proteins are matter of debate and are commonly understood as due to water-mediated interactions. Here we study several cases of proteins, with or without a unique native state, with or without hydrophilic residues, by means of a coarse-grain protein model in explicit solvent. We show, using Monte Carlo simulations, that taking into account how water at the protein interface changes its hydrogen bond properties and its density fluctuations is enough to predict protein stability regions with elliptic shapes in the temperature-pressure plane, consistent with previous theories. Our results clearly identify the different mechanisms with which water participates to denaturation and open the perspective to develop advanced computational design tools for protein engineering.