https://arxiv.org/api/LqJtFUxaanRn9bJdFTtDCs1/exg 2026-03-28T09:12:46Z 4112 120 15 http://arxiv.org/abs/2511.14669v1 Hyperbolic Graph Embeddings Reveal the Host-Pathogen Interactome 2025-11-18T17:08:37Z

Infections depend on interactions between pathogen and host proteins, but comprehensively mapping these interactions is challenging and labor intensive. Many biological networks have hierarchical, scale-free structure, so we developed a deep learning framework, ApexPPI, that represents protein networks in hyperbolic Riemannian space to capture these features. Our model integrates multimodal biological data (protein sequences, gene perturbation experiments, and complementary interaction networks) to predict likely interactions between pathogen and host proteins through multi-task hyperbolic graph neural networks. Mapping protein features into hyperbolic space led to much higher accuracy than previous methods in predicting host-pathogen interactions. From tens of millions of possible protein pairs, our model identified thousands of high-confidence interactions, including many involving human G-protein-coupled receptors (GPCRs). We validated dozens of these predicted complexes using AlphaFold 3 structural modeling, supporting the accuracy of our predictions. This comprehensive map of host-pathogen protein interactions provides a resource for discovering new treatments and illustrates how advanced AI can unravel complex biological systems.

2025-11-18T17:08:37Z Xiaoqiong Xia Cesar de la Fuente-Nunez http://arxiv.org/abs/2511.14431v1 BiRNe: Symbolic bifurcation analysis of reaction networks with Python 2025-11-18T12:33:58Z

Computer algebra methods for analyzing reaction networks often rely on the assumption of mass-action kinetics, which transform the governing ODEs into polynomial systems amenable to techniques such as Gröbner basis computation and related algebraic tools. However, these methods face significant computational complexity, limiting their applicability to relatively small networks involving only a handful of species. In contrast, building on recent theoretical advances, we introduce here \textsc{BiRNe} (BIfurcations in Reaction NEtworks) Python module, which relies on a symbolic approach designed to detect bifurcations in larger reaction networks (up to 10-20 species, depending on the network's connectivity) equipped with parameter-rich kinetics. This class includes enzymatic kinetics such as Michaelis--Menten, ligand-binding kinetics like Hill functions, and generalized mass-action kinetics. For a given network, the current algorithm identifies all minimal autocatalytic subnetworks and fully characterizes the presence of bifurcations associated with zero eigenvalues, thus determining whether the network admits multistationarity. It also detects oscillatory bifurcations arising from positive-feedback structures, capturing a significant class of possible oscillations.

2025-11-18T12:33:58Z 19 pages, 1 figure, Applications of Computer Algebra 2025 in Crete, Session: Computer Algebra Applications in the Life Sciences Richard Golnik Thomas Gatter Peter F. Stadler Nicola Vassena http://arxiv.org/abs/2505.02712v3 Graph Neural Network-Based Reinforcement Learning for Controlling Biological Networks - the GATTACA Framework 2025-11-17T17:35:01Z

Cellular reprogramming, the artificial transformation of one cell type into another, has been attracting increasing research attention due to its therapeutic potential for complex diseases. However, identifying effective reprogramming strategies through classical wet-lab experiments is hindered by lengthy time commitments and high costs. In this study, we explore the use of deep reinforcement learning (DRL) to control Boolean network models of complex biological systems, such as gene regulatory and signalling pathway networks. We formulate a novel control problem for Boolean network models under the asynchronous update mode, specifically in the context of cellular reprogramming. To solve it, we devise GATTACA, a scalable computational framework. To facilitate scalability of our framework, we consider previously introduced concept of a pseudo-attractor and improve the procedure for effective identification of pseudo-attractor states. We then incorporate graph neural networks with graph convolution operations into the artificial neural network approximator of the DRL agent's action-value function. This allows us to leverage the available knowledge on the structure of a biological system and to indirectly, yet effectively, encode the system's modelled dynamics into a latent representation. Experiments on several large-scale, real-world biological networks from the literature demonstrate the scalability and effectiveness of our approach.

2025-05-05T15:07:20Z Andrzej Mizera Jakub Zarzycki http://arxiv.org/abs/2504.21011v2 Bifunctional enzyme action as a source of robustness in biochemical reaction networks: a novel hypergraph approach 2025-11-17T00:36:52Z

Substrate modification networks are ubiquitous in living, biochemical systems. A higher-level hypergraph "skeleton" captures key information about which substrates are transformed in the presence of modification-specific enzymes. Many different detailed models can be associated to the same skeleton, however uncertainty related to model fitting increases with the level of detail. We show that essential dynamical properties such as existence of positive steady states and concentration robustness can be extracted directly from the skeleton independent of the detailed model. The novel formalism of directed hypergraphs is used to prove that bifunctional enzyme action plays a key role in generating robustness. Moreover, we use another novel concept of "current" on a directed hypergraph to establish a link between potentially remote network components. Current is an essential notion required for existence of positive steady states, and furthermore, current-matching combined with bifunctionality generates concentration robustness.

2025-04-15T23:31:48Z Badal Joshi Tung D. Nguyen http://arxiv.org/abs/2511.12805v1 Practical Causal Evaluation Metrics for Biological Networks 2025-11-16T22:18:15Z

Estimating causal networks from biological data is a critical step in systems biology. When evaluating the inferred network, assessing the networks based on their intervention effects is particularly important for downstream probabilistic reasoning and the identification of potential drug targets. In the context of gene regulatory network inference, biological databases are often used as reference sources. These databases typically describe relationships in a qualitative rather than quantitative manner. However, few evaluation metrics have been developed that take this qualitative nature into account. To address this, we developed a metric, the sign-augmented Structural Intervention Distance (sSID), and a weighted sSID that incorporates the net effects of the intervention. Through simulations and analyses of real transcriptomic datasets, we found that our proposed metrics could identify a different algorithm as optimal compared to conventional metrics, and the network selected by sSID had a superior performance in the classification task of clinical covariates using transcriptomic data. This suggests that sSID can distinguish networks that are structurally correct but functionally incorrect, highlighting its potential as a more biologically meaningful and practical evaluation metric.

2025-11-16T22:18:15Z 15 pages, 1 figure Noriaki Sato Marco Scutari Shuichi Kawano Rui Yamaguchi Seiya Imoto http://arxiv.org/abs/2507.14941v2 Stability conditions of chemical networks in a linear framework 2025-11-14T16:11:53Z

Autocatalytic chemical reaction networks can collectively replicate or maintain their constituents despite degradation reactions only above a certain threshold, which we refer to as the decay threshold. When the chemical network has a Jacobian matrix with the Metzler property, we leverage analytical methods developed for Markov processes to show that the decay threshold can be calculated by solving a linear problem, instead of the standard eigenvalue problem. We explore how this decay threshold depends on the network parameters, such as its size, the directionality of the reactions (reversible or irreversible), and its connectivity, then we deduce design principles from this that might be relevant to research on the Origin of Life.

2025-07-20T12:33:47Z Armand Despons Jérémie Unterberger David Lacoste http://arxiv.org/abs/2511.10609v1 Multistationarity in semi-open Phosphorylation-Dephosphorylation Cycles 2025-11-13T18:43:36Z

Multistationarity, underlies biochemical switching and cellular decision-making. We study how multistationarity in the sequential n-site phosphorylation-dephosphorylation cycle is affected when only some species are open, meaning allowed to exchange with the environment (so-called semi-open networks). Working under mass action kinetics, we obtain two complementary structural results for every $n\geq 2$. First, opening any nonempty subset of the substrate species preserves the network's capacity for nondegenerate multistationarity. Second, opening the enzyme species (both kinase and phosphatase), possibly together with any subset of substrates, always destroys multistationarity. The latter result is proved by a general reduction framework combining the detection of absolute concentration robustness (ACR) with projection onto the remaining species; when the projection is monostationary, the full semi-open system is monostationary. We also illustrate the general method on multi-layer cascade variants and discuss biological implications: opening enzymes acts as a robust switch that converts a potentially multistationary phosphorylation module into a monostationary one, while substrate exchange preserves switching capacity and thus the ability to couple cycles to downstream processes.

2025-11-13T18:43:36Z 24 pages Praneet Nandan Beatriz Pascual-Escudero Diego Rojas La Luz http://arxiv.org/abs/2511.10337v1 Stochastic Thermodynamics of Cooperative Biomolecular Machines: Fluctuation Relations and Hidden Detailed Balance Breaking 2025-11-13T14:11:07Z

We examine a biomolecular machine involving a driven, observable process coupled to a hidden process in a kinetically cooperative manner. A stochastic thermodynamics framework is employed to analyze a fluctuation theorem for the first-passage time of the observable process under nonequilibrium steady-state conditions. Based on a generic kinetic model, we demonstrate that, along first-passage trajectories, entropy production remains constant when the changes in stochastic entropy and free energy of the machine are balanced, which corresponds to zero net hidden flux through the initial state manifold. Under this condition, which we define quite generally, this first-passage time fluctuation theorem can be established, with its violation serving as an experimentally detectable signature of hidden detailed balance breaking (which we subsequently characterize). In addition, using an enzymatic model, we show that the violation of our first-passage time fluctuation theorem can be thought of as a consequence of the breakdown of local detailed balance in the steps linking coarse-grained states that correspond to the initial and intermediate state manifolds. In the absence of hidden current, the fluctuation theorem is restored, and a mesoscopic local detailed balance condition can be established, which has implications for the thermodynamic analysis of driven, coarse-grained systems. This work sheds significant light on the unique connections between stochastic thermodynamic quantities and kinetic measurements in complex cooperative networks.

2025-11-13T14:11:07Z D. Evan Piephoff Jianshu Cao http://arxiv.org/abs/2501.02409v5 Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations 2025-11-12T23:46:55Z

Modern high-throughput biological datasets with thousands of perturbations provide the opportunity for large-scale discovery of causal graphs that represent the regulatory interactions between genes. Differentiable causal graphical models have been proposed to infer a gene regulatory network (GRN) from large scale interventional datasets, capturing the causal gene regulatory relationships from genetic perturbations. However, existing models are limited in their expressivity and scalability while failing to address the dynamic nature of biological processes such as cellular differentiation. We propose PerturbODE, a novel framework that incorporates biologically informative neural ordinary differential equations (neural ODEs) to model cell state trajectories under perturbations and derive the causal GRN from the neural ODE's parameters. We demonstrate PerturbODE's efficacy in trajectory prediction and GRN inference across simulated and real over-expression datasets.

2025-01-05T01:04:23Z Zaikang Lin Sei Chang Aaron Zweig Minseo Kang Elham Azizi David A. Knowles http://arxiv.org/abs/2412.03883v2 Multi-Scale Hybrid Modeling to Predict Cell Culture Process with Metabolic Phase Transitions 2025-11-12T01:31:39Z

To advance understanding of cellular metabolism and reduce batch-to-batch variability in cell culture processes, this study introduces a multi-scale hybrid modeling framework designed to simulate and predict the dynamic behavior of CHO cell cultures undergoing metabolic phase transitions. The model captures dependencies across molecular, cellular, and macro-kinetic levels, accounting for variability in single-cell metabolic phases. It integrates three components: (i) a stochastic mechanistic model of single-cell metabolic networks, (ii) a probabilistic model of phase transitions, and (iii) a macro-kinetic model of heterogeneous population dynamics. This modular architecture enables flexible representation of process trajectories under diverse conditions and incorporates heterogeneous online (e.g., oxygen uptake, pH) and offline measurements (e.g., viable cell density, metabolite concentrations). Leveraging these data and single-cell insights, the framework predicts culture dynamics using only readily available online measurements and initial conditions, delivering accurate long-term forecasts of multivariate culture behavior and uncertainty-aware estimates of batch-to-batch variation. Overall, this work establishes a robust foundation for digital twin platforms and predictive bioprocess analytics, supporting systematic experimental design and process control to improve yield and production stability in biomanufacturing.

2024-12-05T05:30:04Z 35 pages, 18 figures Keqi Wang Sarah W. Harcum Wei Xie http://arxiv.org/abs/2511.04838v1 SPECTRA: Spectral Target-Aware Graph Augmentation for Imbalanced Molecular Property Regression 2025-11-06T21:57:21Z

In molecular property prediction, the most valuable compounds (e.g., high potency) often occupy sparse regions of the target space. Standard Graph Neural Networks (GNNs) commonly optimize for the average error, underperforming on these uncommon but critical cases, with existing oversampling methods often distorting molecular topology. In this paper, we introduce SPECTRA, a Spectral Target-Aware graph augmentation framework that generates realistic molecular graphs in the spectral domain. SPECTRA (i) reconstructs multi-attribute molecular graphs from SMILES; (ii) aligns molecule pairs via (Fused) Gromov-Wasserstein couplings to obtain node correspondences; (iii) interpolates Laplacian eigenvalues, eigenvectors and node features in a stable share-basis; and (iv) reconstructs edges to synthesize physically plausible intermediates with interpolated targets. A rarity-aware budgeting scheme, derived from a kernel density estimation of labels, concentrates augmentation where data are scarce. Coupled with a spectral GNN using edge-aware Chebyshev convolutions, SPECTRA densifies underrepresented regions without degrading global accuracy. On benchmarks, SPECTRA consistently improves error in relevant target ranges while maintaining competitive overall MAE, and yields interpretable synthetic molecules whose structure reflects the underlying spectral geometry. Our results demonstrate that spectral, geometry-aware augmentation is an effective and efficient strategy for imbalanced molecular property regression.

2025-11-06T21:57:21Z Brenda Nogueira Meng Jiang Nitesh V. Chawla Nuno Moniz http://arxiv.org/abs/2511.03483v1 A Gene Ranking Framework Enhances the Design Efficiency of Genome-Scale Constraint-Based Metabolic Networks 2025-11-05T14:09:11Z

The design of genome-scale constraint-based metabolic networks has steadily advanced, with an increasing number of successful cases achieving growth-coupled production, in which the biosynthesis of key metabolites is linked to cell growth. However, a major cause of design failures is the inability to find solutions within realistic time limits. Therefore, it is essential to develop methods that achieve a high success rate within the specified computation time. In this study, we propose a framework for ranking the importance of individual genes to accelerate the solution of the original mixed-integer linear programming (MILP) problems in the design of constraint-based models. In the proposed method, after pre-assigning values to highly important genes, the MILPs are solved in parallel as a series of mutually exclusive subproblems. It is found that our framework was able to recover most of the successful cases identified by the original approach and achieved a 37% to 186% increase in success rate compared to the original method within the same time limits. Analysis of the MILP solution process revealed that the proposed method reduced the sizes of subproblems and decreased the number of nodes in the branch-and-bound tree. This framework for ranking gene importance can be directly applicable to a range of MILP-based algorithms for the design of constraint-based metabolic networks. The developed scripts are available on https://github.com/MetNetComp/Gene-Ranked-RatGene.

2025-11-05T14:09:11Z Yier Ma Takeyuki Tamura http://arxiv.org/abs/2507.06853v2 DiffSpectra: Molecular Structure Elucidation from Spectra using Diffusion Models 2025-11-05T11:17:38Z

Molecular structure elucidation from spectra is a fundamental challenge in molecular science. Conventional approaches rely heavily on expert interpretation and lack scalability, while retrieval-based machine learning approaches remain constrained by limited reference libraries. Generative models offer a promising alternative, yet most adopt autoregressive architectures that overlook 3D geometry and struggle to integrate diverse spectral modalities. In this work, we present DiffSpectra, a generative framework that formulates molecular structure elucidation as a conditional generation process, directly inferring 2D and 3D molecular structures from multi-modal spectra using diffusion models. Its denoising network is parameterized by the Diffusion Molecule Transformer, an SE(3)-equivariant architecture for geometric modeling, conditioned by SpecFormer, a Transformer-based spectral encoder capturing multi-modal spectral dependencies. Extensive experiments demonstrate that DiffSpectra accurately elucidates molecular structures, achieving 40.76% top-1 and 99.49% top-10 accuracy. Its performance benefits substantially from 3D geometric modeling, SpecFormer pre-training, and multi-modal conditioning. To our knowledge, DiffSpectra is the first framework that unifies multi-modal spectral reasoning and joint 2D/3D generative modeling for de novo molecular structure elucidation.

2025-07-09T13:57:20Z Liang Wang Yu Rong Tingyang Xu Zhenyi Zhong Zhiyuan Liu Pengju Wang Deli Zhao Qiang Liu Shu Wu Liang Wang Yang Zhang http://arxiv.org/abs/2408.08503v2 Computational strategies for cross-species knowledge transfer 2025-11-04T18:03:08Z

Research organisms provide invaluable insights into human biology and diseases, serving as essential tools for functional experiments, disease modeling, and drug testing. However, evolutionary divergence between humans and research organisms hinders effective knowledge transfer across species. Here, we review state-of-the-art methods for computationally transferring knowledge across species, primarily focusing on methods that utilize transcriptome data and/or molecular networks. Our review addresses four key areas: (1) transferring disease and gene annotation knowledge across species, (2) identifying functionally equivalent molecular components, (3) inferring equivalent perturbed genes or gene sets, and (4) identifying equivalent cell types. We conclude with an outlook on future directions and several key challenges that remain in cross-species knowledge transfer, including introducing the concept of "agnology" to describe functional equivalence of biological entities, regardless of their evolutionary origins. This concept is becoming pervasive in integrative data-driven models where evolutionary origins of functions can remain unresolved.

2024-08-16T03:01:35Z Hao Yuan Christopher A. Mancuso Kayla Johnson Ingo Braasch Arjun Krishnan http://arxiv.org/abs/2511.02418v1 Biomolecular LQR under Partial Observation 2025-11-04T09:47:45Z

This paper introduces a biomolecular Linear Quadratic Regulator (LQR) to investigate the design principles of gene regulatory networks. We show that for fundamental gene regulation network, the bio-controller derived from LQR theory precisely recapitulate natural network motifs, such as auto-regulation and incoherent feedforward loops. This emulation arises from a fundamental principle: the LQR cost function mathematically encodes environmental survival demands, which subsequently drives the selection of both network topology and biochemical parameters. Our work thus establishes a theoretical basis for interpreting biological circuit design, directly linking evolutionary pressures to observable regulatory structures.

2025-11-04T09:47:45Z Xiaoyu Zhang Zhou Fang