https://arxiv.org/api/dho5wqiOBRbvRYKD6zMk2UlpIVI 2026-03-28T11:07:38Z 4112 135 15 http://arxiv.org/abs/2511.02332v1 Biological Regulatory Network Inference through Circular Causal Structure Learning 2025-11-04T07:38:02Z

Biological networks are pivotal in deciphering the complexity and functionality of biological systems. Causal inference, which focuses on determining the directionality and strength of interactions between variables rather than merely relying on correlations, is considered a logical approach for inferring biological networks. Existing methods for causal structure inference typically assume that causal relationships between variables can be represented by directed acyclic graphs (DAGs). However, this assumption is at odds with the reality of widespread feedback loops in biological systems, making these methods unsuitable for direct use in biological network inference. In this study, we propose a new framework named SCALD (Structural CAusal model for Loop Diagram), which employs a nonlinear structure equation model and a stable feedback loop conditional constraint through continuous optimization to infer causal regulatory relationships under feedback loops. We observe that SCALD outperforms state-of-the-art methods in inferring both transcriptional regulatory networks and signaling transduction networks. SCALD has irreplaceable advantages in identifying feedback regulation. Through transcription factor (TF) perturbation data analysis, we further validate the accuracy and sensitivity of SCALD. Additionally, SCALD facilitates the discovery of previously unknown regulatory relationships, which we have subsequently confirmed through ChIP-seq data analysis. Furthermore, by utilizing SCALD, we infer the key driver genes that facilitate the transformation from colon inflammation to cancer by examining the dynamic changes within regulatory networks during the process.

2025-11-04T07:38:02Z Hongyang Jiang Yuezhu Wang Ke Feng Chaoyi Yin Yi Chang Huiyan Sun http://arxiv.org/abs/2508.21006v2 Practical indistinguishability in a gene regulatory network inference problem, a case study 2025-11-03T15:30:58Z

Determining mechanistic models of gene regulation, especially underlying phenotypic variation, is a central goal of both mathematical biology and modern evolutionary biology. However, several challenges, involving both common characteristics of experimental data and the model development process, remain that limit the discovery of general principles. Even the highest-quality experimental data come with challenges. There are always sources of noise, a limit to how often we can measure the system in time, and it is impossible to measure all the relevant states that participate in the full underlying complexity. Additionally, there are usually sources of uncertainty in the underlying biological mechanisms, which give rise to multiple competing model structures. We walk through a case study involving inference of a regulatory network structure involved in a developmental decision in the nematode, \textit{Pristonchus pacificus}. In this study, we fit 13,824 distinct regulatory network models to gene expression data from three experimental conditions to determine which regulatory features are supported by the data. We discover \textit{model sets}, or collections of models with shared regulatory network features that best fit the data, for each of the three experiments we considered, and identify a regulatory network in the intersection of the three model sets. This model describes the data across the experimental conditions and exhibits a high degree of positive regulation and interconnectivity between the key regulators, \textit{eud-1}, \textit{sult-1}, and \textit{nhr-40}. While the biological results are specific to the molecular biology of development in \textit{Pristonchus pacificus}, the comparative modeling framework introduced here can be applied to other systems of gene regulation in an evolutionary developmental context.

2025-08-28T17:08:11Z Cody E. FitzGerald Shelley Reich Victor Agaba Arjun Mathur Michael S. Werner Niall M. Mangan http://arxiv.org/abs/2510.27600v1 Effects of Model Reduction on Coherence and Information Transfer in Stochastic Biochemical Systems 2025-10-31T16:26:48Z

Simplified stochastic models are widely used in the study of frequency-resolved noise propagation in biochemical reaction networks, a common measure being the coherence between random fluctuations in molecule number trajectories. Such models have also found widespread application in the quantification of how information is transmitted in reaction networks via the mutual information (MI) rate. A common assumption is that, under timescale separation, estimates for the coherence and MI rate obtained from simplified (reduced) models closely approximate those in the underlying full models. Here, we challenge that assumption by showing that, while reduced models can faithfully reproduce low-order statistics of molecular counts, they frequently incur substantial discrepancies in the coherence spectrum, especially at intermediate and high frequencies. These errors, in turn, lead to significant inaccuracies in the resulting estimates for the MI rates. We show that the observed discrepancies are due to the interplay between the structure of the underlying reaction networks, the specific model reduction method that is applied, and the asymptotic limits relating the full and the reduced models. We illustrate our results in canonical models of enzyme catalysis and gene expression, highlighting practical implications for quantifying information flow in cells.

2025-10-31T16:26:48Z 27 pages, 3 figures Juan David Marmolejo Lozano Nikola Popovic Ramon Grima http://arxiv.org/abs/2510.27268v1 Information geometry of perturbed gradient flow systems on hypergraphs: A perspective towards nonequilibrium physics 2025-10-31T08:12:20Z

This article serves to concisely review the link between gradient flow systems on hypergraphs and information geometry which has been established within the last five years. Gradient flow systems describe a wealth of physical phenomena and provide powerful analytical technquies which are based on the variational energy-dissipation principle. Modern nonequilbrium physics has complemented this classical principle with thermodynamic uncertaintly relations, speed limits, entropy production rate decompositions, and many more. In this article, we formulate these modern principles within the framework of perturbed gradient flow systems on hypergraphs. In particular, we discuss the geometry induced by the Bregman divergence, the physical implications of dual foliations, as well as the corresponding infinitesimal Riemannian geometry for gradient flow systems. Through the geometrical perspective, we are naturally led to new concepts such as moduli spaces for perturbed gradient flow systems and thermodynamical area which is crucial for understanding speed limits. We hope to encourage the readers working in either of the two fields to further expand on and foster the interaction between the two fields.

2025-10-31T08:12:20Z 26 pages, 2 figures Dimitri Loutchko Keisuke Sugie Tetsuya J Kobayashi http://arxiv.org/abs/2510.26556v1 On the number of non-degenerate canalizing Boolean functions 2025-10-30T14:47:51Z

Canalization is a key organizing principle in complex systems, particularly in gene regulatory networks. It describes how certain input variables exert dominant control over a function's output, thereby imposing hierarchical structure and conferring robustness to perturbations. Degeneracy, in contrast, captures redundancy among input variables and reflects the complete dominance of some variables by others. Both properties influence the stability and dynamics of discrete dynamical systems, yet their combinatorial underpinnings remain incompletely understood. Here, we derive recursive formulas for counting Boolean functions with prescribed numbers of essential variables and given canalizing properties. In particular, we determine the number of non-degenerate canalizing Boolean functions -- that is, functions for which all variables are essential and at least one variable is canalizing. Our approach extends earlier enumeration results on canalizing and nested canalizing functions. It provides a rigorous foundation for quantifying how frequently canalization occurs among random Boolean functions and for assessing its pronounced over-representation in biological network models, where it contributes to both robustness and to the emergence of distinct regulatory roles.

2025-10-30T14:47:51Z 11 pages, 3 figures Claus Kadelka http://arxiv.org/abs/2508.06576v2 GFlowNets for Learning Better Drug-Drug Interaction Representations 2025-10-30T13:59:28Z

Drug-drug interactions pose a significant challenge in clinical pharmacology, with severe class imbalance among interaction types limiting the effectiveness of predictive models. Common interactions dominate datasets, while rare but critical interactions remain underrepresented, leading to poor model performance on infrequent cases. Existing methods often treat DDI prediction as a binary problem, ignoring class-specific nuances and exacerbating bias toward frequent interactions. To address this, we propose a framework combining Generative Flow Networks (GFlowNet) with Variational Graph Autoencoders (VGAE) to generate synthetic samples for rare classes, improving model balance and generate effective and novel DDI pairs. Our approach enhances predictive performance across interaction types, ensuring better clinical reliability.

2025-08-07T14:03:23Z Accepted to ICANN 2025:AIDD and NeurIPS 2025 Workshop on Structured Probabilistic Inference & Generative Modeling (https://openreview.net/forum?id=LZW1jSgfCI) Azmine Toushik Wasi http://arxiv.org/abs/2506.23496v3 Thermodynamic ranking of pathways in reaction networks 2025-10-29T05:18:00Z

One of the puzzles left open by energetic analyses of irreversible stochastic processes is that boundary conditions that prevent the performance of work or the dissipation of heat make no contribution to an entropy-production budget; yet we see ubiquitously in both engineered and living systems that both transient and persistent energy costs are paid to create and maintain such boundaries. We wish to know whether there are inherent limits for the costs of such phenomena, and common units in which those can be traded off against more familiar costs measured in terms of heat dissipation. We give this problem a concrete framing in the context of CRNs, for the problem of extracting a topologically restricted pathway from a larger distributed network, through activation of some reactions and selective elimination of others. We define a thermodynamic cost function for pathways derived from large-deviation theory of stochastic CRNs, which decomposes into two components: an ongoing maintenance cost to sustain a NESS, and a restriction cost, quantifying the ongoing improbability of neutralizing reactions outside the specified pathway. Applying this formalism to detailed-balanced CRNs in the linear response regime, we make use of their formal equivalence to electrical circuits. We prove that the resistance of a CRN decreases as reactions are added that support the throughput current, and that the maintenance cost, the restriction cost, and the thermodynamic cost of nested pathways are bounded below by those of their hosting network. For small CRNs, we show how catalytic and inhibitory mechanisms can drastically alter pathway costs, enabling unfavorable pathways to become favorable and approach the cost of the hosting pathway. Our results provide insights into the thermodynamic principles governing open CRNs and offer a foundation for understanding the evolution of metabolic networks.

2025-06-30T03:40:46Z 57 pages, 11 figures Praful Gagrani Nino Lauber Eric Smith Christoph Flamm http://arxiv.org/abs/2510.22167v1 Graph Identification of Proteins in Tomograms (GRIP-Tomo) 2.0: Topologically Aware Classification for Proteins 2025-10-25T05:27:37Z

Cryo-electron tomography (cryo-ET) enables structural characterization of biomolecules under near-native conditions. Existing approaches for interpreting the resulting three-dimensional volumes are computationally expensive and have difficulty interpreting density associated with small proteins/complexes. To explore alternate approaches for identifying proteins in cryo-ET data we pursued a Graph Network and topologically invariant approach. Here, we report on a fast algorithm that distinguishes volumes containing protein density from noise by searching for nuances of evolutionarily conversed motifs and the geometrical characteristics of protein structure. GRIP-Tomo 2.0 is a machine-learning pipeline that extracts interpretable topological features of protein structures within noisy experimental backgrounds. Compared to version 1.0, the new pipeline includes three upgrades that significantly improve performance including synthetic tomogram generation simulating realistic noise, graph-based persistent feature extraction as protein fingerprints, and high-performance computing acceleration. GRIP-Tomo 2.0 achieves over 90% accuracy in distinguishing proteins from noise for synthetic datasets and over 80% accuracy for real datasets, which represents a foundational step toward advancing cryo-ET workflows and empowering automated detection of both small and large proteins for visual proteomics.

2025-10-25T05:27:37Z Chengxuan Li August George Reece Neff Doo Nam Kim Trevor Moser Kate Baldwin Malio Nelson Arsam Firoozfar James E Evans Margaret S Cheung http://arxiv.org/abs/2510.22008v1 A Multimodal Human Protein Embeddings Database: DeepDrug Protein Embeddings Bank (DPEB) 2025-10-24T20:22:17Z

Computationally predicting protein-protein interactions (PPIs) is challenging due to the lack of integrated, multimodal protein representations. DPEB is a curated collection of 22,043 human proteins that integrates four embedding types: structural (AlphaFold2), transformer-based sequence (BioEmbeddings), contextual amino acid patterns (ESM-2: Evolutionary Scale Modeling), and sequence-based n-gram statistics (ProtVec]). AlphaFold2 protein structures are available through public databases (e.g., AlphaFold2 Protein Structure Database), but the internal neural network embeddings are not. DPEB addresses this gap by providing AlphaFold2-derived embeddings for computational modeling. Our benchmark evaluations show GraphSAGE with BioEmbedding achieved the highest PPI prediction performance (87.37% AUROC, 79.16% accuracy). The framework also achieved 77.42% accuracy for enzyme classification and 86.04% accuracy for protein family classification. DPEB supports multiple graph neural network methods for PPI prediction, enabling applications in systems biology, drug target identification, pathway analysis, and disease mechanism studies.

2025-10-24T20:22:17Z Md Saiful Islam Sajol Magesh Rajasekaran Hayden Gemeinhardt Adam Bess Chris Alvin Supratik Mukhopadhyay http://arxiv.org/abs/2507.05101v2 PRING: Rethinking Protein-Protein Interaction Prediction from Pairs to Graphs 2025-10-22T15:38:08Z

Deep learning-based computational methods have achieved promising results in predicting protein-protein interactions (PPIs). However, existing benchmarks predominantly focus on isolated pairwise evaluations, overlooking a model's capability to reconstruct biologically meaningful PPI networks, which is crucial for biology research. To address this gap, we introduce PRING, the first comprehensive benchmark that evaluates protein-protein interaction prediction from a graph-level perspective. PRING curates a high-quality, multi-species PPI network dataset comprising 21,484 proteins and 186,818 interactions, with well-designed strategies to address both data redundancy and leakage. Building on this golden-standard dataset, we establish two complementary evaluation paradigms: (1) topology-oriented tasks, which assess intra and cross-species PPI network construction, and (2) function-oriented tasks, including protein complex pathway prediction, GO module analysis, and essential protein justification. These evaluations not only reflect the model's capability to understand the network topology but also facilitate protein function annotation, biological module detection, and even disease mechanism analysis. Extensive experiments on four representative model categories, consisting of sequence similarity-based, naive sequence-based, protein language model-based, and structure-based approaches, demonstrate that current PPI models have potential limitations in recovering both structural and functional properties of PPI networks, highlighting the gap in supporting real-world biological applications. We believe PRING provides a reliable platform to guide the development of more effective PPI prediction models for the community. The dataset and source code of PRING are available at https://github.com/SophieSarceau/PRING.

2025-07-07T15:21:05Z Xinzhe Zheng Hao Du Fanding Xu Jinzhe Li Zhiyuan Liu Wenkang Wang Tao Chen Wanli Ouyang Stan Z. Li Yan Lu Nanqing Dong Yang Zhang http://arxiv.org/abs/2409.17488v2 Optimal control of stochastic reaction networks with entropic control cost and emergence of mode-switching strategies 2025-10-20T05:04:37Z

Controlling the stochastic dynamics of biological populations is a challenge that arises across various biological contexts. However, these dynamics are inherently nonlinear and involve a discrete state space, i.e., the number of molecules, cells, or organisms. Additionally, the possibility of extinction has a significant impact on both dynamics and control strategies, particularly when the population size is small. These factors hamper the direct application of conventional control theories to biological systems. To address these challenges, we formulate the optimal control problem for stochastic population dynamics by utilizing control cost functions based on the f-divergence, which naturally accounts for population-specific factors. If Kullback-Leibler (KL) divergence is adopted for the cost function, the complex nonlinear Hamilton-Jacobi-Bellman equation is simplified into a linear form, facilitating efficient computation of optimal solutions. We demonstrate the effectiveness of our approach by applying it to the control of interacting random walkers, Moran processes, and SIR models, and observe the mode-switching phenomena in the control strategies. Our approach provides new opportunities for applying control theory to a wide range of biological problems.

2024-09-26T02:50:32Z 12 pages, 4 figures PRX Life 3, 033027 (2025) Shuhei A. Horiguchi Tetsuya J. Kobayashi 10.1103/zttn-tpzq http://arxiv.org/abs/2510.16824v1 ProtoMol: Enhancing Molecular Property Prediction via Prototype-Guided Multimodal Learning 2025-10-19T13:19:37Z

Multimodal molecular representation learning, which jointly models molecular graphs and their textual descriptions, enhances predictive accuracy and interpretability by enabling more robust and reliable predictions of drug toxicity, bioactivity, and physicochemical properties through the integration of structural and semantic information. However, existing multimodal methods suffer from two key limitations: (1) they typically perform cross-modal interaction only at the final encoder layer, thus overlooking hierarchical semantic dependencies; (2) they lack a unified prototype space for robust alignment between modalities. To address these limitations, we propose ProtoMol, a prototype-guided multimodal framework that enables fine-grained integration and consistent semantic alignment between molecular graphs and textual descriptions. ProtoMol incorporates dual-branch hierarchical encoders, utilizing Graph Neural Networks to process structured molecular graphs and Transformers to encode unstructured texts, resulting in comprehensive layer-wise representations. Then, ProtoMol introduces a layer-wise bidirectional cross-modal attention mechanism that progressively aligns semantic features across layers. Furthermore, a shared prototype space with learnable, class-specific anchors is constructed to guide both modalities toward coherent and discriminative representations. Extensive experiments on multiple benchmark datasets demonstrate that ProtoMol consistently outperforms state-of-the-art baselines across a variety of molecular property prediction tasks.

2025-10-19T13:19:37Z Yingxu Wang Kunyu Zhang Jiaxin Huang Nan Yin Siwei Liu Eran Segal http://arxiv.org/abs/2409.06877v3 Positive equilibria in mass action networks: geometry and bounds 2025-10-15T18:01:59Z

We present results on the geometry of the positive equilibrium set of a mass action network. Any mass action network gives rise to a parameterised family of polynomial equations whose positive solutions are the positive equilibria of the network. Here, we start by deriving alternative systems of equations, whose solutions are in smooth, one-to-one correspondence with positive equilibria of the network, and capture degeneracy or nondegeneracy of the corresponding equilibria. The derivation leads us to consider partitions of networks in a natural sense, and we explore the implications of choosing different partitions. The alternative systems are often simpler than the original mass action equations, sometimes giving explicit parameterisations of positive equilibria, and allowing us to rapidly identify various algebraic and geometric properties of the positive equilibrium set, including toricity and local toricity. We can use the approaches we develop to bound the number of positive nondegenerate equilibria on stoichiometric classes; to derive semialgebraic descriptions of the parameter regions for multistationarity; and to study bifurcations. We present the main construction, various consequences for particular classes of networks, and numerous examples. We also develop additional techniques specifically for quadratic networks, the most common class of networks in applications, and use these techniques to derive strengthened results for quadratic networks.

2024-09-10T21:47:12Z Murad Banaji Elisenda Feliu http://arxiv.org/abs/2510.09594v1 MODE: Learning compositional representations of complex systems with Mixtures Of Dynamical Experts 2025-10-10T17:52:31Z

Dynamical systems in the life sciences are often composed of complex mixtures of overlapping behavioral regimes. Cellular subpopulations may shift from cycling to equilibrium dynamics or branch towards different developmental fates. The transitions between these regimes can appear noisy and irregular, posing a serious challenge to traditional, flow-based modeling techniques which assume locally smooth dynamics. To address this challenge, we propose MODE (Mixture Of Dynamical Experts), a graphical modeling framework whose neural gating mechanism decomposes complex dynamics into sparse, interpretable components, enabling both the unsupervised discovery of behavioral regimes and accurate long-term forecasting across regime transitions. Crucially, because agents in our framework can jump to different governing laws, MODE is especially tailored to the aforementioned noisy transitions. We evaluate our method on a battery of synthetic and real datasets from computational biology. First, we systematically benchmark MODE on an unsupervised classification task using synthetic dynamical snapshot data, including in noisy, few-sample settings. Next, we show how MODE succeeds on challenging forecasting tasks which simulate key cycling and branching processes in cell biology. Finally, we deploy our method on human, single-cell RNA sequencing data and show that it can not only distinguish proliferation from differentiation dynamics but also predict when cells will commit to their ultimate fate, a key outstanding challenge in computational biology.

2025-10-10T17:52:31Z 30 pages, 5 figures Nathan Quiblier Roy Friedman Matthew Ricci http://arxiv.org/abs/2510.09372v1 Design of DNA Strand Displacement Reactions 2025-10-10T13:28:10Z

DNA strand displacement (SD) reactions are central to the operation of many synthetic nucleic acid systems, including molecular circuits, sensors, and machines. Over the years, a broad set of design frameworks has emerged to accommodate various functional goals, initial configurations, and environmental conditions. Nevertheless, key challenges persist, particularly in reliably predicting reaction kinetics. This review examines recent approaches to SD reaction design, with emphasis on the properties of single reactions, including kinetics, structural factors, and limitations in current modelling practices. We identify promising innovations while analysing the factors that continue to hinder predictive accuracy. We conclude by outlining future directions for achieving more robust and programmable behaviour in DNA-based systems.

2025-10-10T13:28:10Z 16 pages, 3 figures. Invited review article Križan Jurinović Merry Mitra Rakesh Mukherjee Thomas E. Ouldridge