https://arxiv.org/api/gBwmwiMG0AK+qmAJpxhY9T7Ypd42026-03-24T09:47:14Z411210515http://arxiv.org/abs/2512.02478v1Simulation and inference methods for non-Markovian stochastic biochemical reaction networks2025-12-02T07:15:17ZStochastic models of biochemical reaction networks are widely used to capture intrinsic noise in cellular systems. The typical formulation of these models are based on Markov processes for which there is extensive research on efficient simulation and inference. However, there are biological processes, such as gene transcription and translation, that introduce history dependent dynamics requiring non-Markovian processes to accurately capture the stochastic dynamics of the system. This greater realism comes with additional computational challenges for simulation and parameter inference. We develop efficient stochastic simulation algorithms for well-mixed non-Markovian stochastic biochemical reaction networks with delays that depend on system state and time. Our methods generalize the next reaction method and $τ$-leaping method to support arbitrary inter-event time distributions while preserving computational scalability. We also introduce a coupling scheme to generate exact non-Markovian sample paths that are positively correlated to an approximate non-Markovian $τ$-leaping sample path. This enables substantial computational gains for Bayesian inference of model parameters though multifidelity simulation-based inference schemes. We demonstrate the effectiveness of our approach on a gene regulation model with delayed auto-inhibition, showing substantial gains in both simulation accuracy and inference efficiency of two orders of magnitude. These results extend the practical applicability of non-Markovian models in systems biology and beyond.2025-12-02T07:15:17ZThomas P. SteeleDavid J. Warnehttp://arxiv.org/abs/2512.02204v1MoRSAIK: Sequence Motif Reactor Simulation, Analysis and Inference Kit in Python2025-12-01T20:52:23ZOrigins of life research investigates how life could emerge from prebiotic chemistry only. One possible explanation provides the RNA world hypothesis. It states that life could emerge from RNA strands only, storing and transferring biological information, as well as catalyzing reactions as ribozymes. Before this state could have emerged, however, the prebiotic world was probably a purely chemical pool of short RNA strands with random sequences and without biological function performing hybridization and dehybridization, as well as ligation and cleavage. In this context relevant questions are what are the conditions that allow longer RNA strands to be built and how can information carrying in RNA sequence emerge?
In order to investigate such RNA reactors, efficient simulations are needed because the space of possible RNA sequences increases exponentially with the length of the strands, as well as the number of reactions between two strands. In addition, simulations have to be compared to experimental data for validation and parameter calibration. Here, we present the MoRSAIK python package for sequence motif (or k-mer) reactor simulation, analysis and inference. It enables users to simulate RNA sequence motif dynamics in the mean field approximation as well as to infer the reaction parameters from data with Bayesian methods and to analyze results by computing observables and plotting. MoRSAIK simulates an RNA reactor by following the reactions and the concentrations of all strands inside up to a certain length (of four nucleotides by default). Longer strands are followed indirectly, by tracking the concentrations of their containing sequence motifs of that maximum length.2025-12-01T20:52:23Z5 pages, 1 figureJohannes Harth-KitzerowUlrich GerlandTorsten A. Enßlinhttp://arxiv.org/abs/2512.01160v1From Regression to Classification: Exploring the Benefits of Categorical Representations of Energy in MLIPs2025-12-01T00:36:42ZDensity Functional Theory (DFT) is a widely used computational method for estimating the energy and behavior of molecules. Machine Learning Interatomic Potentials (MLIPs) are models trained to approximate DFT-level energies and forces at dramatically lower computational cost. Many modern MLIPs rely on a scalar regression formulation; given information about a molecule, they predict a single energy value and corresponding forces while minimizing absolute error with DFT's calculations. In this work, we explore a multi-class classification formulation that predicts a categorical distribution over energy/force values, providing richer supervision through multiple targets. Most importantly, this approach offers a principled way to quantify model uncertainty.
In particular, our method predicts a histogram of the energy/force distribution, converts scalar targets into histograms, and trains the model using cross-entropy loss. Our results demonstrate that this categorical formulation can achieve absolute error performance comparable to regression baselines. Furthermore, this representation enables the quantification of epistemic uncertainty through the entropy of the predicted distribution, offering a measure of model confidence absent in scalar regression approaches.2025-12-01T00:36:42Z11th Annual Conference on Vision and Intelligent Systems (CVIS 2025)Ahmad Alihttp://arxiv.org/abs/2511.19444v2Comment on "Direct Targeting and Regulation of RNA Polymerase II by Cell Signaling Kinases"2025-11-28T16:23:02ZDabas et al. in Science 2025 report that approximately 117 human kinases directly phosphorylate the C-terminal domain (CTD) of RNA polymerase II (Pol II), proposing an extensive, direct biochemical bridge between signal transduction and transcriptional control. Such a sweeping claim that one-fourth of the human kinome directly targets the CTD represents a profound revision of canonical transcriptional biology. However, the evidence presented relies primarily on in vitro kinase assays using short CTD peptides, sparse in-cell validation, and mechanistically incomplete models of nuclear trafficking, chromatin targeting, structural compatibility, and catalytic specificity. In this extended critique, we demonstrate that the conclusions of this study are not supported by current biochemical, structural, cell biological, or genomic data. We outline severe shortcomings in assay design, lack of quantitative kinetics, incompatibilities with known Pol II structural constraints, unsupported assumptions about nuclear localization, inappropriate extension to "direct-at-gene" mechanisms, absence of global transcriptional effects, failure to align with the essential role of canonical CDKs, and missing transparency in dataset reporting. We conclude that the central claims of the study are premature and contradicted by decades of established transcriptional research. Substantial new evidence is required before revising the mechanistic model of Pol II CTD regulation.2025-11-15T01:48:14ZarXiv admin note: This submission has been withdrawn due to violation of arXiv policies for acceptable submissionsJia LiCollege of Chemical Engineering, Huaqiao University, Xiamen, ChinaShu-Feng ZhouCollege of Chemical Engineering, Huaqiao University, Xiamen, Chinahttp://arxiv.org/abs/2511.23114v1A Spectral Koopman Approximation Framework for Stochastic Reaction Networks2025-11-28T11:57:17ZStochastic reaction networks (SRNs) are a general class of continuous-time Markov jump processes used to model a wide range of systems, including biochemical dynamics in single cells, ecological and epidemiological populations, and queueing or communication networks. Yet analyzing their dynamics remains challenging because these processes are high-dimensional and their transient behavior can vary substantially across different initial molecular or population states. Here we introduce a spectral framework for the stochastic Koopman operator that provides a tractable, low-dimensional representation of SRN dynamics over continuous time, together with computable error estimates. By exploiting the compactness of the Koopman operator, we recover dominant spectral modes directly from simulated or experimental data, enabling efficient prediction of moments, event probabilities, and other summary statistics across all initial states. We further derive continuous-time parameter sensitivities and cross-spectral densities, offering new tools for probing noise structure and frequency-domain behavior. We demonstrate the approach on biologically relevant systems, including synthetic intracellular feedback controllers, stochastic oscillators, and inference of initial-state distributions from high-temporal-resolution flow cytometry. Together, these results establish spectral Koopman analysis as a powerful and general framework for studying stochastic dynamical systems across the biological, ecological, and computational sciences.2025-11-28T11:57:17Z7 figuresAnkit GuptaMustafa Khammashhttp://arxiv.org/abs/2511.22252v1Stochastic Models of Resource Allocation in Chemical Reaction Networks2025-11-27T09:27:48ZThis paper analyses of a stochastic model of a chemical reaction network with three types of chemical species ${\cal R}$, ${\cal M}$ and ${\cal U}$ that interact to transform a flow of external resources, the chemical species ${\cal Q}$, to produce a product, the chemical species ${\cal P}_r$. A regulation mechanism involving the sequestration of the chemical species ${\cal R}$ when the flow of resources is too low is investigated. The original motivation of the study is of analyzing the qualitative properties of a key regulation mechanism of gene expression in biological cells, the {\em stringent response}.
A scaling analysis of a Markov process in $\N^5$ representing the state of the chemical reaction network is achieved. It is shown that, depending on the parameters of the model, there are, quite surprisingly, three possible asymptotic regimes. To each of them corresponds a stochastic averaging principle with a fast process expressed in terms of a network of $M/M/\infty$ queues. One of these regimes, the optimal sequestration regime, does not seem to have been identified up to now. Under this regime, the input flow of resources is low but the state of the network is still acceptable in terms of unused macro-molecules, showing the remarkable efficiency of this regulation mechanism. The technical proofs of the main convergence results rely on a combination of coupling arguments, technical estimates of the solutions of SDEs, of sample paths of fast processes in particular, and the stability properties of some dynamical systems in $\R^2$.2025-11-27T09:27:48ZVincent FromionPhilippe RobertJana Zaherddinehttp://arxiv.org/abs/2507.09272v2Degeneracy of Zero-one Reaction Networks2025-11-27T05:25:42ZZero-one biochemical reaction networks are widely recognized for their importance in analyzing signal transduction and cellular decision-making processes. Degenerate networks reveal non-standard behaviors and mark the boundary where classical methods fail. Their analysis is key to understanding exceptional dynamical phenomena in biochemical systems. Therefore, we focus on investigating the degeneracy of zero-one reaction networks. It is known that one-dimensional zero-one networks cannot degenerate. In this work, we identify all degenerate two-dimensional zero-one reaction networks with up to three species by an efficient algorithm. By analyzing the structure of these networks, we arrive at the following conclusion: if a two-dimensional zero-one reaction network with three species is degenerate, then its steady-state system is equivalent to a binomial system.2025-07-12T12:57:49ZXiaoxian TangYihan WangJiandong Zhanghttp://arxiv.org/abs/2305.13348v2On the reduction of stochastic chemical reaction networks2025-11-25T21:56:27ZThe linear noise approximation (LNA) describes the random fluctuations from the mean-field concentrations of a chemical reaction network due to intrinsic noise. It is also used as a test probe to determine the accuracy of reduced formulations of the chemical master equation and to understand the relationship between timescale disparity and model reduction in stochastic environments. Although several reduced LNAs have been proposed, they have not been placed into a general theory concerning the accuracy of reduced LNAs derived from center manifold and singular perturbation theory. This has made it difficult to understand why certain reductions of the master or Langevin equations fail or succeed. In this work, we develop a deeper understanding of slow manifold projection in the linear noise regime by answering a straightforward but open question: In the presence of eigenvalue disparity, does the appropriate oblique projection of the LNA onto the slow eigenspace accurately approximate the first and second moments of complete LNA, and if not, why? Although most studies concentrate on the role of eigenvalue disparity arising from the drift matrix, we go further and examine the interplay between disparate ``drift" eigenvalues and the eigenvalues of the diffusion matrix, the latter of which may or may not be disparate. Furthermore, we place the previously established reductions of the LNA into a more general framework and formulate the necessary and sufficient conditions for the projected LNA to accurately approximate the first and second moments of the complete LNA.2023-05-22T16:39:17Z9 Figures, 38 pagesJustin EilertsenWylie Stroberghttp://arxiv.org/abs/2511.19813v1Time-Varying Network Driver Estimation (TNDE) Quantifies Stage-Specific Regulatory Effects From Single-Cell Snapshots2025-11-25T00:51:55ZIdentifying key driver genes governing biological processes such as development and disease progression remains a challenge. While existing methods can reconstruct cellular trajectories or infer static gene regulatory networks (GRNs), they often fail to quantify time-resolved regulatory effects within specific temporal windows. Here, we present Time-varying Network Driver Estimation (TNDE), a computational framework quantifying dynamic gene driver effects from single-cell snapshot data under a linear Markov assumption. TNDE leverages a shared graph attention encoder to preserve the local topological structure of the data. Furthermore, by incorporating partial optimal transport, TNDE accounts for unmatched cells arising from proliferation or apoptosis, thereby enabling trajectory alignment in non-equilibrium processes. Benchmarking on simulated datasets demonstrates that TNDE outperforms existing baseline methods across diverse complex regulatory scenarios. Applied to mouse erythropoiesis data, TNDE identifies stage-specific driver genes, the functional relevance of which is corroborated by biological validation. TNDE offers an effective quantitative tool for dissecting dynamic regulatory mechanisms underlying complex biological processes.2025-11-25T00:51:55ZJiaxin LiShanjun Maohttp://arxiv.org/abs/2505.09664v2KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference2025-11-24T13:05:14ZGene regulatory network (GRN) inference serves as a cornerstone for deciphering cellular decision-making processes. Early approaches rely exclusively on gene expression data, thus their predictive power remain fundamentally constrained by the vast combinatorial space of potential gene-gene interactions. Subsequent methods integrate prior knowledge to mitigate this challenge by restricting the solution space to biologically plausible interactions. However, we argue that the effectiveness of these approaches is contingent upon the precision of prior information and the reduction in the search space will circumscribe the models' potential for novel biological discoveries. To address these limitations, we introduce KINDLE, a three-stage framework that decouples GRN inference from prior knowledge dependencies. KINDLE trains a teacher model that integrates prior knowledge with temporal gene expression dynamics and subsequently distills this encoded knowledge to a student model, enabling accurate GRN inference solely from expression data without access to any prior. KINDLE achieves state-of-the-art performance across four benchmark datasets. Notably, it successfully identifies key transcription factors governing mouse embryonic development and precisely characterizes their functional roles. In mouse hematopoietic stem cell data, KINDLE accurately predicts fate transition outcomes following knockout of two critical regulators (Gata1 and Spi1). These biological validations demonstrate our framework's dual capability in maintaining topological inference precision while preserving discovery potential for novel biological mechanisms.2025-05-14T16:13:10Z39th Conference on Neural Information Processing Systems (NeurIPS 2025)Rui PengYuchen LuQichen SunYuxing LuChi ZhangZiru LiuJinzhuo Wanghttp://arxiv.org/abs/2511.18883v1Enumeration of Autocatalytic Subsystems in Large Chemical Reaction Networks2025-11-24T08:40:05ZAutocatalysis is an important feature of metabolic networks, contributing crucially to the self-maintenance of organisms. Autocatalytic subsystems of chemical reaction networks (CRNs) are characterized in terms of algebraic conditions on submatrices of the stoichiometric matrix. Here, we derive sufficient conditions for subgraphs supporting irreducible autocatalytic systems in the bipartite König representation of the CRN. On this basis, we develop an efficient algorithm to enumerate autocatalytic subnetworks and, as a special case, autocatalytic cores, i.e., minimal autocatalytic subnetworks, in full-size metabolic networks. The same algorithmic approach can also be used to determine autocatalytic cores only. As a showcase application, we provide a complete analysis of autocatalysis in the core metabolism of E. coli and enumerate irreducible autocatalytic subsystems of limited size in full-fledged metabolic networks of E. coli, human erythrocytes, and Methanosarcina barkeri (Archea). The mathematical and algorithmic results are accompanied by software enabling the routine analysis of autocatalysis in large CRNs.2025-11-24T08:40:05Z64 Pages (40 main + 24 Supplementary Information), 15 figuresRichard GolnikThomas GatterPeter F. StadlerNicola Vassenahttp://arxiv.org/abs/2511.18626v1Learning the principles of T cell antigen discernment2025-11-23T21:50:58ZT cells are central to the adaptive immune response, capable of detecting pathogenic antigens while ignoring healthy tissues with remarkable specificity and sensitivity. Quantitatively understanding how T cell receptors (TCRs) discriminate among antigens requires biophysical models and theoretical analysis of signaling networks. Here, we review current theoretical frameworks of antigen recognition in the context of modern experimental and computational advances. Antigen potency spans a continuum and exhibits nonlinear effects within complex mixtures, challenging discrete classification and simple threshold-based models. This complexity motivates the development of models such as adaptive kinetic proofreading, which integrate both activating and inhibitory signals. Advances in high-throughput technologies now generate large-scale, quantitative datasets, enabling the refinement of such models through statistical and machine learning approaches. This convergence of theory, data, and computation promises deeper insights into immune decision-making and opens new avenues for rational immunotherapy design.2025-11-23T21:50:58ZFrançois X. P. BourassaSooraj AcharGrégoire Altan-BonnetPaul Françoishttp://arxiv.org/abs/2511.17905v1Canalization as a stabilizing principle of gene regulatory networks: a discrete dynamical systems perspective2025-11-22T04:01:46ZGene regulatory networks exhibit remarkable stability, maintaining functional phenotypes despite genetic and environmental perturbations. Discrete dynamical models, such as Boolean networks, provide systems biologists with a tractable framework to explore the mathematical underpinnings of this robustness. A key mechanism conferring stability is canalization. This perspective synthesizes historical insights, formal definitions of canalization in discrete dynamical models, quantitative measures of stability, illustrative applications, and emerging challenges at the interface of theory and experiment.2025-11-22T04:01:46Z23 pages, 6 figuresClaus Kadelkahttp://arxiv.org/abs/2511.15554v1Chemical systems with chaos2025-11-19T15:46:33ZThree-dimensional polynomial dynamical systems (DSs) can display chaos with various properties already in the quadratic case with only one or two quadratic monomials. In particular, one-wing chaos is reported in quadratic DSs with only one quadratic monomial, while two-wing and hidden chaos in quadratic DSs with only two quadratic monomials. However, none of the reported DSs can be realized with chemical reactions. To bridge this gap, in this paper, we investigate chaos in chemical dynamical systems (CDSs) - a subset of polynomial DSs that can model the dynamics of mass-action chemical reaction networks. To this end, we develop a fundamental theory for mapping polynomial DSs into CDSs of the same dimension and with a reduced number of non-linear terms. Applying this theory, we show that, under suitable robustness assumptions, quadratic CDSs, and cubic CDSs with only one cubic, can display a rich set of chaotic solutions already in three dimensions. Furthermore, we construct some relatively simple three-dimensional examples, including a quadratic CDS with one-wing chaos and three quadratics, a cubic CDS with two-wing chaos and one cubic, and a quadratic CDS with hidden chaos and five quadratics.2025-11-19T15:46:33ZTomislav Plesahttp://arxiv.org/abs/2511.14669v1Hyperbolic Graph Embeddings Reveal the Host-Pathogen Interactome2025-11-18T17:08:37ZInfections depend on interactions between pathogen and host proteins, but comprehensively mapping these interactions is challenging and labor intensive. Many biological networks have hierarchical, scale-free structure, so we developed a deep learning framework, ApexPPI, that represents protein networks in hyperbolic Riemannian space to capture these features. Our model integrates multimodal biological data (protein sequences, gene perturbation experiments, and complementary interaction networks) to predict likely interactions between pathogen and host proteins through multi-task hyperbolic graph neural networks. Mapping protein features into hyperbolic space led to much higher accuracy than previous methods in predicting host-pathogen interactions. From tens of millions of possible protein pairs, our model identified thousands of high-confidence interactions, including many involving human G-protein-coupled receptors (GPCRs). We validated dozens of these predicted complexes using AlphaFold 3 structural modeling, supporting the accuracy of our predictions. This comprehensive map of host-pathogen protein interactions provides a resource for discovering new treatments and illustrates how advanced AI can unravel complex biological systems.2025-11-18T17:08:37ZXiaoqiong XiaCesar de la Fuente-Nunez