https://arxiv.org/api/lvFrc2POZqc+I9R+WK0DWq56EgA2026-06-13T17:20:31Z130167515http://arxiv.org/abs/2605.21859v1PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference2026-05-21T01:13:46ZPhylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a hybrid flow-matching model that learns posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topology transitions. We evaluate the learned geometry operationally: if the flow reaches posterior-relevant regions, finite-budget Bayesian refinement initialized from, or guided by, its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 phylogenetic posterior benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to classical initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while split-guided PhylaFlow-MCMC obtains the strongest hard-case results. The best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight under the same refinement budget. In a joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.2026-05-21T01:13:46Z9 pages, 3 figuresYasha EktefaieLeo CuiShrey JainMarinka ZitnikPardis Sabetihttp://arxiv.org/abs/2605.21787v1Drivers of Transient Dynamics and Persistence in Dengue: Insights from Sensitivity and Stochastic Modeling2026-05-20T22:27:48ZWe investigate how key epidemiological parameters shape both seasonal epidemics and the persistence of dengue transmission. Our findings confirm known mechanistic drivers of epidemic variability and introduce a ranking of parameter importance in our dengue model, which in turn informs the prioritization of public health policies. We propose a stochastic vector-host model with waning immunity, exogenous infection, and vertical transmission. To assess parameter influence, we first qualitatively analyze the macroscopic model. We then perform a multivariate Sobol sensitivity analysis of epidemic summary statistics, and examine the variance of the endemic equilibrium as a function of model parameters. We show that the macroscopic model is well posed, vertical transmission lowers the threshold for persistence, and low spatial coupling increases infectious endemic equilibria. The vector-host population ratio and host recovery rate have the largest first-order and total sensitivity indices, surpassing the contact rates; this implies that control measures during seasonal dengue should prioritize protecting infectious hosts from mosquito bites. Finally, we show that the covariance of hosts and vectors at the endemic equilibrium is asynchronous in the contact-rate plane. This robust pattern has epidemiological, ecological and evolutive interpretations. A dengue strain has two niches to exploit during the endemic regime, and coexisting strain have two niches each. Moreover, large fluctuations in a given strain during the endemic regime provide a mechanistic explanation for high vertical transmission, enabling viral reservoirs that can hatch and trigger outbreaks in the following season. We argue that our model and results can be adapted to address specific public health questions to guide dengue control using field data.2026-05-20T22:27:48ZCesar Alberto Rosales-AlcantarMarcos A. Capistránhttp://arxiv.org/abs/2405.17032v4Exact phylodynamic likelihood via structured Markov genealogy processes2026-05-20T21:34:15ZWe show that each member of a broad class of Markovian population models induces a unique stochastic process on the space of genealogies. We construct this genealogy process and derive exact expressions for the likelihood of an observed genealogy in terms of a filter equation, the structure of which is completely determined by the population model. We show that existing phylodynamic methods based on the coalescent and linear birth-death processes are special cases. We derive some properties of filter equations and describe a class of algorithms that can be used to numerically solve them. Importantly, because these algorithms rely only on simulation of the population model, they retain the plug-and-play property upon which simulation-based inference depends. Our results open the door to statistically efficient likelihood-based phylodynamic inference for a much wider class of models than is currently possible.2024-05-27T10:39:18ZAaron A. KingQianying LinEdward L. Ionideshttp://arxiv.org/abs/2501.17622v2Likelihood landscape of binary latent model on a tree2026-05-20T21:29:21ZWe investigate the optimization landscape of maximum likelihood estimation (MLE) for the Cavender-Farris-Neyman (CFN) model, a two-state latent tree model fundamental to statistical phylogenetics and the ferromagnetic Ising model. Although the log-likelihood function is non-concave and may admit many critical points, simple coordinate maximization algorithms are remarkably effective in practice. We provide the first theoretical justification for this success. We prove that sufficiently deep inside the reconstruction regime, the population log-likelihood is strongly concave and smooth within a box around the true parameter, whose size is independent of tree topology and number of leaves. This fundamental result implies that the empirical landscape shares these regularity properties with high probability given polynomial sample complexity and also that coordinate maximization converges exponentially fast to an $O(1/\sqrt{m})$-consistent MLE. Our analysis centers on a novel decay property of the population Hessian: diagonal entries remain large while off-diagonal entries decay exponentially with graph distance. These results provide rigorous theoretical evidence for the efficacy of likelihood-based tree inference and suggest broader principles for latent variable models.2025-01-29T12:54:55Z59 pages, 8 figuresDavid ClancyHanbaek LyuSebastien Rochhttp://arxiv.org/abs/2605.21725v1Regularizing and Normalizing DAGs and Phylogenetic Networks2026-05-20T20:33:54ZPhylogenetic networks and, more generally, directed acyclic graphs (DAGs) represent hierarchical structure beyond trees, for instance in the presence of reticulate evolutionary events such as hybridization or horizontal gene transfer. A central question is which parts of such graphs are essential with respect to leaf-observable information, and which parts can be removed without changing this information. Resolving this question can lead to principled simplification methods for phylogenetic networks, such as the recent normalization approach of Francis et al.
In this paper, we study this question from three related perspectives: clusters displayed by a DAG $G$, least common ancestors (LCAs) of subsets of its leaf set, and visibility, a path-based property of vertices. We first introduce an LCA-based simplification procedure called $i$-regularization. For a DAG $G$ and $i\geq 1$, the DAG $\reg_i(G)$ retains precisely those vertices that occur as unique LCAs of leaf subsets of size at most $i$, removes the remaining non-leaf vertices by a graph-editing operation $\ominus$, and then deletes shortcuts. We show that $\reg_i(G)$ preserves all such LCAs, is $i$-lca-relevant, and admits a cluster-level description: it is regular, i.e., isomorphic to the Hasse diagram of the corresponding lca-clusters.
We then compare LCA-based regularization with normalization. Using the same $\ominus$-operator, we describe the cover construction underlying normalization, identify visible vertices that are nevertheless removed, and characterize when regularization and normalization coincide. Together, these results provide a unified framework for cluster-based, LCA-based, and visibility-based simplifications of DAGs and phylogenetic networks.2026-05-20T20:33:54ZMarc HellmuthAnna LindebergVincent Moultonhttp://arxiv.org/abs/2605.21129v1How hate spreads online and why it returns: Re-entrant phases driven by collective behavior2026-05-20T13:01:56ZThe 2025 Bondi Beach mass-shooting was perpetrated by individuals inspired by ISIS (Islamic State) propaganda that increasingly featured anti-Semitic hate content following the October 2023 start of the Israel-Palestine war. Similar stories hold for other types of hate attacks, e.g. against Muslims on May 18, 2026. There is an urgent need to get ahead of future threats by understanding how and when a newly created piece of hate content will spread system-wide online. We present a two-species coalescence-fragmentation model with Susceptible-Infected-Recovered dynamics that incorporates the following published empirical features: (1) New pieces of hate content tend to be generated and promoted by a subset of in-built communities on less regulated platforms. (2) These `hate' communities create links (hyperlinks) with each other and with non-hate communities across all platforms to form dynamically evolving clusters (i.e. coalescence) across which new hate content can then spread. (3) These clusters can get broken up by moderator shutdowns (i.e. fragmentation). We present numerical solutions and derive two levels of approximate mean-field theory: Effective Medium Theory (EMT) and Beyond Effective Medium Theory (BEMT). Both numerical and analytic solutions reveal that system-wide spreading is governed by re-entrant threshold phases: as the fraction of hate communities varies, the system can transition from spreading to no-spreading and back to spreading. The derived analytic formulae give explicit insight into how these phase boundaries might be manipulated to prevent system-wide spreading. More broadly, the re-entrant phase behavior warns that policies which steadily reduce the number of hate communities can initially succeed but then backfire if pushed further, suggesting that blanket requirements for platforms to simply do `more' are over-simplistic.2026-05-20T13:01:56Zearlier draft of published paperPhys Rev E May 20 2026Chen XuPak Ming HuiChenkai XiaNeil F. Johnson10.1103/pghw-mmzzhttp://arxiv.org/abs/2605.20692v1Inferring infectiousness: a joint model of the within-host viral kinetics of SARS-CoV-22026-05-20T04:39:53ZDuring an infectious disease outbreak, providing accurate answers to policy questions about transmission requires a detailed model of the natural history of infectiousness. Unfortunately, direct measures of infectiousness are generally unavailable. Instead, we often rely on indirect proxies, such as viral load measured by PCR or antigen tests, viral culture to detect replication-competent virus, or symptom onset, each of which reflects different aspects of viral dynamics or host response. However, these proxies vary in terms of the ease of collection, scalability, and their relationship to viral shedding and therefore underlying infectiousness. Here, we use data from five prospective, densely sampled cohorts with longitudinal data on multiple proxies of viral shedding for approximately 2,000 infections to develop a Bayesian joint model for the within-host viral kinetics of SARS-CoV-2 infection. Modeling the joint distribution allows us to infer the trajectory of infectious virus shedding -- the most direct correlate of infectiousness -- for individuals who contribute only PCR data, and to compute derived quantities that are inaccessible from any single proxy alone. These include the population-level probability and expected duration of ongoing infectiousness as a function of time since diagnosis, stratified by variant, vaccination status, and infection history; the residual risk of releasing an individual from isolation; and personalized, real-time estimates of infectiousness that are sequentially updated as new test results become available.2026-05-20T04:39:53ZChristopher B. BoyerStephen M. KisslerSeran HakkiJakob JonnerbyAjit LalvaniMarc Lipsitchhttp://arxiv.org/abs/2508.17599v2Decoding species coexistence: A reinforcement learning perspective2026-05-20T00:18:20ZA central goal in ecology is to understand how biodiversity is maintained. Previous theoretical works have employed the rock-paper-scissors (RPS) game as a toy model, demonstrating that population mobility is crucial in determining the species' coexistence. One key prediction is that biodiversity is jeopardized and eventually lost when mobility exceeds a certain value--a conclusion at odds with empirical observations of highly mobile species coexisting in nature. To address this discrepancy, we introduce a reinforcement learning framework and study a spatial RPS model, where individual mobility is adaptively regulated via a Q-learning algorithm rather than held fixed. Our results show that all three species can coexist stably, with extinction probabilities remaining low across a broad range of baseline migration rates. Mechanistic analysis reveals that individuals develop two behavioral tendencies: survival priority (escaping from predators) and predation priority (remaining near prey). While species coexistence emerges from the balance of the two tendencies, their imbalance jeopardizes biodiversity. Notably, there is a symmetry-breaking of action preference in a particular state that is responsible for the divergent species densities. Furthermore, when Q-learning species interact with fixed-mobility counterparts, those with adaptive mobility exhibit a significant evolutionary advantage. Our study suggests that reinforcement learning may offer a promising new perspective for uncovering the mechanisms of biodiversity and informing conservation strategies.2025-08-25T01:54:11Z13 pages, 11 figuresPhys. Rev. E 113, 054411 (2026), Editors' SuggestionKaiwen JiangChenyang ZhaoShengfeng DengWeiran CaiJiqiang ZhangLi Chenhttp://arxiv.org/abs/2602.04150v2A brief review of evolutionary game dynamics in the reinforcement learning paradigm2026-05-20T00:11:07ZCooperation, fairness, trust, and resource coordination are cornerstones of modern civilization, yet their emergence remains inadequately explained by the persistent discrepancies between theoretical predictions and behavioral experiments. Part of this gap may arise from the imitation learning paradigm commonly used in prior theoretical models, which assumes individuals merely copy successful neighbors according to predetermined, fixed rules. This review examines recent advances in evolutionary game dynamics that employ reinforcement learning (RL) as an alternative paradigm. In RL, individuals learn through trial and error and introspectively refine their strategies based on environmental feedback. We begin by introducing key concepts in evolutionary game theory and the two learning paradigms, then synthesize progress in applying RL to elucidate cooperation, trust, fairness, optimal resource coordination, and ecological dynamics. Collectively, these studies indicate that RL offers a promising unified framework for understanding the diverse social and ecological phenomena observed in human and natural systems.2026-02-04T02:37:25Z27 pages, 7 figures, invited reviewCommunications in Theoretical Physics 78, 067601 (2026)Guozhong ZhengXin OuShengfeng DengJiqiang ZhangLi Chenhttp://arxiv.org/abs/2605.20103v1Face morphometric profiles of groups as early markers for certain diseases?2026-05-19T16:55:32ZBackground: Face morphometry has been shown to work as a diagnosis tool in a set of syndromes. Face similarities are usually indications of more complete genetic similarities. Purpose: To show preliminary results on the face morphometry profile of the Cuban population and to argue that it could be used to define early markers for diseases, like Alzheimer. Methods: A dataset composed of photos of 200000 men is processed. Facial landmarks are extracted by means of the DLIB library and distances between them are computed. By clustering samples with similar facial traits, groups are formed and their densities inside the population are computed. Results: The face morphometry profiles for two age cohorts are obtained, showing the population dynamics. Genes involved in facial development are shown to be related to Alzheimer's disease. Conclusions: Late multifactorial diseases develop against the genetic background of each individual, which is expressed by its face morphometry. The latter can be thus considered a risk marker.2026-05-19T16:55:32ZInt J Oral Craniofac Sci 9(2): 008-015 (2023)Roberto HerreroYoanna Martinez-DiazHeydi Mendez-VazquezJoan NievesAugusto Gonzalez10.17352/2455-4634.000060http://arxiv.org/abs/2605.19962v1Computing the Arc-Deletion Distance to Orchard Networks is NP-hard2026-05-19T15:14:30ZPhylogenetic networks generalize phylogenetic trees by allowing reticulate evolutionary events such as horizontal gene transfer and hybridization. Among the many subclasses of phylogenetic networks, orchard networks have attracted increasing attention due to their structural and algorithmic properties. In this paper, we study the arc-deletion distance to orchard networks, defined as the minimum number of reticulate arcs whose deletion transforms a phylogenetic network into an orchard network. We prove that computing this distance is NP-hard via a polynomial-time reduction from the Degree-3 Vertex Cover problem. Our result establishes the computational intractability of this proximity measure and contributes to the complexity theory of phylogenetic network transformations.2026-05-19T15:14:30Z20pages, 5 figuresPeng LiZhiwei LiuYangjing Longhttp://arxiv.org/abs/2210.09286v2An interacting particle system for the front of an epidemic advancing through a susceptible population2026-05-19T10:26:58ZWe introduce an interacting particle system that models the spread of an epidemic in terms of heterogeneous diffusive dynamics, rather than exogenous contact and transmission rates at the population level as in classical compartmental models. Each individual has a one-dimensional level of shielding that evolves according to a stochastic differential equation reflected at the advancing front of the epidemic. The front is driven by cumulative infections, and collisions with it represent at-risk situations which may lead to infection depending on a non-Markovian mechanism that involves the local time, the intrinsic transmissibility, and the current contagiousness within the population. We give a rigorous construction of the system and develop two key technical tools: a compensated martingale property for the infected proportion and a general result on how local time transforms under a random time-dependent bijection of the state space. The former yields a decomposition of the expected number of new infections that parallels a corresponding decomposition in the SIR model. The latter allows us to represent the law of each particle, after suitable conditioning, as a generalised elastic Brownian motion with drift.2022-10-17T17:45:50Z38 pages, 3 figuresEliana FaustiAndreas Sojmarkhttp://arxiv.org/abs/2605.19333v1Deep-time consistency in proteome elemental composition across cellular and viral life2026-05-19T04:14:11ZProteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable proteomes. Here, we analyze the elemental composition of thousands of proteomes spanning cellular domains and viral realms. Despite evolutionary divergence and orders-of-magnitude variation in proteome size and gene content, proteomes exhibit strikingly consistent elemental composition. This consistency is substantially more constrained than amino acid frequencies or physicochemical properties and is not explained by evolutionary relatedness, biological function, or amino acid usage alone. Viral proteomes occupy the same elemental composition space observed in cellular organisms despite the absence of a single viral common ancestor, suggesting common biochemical constraints shape proteome organization across life. To investigate the evolutionary origins of this pattern, we compare modern proteomes with multiple independent reconstructions of the Last Universal Common Ancestor (LUCA) and with synthetic reduced-alphabet proteomes generated from primordial amino acid alphabets. LUCA proteomes occupy the same constrained elemental composition space observed in modern Bacteria and Archaea, whereas reduced primordial-like alphabets systematically generated alternative elemental regimes outside the modern range despite retaining high sequence similarity to extant proteins. Reduced alphabets disrupt fold space and reorganize relationships between elemental composition and predicted protein structural organization. Our results suggest that constrained elemental composition represents a fundamental organizational property of proteomes, which emerged early in evolution and may have contributed to the selection and stabilization of the modern amino acid alphabet.2026-05-19T04:14:11ZL. Felipe BenitesLouie SlocombeSara I. Walkerhttp://arxiv.org/abs/2605.18571v1Incorporating vaccine effects into epidemiological models: common pitfalls and solutions2026-05-18T15:50:40ZIncorporating vaccination into mathematical models appears deceptively simple: models integrate vaccine-derived protections, such as reduced susceptibility to infection, using parameters informed by empirical estimates of vaccine efficacy or effectiveness (VE). In practice, however, empirical VE estimates often do not correspond directly to the parameters of epidemiological models. Here, we extend previous work to demonstrate that in order to accurately parameterize a model, one must consider both a vaccine's mechanism of action and the statistic used to infer VE from empirical data. When a vaccine confers leaky protection -- that is, vaccination partially rather than completely reduces individual infection risk -- we show that common empirical VE estimation methods do not provide directly applicable values for model parameters. Naive (i.e. direct) incorporation of these VE estimates into models results in an underestimate of population-level vaccine impact. To make progress when these estimates are the only available sources for VE, we introduce a parameterization approach which more accurately aligns the modeled effect of vaccination with empirical estimates. Under this adjusted parameterization approach, models predict fewer total infections and lower herd immunity thresholds for leaky vaccines than would be predicted under current parameterization practices. Our parameterization guidelines and adjustment approach can be used to improve accuracy in models that are used in vaccine decision making and public health planning.2026-05-18T15:50:40ZCasey E. MiddletonOliver EalesJames M. McCawFreya M. Shearerhttp://arxiv.org/abs/2605.17975v1M-SDT: A modelling framework for dengue transmission, forecasting, and intervention strategies in Ahmedabad Municipal Corporation2026-05-18T07:30:00ZDengue fever poses a persistent public health challenge in rapidly urbanizing Indian cities such as Ahmedabad, where spatial heterogeneity and seasonal variability complicate forecasting and control. In this study, we develop a data-driven compartmental framework to simulate transmission dynamics, generate forecasts, and evaluate intervention strategies across the Ahmedabad Municipal Corporation (AMC). We employ a Mechanistic Seasonal Dengue Transmission (M-SDT) model that incorporates symptomatic and asymptomatic infections. We calibrated the proposed model using zone-wise dengue case data during 2020--2024. Parameter uncertainty is rigorously quantified using a bootstrap sampling framework with negative binomial noise. The calibrated model reveals pronounced spatial heterogeneity across AMC zones, with persistent hotspots and distinct transmission regimes. Forecasts for 2026--2028 indicate continued endemic circulation with moderate inter-annual variability. Sensitivity analysis identifies the mosquito biting rate and vector mortality as dominant drivers of long-term disease burden, highlighting the central role of vector ecology in shaping epidemic outcomes. Evaluating seasonal vector control strategies shows a notable difference in operation; periodic fogging has a cumulative effect over the years, while sustained residual spraying can quickly curb outbreaks and decrease incidence by over 80%. The zone-wise analysis reveals that the mosquito-to-human ratio governs not only the baseline outbreak potential but also each zone's responsiveness to control strategies. Overall, the M-SDT modelling framework enables reconstruction of unobserved dynamics, rigorous uncertainty quantification, and evaluation of targeted, zone-specific interventions, underscoring the importance of integrating fine-scale surveillance data with mechanistic modelling for adaptive urban dengue control.2026-05-18T07:30:00Z38 pages, 17 figuresSourav RoyRajendra GadhaviBhavin SolankiChirag ShahRaj C. SharmaIndrajit Ghosh