https://arxiv.org/api/VK/WujKI2feJJLSLsGEkaqROPis2026-06-21T12:02:52Z1302996015http://arxiv.org/abs/2508.07239v1BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics2025-08-10T08:34:05ZModelling disease outbreak models remains challenging due to incomplete surveillance data, noise, and limited access to standardized datasets. We have created BIGBOY1.2, an open synthetic dataset generator that creates configurable epidemic time series and population-level trajectories suitable for benchmarking modelling, forecasting, and visualisation. The framework supports SEIR and SIR-like compartmental logic, custom seasonality, and noise injection to mimic real reporting artifacts. BIGBOY1.2 can produce datasets with diverse characteristics, making it suitable for comparing traditional epidemiological models (e.g., SIR, SEIR) with modern machine learning approaches (e.g., SVM, neural networks).2025-08-10T08:34:05ZRaunak NarwalSyed Abbashttp://arxiv.org/abs/2505.01727v2A mathematical model of human population reproduction through marriage2025-08-10T06:57:01ZWe develop a linear one-sex dynamical model of human population reproduction through marriage. In our model, a woman may marry and divorce multiple times; however, only women who are currently married are assumed to bear children. The iterative marriage process is formulated as a three-state compartmental model, which is described by a system of McKendrick equations with a marital birth rate function that depends on the duration of marriage and the age at marriage. To examine the impact of changing nuptiality on fertility, we derive new formulas for the reproduction indices. In particular, the total fertility rate (TFR) is expressed as the product of the total marriage number and the average total marital fertility. Using Japanese vital statistics, we show that our model provides a reasonable estimate of the current TFR and its future trajectory.2025-05-03T07:43:30ZHisashi InabaShoko Konishihttp://arxiv.org/abs/2508.07081v1Treemble: A Graphical Tool to Generate Newick Strings from Phylogenetic Tree Images2025-08-09T19:23:49ZPhylogenetic trees are ubiquitous and central to biology, but most published trees are available only as visual diagrams and not in the machine-readable newick format. There are thus thousands of published trees in the scientific literature that are unavailable for follow-up analyses, comparisons, supertree construction, etc. Experts can easily read such diagrams, but the manual construction of a newick string is prohibitively laborious. Previous attempts to semi-automate the reading of tree images relied on image processing techniques. These quickly encounter difficulties with typical published tree diagrams that contain various graphical elements that overlap the branches, such as error bars on internal nodes. Here we introduce Treemble, a user-friendly desktop application for generating newick strings from tree images. The user simply clicks to mark node locations, and Treemble algorithmically assembles the tree from the node coordinates alone. Tip nodes can be automatically detected and marked. Treemble also facilitates the automatic reading of tip name labels and can handle both rectangular and circular trees. Treemble is a native desktop application for both MacOS and Windows, and is freely available and fully documented at treemble.org.2025-08-09T19:23:49ZJohn B. AllardSudhir Kumarhttp://arxiv.org/abs/2503.14625v2Costs and benefits of phytoplankton motility2025-08-09T04:04:06ZThe motility skills of phytoplankton have evolved and persisted over millions of years, primarily in response to factors such as nutrient and light availability, temperature and viscosity gradients, turbulence, and predation pressure. Phytoplankton motility is broadly categorized into swimming and buoyancy regulation. Despite studies in the literature exploring the motility costs and benefits of phytoplankton, there remains a gap in our integrative understanding of direct and indirect energy expenditures, starting from when an organism initiates movement due to any biophysical motive, to when the organism encounters intracellular and environmental challenges. Here we gather available pieces of this puzzle from literature in biology, physics, and oceanography to paint an overarching picture of our current knowledge. The characterization of sinking and rising behavior as passive motility has resulted in the concept of sinking and rising internal efficiency being overlooked. We define this efficiency based on any energy dissipation associated with processes of mass density adjustment, as exemplified in structures like frustules and vacuoles. We propose that sinking and rising are active motility processes involving non-visible mechanisms, as species demonstrate active and rapid strategies in response to turbulence, predation risk, and gradients of nutrients, light, temperature, and viscosity. Identifying intracellular buoyancy-regulating dissipative processes offers deeper insight into the motility costs relative to the organism's total metabolic rate.2025-03-18T18:23:17Z51 pages, 2 figures, 2 tables, 15 equationsPeyman FahimiAndrew J. IrwinMichael Lynch10.13140/RG.2.2.30118.43844http://arxiv.org/abs/2508.06747v1Geometry of the space of phylogenetic trees with non-identical leaves2025-08-08T23:15:00ZPhylogenetic trees summarize evolutionary relationships. The Billera-Holmes-Vogtmann (BHV) space for comparing phylogenetic trees has many elegant mathematical properties, but it does not encompass trees with differing leaf sets. To overcome this, we introduce Towering space: a complete metric space that extends BHV space to trees with non-identical leaf sets. Towering space is a structured collection of BHV spaces connected via pruning and regrafting operations. We study the geometry of paths in Towering space and present an algorithm for computing metric distances. By addressing a major limitation of BHV space, Towering space facilitates the analysis of modern phylogenetic datasets such as multi-domain gene trees.2025-08-08T23:15:00ZMaria Alejandra Valdez CabreraAmy D Willishttp://arxiv.org/abs/2502.13426v2Stability of difference equations with interspecific density dependence, competition, and maturation delays2025-08-08T06:08:16ZA general system of difference equations is presented for multispecies communities with density dependent population growth and delayed maturity. Interspecific competition, mutualism, predation, commensalism, and amensalism are accommodated. A sufficient condition for the local asymptotic stability of a coexistence equilibrium in this system is then proven. Using this system, the generalisation of the Beverton-Holt and Leslie-Gower models of competition to multispecies systems with possible maturation delays is presented and shown to yield interesting stability properties. The stability of coexistence depends on the relative abundances of the species at the unique interior equilibrium. A sufficient condition for local stability is derived that only requires intraspecific competition to outweigh interspecific competition. The condition does not depend on maturation delays. The derived stability properties are used to develop a novel estimation approach for the coefficients of interspecific competition. This approach finds an optimal configuration given two conjectures. First, coexisting species strive to outcompete competitors. Second, persisting species are more likely in stable systems with strong dampening of perturbations and high ecological resilience. The optimal solution is compared to estimates of niche overlap using an empirical example of malaria mosquito vectors with delayed maturity in the Anopheles gambiae sensu lato species complex.2025-02-19T04:52:30Z18 pages, 1 figureGeoffrey R. HosackMaud El-HachemNicholas J. Beeton10.1007/s11538-025-01515-0http://arxiv.org/abs/2508.05896v1Optimal trap cropping investments to maximize agricultural yield2025-08-07T23:04:07ZTrap cropping is a pest management strategy where a grower plants an attractive "trap crop" alongside the primary crop to divert pests away from it. We propose a simple framework for optimizing the proportion of a grower's field or greenhouse allocated to a main crop and a trap crop to maximize agricultural yield. We implement this framework using a model of pest movement governed by trap crop attractiveness, the potential yield threatened by pests, and functional relationships between yield loss and pest density drawn from the literature. Focusing on a simple case in which pests move freely across the field and are attracted to traps solely by their relative attractiveness, we find that allocating 5-20 percent of the landscape to trap plants is typically required to maximize yield and achieve effective pest control in the absence of pesticides. For highly attractive trap plants, growers can devote less space because they are more effective; less attractive plants are ineffective even in large numbers. Intermediate attractiveness warrants the greatest investment in trap cropping. Our framework offers a transparent and tractable approach for exploring trade-offs in pest management and can be extended to incorporate more complex pest behaviors, crop spatial configurations, and economic considerations.2025-08-07T23:04:07ZMatthew H Holdenhttp://arxiv.org/abs/2508.05832v1Identifiability of Large Phylogenetic Mixtures for Many Phylogenetic Model Structures2025-08-07T20:19:48ZIdentifiability of phylogenetic models is a necessary condition to ensure that the model parameters can be uniquely determined from data. Mixture models are phylogenetic models where the probability distributions in the model are convex combinations of distributions in simpler phylogenetic models. Mixture models are used to model heterogeneity in the substitution process in DNA sequences. While many basic phylogenetic models are known to be identifiable, mixture models in generality have only been shown to be identifiable in certain cases. We expand the main theorem of [Rhodes, Sullivant 2012] to prove identifiability of mixture models in equivariant phylogenetic models, specifically the Jukes-Cantor, Kimura 2-parameter model, Kimura 3-parameter model and the Strand Symmetric model.2025-08-07T20:19:48ZBryson KagySeth Sullivanthttp://arxiv.org/abs/2508.06580v1Actuarial Analysis of an Infectious Disease Insurance based on an SEIARD Epidemiological Model2025-08-07T17:24:14ZThe growing number of infectious disease outbreaks, like the one caused by the SARS-CoV-2 virus, underscores the necessity of actuarial models that can adapt to epidemic-driven risks. Traditional life insurance frameworks often rely on static mortality assumptions that fail to capture the temporal and behavioral complexity of disease transmission. In this paper, we propose an integrated actuarial framework based on the SEIARD epidemiological model. This framework enables the explicit modeling of incubation periods and disease-induced mortality. We derive key actuarial quantities, including the present value of annuity benefits, payment streams, and net premiums, based on SEIARD dynamics. We formulate a prospective reserve function and analyze its evolution throughout the course of an epidemic. Additionally, we examine the forces of infection, mortality, and removal to assess their impact on epidemic-adjusted survival probabilities. Numerical simulations implemented via a nonstandard finite difference (NSFD) scheme illustrate the model's applicability under various parameter settings and insurance policy assumptions.2025-08-07T17:24:14ZAchraf ZinihiMatthias EhrhardtMoulay Rchid Sidi Ammi10.1080/10920277.2026.2664588http://arxiv.org/abs/2503.03540v3An SIRS model with hospitalizations: economic impact by disease severity2025-08-07T13:28:50ZWe introduce a two-timescale SIRS-type model in which a fraction $θ$ of infected individuals experiences a severe course of the disease, requiring hospitalization. During hospitalization, these individuals do not contribute to further infections. We analyze the model's equilibria, perform a bifurcation analysis, and explore its two-timescale nature (using techniques from Geometric Singular Perturbation Theory). Our main result provides an explicit expression for the value of $θ$ that maximizes the total number of hospitalized individuals for long times, revealing that this fraction can be lower than 1. This highlights the interesting effect that a severe disease, by necessitating widespread hospitalization, can indirectly suppress contagions and, consequently, reduce hospitalizations. Numerical simulations illustrate the growth in the number of hospitalizations for short times. The model can also be interpreted as a scenario where only a fraction $θ$ of infected individuals develops symptoms and self-quarantines.2025-03-05T14:25:46ZJacopo Borsottihttp://arxiv.org/abs/2508.08302v1Non-participant externalities reshape the evolution of altruistic punishment2025-08-07T10:10:54ZWhile voluntary participation is a key mechanism that enables altruistic punishment to emerge, its explanatory power typically rests on the common assumption that non-participants have no impact on the public good. Yet, given the decentralized nature of voluntary participation, opting out does not necessarily preclude individuals from influencing the public good. Here, we revisit the role of voluntary participation by allowing non-participants to exert either positive or negative impacts on the public good. Using evolutionary analysis in a well-mixed finite population, we find that positive externalities from non-participants lower the synergy threshold required for altruistic punishment to dominate. In contrast, negative externalities raise this threshold, making altruistic punishment harder to sustain. Notably, when non-participants have positive impacts, altruistic punishment thrives only if non-participation is incentivized, whereas under negative impacts, it can persist even when non-participation is discouraged. Our findings reveal that efforts to promote altruistic punishment must account for the active role of non-participants, whose influence can make or break collective outcomes.2025-08-07T10:10:54ZZhao SongChen ShenValerio CapraroThe Anh Hanhttp://arxiv.org/abs/2508.08301v1Coordinating cooperation in stag-hunt game: Emergence of evolutionarily stable procedural rationality2025-08-07T04:37:16ZHumans are bounded rational at best and this, we argue, has worked in their favour in the hunter-gatherer society where emergence of a coordinated action, leading to cooperation, is otherwise the standard stag-hunt dilemma (when individuals are rational). In line with the fact the humans strive for developing self-reputation by having less propensity to cheat than to be cheated, we observe that the payoff structure of the stag-hunt game appropriately modifies to that of coordination-II game. Subsequently, within the paradigm of evolutionary game theory, we establish that a population -- consisting of procedural rational players (a type of bounded rationality) -- is unequivocally evolutionarily stable against emergence of more rational strategies in coordination-II game. The cooperation is, thus, shown to have been established by evolutionary forces picking less rational individuals.2025-08-07T04:37:16ZJ. Phys. Complex. 6, 035004 (2025)Joy Das BairagyaSagar Chakraborty10.1088/2632-072X/adf2eehttp://arxiv.org/abs/2508.04649v1Estimating breast cancer recurrence in a population-based registry in Georgia, US2025-08-06T17:17:45ZAlthough the descriptive epidemiology of primary breast cancer is well characterized in the US, breast cancer recurrence rates have not been measured in an unselected population. The number of breast cancer survivors at risk for recurrence is growing each year, so recurrence surveillance is a pressing need. We used missing data methods to impute breast cancer recurrence and estimate the risk of recurrence in the Cancer Recurrence Information and Surveillance Program (CRISP) cohort in the Georgia Cancer Registry. The imputation model was based on an internal validation substudy and indicators recorded in the registry (e.g., pathology reports, imaging claims), prognostic variables (e.g., stage at diagnosis), and characteristics associated with missing data (e.g., insurance coverage). We pooled hazard ratios (HR) and 95% Confidence Intervals (CI) across 1000 imputed datasets, adjusted for age, stage, grade, subtype, race and ethnicity, marital status, and urban/rural county at diagnosis. There were 1,606 patients with a validated outcome (75% with breast cancer recurrence) and we imputed the outcome for the remaining 23,439 patients. We estimated an overall 7.2% incidence of recurrence between at least 1 year after diagnosis and up to 5 years of follow up. When comparing the hazards pooled across imputations, we found that some patterns differed from established patterns in mortality or survival, notably by race and ethnicity, underscoring the need for continued research on the descriptive epidemiology of breast cancer recurrence. These results provide new insights into surveillance for breast cancer survivors in Georgia, especially those with higher stage and grade tumors, of Hispanic ethnicity, and who may be lacking social support.2025-08-06T17:17:45ZChrystelle KiangMicah StreiffRebecca NashRobert H. LylesDeirdre Cronin-FentonAnke HuelsTimothy L. LashKevin C. Wardhttp://arxiv.org/abs/2508.04187v1Tweets vs Pathogen Spread: A Case Study of COVID-19 in American States2025-08-06T08:15:37ZThe concept of the mutual influence that awareness and disease may exert on each other has recently presented significant challenges. The actions individuals take to prevent contracting a disease and their level of awareness can profoundly affect the dynamics of its spread. Simultaneously, disease outbreaks impact how people become aware. In response, we initially propose a null model that couples two Susceptible-Infectious-Recovered (SIR) dynamics and analyze it using a mean-field approach. Subsequently, we explore the parameter space to quantify the effects of this mutual influence on various observables. Finally, based on this null model, we conduct an empirical analysis of Twitter data related to COVID-19 and confirmed cases within American states. Our findings indicate that in specific regions of the parameter space, it is possible to suppress the epidemic by increasing awareness, and we investigate phase transitions. Furthermore, our model demonstrates the ability to alter the dominant population group by adjusting parameters throughout the course of the outbreak. Additionally, using the model, we assign a set of parameters to each state, revealing that these parameters change at different pandemic peaks. Notably, a robust correlation emerges between the ranking of states' Twitter activity, as gathered from empirical data, and the immunity parameters assigned to each state using our model. This observation underscores the pivotal role of sustained awareness transitioning from the initial to the subsequent peaks in the disease progression.2025-08-06T08:15:37Z16 pages, 9 figuresSara ShabaniSahar JafarbeglooSadegh RaeisiFakhteh Ghanbarnejadhttp://arxiv.org/abs/2508.04085v1Generalising the Central Dogma as a cross-hierarchical principle of biology2025-08-06T05:03:31ZThe Central Dogma of molecular biology, as originally proposed by Crick, asserts that information passed into protein cannot flow back out. This principle has been interpreted as underpinning modern understandings of heredity and evolution, implying the unidirectionality of information flow from nucleic acids to proteins. Here, we propose a generalisation of the Central Dogma as a division of labour between the transmission and expression of information: the transmitter (nucleic acids) perpetuates information across generations, whereas the expressor (protein) enacts this information to facilitate the transmitter's function without itself perpetuating information. We argue that this generalisation offers two benefits. First, it provides a unifying perspective for comparing the Central Dogma to analogous divisions of labour observed at vastly different biological scales, including multicellular organisms, eukaryotic cells, organelles, and bacteria. Second, it offers a theoretical framework to explain the Central Dogma as an outcome of evolution. Specifically, we review a mathematical model suggesting that the Central Dogma originates through spontaneous symmetry breaking driven by evolutionary conflicts between different levels of selection. By reframing the Central Dogma as an informational relationship between components of a system, this generalisation underscores its broader relevance across the biological hierarchy and sheds light on its evolutionary origin.2025-08-06T05:03:31Z38 pages, 2 figuresNobuto TakeuchiKunihiko Kaneko