https://arxiv.org/api/VK/WujKI2feJJLSLsGEkaqROPis 2026-06-21T12:02:52Z 13029 960 15 http://arxiv.org/abs/2508.07239v1 BIGBOY1.2: Generating Realistic Synthetic Data for Disease Outbreak Modelling and Analytics 2025-08-10T08:34:05Z

Modelling disease outbreak models remains challenging due to incomplete surveillance data, noise, and limited access to standardized datasets. We have created BIGBOY1.2, an open synthetic dataset generator that creates configurable epidemic time series and population-level trajectories suitable for benchmarking modelling, forecasting, and visualisation. The framework supports SEIR and SIR-like compartmental logic, custom seasonality, and noise injection to mimic real reporting artifacts. BIGBOY1.2 can produce datasets with diverse characteristics, making it suitable for comparing traditional epidemiological models (e.g., SIR, SEIR) with modern machine learning approaches (e.g., SVM, neural networks).

2025-08-10T08:34:05Z Raunak Narwal Syed Abbas http://arxiv.org/abs/2505.01727v2 A mathematical model of human population reproduction through marriage 2025-08-10T06:57:01Z

We develop a linear one-sex dynamical model of human population reproduction through marriage. In our model, a woman may marry and divorce multiple times; however, only women who are currently married are assumed to bear children. The iterative marriage process is formulated as a three-state compartmental model, which is described by a system of McKendrick equations with a marital birth rate function that depends on the duration of marriage and the age at marriage. To examine the impact of changing nuptiality on fertility, we derive new formulas for the reproduction indices. In particular, the total fertility rate (TFR) is expressed as the product of the total marriage number and the average total marital fertility. Using Japanese vital statistics, we show that our model provides a reasonable estimate of the current TFR and its future trajectory.

2025-05-03T07:43:30Z Hisashi Inaba Shoko Konishi http://arxiv.org/abs/2508.07081v1 Treemble: A Graphical Tool to Generate Newick Strings from Phylogenetic Tree Images 2025-08-09T19:23:49Z

Phylogenetic trees are ubiquitous and central to biology, but most published trees are available only as visual diagrams and not in the machine-readable newick format. There are thus thousands of published trees in the scientific literature that are unavailable for follow-up analyses, comparisons, supertree construction, etc. Experts can easily read such diagrams, but the manual construction of a newick string is prohibitively laborious. Previous attempts to semi-automate the reading of tree images relied on image processing techniques. These quickly encounter difficulties with typical published tree diagrams that contain various graphical elements that overlap the branches, such as error bars on internal nodes. Here we introduce Treemble, a user-friendly desktop application for generating newick strings from tree images. The user simply clicks to mark node locations, and Treemble algorithmically assembles the tree from the node coordinates alone. Tip nodes can be automatically detected and marked. Treemble also facilitates the automatic reading of tip name labels and can handle both rectangular and circular trees. Treemble is a native desktop application for both MacOS and Windows, and is freely available and fully documented at treemble.org.

2025-08-09T19:23:49Z John B. Allard Sudhir Kumar http://arxiv.org/abs/2503.14625v2 Costs and benefits of phytoplankton motility 2025-08-09T04:04:06Z

The motility skills of phytoplankton have evolved and persisted over millions of years, primarily in response to factors such as nutrient and light availability, temperature and viscosity gradients, turbulence, and predation pressure. Phytoplankton motility is broadly categorized into swimming and buoyancy regulation. Despite studies in the literature exploring the motility costs and benefits of phytoplankton, there remains a gap in our integrative understanding of direct and indirect energy expenditures, starting from when an organism initiates movement due to any biophysical motive, to when the organism encounters intracellular and environmental challenges. Here we gather available pieces of this puzzle from literature in biology, physics, and oceanography to paint an overarching picture of our current knowledge. The characterization of sinking and rising behavior as passive motility has resulted in the concept of sinking and rising internal efficiency being overlooked. We define this efficiency based on any energy dissipation associated with processes of mass density adjustment, as exemplified in structures like frustules and vacuoles. We propose that sinking and rising are active motility processes involving non-visible mechanisms, as species demonstrate active and rapid strategies in response to turbulence, predation risk, and gradients of nutrients, light, temperature, and viscosity. Identifying intracellular buoyancy-regulating dissipative processes offers deeper insight into the motility costs relative to the organism's total metabolic rate.

2025-03-18T18:23:17Z 51 pages, 2 figures, 2 tables, 15 equations Peyman Fahimi Andrew J. Irwin Michael Lynch 10.13140/RG.2.2.30118.43844 http://arxiv.org/abs/2508.06747v1 Geometry of the space of phylogenetic trees with non-identical leaves 2025-08-08T23:15:00Z

Phylogenetic trees summarize evolutionary relationships. The Billera-Holmes-Vogtmann (BHV) space for comparing phylogenetic trees has many elegant mathematical properties, but it does not encompass trees with differing leaf sets. To overcome this, we introduce Towering space: a complete metric space that extends BHV space to trees with non-identical leaf sets. Towering space is a structured collection of BHV spaces connected via pruning and regrafting operations. We study the geometry of paths in Towering space and present an algorithm for computing metric distances. By addressing a major limitation of BHV space, Towering space facilitates the analysis of modern phylogenetic datasets such as multi-domain gene trees.

2025-08-08T23:15:00Z Maria Alejandra Valdez Cabrera Amy D Willis http://arxiv.org/abs/2502.13426v2 Stability of difference equations with interspecific density dependence, competition, and maturation delays 2025-08-08T06:08:16Z

A general system of difference equations is presented for multispecies communities with density dependent population growth and delayed maturity. Interspecific competition, mutualism, predation, commensalism, and amensalism are accommodated. A sufficient condition for the local asymptotic stability of a coexistence equilibrium in this system is then proven. Using this system, the generalisation of the Beverton-Holt and Leslie-Gower models of competition to multispecies systems with possible maturation delays is presented and shown to yield interesting stability properties. The stability of coexistence depends on the relative abundances of the species at the unique interior equilibrium. A sufficient condition for local stability is derived that only requires intraspecific competition to outweigh interspecific competition. The condition does not depend on maturation delays. The derived stability properties are used to develop a novel estimation approach for the coefficients of interspecific competition. This approach finds an optimal configuration given two conjectures. First, coexisting species strive to outcompete competitors. Second, persisting species are more likely in stable systems with strong dampening of perturbations and high ecological resilience. The optimal solution is compared to estimates of niche overlap using an empirical example of malaria mosquito vectors with delayed maturity in the Anopheles gambiae sensu lato species complex.

2025-02-19T04:52:30Z 18 pages, 1 figure Geoffrey R. Hosack Maud El-Hachem Nicholas J. Beeton 10.1007/s11538-025-01515-0 http://arxiv.org/abs/2508.05896v1 Optimal trap cropping investments to maximize agricultural yield 2025-08-07T23:04:07Z

Trap cropping is a pest management strategy where a grower plants an attractive "trap crop" alongside the primary crop to divert pests away from it. We propose a simple framework for optimizing the proportion of a grower's field or greenhouse allocated to a main crop and a trap crop to maximize agricultural yield. We implement this framework using a model of pest movement governed by trap crop attractiveness, the potential yield threatened by pests, and functional relationships between yield loss and pest density drawn from the literature. Focusing on a simple case in which pests move freely across the field and are attracted to traps solely by their relative attractiveness, we find that allocating 5-20 percent of the landscape to trap plants is typically required to maximize yield and achieve effective pest control in the absence of pesticides. For highly attractive trap plants, growers can devote less space because they are more effective; less attractive plants are ineffective even in large numbers. Intermediate attractiveness warrants the greatest investment in trap cropping. Our framework offers a transparent and tractable approach for exploring trade-offs in pest management and can be extended to incorporate more complex pest behaviors, crop spatial configurations, and economic considerations.

2025-08-07T23:04:07Z Matthew H Holden http://arxiv.org/abs/2508.05832v1 Identifiability of Large Phylogenetic Mixtures for Many Phylogenetic Model Structures 2025-08-07T20:19:48Z

Identifiability of phylogenetic models is a necessary condition to ensure that the model parameters can be uniquely determined from data. Mixture models are phylogenetic models where the probability distributions in the model are convex combinations of distributions in simpler phylogenetic models. Mixture models are used to model heterogeneity in the substitution process in DNA sequences. While many basic phylogenetic models are known to be identifiable, mixture models in generality have only been shown to be identifiable in certain cases. We expand the main theorem of [Rhodes, Sullivant 2012] to prove identifiability of mixture models in equivariant phylogenetic models, specifically the Jukes-Cantor, Kimura 2-parameter model, Kimura 3-parameter model and the Strand Symmetric model.

2025-08-07T20:19:48Z Bryson Kagy Seth Sullivant http://arxiv.org/abs/2508.06580v1 Actuarial Analysis of an Infectious Disease Insurance based on an SEIARD Epidemiological Model 2025-08-07T17:24:14Z

The growing number of infectious disease outbreaks, like the one caused by the SARS-CoV-2 virus, underscores the necessity of actuarial models that can adapt to epidemic-driven risks. Traditional life insurance frameworks often rely on static mortality assumptions that fail to capture the temporal and behavioral complexity of disease transmission. In this paper, we propose an integrated actuarial framework based on the SEIARD epidemiological model. This framework enables the explicit modeling of incubation periods and disease-induced mortality. We derive key actuarial quantities, including the present value of annuity benefits, payment streams, and net premiums, based on SEIARD dynamics. We formulate a prospective reserve function and analyze its evolution throughout the course of an epidemic. Additionally, we examine the forces of infection, mortality, and removal to assess their impact on epidemic-adjusted survival probabilities. Numerical simulations implemented via a nonstandard finite difference (NSFD) scheme illustrate the model's applicability under various parameter settings and insurance policy assumptions.

2025-08-07T17:24:14Z Achraf Zinihi Matthias Ehrhardt Moulay Rchid Sidi Ammi 10.1080/10920277.2026.2664588 http://arxiv.org/abs/2503.03540v3 An SIRS model with hospitalizations: economic impact by disease severity 2025-08-07T13:28:50Z

We introduce a two-timescale SIRS-type model in which a fraction $θ$ of infected individuals experiences a severe course of the disease, requiring hospitalization. During hospitalization, these individuals do not contribute to further infections. We analyze the model's equilibria, perform a bifurcation analysis, and explore its two-timescale nature (using techniques from Geometric Singular Perturbation Theory). Our main result provides an explicit expression for the value of $θ$ that maximizes the total number of hospitalized individuals for long times, revealing that this fraction can be lower than 1. This highlights the interesting effect that a severe disease, by necessitating widespread hospitalization, can indirectly suppress contagions and, consequently, reduce hospitalizations. Numerical simulations illustrate the growth in the number of hospitalizations for short times. The model can also be interpreted as a scenario where only a fraction $θ$ of infected individuals develops symptoms and self-quarantines.

2025-03-05T14:25:46Z Jacopo Borsotti http://arxiv.org/abs/2508.08302v1 Non-participant externalities reshape the evolution of altruistic punishment 2025-08-07T10:10:54Z

While voluntary participation is a key mechanism that enables altruistic punishment to emerge, its explanatory power typically rests on the common assumption that non-participants have no impact on the public good. Yet, given the decentralized nature of voluntary participation, opting out does not necessarily preclude individuals from influencing the public good. Here, we revisit the role of voluntary participation by allowing non-participants to exert either positive or negative impacts on the public good. Using evolutionary analysis in a well-mixed finite population, we find that positive externalities from non-participants lower the synergy threshold required for altruistic punishment to dominate. In contrast, negative externalities raise this threshold, making altruistic punishment harder to sustain. Notably, when non-participants have positive impacts, altruistic punishment thrives only if non-participation is incentivized, whereas under negative impacts, it can persist even when non-participation is discouraged. Our findings reveal that efforts to promote altruistic punishment must account for the active role of non-participants, whose influence can make or break collective outcomes.

2025-08-07T10:10:54Z Zhao Song Chen Shen Valerio Capraro The Anh Han http://arxiv.org/abs/2508.08301v1 Coordinating cooperation in stag-hunt game: Emergence of evolutionarily stable procedural rationality 2025-08-07T04:37:16Z

Humans are bounded rational at best and this, we argue, has worked in their favour in the hunter-gatherer society where emergence of a coordinated action, leading to cooperation, is otherwise the standard stag-hunt dilemma (when individuals are rational). In line with the fact the humans strive for developing self-reputation by having less propensity to cheat than to be cheated, we observe that the payoff structure of the stag-hunt game appropriately modifies to that of coordination-II game. Subsequently, within the paradigm of evolutionary game theory, we establish that a population -- consisting of procedural rational players (a type of bounded rationality) -- is unequivocally evolutionarily stable against emergence of more rational strategies in coordination-II game. The cooperation is, thus, shown to have been established by evolutionary forces picking less rational individuals.

2025-08-07T04:37:16Z J. Phys. Complex. 6, 035004 (2025) Joy Das Bairagya Sagar Chakraborty 10.1088/2632-072X/adf2ee http://arxiv.org/abs/2508.04649v1 Estimating breast cancer recurrence in a population-based registry in Georgia, US 2025-08-06T17:17:45Z

Although the descriptive epidemiology of primary breast cancer is well characterized in the US, breast cancer recurrence rates have not been measured in an unselected population. The number of breast cancer survivors at risk for recurrence is growing each year, so recurrence surveillance is a pressing need. We used missing data methods to impute breast cancer recurrence and estimate the risk of recurrence in the Cancer Recurrence Information and Surveillance Program (CRISP) cohort in the Georgia Cancer Registry. The imputation model was based on an internal validation substudy and indicators recorded in the registry (e.g., pathology reports, imaging claims), prognostic variables (e.g., stage at diagnosis), and characteristics associated with missing data (e.g., insurance coverage). We pooled hazard ratios (HR) and 95% Confidence Intervals (CI) across 1000 imputed datasets, adjusted for age, stage, grade, subtype, race and ethnicity, marital status, and urban/rural county at diagnosis. There were 1,606 patients with a validated outcome (75% with breast cancer recurrence) and we imputed the outcome for the remaining 23,439 patients. We estimated an overall 7.2% incidence of recurrence between at least 1 year after diagnosis and up to 5 years of follow up. When comparing the hazards pooled across imputations, we found that some patterns differed from established patterns in mortality or survival, notably by race and ethnicity, underscoring the need for continued research on the descriptive epidemiology of breast cancer recurrence. These results provide new insights into surveillance for breast cancer survivors in Georgia, especially those with higher stage and grade tumors, of Hispanic ethnicity, and who may be lacking social support.

2025-08-06T17:17:45Z Chrystelle Kiang Micah Streiff Rebecca Nash Robert H. Lyles Deirdre Cronin-Fenton Anke Huels Timothy L. Lash Kevin C. Ward http://arxiv.org/abs/2508.04187v1 Tweets vs Pathogen Spread: A Case Study of COVID-19 in American States 2025-08-06T08:15:37Z

The concept of the mutual influence that awareness and disease may exert on each other has recently presented significant challenges. The actions individuals take to prevent contracting a disease and their level of awareness can profoundly affect the dynamics of its spread. Simultaneously, disease outbreaks impact how people become aware. In response, we initially propose a null model that couples two Susceptible-Infectious-Recovered (SIR) dynamics and analyze it using a mean-field approach. Subsequently, we explore the parameter space to quantify the effects of this mutual influence on various observables. Finally, based on this null model, we conduct an empirical analysis of Twitter data related to COVID-19 and confirmed cases within American states. Our findings indicate that in specific regions of the parameter space, it is possible to suppress the epidemic by increasing awareness, and we investigate phase transitions. Furthermore, our model demonstrates the ability to alter the dominant population group by adjusting parameters throughout the course of the outbreak. Additionally, using the model, we assign a set of parameters to each state, revealing that these parameters change at different pandemic peaks. Notably, a robust correlation emerges between the ranking of states' Twitter activity, as gathered from empirical data, and the immunity parameters assigned to each state using our model. This observation underscores the pivotal role of sustained awareness transitioning from the initial to the subsequent peaks in the disease progression.

2025-08-06T08:15:37Z 16 pages, 9 figures Sara Shabani Sahar Jafarbegloo Sadegh Raeisi Fakhteh Ghanbarnejad http://arxiv.org/abs/2508.04085v1 Generalising the Central Dogma as a cross-hierarchical principle of biology 2025-08-06T05:03:31Z

The Central Dogma of molecular biology, as originally proposed by Crick, asserts that information passed into protein cannot flow back out. This principle has been interpreted as underpinning modern understandings of heredity and evolution, implying the unidirectionality of information flow from nucleic acids to proteins. Here, we propose a generalisation of the Central Dogma as a division of labour between the transmission and expression of information: the transmitter (nucleic acids) perpetuates information across generations, whereas the expressor (protein) enacts this information to facilitate the transmitter's function without itself perpetuating information. We argue that this generalisation offers two benefits. First, it provides a unifying perspective for comparing the Central Dogma to analogous divisions of labour observed at vastly different biological scales, including multicellular organisms, eukaryotic cells, organelles, and bacteria. Second, it offers a theoretical framework to explain the Central Dogma as an outcome of evolution. Specifically, we review a mathematical model suggesting that the Central Dogma originates through spontaneous symmetry breaking driven by evolutionary conflicts between different levels of selection. By reframing the Central Dogma as an informational relationship between components of a system, this generalisation underscores its broader relevance across the biological hierarchy and sheds light on its evolutionary origin.

2025-08-06T05:03:31Z 38 pages, 2 figures Nobuto Takeuchi Kunihiko Kaneko