https://arxiv.org/api/iUOnp2QQ1dOTZfNpm1VZv50++ds 2026-06-14T01:57:55Z 13016 195 15 http://arxiv.org/abs/2601.20981v2 Diversifying Toxicity Search in Large Language Models Through Speciation 2026-04-21T09:20:29Z

Evolutionary prompt search is a practical black-box approach for red teaming large language models, however existing methods often collapse onto a small family of high-performing prompts, limiting coverage of distinct failure modes. We present a speciated quality-diversity extension of \textit{ToxSearch} that maintains multiple high-toxicity prompt niches in parallel rather than optimizing a single best prompt. \textit{ToxSearch-S} introduces unsupervised prompt speciation via a search methodology that maintains capacity-limited species with exemplar leaders, a reserve pool for emerging niches, and species-aware parent selection that trades off within-niche exploitation and cross-niche exploration. Preliminary results show \textit{ToxSearch-S} reaching higher peak toxicity ($\approx 0.73$ vs.\ $\approx 0.47$) with a heavier tail (top-10 median $0.66$ vs.\ $0.45$) than the baseline. Speciation also yields broader semantic coverage under a topics-as-species analysis (higher effective topic diversity and larger unique topic coverage). Finally, species formed are well-separated in embedding space (mean separation ratio $\approx 1.93$) and exhibit distinct toxicity distributions, indicating that speciation partitions the adversarial space into behaviorally differentiated niches rather than superficial lexical variants.

2026-01-28T19:29:54Z Preprint. 4 pages, Accepted at GECCO as short paper Onkar Shelar Travis Desell http://arxiv.org/abs/2604.18872v1 Meeting times on graphs in near-cubic time 2026-04-20T21:53:06Z

The expected meeting time of two random walkers on an undirected graph of size $N$, where at each time step one walker moves and the process stops when they collide, satisfies a system of $\binom{N}{2}$ linear equations. Naïvely, solving this system takes $O\left(N^{6}\right)$ operations. However, this system of linear equations has nice structure in that it is almost a Sylvester equation, with the obstruction being a diagonal absorption constraint. We give a simple algorithm for solving this system that exploits this structure, leading to $O\left(N^{4}\right)$ operations and $Θ\left(N^{2}\right)$ space for exact computation of all $\binom{N}{2}$ meeting times. While this practical method uses only standard dense linear algebra, it can be improved (in theory) to $O\left(N^{3}\log^{2}N\right)$ operations by exploiting the Cauchy structure of the diagonal correction. We generalize this result slightly to cover the Poisson equation for the absorbing "lazy" pair walk with an arbitrary source, which can be solved at the same cost, with $O\left(N^{3}\right)$ per additional source on the same graph. We conclude with applications to evolutionary dynamics, giving improved algorithms for calculating fixation probabilities and mean trait frequencies.

2026-04-20T21:53:06Z 11 pages Alex McAvoy http://arxiv.org/abs/2604.18345v1 Effect of antibiotic spectrum on the abundance of resistant bacteria in multispecies communities 2026-04-20T14:42:05Z

Antibiotic resistance is a major threat to global health. It emerges in multispecies microbial communities under antibiotic exposure. This makes antibiotic spectrum -- a drug's distribution of effects across species -- a potential key parameter in resistance management. However, we currently lack evolutionary theory for resistance dynamics in a multispecies setting. Analysing established community ecology theory, we develop a simple mathematical measure for how one taxon (strain or species) affects another taxon through all direct and indirect interactions in a complex interaction network. Using this, we derive the expected effects of different antibiotic spectra on the abundance of resistant taxa in microbial communities. This furthers our understanding of microbial evolutionary ecology in multispecies communities, and provides a formal theoretical basis for empirical work on optimal antibiotic choice.

2026-04-20T14:42:05Z 5 figures Magnus Aspenberg Erik Andreas Martens Kristofer Wollein Waldetoft http://arxiv.org/abs/2604.17926v1 Information on hidden birth events restores identifiability in phylodynamic inference 2026-04-20T08:06:48Z

The parameters of many classes of birth-death processes cannot be inferred uniquely from phylogenetic trees: infinitely many parameter combinations yield the same distribution of phylogenetic trees. Here, we show that parameter identifiability can be recovered even for the most general cases of time-dependent rates when additional information on hidden birth events along branches of the reconstructed tree is available. This holds both for models in which individuals are sampled at a single point in time or through time at a time-dependent rate. Moreover, we prove that when mutations occur at birth - assuming two different models for the accumulation of mutations at a birth event - then information about hidden birth events is available in the sequences and thus all parameters of time-dependent birth-death models become identifiable. Thus, phylodynamic inference is identifiable whenever evolutionary models with mutation accumulation at birth (such as at speciation, transmission, or cell division) are plausible.

2026-04-20T08:06:48Z Tobias Dieselhorst Tanja Stadler http://arxiv.org/abs/2506.22178v2 Vegetation Patterning Can Both Impede and Trigger Critical Transitions from Savanna to Grassland 2026-04-20T07:40:27Z

Tree-grass coexistence is a defining feature of savanna ecosystems, which play an important role in supporting biodiversity and human populations worldwide. While recent advances have clarified many of the underlying processes, how these mechanisms interact to shape ecosystem dynamics under environmental stress is not yet understood. Here, we present and analyze a minimalistic spatially extended model of tree-grass dynamics in dry savannas. We incorporate tree facilitation of grasses through shading and grass competing with trees for water, both varying with tree life stage. Our model shows that these mechanisms lead to grass-tree coexistence and bistability between savanna and grassland states. Moreover, the model predicts vegetation patterns consisting of trees and grasses, particularly under harsh environmental conditions, which can persist in situations where a non-spatial version of the model predicts ecosystem collapse from savanna to grassland instead (a phenomenon called ``Turing-evades-tipping''). Additionally, we identify a novel ``Turing-triggers-tipping'' mechanism, where unstable pattern formation drives tipping events that are overlooked when spatial dynamics are not included. These transient patterns act as early warning signals for ecosystem transitions, offering a critical window for intervention. Further theoretical and empirical research is needed to determine when spatial patterns prevent tipping or drive collapse.

2025-06-27T12:43:41Z 24 pages, 8 figures Environmental Research Letters, 2025, Volume 20, Number 9 Jelle van der Voort Mara Baudena Ehud Meron Max Rietkerk Arjen Doelman 10.1088/1748-9326/adc3ab http://arxiv.org/abs/2507.10257v2 Epidemic spread: limiting contacts to regular circles is not necessarily the safest option 2026-04-18T19:04:27Z

When a new infectious disease (or a new strain of an existing one) emerges, as in the recent COVID-19 pandemic, different types of mobility restrictions are considered to slow down or mitigate the spread of the disease. The measures to be adopted require carefully weighing the social cost against their impact on disease control. In this work, we analyze, in a context of mobility restrictions, the role of frequent versus occasional contacts in epidemic spread. We develop an individual-based mathematical model where frequent contacts among individuals (at home, work, schools) and occasional contacts (at stores, transport, etc.) are considered. We define several contact structures by varying the relative weight of frequent and occasional contacts while keeping the same initial growth rate of the epidemic. We find the remarkable result that the more frequent contacts prevail over occasional ones, the higher the epidemic peak, the sooner it occurs, and the greater the final number of individuals affected by the epidemic. We conduct our study using an SIR model, considering both exponential and deterministic recovery from infection, and obtain that this effect is more pronounced under deterministic recovery. We find that the impact of relaxation measures depends on the relative importance of frequent and occasional contacts within the considered social structures. Finally, we assess in which of the considered scenarios the homogeneous mixing approximation provides a reasonable description of the epidemic dynamics.

2025-07-11T16:35:39Z Section 2.2.1 added and minor corrections made. Results unchanged Jõao Gabriel Simões Delboni Gabriel Fabricius http://arxiv.org/abs/2604.17036v1 Evolution as fitness landscape navigation: Concepts, Measures, and Emerging Questions 2026-04-18T15:35:30Z

Fitness landscapes are mappings between genotypes, phenotypes, and fitness that shape evolution. In recent years, empirical work and theoretical models have greatly advanced our understanding of how populations navigate rugged fitness landscapes. Here, we provide a timely review of this field. Its rapidly growing literature employs a wide range of terms, which are sometimes used ambiguously or inconsistently. We therefore begin by defining the major concepts and the field's vocabulary, highlighting our own terminology choices wherever needed. We then review key results on the relationships between epistasis, ruggedness, accessibility, and navigability for genotype-fitness maps, highlighting several complex and sometimes counterintuitive connections that have emerged. Further, we review how the conserved structural properties of the underlying genotype-phenotype map -- that leads to the formation of large connected neutral networks of genotypes -- influence dynamics on fitness landscapes. We then compare the two levels to study landscape navigation -- the level of the genotype-phenotype maps and the level of genotype-fitness maps. Our review leads us to propose a new measure of navigability, based on evolutionary outcomes, that is broadly applicable and overcomes limitations of existing measures. Finally, we review the smaller body of work that relaxes the common assumption of fitness-monotonic paths on static landscapes, and discuss how this can fundamentally change the nature of fitness landscape navigation. Throughout the review, we identify directions for future work to fill existing gaps and to synthesize the disparate strands of research within the field.

2026-04-18T15:35:30Z 27 pages, 2 figures Malvika Srivastava Claudia Bank Joachim Krug Suman G. Das http://arxiv.org/abs/2604.16065v1 Phase transitions in microbial lineage trees 2026-04-17T13:46:34Z

Statistical physics can describe the behavior of microbial populations consisting of many heterogeneous individuals. A direct consequence is the existence of phase transitions, where the behavior of a population changes discontinuously upon a small perturbation. While such phase transitions have often been proposed in biology, connecting observed behavior to the underlying physics has remained challenging. We show how phase transitions naturally arise in microbial population dynamics and highlight their connection with genealogies. We rigorously demonstrate the existence of a first-order phase transition in a model of bacterial plasmid engineering and find a strict lower bound on the number of plasmids that can be stably maintained in a population.

2026-04-17T13:46:34Z 11 pages, 3 figures Kaan Öcal Syrine Ghrabli Michael P. H. Stumpf http://arxiv.org/abs/2502.12831v3 The gene's-eye view of quantitative genetics 2026-04-17T08:08:27Z

Modelling the evolution of a continuous trait in a biological population is one of the oldest problems in evolutionary biology, which led to the birth of quantitative genetics. With the recent development of GWAS methods, it has become essential to link the evolution of the trait distribution to the underlying evolution of allelic frequencies at many loci, co-contributing to the trait value. The way most articles go about this is to make assumptions on the trait distribution, and use Wright's formula to model how the evolution of the trait translates on each individual locus. Here, we take a gene's eye-view of the system, starting from an explicit finite-loci model with selection, drift, recombination and mutation, in which the trait value is a direct product of the genome. We let the number of loci go to infinity under the assumption of strong recombination, and characterize the limit behavior of a given locus with a McKean-Vlasov SDE and the corresponding Fokker-Planck IPDE. In words, the selection on a typical locus depends on the mean behaviour of the other loci which can be approximated with the law of the focal locus. Results include the independence of two loci and explicit stationary distribution for allelic frequencies at a given locus (under some assumptions on the fitness function).

2025-02-18T12:52:53Z (40 pages, 2 figures) Philibert Courau Amaury Lambert Emmanuel Schertzer http://arxiv.org/abs/2604.20885v1 From Physical Difference to Meaning: A Constructor-Theoretic Framework for Prebiotic Information in Casimir-Lifshitz-Coupled Protocell Clusters 2026-04-17T07:58:08Z

This paper develops a physical framework for the prebiotic emergence of information and meaning. Building on Constructor Theory, we define information as a reproducible physical difference and meaning as a difference with stable functional consequences. Casimir-Lifshitz-coupled protocell clusters serve as a minimal model that exhibits reproducible attractors, ordered transitions, and autonomous task structures. We show that such clusters carry both informational states (e.g., distances, geometries, gradients) and meaningful states that regulate prebiotic tasks such as approach, exchange, or stabilization. This approach integrates physical mechanisms, computational mechanics, and early proto-semantic functions into a coherent account of information formation before biology.

2026-04-17T07:58:08Z 8 pages, 3 figures, The Eighteenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, BIOTECHNO 2026, Valencia, Spain Michael Massoth http://arxiv.org/abs/2509.04995v2 Revealing the building blocks of tree balance: fundamental units of the Sackin and Colless Indices 2026-04-17T07:14:16Z

(Im)balance indices can be used to quantify the (im)balance of trees by assigning numerical scores to them. An easy way to generate a new index is to construct a compound index, e.g., a linear combination of established indices. Two of the most prominent and widely used imbalance indices are the Sackin index and the Colless index. In this study, we show that these classic indices are themselves compound in nature: they can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. We further show that the difference Colless minus Sackin results in another imbalance index that is minimized (amongst others) by all Colless minimal trees. Conversely, the difference Sackin minus Colless forms a balance index. Finally, we compare the building blocks of which the Sackin and the Colless indices consist to these indices as well as to the stairs2 index, which is another index from the literature. Our results suggest that the elementary building blocks we identify are not only foundational to established indices but also valuable tools for analyzing disagreement among indices when comparing the balance of different trees. Along the way, we investigate the so-called echelon tree, which plays an important role for several (im)balance indices, and present the first non-recursive algorithm to construct it.

2025-09-05T10:41:19Z Linda Knüver Mareike Fischer http://arxiv.org/abs/2511.22736v2 Bounds on the sequence length sufficient to reconstruct binary level-$1$ phylogenetic networks under the CFN model 2026-04-16T11:33:57Z

Phylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-$1$ semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex-disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.

2025-11-27T20:03:32Z Martin Frohn Niels Holtgrefe Leo van Iersel Mark Jones Steven Kelk http://arxiv.org/abs/2604.14483v1 Synchronized disease and behavioural dynamics in weakly coupled populations 2026-04-15T23:48:43Z

The spread of infectious disease is strongly influenced by social dynamics. In addition to infection risk, individuals vaccination decisions depend on prevailing social behavior: high infection levels and widespread vaccination can increase vaccine uptake, which in turn suppresses infection. This feedback can generate sustained oscillations in disease prevalence and vaccination behavior. Here, we study two such populations undergoing the same behavioral epidemiological limit cycle and introduce weak coupling between them through social influence. We show that coupling leads to synchronization of disease dynamics between the two groups. Moreover, we find that different payoff sensitivity may lead to synchronization or anti synchronization.

2026-04-15T23:48:43Z Xinxuan Wang Youngmin Park Bryce Morsky http://arxiv.org/abs/2604.13963v1 A generative model for bipartite gene-sharing networks 2026-04-15T15:14:31Z

Gene-sharing networks provide a powerful framework to study the evolution of viruses and mobile genetic elements. These bipartite networks, which link genes to the genomes that contain them, exhibit characteristic degree distributions: a scale-free distribution for genes and an exponential-like decay for genomes. Here, we propose a mechanistic model that explains these patterns through fundamental evolutionary processes including horizontal gene transfer, capture of new genes, emergence of new genomes, and gene loss. Using a mean-field approximation, we derive analytical expressions for the asymptotic gene and genome degree distributions, recapitulating a power-law distribution for genes and an exponential distribution for genomes. Numerical simulations validate these predictions and yield parameter values that closely fit empirical data from dsDNA viruses, RNA viruses, and prokaryotic pangenomes. This simple model with only two parameters provides a generative framework for bipartite gene-sharing networks, offering qualitative and quantitative insights into the main evolutionary forces driving genome plasticity. Setting the gene loss rate to zero, the gene and genome degree distributions of the model closely fit the empirically observed distributions. Thus, evolution of viruses appears to be dominated by gene gain, in agreement with the results of independent reconstructions of viral evolution.

2026-04-15T15:14:31Z 12 pages, 5 figures, uses RevTeX4.2 Jaime Iranzo Pedro Jódar Eugene V. Koonin Susanna Manrubia José A. Cuesta http://arxiv.org/abs/2601.00515v3 The Physics of Causation 2026-04-15T12:42:09Z

Assembly theory (AT) introduces causation as a material property and establishes a metrology for objects produced by evolution and selection. The physical scale of causation is quantified by the assembly index, defined as the minimum number of recursive steps necessary to make an object. Observing countable copies of high assembly index objects indicates a mechanism producing them is persistent, such that the object's environment constructs a memory that traps causation within a contingent chain. Copy number and assembly index together underlie a standardized metrology for detecting causation (assembly index) and contingency (copy number). These allow a precise definition of an assembly threshold that demarcates life (and its derivative agential, intelligent, and technological forms and artifacts) as structures with persistent copies in regimes of deep causal possibility. In introducing a fundamental concept of material causation to quantify and measure life, AT represents a departure from prior theories of causation, such as interventional ones, which have so far proven incompatible with fundamental physics. We discuss how AT's concept of causation provides the foundation for a theory of physics that allows precise and testable concept of "life", and in which novelty, contingency and the potential for open-endedness are fundamental, and determinism is emergent from selection along assembled lineages.

2026-01-02T00:20:53Z 65 pages, 8 Figures, 83 references Leroy Cronin Sara I. Walker