https://arxiv.org/api/iUOnp2QQ1dOTZfNpm1VZv50++ds2026-06-14T01:57:55Z1301619515http://arxiv.org/abs/2601.20981v2Diversifying Toxicity Search in Large Language Models Through Speciation2026-04-21T09:20:29ZEvolutionary prompt search is a practical black-box approach for red teaming large language models, however existing methods often collapse onto a small family of high-performing prompts, limiting coverage of distinct failure modes. We present a speciated quality-diversity extension of \textit{ToxSearch} that maintains multiple high-toxicity prompt niches in parallel rather than optimizing a single best prompt. \textit{ToxSearch-S} introduces unsupervised prompt speciation via a search methodology that maintains capacity-limited species with exemplar leaders, a reserve pool for emerging niches, and species-aware parent selection that trades off within-niche exploitation and cross-niche exploration. Preliminary results show \textit{ToxSearch-S} reaching higher peak toxicity ($\approx 0.73$ vs.\ $\approx 0.47$) with a heavier tail (top-10 median $0.66$ vs.\ $0.45$) than the baseline. Speciation also yields broader semantic coverage under a topics-as-species analysis (higher effective topic diversity and larger unique topic coverage). Finally, species formed are well-separated in embedding space (mean separation ratio $\approx 1.93$) and exhibit distinct toxicity distributions, indicating that speciation partitions the adversarial space into behaviorally differentiated niches rather than superficial lexical variants.2026-01-28T19:29:54ZPreprint. 4 pages, Accepted at GECCO as short paperOnkar ShelarTravis Desellhttp://arxiv.org/abs/2604.18872v1Meeting times on graphs in near-cubic time2026-04-20T21:53:06ZThe expected meeting time of two random walkers on an undirected graph of size $N$, where at each time step one walker moves and the process stops when they collide, satisfies a system of $\binom{N}{2}$ linear equations. Naïvely, solving this system takes $O\left(N^{6}\right)$ operations. However, this system of linear equations has nice structure in that it is almost a Sylvester equation, with the obstruction being a diagonal absorption constraint. We give a simple algorithm for solving this system that exploits this structure, leading to $O\left(N^{4}\right)$ operations and $Θ\left(N^{2}\right)$ space for exact computation of all $\binom{N}{2}$ meeting times. While this practical method uses only standard dense linear algebra, it can be improved (in theory) to $O\left(N^{3}\log^{2}N\right)$ operations by exploiting the Cauchy structure of the diagonal correction. We generalize this result slightly to cover the Poisson equation for the absorbing "lazy" pair walk with an arbitrary source, which can be solved at the same cost, with $O\left(N^{3}\right)$ per additional source on the same graph. We conclude with applications to evolutionary dynamics, giving improved algorithms for calculating fixation probabilities and mean trait frequencies.2026-04-20T21:53:06Z11 pagesAlex McAvoyhttp://arxiv.org/abs/2604.18345v1Effect of antibiotic spectrum on the abundance of resistant bacteria in multispecies communities2026-04-20T14:42:05ZAntibiotic resistance is a major threat to global health. It emerges in multispecies microbial communities under antibiotic exposure. This makes antibiotic spectrum -- a drug's distribution of effects across species -- a potential key parameter in resistance management. However, we currently lack evolutionary theory for resistance dynamics in a multispecies setting. Analysing established community ecology theory, we develop a simple mathematical measure for how one taxon (strain or species) affects another taxon through all direct and indirect interactions in a complex interaction network. Using this, we derive the expected effects of different antibiotic spectra on the abundance of resistant taxa in microbial communities. This furthers our understanding of microbial evolutionary ecology in multispecies communities, and provides a formal theoretical basis for empirical work on optimal antibiotic choice.2026-04-20T14:42:05Z5 figuresMagnus AspenbergErik Andreas MartensKristofer Wollein Waldetofthttp://arxiv.org/abs/2604.17926v1Information on hidden birth events restores identifiability in phylodynamic inference2026-04-20T08:06:48ZThe parameters of many classes of birth-death processes cannot be inferred uniquely from phylogenetic trees: infinitely many parameter combinations yield the same distribution of phylogenetic trees. Here, we show that parameter identifiability can be recovered even for the most general cases of time-dependent rates when additional information on hidden birth events along branches of the reconstructed tree is available. This holds both for models in which individuals are sampled at a single point in time or through time at a time-dependent rate. Moreover, we prove that when mutations occur at birth - assuming two different models for the accumulation of mutations at a birth event - then information about hidden birth events is available in the sequences and thus all parameters of time-dependent birth-death models become identifiable. Thus, phylodynamic inference is identifiable whenever evolutionary models with mutation accumulation at birth (such as at speciation, transmission, or cell division) are plausible.2026-04-20T08:06:48ZTobias DieselhorstTanja Stadlerhttp://arxiv.org/abs/2506.22178v2Vegetation Patterning Can Both Impede and Trigger Critical Transitions from Savanna to Grassland2026-04-20T07:40:27ZTree-grass coexistence is a defining feature of savanna ecosystems, which play an important role in supporting biodiversity and human populations worldwide. While recent advances have clarified many of the underlying processes, how these mechanisms interact to shape ecosystem dynamics under environmental stress is not yet understood. Here, we present and analyze a minimalistic spatially extended model of tree-grass dynamics in dry savannas. We incorporate tree facilitation of grasses through shading and grass competing with trees for water, both varying with tree life stage. Our model shows that these mechanisms lead to grass-tree coexistence and bistability between savanna and grassland states. Moreover, the model predicts vegetation patterns consisting of trees and grasses, particularly under harsh environmental conditions, which can persist in situations where a non-spatial version of the model predicts ecosystem collapse from savanna to grassland instead (a phenomenon called ``Turing-evades-tipping''). Additionally, we identify a novel ``Turing-triggers-tipping'' mechanism, where unstable pattern formation drives tipping events that are overlooked when spatial dynamics are not included. These transient patterns act as early warning signals for ecosystem transitions, offering a critical window for intervention. Further theoretical and empirical research is needed to determine when spatial patterns prevent tipping or drive collapse.2025-06-27T12:43:41Z24 pages, 8 figuresEnvironmental Research Letters, 2025, Volume 20, Number 9Jelle van der VoortMara BaudenaEhud MeronMax RietkerkArjen Doelman10.1088/1748-9326/adc3abhttp://arxiv.org/abs/2507.10257v2Epidemic spread: limiting contacts to regular circles is not necessarily the safest option2026-04-18T19:04:27ZWhen a new infectious disease (or a new strain of an existing one) emerges, as in the recent COVID-19 pandemic, different types of mobility restrictions are considered to slow down or mitigate the spread of the disease. The measures to be adopted require carefully weighing the social cost against their impact on disease control. In this work, we analyze, in a context of mobility restrictions, the role of frequent versus occasional contacts in epidemic spread. We develop an individual-based mathematical model where frequent contacts among individuals (at home, work, schools) and occasional contacts (at stores, transport, etc.) are considered. We define several contact structures by varying the relative weight of frequent and occasional contacts while keeping the same initial growth rate of the epidemic. We find the remarkable result that the more frequent contacts prevail over occasional ones, the higher the epidemic peak, the sooner it occurs, and the greater the final number of individuals affected by the epidemic. We conduct our study using an SIR model, considering both exponential and deterministic recovery from infection, and obtain that this effect is more pronounced under deterministic recovery. We find that the impact of relaxation measures depends on the relative importance of frequent and occasional contacts within the considered social structures. Finally, we assess in which of the considered scenarios the homogeneous mixing approximation provides a reasonable description of the epidemic dynamics.2025-07-11T16:35:39ZSection 2.2.1 added and minor corrections made. Results unchangedJõao Gabriel Simões DelboniGabriel Fabriciushttp://arxiv.org/abs/2604.17036v1Evolution as fitness landscape navigation: Concepts, Measures, and Emerging Questions2026-04-18T15:35:30ZFitness landscapes are mappings between genotypes, phenotypes, and fitness that shape evolution. In recent years, empirical work and theoretical models have greatly advanced our understanding of how populations navigate rugged fitness landscapes. Here, we provide a timely review of this field. Its rapidly growing literature employs a wide range of terms, which are sometimes used ambiguously or inconsistently. We therefore begin by defining the major concepts and the field's vocabulary, highlighting our own terminology choices wherever needed. We then review key results on the relationships between epistasis, ruggedness, accessibility, and navigability for genotype-fitness maps, highlighting several complex and sometimes counterintuitive connections that have emerged. Further, we review how the conserved structural properties of the underlying genotype-phenotype map -- that leads to the formation of large connected neutral networks of genotypes -- influence dynamics on fitness landscapes. We then compare the two levels to study landscape navigation -- the level of the genotype-phenotype maps and the level of genotype-fitness maps. Our review leads us to propose a new measure of navigability, based on evolutionary outcomes, that is broadly applicable and overcomes limitations of existing measures. Finally, we review the smaller body of work that relaxes the common assumption of fitness-monotonic paths on static landscapes, and discuss how this can fundamentally change the nature of fitness landscape navigation. Throughout the review, we identify directions for future work to fill existing gaps and to synthesize the disparate strands of research within the field.2026-04-18T15:35:30Z27 pages, 2 figuresMalvika SrivastavaClaudia BankJoachim KrugSuman G. Dashttp://arxiv.org/abs/2604.16065v1Phase transitions in microbial lineage trees2026-04-17T13:46:34ZStatistical physics can describe the behavior of microbial populations consisting of many heterogeneous individuals. A direct consequence is the existence of phase transitions, where the behavior of a population changes discontinuously upon a small perturbation. While such phase transitions have often been proposed in biology, connecting observed behavior to the underlying physics has remained challenging. We show how phase transitions naturally arise in microbial population dynamics and highlight their connection with genealogies. We rigorously demonstrate the existence of a first-order phase transition in a model of bacterial plasmid engineering and find a strict lower bound on the number of plasmids that can be stably maintained in a population.2026-04-17T13:46:34Z11 pages, 3 figuresKaan ÖcalSyrine GhrabliMichael P. H. Stumpfhttp://arxiv.org/abs/2502.12831v3The gene's-eye view of quantitative genetics2026-04-17T08:08:27ZModelling the evolution of a continuous trait in a biological population is one of the oldest problems in evolutionary biology, which led to the birth of quantitative genetics. With the recent development of GWAS methods, it has become essential to link the evolution of the trait distribution to the underlying evolution of allelic frequencies at many loci, co-contributing to the trait value. The way most articles go about this is to make assumptions on the trait distribution, and use Wright's formula to model how the evolution of the trait translates on each individual locus. Here, we take a gene's eye-view of the system, starting from an explicit finite-loci model with selection, drift, recombination and mutation, in which the trait value is a direct product of the genome. We let the number of loci go to infinity under the assumption of strong recombination, and characterize the limit behavior of a given locus with a McKean-Vlasov SDE and the corresponding Fokker-Planck IPDE. In words, the selection on a typical locus depends on the mean behaviour of the other loci which can be approximated with the law of the focal locus. Results include the independence of two loci and explicit stationary distribution for allelic frequencies at a given locus (under some assumptions on the fitness function).2025-02-18T12:52:53Z(40 pages, 2 figures)Philibert CourauAmaury LambertEmmanuel Schertzerhttp://arxiv.org/abs/2604.20885v1From Physical Difference to Meaning: A Constructor-Theoretic Framework for Prebiotic Information in Casimir-Lifshitz-Coupled Protocell Clusters2026-04-17T07:58:08ZThis paper develops a physical framework for the prebiotic emergence of information and meaning. Building on Constructor Theory, we define information as a reproducible physical difference and meaning as a difference with stable functional consequences. Casimir-Lifshitz-coupled protocell clusters serve as a minimal model that exhibits reproducible attractors, ordered transitions, and autonomous task structures. We show that such clusters carry both informational states (e.g., distances, geometries, gradients) and meaningful states that regulate prebiotic tasks such as approach, exchange, or stabilization. This approach integrates physical mechanisms, computational mechanics, and early proto-semantic functions into a coherent account of information formation before biology.2026-04-17T07:58:08Z8 pages, 3 figures, The Eighteenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, BIOTECHNO 2026, Valencia, SpainMichael Massothhttp://arxiv.org/abs/2509.04995v2Revealing the building blocks of tree balance: fundamental units of the Sackin and Colless Indices2026-04-17T07:14:16Z(Im)balance indices can be used to quantify the (im)balance of trees by assigning numerical scores to them. An easy way to generate a new index is to construct a compound index, e.g., a linear combination of established indices. Two of the most prominent and widely used imbalance indices are the Sackin index and the Colless index. In this study, we show that these classic indices are themselves compound in nature: they can be decomposed into more elementary components that independently satisfy the defining properties of a tree (im)balance index. We further show that the difference Colless minus Sackin results in another imbalance index that is minimized (amongst others) by all Colless minimal trees. Conversely, the difference Sackin minus Colless forms a balance index. Finally, we compare the building blocks of which the Sackin and the Colless indices consist to these indices as well as to the stairs2 index, which is another index from the literature. Our results suggest that the elementary building blocks we identify are not only foundational to established indices but also valuable tools for analyzing disagreement among indices when comparing the balance of different trees. Along the way, we investigate the so-called echelon tree, which plays an important role for several (im)balance indices, and present the first non-recursive algorithm to construct it.2025-09-05T10:41:19ZLinda KnüverMareike Fischerhttp://arxiv.org/abs/2511.22736v2Bounds on the sequence length sufficient to reconstruct binary level-$1$ phylogenetic networks under the CFN model2026-04-16T11:33:57ZPhylogenetic trees and networks are graphs used to model evolutionary relationships, with trees representing strictly branching histories and networks allowing for events in which lineages merge, called reticulation events. While the question of data sufficiency has been studied extensively in the context of trees, it remains largely unexplored for networks. In this work we take a first step in this direction by establishing bounds on the amount of genomic data required to reconstruct binary level-$1$ semi-directed phylogenetic networks, which are binary networks in which reticulation events are indicated by directed edges, all other edges are undirected, and cycles are vertex-disjoint. For this class, methods have been developed recently that are statistically consistent. Roughly speaking, such methods are guaranteed to reconstruct the correct network assuming infinitely long genomic sequences. Here we consider the question whether networks from this class can be uniquely and correctly reconstructed from finite sequences. Specifically, we present an inference algorithm that takes as input genetic sequence data, and demonstrate that the sequence length sufficient to reconstruct the correct network with high probability, under the CFN model of evolution, scales logarithmically, polynomially, or polylogarithmically with the number of taxa, depending on the parameter regime. As part of our contribution, we also present novel inference rules for quartet data in the semi-directed phylogenetic network setting.2025-11-27T20:03:32ZMartin FrohnNiels HoltgrefeLeo van IerselMark JonesSteven Kelkhttp://arxiv.org/abs/2604.14483v1Synchronized disease and behavioural dynamics in weakly coupled populations2026-04-15T23:48:43ZThe spread of infectious disease is strongly influenced by social dynamics. In addition to infection risk, individuals vaccination decisions depend on prevailing social behavior: high infection levels and widespread vaccination can increase vaccine uptake, which in turn suppresses infection. This feedback can generate sustained oscillations in disease prevalence and vaccination behavior. Here, we study two such populations undergoing the same behavioral epidemiological limit cycle and introduce weak coupling between them through social influence. We show that coupling leads to synchronization of disease dynamics between the two groups. Moreover, we find that different payoff sensitivity may lead to synchronization or anti synchronization.2026-04-15T23:48:43ZXinxuan WangYoungmin ParkBryce Morskyhttp://arxiv.org/abs/2604.13963v1A generative model for bipartite gene-sharing networks2026-04-15T15:14:31ZGene-sharing networks provide a powerful framework to study the evolution of viruses and mobile genetic elements. These bipartite networks, which link genes to the genomes that contain them, exhibit characteristic degree distributions: a scale-free distribution for genes and an exponential-like decay for genomes. Here, we propose a mechanistic model that explains these patterns through fundamental evolutionary processes including horizontal gene transfer, capture of new genes, emergence of new genomes, and gene loss. Using a mean-field approximation, we derive analytical expressions for the asymptotic gene and genome degree distributions, recapitulating a power-law distribution for genes and an exponential distribution for genomes. Numerical simulations validate these predictions and yield parameter values that closely fit empirical data from dsDNA viruses, RNA viruses, and prokaryotic pangenomes. This simple model with only two parameters provides a generative framework for bipartite gene-sharing networks, offering qualitative and quantitative insights into the main evolutionary forces driving genome plasticity. Setting the gene loss rate to zero, the gene and genome degree distributions of the model closely fit the empirically observed distributions. Thus, evolution of viruses appears to be dominated by gene gain, in agreement with the results of independent reconstructions of viral evolution.2026-04-15T15:14:31Z12 pages, 5 figures, uses RevTeX4.2Jaime IranzoPedro JódarEugene V. KooninSusanna ManrubiaJosé A. Cuestahttp://arxiv.org/abs/2601.00515v3The Physics of Causation2026-04-15T12:42:09ZAssembly theory (AT) introduces causation as a material property and establishes a metrology for objects produced by evolution and selection. The physical scale of causation is quantified by the assembly index, defined as the minimum number of recursive steps necessary to make an object. Observing countable copies of high assembly index objects indicates a mechanism producing them is persistent, such that the object's environment constructs a memory that traps causation within a contingent chain. Copy number and assembly index together underlie a standardized metrology for detecting causation (assembly index) and contingency (copy number). These allow a precise definition of an assembly threshold that demarcates life (and its derivative agential, intelligent, and technological forms and artifacts) as structures with persistent copies in regimes of deep causal possibility. In introducing a fundamental concept of material causation to quantify and measure life, AT represents a departure from prior theories of causation, such as interventional ones, which have so far proven incompatible with fundamental physics. We discuss how AT's concept of causation provides the foundation for a theory of physics that allows precise and testable concept of "life", and in which novelty, contingency and the potential for open-endedness are fundamental, and determinism is emergent from selection along assembled lineages.2026-01-02T00:20:53Z65 pages, 8 Figures, 83 referencesLeroy CroninSara I. Walker