Overview
- Incomplete lineage sorting (ILS) occurs when ancestral genetic polymorphisms persist through rapid speciation events, causing individual gene trees to differ from the true species tree — a phenomenon that is expected under coalescent theory whenever the time between successive speciation events is short relative to effective population size.
- The human-chimpanzee-gorilla trichotomy is the most extensively studied case: although humans are most closely related to chimpanzees at the species level, approximately 30 percent of the genome yields gene trees that group humans with gorillas or chimpanzees with gorillas, reflecting ancestral polymorphism that was not fully sorted before the human-chimp and gorilla lineages diverged.
- Multispecies coalescent methods, which model gene trees as random draws from a species-tree distribution, now provide the standard statistical framework for inferring species relationships in the presence of ILS, enabling accurate phylogenetic reconstruction even when the majority of individual gene trees conflict with the species tree.
Incomplete lineage sorting (ILS) is a population-genetic phenomenon in which ancestral polymorphisms persist through speciation events, causing the genealogical history of individual genes to differ from the branching order of the species that carry them. When a gene copies its history faithfully through a speciation event — with one allele going to one descendant species and another allele to the other — the gene tree matches the species tree. But when ancestral populations are large and speciation events are closely spaced in time, allelic lineages may fail to coalesce within the internodal population, and the random sorting of ancestral variants into daughter species can produce gene trees that conflict with the true phylogenetic relationships among those species.1, 2 Understanding ILS has transformed [molecular phylogenetics](/evolution/molecular-phylogenetics-methods), forcing a shift from concatenation-based approaches that treat all genes as sharing a single history to coalescent-based methods that explicitly model gene tree variation within a species-tree framework.5
Coalescent theory and gene tree variation
The conceptual foundation for understanding ILS comes from coalescent theory, which models the genealogical history of gene copies backward in time. In a population of effective size Ne, two randomly sampled alleles coalesce — trace back to their most recent common ancestor — at a rate inversely proportional to population size. For diploid organisms, the expected coalescence time for two alleles is 2Ne generations, but there is substantial variance around this expectation, and coalescence can take much longer.4, 8
When a species splits into two daughter lineages, the alleles within each daughter population continue to coalesce backward in time. If the time separating two successive speciation events (the internodal interval) is long relative to the ancestral effective population size, then most gene lineages will have coalesced within the internodal population, and the resulting gene trees will match the species tree. But if the internodal interval is short relative to Ne, many gene lineages will fail to coalesce during that interval and will instead sort randomly across the deeper speciation node. This random sorting produces gene trees that can have any of the possible topologies, with probabilities governed by the ratio of internodal time to population size.1, 2
Pamilo and Nei demonstrated in 1988 that for three taxa, the probability of a gene tree matching the species tree depends on the branch length in coalescent units (the internodal time divided by 2Ne for diploid organisms). When this ratio is small — as occurs during rapid radiations — the probability that any single gene tree matches the species tree can fall below two-thirds, meaning that the majority of individual gene histories may conflict with the species phylogeny.4 Degnan and Rosenberg extended this result in 2006, proving the existence of "anomalous gene trees" — gene tree topologies that are individually more probable than the gene tree that matches the species tree, a counterintuitive result with profound implications for phylogenetic inference.11
The human-chimpanzee-gorilla case
The most extensively studied example of ILS involves the evolutionary relationships among humans, chimpanzees, and gorillas. Molecular and morphological evidence overwhelmingly supports a species tree in which humans and chimpanzees are each other's closest relatives, with gorillas as the outgroup. However, the African great ape radiation was rapid: the gorilla lineage diverged approximately 10 million years ago, and the human-chimpanzee split occurred approximately 6–7 million years ago, leaving an internodal interval of only 3–4 million years. The ancestral effective population size during this interval was large, estimated at roughly 50,000–80,000 individuals.3, 12
These parameters predict substantial ILS, and genomic analyses have confirmed this prediction in striking detail. The gorilla genome sequencing project led by Scally and colleagues found that for approximately 30 percent of the genome, the gene tree topology differs from the species tree. About 15 percent of the human genome is more closely related to the gorilla genome than to the chimpanzee genome, and another 15 percent groups chimpanzees with gorillas to the exclusion of humans.3 These discordant regions are not errors or artefacts but precisely the pattern expected from ILS when ancestral polymorphism is incompletely sorted across a rapid speciation sequence. The [human-chimpanzee DNA](/evolution/human-chimpanzee-dna) comparison thus provides both a confirmation of the close evolutionary relationship among the African apes and a textbook demonstration of how gene trees can differ from species trees.
The human-chimp-gorilla case also illustrates the concept of hemiplasy, introduced by Avise and Robinson. Hemiplasy refers to character-state changes that appear homoplasious (convergent or reversal) on the species tree but are actually the result of ILS — the character evolved once on a gene tree but maps discordantly onto the species tree because the gene tree and species tree differ. Hemiplasy can inflate apparent rates of convergent evolution and mislead phylogenetic analyses that assume gene trees match species trees.14
Distinguishing ILS from hybridisation
Gene tree discordance can arise from two fundamentally different processes: ILS, which reflects the stochastic sorting of ancestral polymorphism, and hybridisation (introgression), which involves the transfer of genetic material between lineages after speciation. Distinguishing between these sources of discordance is one of the central challenges in modern [phylogenetics](/evolution/phylogenetics), because they produce superficially similar patterns of gene tree conflict but have very different biological implications.2, 5
Several features help differentiate ILS from introgression. Under pure ILS, the two minority gene tree topologies (the discordant trees) are expected to occur at equal frequencies, because the random sorting of two uncoalesced lineages across a speciation node has no directional bias. Introgression, by contrast, transfers genetic material between specific lineages and therefore produces an asymmetric excess of one discordant topology over the other. Statistical tests such as the ABBA-BABA test (Patterson's D-statistic) exploit this asymmetry to detect introgression in the presence of background ILS.3
Geographic and genomic patterns also provide discriminating evidence. ILS-generated discordance should be distributed randomly across the genome and unrelated to geographic contact between species, whereas introgressed regions tend to cluster in specific genomic locations (often in regions of low recombination or near adaptively significant loci) and correlate with geographic proximity between hybridising populations. In the gorilla genome analysis, Scally and colleagues found that the pattern of gene tree discordance was largely consistent with ILS, though a small fraction of loci showed signatures suggestive of ancestral introgression between the gorilla and human-chimpanzee lineages.3
The multispecies coalescent
The recognition that ILS can cause pervasive conflict between gene trees and species trees prompted a fundamental rethinking of phylogenetic methodology. Traditional concatenation approaches — in which sequences from multiple loci are combined into a single supermatrix and analysed as if they share a common genealogy — can be positively misleading when ILS is prevalent, because the concatenated signal can converge on an incorrect species tree with increasing statistical support as more data are added.2, 5
The multispecies coalescent (MSC) model provides the alternative framework. Under the MSC, gene trees are treated as random variables drawn from a probability distribution defined by the species tree and its associated parameters (branch lengths in coalescent units and effective population sizes). Rather than assuming that all genes share the same tree, the MSC explicitly models the discordance among gene trees as a function of the population-genetic processes occurring along the species tree branches.5, 6
Several computational methods implement the MSC for species tree estimation. Bayesian methods such as *BEAST and BEST co-estimate gene trees and the species tree simultaneously under a full probabilistic model, providing robust inference but at high computational cost for large datasets.6 Summary methods such as ASTRAL take a two-step approach: individual gene trees are first estimated from sequence data, and the species tree is then inferred by finding the topology that is most consistent with the collection of estimated gene trees. ASTRAL operates by maximising the number of shared quartet topologies between the gene trees and the species tree, and it has been proven statistically consistent under the MSC — meaning that it converges on the correct species tree as the number of loci increases, even when the majority of individual gene trees are discordant.7
The SVDQuartets method takes yet another approach, using algebraic properties of site-pattern frequencies under the MSC to infer species trees directly from sequence alignments without first estimating individual gene trees, avoiding potential errors introduced by gene tree estimation.13
ILS and rapid radiations
ILS is most severe in phylogenetic contexts involving rapid evolutionary radiations, where multiple speciation events occur in quick succession with large ancestral populations. These conditions produce short internodal branches in coalescent units, maximising the probability of discordant gene trees. Many of the most contentious nodes in the [tree of life](/evolution/tree-of-life) — the early radiation of placental mammals, the base of the bird phylogeny, the radiation of ray-finned fishes, the relationships among basal angiosperms — involve precisely this type of rapid diversification, and ILS has been implicated as a major source of phylogenetic uncertainty in each case.10, 15
The early diversification of neoavian birds provides an illustrative example. The major orders of modern birds appear to have radiated explosively following the end-Cretaceous mass extinction approximately 66 million years ago. Phylogenomic studies using thousands of loci have found that while certain higher-level relationships are well supported, the basal branching order remains difficult to resolve, with different gene subsets supporting conflicting topologies. The pattern is consistent with severe ILS caused by the combination of rapid speciation and large ancestral population sizes during the initial radiation.15
Similarly, the radiation of ray-finned fishes — the largest vertebrate clade, comprising more than 30,000 species — presents extensive gene tree discordance at many nodes. Faircloth and colleagues' analysis of ultraconserved elements found that while the major lineages of ray-finned fishes could be resolved using coalescent methods, traditional concatenation produced different and likely incorrect relationships at several nodes, underscoring the practical importance of accounting for ILS in phylogenomic inference.10
Broader significance
The study of incomplete lineage sorting has had a transformative effect on systematic biology. It has made clear that the history of a species is not a single branching tree but a complex network of gene genealogies, most of which do not precisely match the species divergence pattern. This insight has practical consequences beyond pure phylogenetics: ILS affects the reconstruction of ancestral character states, the inference of divergence times using [molecular clocks](/evolution/molecular-clocks), the detection of natural selection from comparative genomic data, and the delimitation of species boundaries in recently diverged groups.1, 2, 14
Perhaps most importantly, the recognition of ILS as a pervasive and predictable phenomenon has unified population genetics and phylogenetics into a single conceptual framework. The multispecies coalescent treats species trees and gene trees as different levels of the same hierarchical process, bridging the traditional divide between microevolutionary models of allele frequency change within populations and macroevolutionary models of lineage diversification across the [tree of life](/evolution/tree-of-life). In doing so, it has provided the field with both a deeper understanding of why phylogenetic inference is difficult and the statistical tools to do it correctly.5, 9
References
A phylogenomic perspective on the radiation of ray-finned fishes based upon targeted sequencing of ultraconserved elements
Maximum likelihood estimation of a migration matrix and effective population sizes in n subpopulations by using a coalescent approach