bookmark

Genome duplication and polyploidy


Overview

  • Whole-genome duplication (WGD) is a major evolutionary mechanism in which the entire genetic complement of an organism is doubled, creating a polyploid with twice the normal chromosome number, and phylogenomic analyses have revealed that WGD events have occurred repeatedly across the tree of life, including at least two rounds (the 2R hypothesis) at the base of vertebrate evolution approximately 500 million years ago.
  • Polyploidy is especially prevalent in flowering plants, where an estimated 30 to 80 percent of living species are polyploid or descend from polyploid ancestors, and major crop species including wheat, cotton, potato, and canola are polyploids whose duplicated genomes have been central to agricultural domestication and breeding.
  • Following genome duplication, redundant gene copies undergo one of three fates: loss (nonfunctionalisation), preservation of the ancestral function (subfunctionalisation), or evolution of novel functions (neofunctionalisation), and the balance among these outcomes determines whether WGD ultimately promotes evolutionary innovation, genomic streamlining, or both.

Whole-genome duplication (WGD) is an evolutionary process in which the entire genetic complement of an organism is duplicated, producing a polyploid with twice the normal number of chromosomes. First proposed as a major driver of evolutionary innovation by Susumu Ohno in his landmark 1970 book Evolution by Gene Duplication, WGD has since been recognised as a pervasive force across the tree of life, documented in plants, animals, fungi, and protists.1, 3 Phylogenomic analyses have revealed that genome duplication events have occurred repeatedly throughout evolutionary history, with particularly profound consequences for the diversification of flowering plants and the origin of vertebrate complexity. The duplicated genes produced by WGD provide raw material for evolutionary experimentation: freed from the constraint of maintaining a single essential function, duplicate copies can diverge to acquire new roles, partition ancestral functions between them, or be lost entirely.1, 7

Close-up of wheat ears, a hexaploid crop produced by whole-genome duplication
Bread wheat (Triticum aestivum) is an allohexaploid combining the genomes of three ancestral grass species, one of the most economically important products of whole-genome duplication. Bluemoose, Wikimedia Commons, CC BY-SA 3.0

Mechanisms of genome duplication

Genome duplication can occur through two fundamentally different mechanisms. Autopolyploidy arises when the genome of a single species is duplicated, typically through errors in cell division that produce unreduced gametes (eggs or sperm with the full diploid chromosome number rather than the normal haploid complement). When two unreduced gametes fuse, the resulting offspring is a tetraploid with four copies of each chromosome. Autopolyploidy produces organisms that are genetically very similar to their diploid progenitors but reproductively isolated from them because crosses between tetraploids and diploids produce triploid offspring that are largely sterile.4, 16

Allopolyploidy, by contrast, combines genome duplication with hybridisation between two different species. When two related species hybridise and the hybrid undergoes genome doubling, the result is an allopolyploid that carries the complete genome of both parental species. Allopolyploidy is especially common in plants and has produced many of the world's most important crop species, including bread wheat (Triticum aestivum), which is a hexaploid combining the genomes of three ancestral grass species, and cotton (Gossypium), which is an allotetraploid formed approximately 1 to 2 million years ago from the merger of an African and an American diploid cotton species.4, 11

Whole-genome duplications in vertebrate evolution

One of the most consequential applications of WGD theory concerns the origin of vertebrate complexity. The 2R hypothesis, first proposed by Ohno and subsequently supported by genomic evidence, holds that two successive rounds of whole-genome duplication occurred in the stem lineage of vertebrates approximately 500 million years ago, before the divergence of jawed and jawless vertebrates. These ancient duplications quadrupled the vertebrate gene complement and are thought to have provided the genetic raw material for the evolution of key vertebrate innovations including the adaptive immune system, the neural crest, and the complex vertebrate brain.1, 2

Evidence for the 2R hypothesis comes from comparative genomic studies showing that vertebrate genomes contain four paralogous copies (ohnologs) of many genes that exist in single copy in invertebrate outgroups such as the amphioxus (Branchiostoma). Dehal and Boore analysed the human genome and identified extensive paralogous regions consistent with two rounds of duplication followed by extensive gene loss. Putnam and colleagues confirmed these findings by sequencing the amphioxus genome, which preserves a pre-duplication state and shows clear four-to-one relationships with vertebrate gene families across large chromosomal regions.2, 15

Additional whole-genome duplications have occurred in specific vertebrate lineages. Teleost fishes experienced a third round of genome duplication (the teleost-specific genome duplication, or TSGD) approximately 320 million years ago, and salmonid fishes underwent a fourth duplication approximately 80 million years ago. The salmonid genome, sequenced by Lien and colleagues, retains approximately half of its duplicated genes in functional pairs, providing a snapshot of the early stages of post-WGD gene fate resolution.8, 3

Major whole-genome duplication events across the tree of life3

EventLineageApproximate timingKey evidence
1R + 2RAncestral vertebrates~500 Ma4:1 paralogy with amphioxus
TSGD (3R)Teleost fishes~320 MaDuplicated Hox clusters
Salmonid (4R)Salmonidae~80 MaResidual tetrasomic inheritance
WGD in yeastSaccharomyces~100 MaDuplicated gene blocks
Gamma eventCore eudicots~130 MaSynteny across angiosperms
Alpha + BetaBrassicaceae~40–65 MaArabidopsis duplicated blocks

Polyploidy in plants

Polyploidy is far more prevalent in plants than in animals, and it has played a central role in the diversification of flowering plants (angiosperms). Estimates suggest that 30 to 80 percent of living angiosperm species are polyploid or have polyploid ancestry, and the One Thousand Plant Transcriptomes Initiative identified multiple independent WGD events across virtually every major plant lineage, suggesting that genome duplication has been a recurring feature of plant evolution over the past 200 million years.4, 5

Several ancient polyploidy events have been linked to major radiations and ecological transitions. The gamma whole-genome triplication, shared by all core eudicots, occurred approximately 130 million years ago and preceded the explosive diversification of the largest clade of flowering plants. Additional WGD events within specific lineages have been correlated with the evolution of novel chemical defences, shifts in pollinator relationships, and colonisation of new habitats. Edger and colleagues demonstrated that a WGD in the ancestor of the Brassicales was followed by the evolution of glucosinolate defence compounds, triggering a coevolutionary arms race with butterfly herbivores that drove diversification in both lineages.5, 12

The prevalence of polyploidy in plants is thought to reflect several factors that make genome duplication more tolerable in plants than in animals: plants frequently reproduce asexually (allowing newly formed polyploids to persist without requiring a compatible mate), they are generally more tolerant of changes in gene dosage, and they lack the chromosomal sex determination systems that are disrupted by polyploidy in many animal groups.4, 16

Polyploidy in animals and fungi

Although less common than in plants, polyploidy has been documented in several animal lineages. Among vertebrates, polyploid species are known in frogs (particularly the genus Xenopus), fish (salmonids, sturgeons, and some cyprinids), and lizards. Polyploid animals are disproportionately represented among parthenogenetic (asexually reproducing) species, which avoids the complications of meiotic chromosome pairing in organisms with odd or unbalanced chromosome sets.13

In invertebrates, polyploidy has been documented in insects, crustaceans, molluscs, and annelids, though it remains considerably rarer than in plants. The relative scarcity of polyploidy in animals is generally attributed to the disruption of sex determination mechanisms (particularly XY and ZW systems), the greater sensitivity of animal development to gene dosage changes, and the obligate sexual reproduction of most animal species, which makes the establishment of new polyploid lineages more difficult.13

Among fungi, the best-studied case of WGD occurred in the lineage leading to the baker's yeast Saccharomyces cerevisiae. Wolfe and Shields identified extensive duplicated blocks across the yeast genome and proposed that the ancestor of Saccharomyces underwent a whole-genome duplication approximately 100 million years ago. Subsequent studies have shown that most duplicated genes were lost, but the retained duplicates include genes involved in glucose metabolism and fermentation, suggesting that the WGD facilitated the evolution of the distinctive anaerobic fermentation capacity that characterises modern Saccharomyces species.14

Fates of duplicated genes

Following a WGD event, each gene in the genome exists in two copies (paralogs, or ohnologs when produced specifically by WGD). These duplicate copies are initially redundant, and their subsequent evolutionary fates determine the long-term consequences of the duplication. Three principal outcomes have been identified. Nonfunctionalisation (also called pseudogenisation) occurs when one copy accumulates deleterious mutations and becomes a nonfunctional pseudogene; this is the most common outcome, and the majority of duplicated genes are lost within millions of years of a WGD event.1, 7

Neofunctionalisation occurs when one copy retains the original function while the other acquires a new function through the accumulation of beneficial mutations. Because one copy maintains the essential ancestral role, the other is free to explore mutational space without deleterious consequences, and any mutations that confer a new advantageous function will be preserved by natural selection. Ohno originally proposed neofunctionalisation as the primary mechanism by which WGD drives evolutionary innovation.1

Subfunctionalisation, proposed by Force and colleagues in their duplication-degeneration-complementation (DDC) model, occurs when each copy loses a subset of the ancestral gene's functions through degenerative mutations, so that both copies are required to perform the full range of functions originally carried out by the single ancestral gene. Unlike neofunctionalisation, subfunctionalisation does not require positive selection and can occur through neutral processes alone. He and Zhang proposed that subfunctionalisation may serve as a transition state toward eventual neofunctionalisation, with initial partitioning of functions followed by the independent elaboration of each copy's retained function.6, 7

Polyploidy and agriculture

Many of the world's most important crop species are polyploids, and genome duplication has played a central role in the history of agricultural domestication. Bread wheat (Triticum aestivum) is an allohexaploid (6x) that arose through two successive hybridisation and genome-doubling events, combining the genomes of three ancestral grass species over the past half-million years. The hexaploid genome of wheat, approximately five times larger than the human genome, provides extensive genetic redundancy that has facilitated adaptation to diverse growing conditions and human selection for yield, disease resistance, and grain quality.4

Cotton (Gossypium hirsutum and G. barbadense) is an allotetraploid formed approximately 1 to 2 million years ago from the hybridisation of an Old World A-genome species and a New World D-genome species. The polyploid event combined genomes with different fibre characteristics, and the resulting tetraploid produced longer, finer fibres than either diploid parent, a property subsequently enhanced by human selection during domestication. Other major polyploid crops include potato (Solanum tuberosum, tetraploid), canola (Brassica napus, allotetraploid), oat (Avena sativa, hexaploid), sugarcane (complex polyploid), and strawberry (Fragaria x ananassa, octoploid).9, 11

Genome duplication and evolutionary diversification

A recurring question in evolutionary biology is whether WGD events are causally linked to subsequent evolutionary radiations or whether the apparent correlation between WGD and diversification is coincidental. Van de Peer, Maere, and Meyer analysed the timing of ancient WGD events across plant and animal phylogenies and found that many well-documented polyploidy events cluster near major extinction boundaries, including the Cretaceous-Paleogene (K-Pg) boundary at 66 million years ago. They proposed that polyploid lineages may have a selective advantage during periods of environmental upheaval because their duplicated genomes provide greater genetic redundancy and buffering capacity against deleterious mutations.3

However, the relationship between WGD and diversification is not straightforward. Most newly formed polyploid lineages go extinct rapidly, and only a small fraction undergo the extensive gene loss, diploidisation, and functional divergence necessary to establish themselves as successful new lineages. The lag between WGD and subsequent radiation — often tens of millions of years — suggests that genome duplication is not sufficient on its own to drive diversification but rather creates latent evolutionary potential that is realised only when combined with ecological opportunity or environmental change.3, 5

Detecting ancient genome duplications

Identifying WGD events that occurred hundreds of millions of years ago requires multiple lines of genomic evidence, as the signal of duplication is progressively eroded by gene loss, chromosomal rearrangement, and sequence divergence. The primary evidence comes from synteny analysis, which identifies blocks of genes that occur in the same order on different chromosomes, indicating that they were produced by the duplication and subsequent rearrangement of an ancestral chromosomal segment. When such syntenic blocks are found throughout the genome, they constitute strong evidence for WGD.2, 14

A complementary approach analyses the age distribution of duplicated gene pairs (paralogs) across a genome. If a WGD occurred at a specific time in the past, the synonymous substitution rates (Ks values) of ohnolog pairs should cluster around a single peak corresponding to that event, distinguishing WGD from the continuous background of small-scale tandem duplications. The One Thousand Plant Transcriptomes Initiative used this Ks-based approach to identify dozens of previously unknown WGD events across the plant tree of life, revealing that genome duplication has been even more pervasive than previously recognised.5, 15

References

1

Evolution by gene duplication

Ohno, S. · Springer-Verlag, Berlin, 1970

open_in_new
2

Two rounds of whole genome duplication in the ancestral vertebrate

Dehal, P. & Boore, J. L. · PLoS Computational Biology 1(3): e38, 2005

open_in_new
3

The evolutionary significance of ancient genome duplications

Van de Peer, Y., Maere, S. & Meyer, A. · Nature Reviews Genetics 10: 725–732, 2009

open_in_new
4

Polyploidy and genome evolution in plants

Soltis, P. S. & Soltis, D. E. · Current Opinion in Genetics & Development 35: 119–125, 2015

open_in_new
5

One thousand plant transcriptomes and the phylogenomics of green plants

One Thousand Plant Transcriptomes Initiative · Nature 574: 679–685, 2019

open_in_new
6

Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution

He, X. & Zhang, J. · Genetics 169: 1157–1164, 2005

open_in_new
7

Preservation of duplicate genes by complementary, degenerative mutations

Force, A. et al. · Genetics 151: 1531–1545, 1999

open_in_new
8

Ancient whole genome duplications, new opportunities: a contention of ideas with examples from the Salmonidae

Lien, S. et al. · Nature 533: 200–205, 2016

open_in_new
9

The genome of the mesopolyploid crop species Brassica rapa

Wang, X. et al. · Nature Genetics 43: 1035–1039, 2011

open_in_new
11

Polyploidy and the evolutionary history of cotton

Wendel, J. F. · Advances in Agronomy 78: 139–186, 2000

open_in_new
12

The butterfly plant arms-race escalated by gene and genome duplications

Edger, P. P. et al. · Proceedings of the National Academy of Sciences 112: 8362–8366, 2015

open_in_new
13

Autopolyploidy in animals

Otto, S. P. & Whitton, J. · Annual Review of Genetics 34: 401–437, 2000

open_in_new
14

The yeast genome: on the road to the golden age

Wolfe, K. H. & Shields, D. C. · Nature 387: 708–713, 1997

open_in_new
15

The amphioxus genome and the evolution of the chordate karyotype

Putnam, N. H. et al. · Nature 453: 1064–1071, 2008

open_in_new
16

Polyploidy in an evolutionary context

Ramsey, J. & Schemske, D. W. · Journal of Botany 89: 1079–1091, 2002

open_in_new
0:00