bookmark

The tree of life


Overview

  • All life on Earth shares a single common ancestor, a conclusion supported by the universal genetic code, conserved ribosomal RNA sequences, shared core biochemistry, and a formal statistical test published in 2010 that ruled out independent origins with overwhelming probability.
  • Carl Woese’s ribosomal RNA phylogenies revealed a three-domain architecture of life — Bacteria, Archaea, and Eukarya — overturning a century of two-kingdom thinking and placing the domain boundary not between animals and microbes but deep within the microbial world.
  • Horizontal gene transfer, in which genes move laterally between lineages rather than vertically from parent to offspring, complicates the strictly branching tree metaphor and has led some biologists to describe early microbial evolution as a “web of life,” though universal common ancestry itself remains secure.

The concept that all living things on Earth are connected through a single branching genealogy — the tree of life — is one of the most profound and well-supported ideas in the history of science. It holds that every organism alive today, from the simplest bacterium to the blue whale, shares a common ancestor that lived billions of years ago, and that the diversity of life represents the accumulated divergence of lineages over geological time. The idea was articulated most powerfully by Charles Darwin in On the Origin of Species in 1859, where the only illustration in the entire book was a branching diagram showing how a single ancestral form might diversify into many descendant species over time.1 In the century and a half since, the tree of life has been confirmed and enriched by an avalanche of independent evidence from molecular biology, genomics, and comparative biochemistry, and the project of mapping its full structure has become one of the central endeavors of modern biology.

Darwin’s diagram and the hierarchical classification of life

Darwin’s branching diagram in On the Origin of Species was not merely illustrative. It embodied a specific and testable claim: that the nested, hierarchical pattern of biological classification — varieties within species, species within genera, genera within families, families within orders, and so on — is not an arbitrary human sorting scheme but rather a direct reflection of actual genealogical history.1 The reason all cats share more features with each other than any of them share with dogs is because cats share a more recent common ancestor with each other than any of them do with dogs. The reason all mammals share more features with each other than with reptiles is because mammals form a more recently unified clade. Similarity maps onto recency of common ancestry, and the entire nested hierarchy of the Linnaean classification system falls out naturally as a consequence.

Before Darwin, naturalists classified organisms into hierarchical groups but had no principled explanation for why such a hierarchy existed. Special creationism offered no prediction: a creator could in principle have made organisms in any pattern whatsoever. Darwin’s great insight was that common descent with modification both explains and predicts the hierarchical structure of life, because diverging lineages accumulate independent changes after they split, generating the distinctive and diagnosable clusters that taxonomists recognize as higher taxa. The prediction is strong: if common descent is true, no organism should ever be found that violates the nested hierarchy — no mammal that is more closely related to a fish than to another mammal, no plant more closely related to an insect than to another plant. More than 150 years of molecular data have confirmed this expectation without a single anomalous case that undermines the fundamental hierarchy, even as horizontal gene transfer has complicated its fine structure among microbes.5

Molecular evidence for universal common ancestry

The development of molecular biology in the mid-twentieth century provided an entirely independent source of evidence for the tree of life, one that Darwin could not have anticipated. Three lines of molecular evidence stand out as particularly decisive: the universality of the genetic code, the conservation of core biochemical machinery across all domains of life, and the concordance of phylogenies reconstructed from different molecular markers.

The genetic code — the mapping of nucleotide triplets (codons) to amino acids — is nearly identical across all known life forms. The codon UUU specifies the amino acid phenylalanine in bacteria, in redwood trees, and in humans. The codon UAA is a stop codon in Escherichia coli and in Homo sapiens. With only minor exceptions in a handful of lineages (such as certain mitochondrial genomes and some protists), the code is universal.5 This universality is extraordinarily difficult to explain under any hypothesis of independent origins. The genetic code is essentially arbitrary: there is no chemical reason why UUU must code for phenylalanine rather than any other amino acid. The fact that the same arbitrary assignment is used by every known organism on Earth is a near-certain signature of shared ancestry. A universe in which life arose independently multiple times would be expected to produce different codes, just as independently invented human languages use different words for the same objects.

Similarly, the core biochemical machinery of life — DNA replication, transcription, translation, the ATP synthase complex, the citric acid cycle — is shared across the three domains of life in forms clearly derived from a common ancestral set of molecules.5 Ribosomal proteins that assemble the machinery for reading genetic information are recognizably homologous across Bacteria, Archaea, and Eukarya. The information-processing systems of all life share a common logic and common molecular components that would be extraordinarily improbable under any scenario of independent origins.

Perhaps most compellingly, phylogenetic trees built from different molecular markers — ribosomal RNA genes, protein-coding genes, transfer RNA sequences — converge on the same branching pattern for organisms across the tree of life.4 If each gene had an independent evolutionary history, one would not expect different genes to produce the same tree topology. Their convergence is a direct prediction of common descent: all genes in an organism share the same genealogical history because the organism itself has a single genealogical history. The concordance of hundreds of independently analyzed genes and gene families constitutes overwhelming evidence for the reality of the tree of life.

Woese and the three-domain system

For most of the twentieth century, biologists recognized two fundamental categories of life: prokaryotes (organisms lacking a nucleus) and eukaryotes (organisms with a nucleus). This two-kingdom framework, which grouped all bacteria together as prokaryotes, was overturned in 1977 by Carl Woese and George Fox, who used ribosomal RNA sequences to build the first molecular phylogeny of diverse organisms and discovered something unexpected: the microbes then known as “archaebacteria” were not, in fact, a subgroup of bacteria. They were as different from bacteria in their rRNA sequences as either group was from eukaryotes.2

Woese and Fox had stumbled upon a previously hidden domain of life. Ribosomal RNA was an inspired choice of molecular marker: it is present in all cellular life, performs the same essential function in all organisms (protein synthesis), and evolves slowly enough to preserve deep evolutionary signal across billions of years of divergence. By comparing rRNA sequences across a broad sample of organisms, Woese and Fox showed that life on Earth is organized into three primary lineages, not two. In 1990, Woese, along with Otto Kandler and Mark Wheelis, formally proposed the three-domain classification system: Bacteria, Archaea, and Eukarya.3

The Archaea, while superficially similar to bacteria in lacking a nucleus, differ from bacteria in profound ways. Their cell membrane lipids have a distinctive ether-linked chemistry not found in bacteria or eukaryotes. Their transcription and translation machinery is more similar to that of eukaryotes than to bacteria. Many archaea thrive in extreme environments — hypersaline lakes, boiling hot springs, anaerobic sediments — though it is now clear that archaea are abundant in ordinary environments as well, including ocean water and soil. The three-domain system restructured the entire conceptual framework of microbiology and demonstrated that the deepest division in the history of life runs not between animals and microbes, but between the bacterial domain and the archaeal-eukaryotic lineage.3

Subsequent genomic analyses have complicated and refined the three-domain picture. Phylogenomic studies based on large sets of conserved genes have consistently placed eukaryotes as emerging from within the Archaea, specifically from a group called the Asgard archaea, rather than as a sister group to the entire archaeal domain.8 This finding suggests a two-domain topology at the deepest level of the tree, with eukaryotes arising as a chimeric lineage combining an archaeal host cell with a bacterial endosymbiont that became the mitochondrion.7, 8 The eukaryotic cell is itself, in this view, a product of a merger between two deeply diverged lineages — an archaeal cell and an engulfed alpha-proteobacterium that eventually became the mitochondrion, contributing its own genome to the eukaryotic lineage.14

Horizontal gene transfer and the web of life

The tree of life metaphor implies that genetic information flows vertically, from parent to offspring, and that the history of life can be represented as a strictly bifurcating tree. This picture is complicated by horizontal gene transfer (HGT), the movement of genetic material between organisms that are not in a direct ancestor-descendant relationship. HGT is now known to be pervasive among prokaryotes, with some estimates suggesting that a significant fraction of the bacterial genome — perhaps 20 percent or more — has been acquired horizontally at some point in the evolutionary history of a given lineage.6

HGT occurs through several mechanisms: transformation (uptake of environmental DNA), transduction (transfer by viruses), and conjugation (direct transfer between cells via plasmids). The transferred genes can encode almost anything — antibiotic resistance, metabolic pathways, virulence factors — and they spread rapidly through bacterial populations, sometimes crossing domain boundaries. The discovery of widespread HGT led some biologists, most notably W. Ford Doolittle, to argue that the tree of life should be replaced by a “web of life” or a “net of life” metaphor that better captures the reticulate (network-like) nature of microbial evolution.15

However, the implications of HGT for universal common ancestry are less radical than they might appear. Even in the presence of extensive horizontal gene transfer, a core set of informational genes — those involved in DNA replication, transcription, and translation — is largely refractory to horizontal transfer because they are deeply integrated with dozens of interacting partners and cannot easily function in a new cellular context.5 These core genes retain a tree-like signal that traces back to LUCA (the Last Universal Common Ancestor), even when the overall genome history is reticulate. Universal common ancestry, as a claim about the origin of cellular life, is not undermined by HGT: the ancestral lineage was still singular, and HGT represents lateral exchange among its descendants, not independent origin events. The “tree” remains the best representation of the history of cell lineages, even if it is overlaid by a web of transferred genes.15

LUCA: the Last Universal Common Ancestor

The Last Universal Common Ancestor, or LUCA, is the hypothetical organism (or population of organisms) from which all currently living things on Earth descend. LUCA is not a claim that life originated only once — it is possible that life originated multiple times, with all but one lineage going extinct — but it is the point in the evolutionary tree where all surviving lineages converge when traced backward in time.10 LUCA is thought to have lived roughly 3.5 to 4 billion years ago, based on the oldest known fossil evidence for cellular life and molecular clock estimates, though its precise date and nature remain subjects of active research.

What can be inferred about LUCA from comparative genomics? In a landmark 2016 study, Madeline Weiss and colleagues systematically identified genes shared across Bacteria and Archaea that showed a tree-like pattern of inheritance consistent with vertical descent from a common ancestor, reasoning that these genes most likely date back to LUCA itself.10 They identified 355 protein families that met this criterion. The profile that emerged suggests LUCA was an anaerobic organism that obtained energy by coupling the reduction of carbon dioxide with the oxidation of hydrogen, using a Wood-Ljungdahl pathway for carbon fixation. It lived in a hot, metal-rich environment — possibly a hydrothermal vent system — and relied heavily on iron-sulfur clusters as cofactors.10 Notably, LUCA already possessed a sophisticated genetic system with ribosomes, transfer RNAs, and the genetic code as we know it, meaning that the deepest evolutionary events, including the origin of life itself and the establishment of the genetic code, predate LUCA and are not directly accessible through comparative genomics of modern organisms.

LUCA was already a complex organism by any measure. The presence of the ribosome, the universal genetic code, and DNA-based heredity in LUCA means that the most fundamental innovations of life — the emergence of self-replicating, heritable information and its transduction into functional proteins — preceded the last common ancestor. The origin of life and the establishment of LUCA are distinct events separated by an unknown but substantial interval of pre-LUCA evolution.

Theobald’s formal statistical test

In 2010, Douglas Theobald published a formal statistical test of the theory of universal common ancestry in the journal Nature, representing the most rigorous quantitative assessment of the hypothesis to that point.4 Theobald used a likelihood-ratio framework to compare several competing hypotheses: universal common ancestry (all life shares a single root), multiple independent origins with subsequent horizontal gene transfer, and completely independent origins without any horizontal gene transfer. The test used a dataset of conserved protein sequences from diverse organisms spanning all three domains of life.

The result was decisive. The hypothesis of universal common ancestry outperformed the alternatives by a likelihood ratio corresponding to a Bayes factor so large that it was effectively impossible to compute exactly — the evidence in favor of a single common ancestor over any multiple-origins hypothesis exceeded any reasonable statistical threshold by many orders of magnitude.4 Theobald’s analysis also tested whether the data could be explained by multiple independent lineages that subsequently shared genes through horizontal gene transfer, and found that even this scenario was far less probable than true universal common ancestry. The paper was reviewed by statisticians and biologists and its conclusions were broadly accepted; a commentary accompanying the article described it as providing “an extraordinarily strong statistical case” for common ancestry.9

The significance of Theobald’s test is not that biologists needed additional convincing — the evidence for universal common ancestry from the genetic code, conserved biochemistry, and congruent phylogenies was already overwhelming — but that it demonstrated the hypothesis could be subjected to formal statistical hypothesis testing and that it passes with an essentially unlimited margin. Universal common ancestry is no longer merely a compelling narrative; it is a quantitatively confirmed scientific conclusion.

Modern phylogenomics and the Open Tree of Life

The advent of high-throughput genome sequencing has transformed the practice of phylogenetics by making it possible to reconstruct evolutionary relationships from thousands of genes simultaneously rather than a handful of markers. This field, called phylogenomics, produces phylogenetic trees of substantially greater resolution and statistical support than single-gene analyses, and has resolved many longstanding controversies in the history of life. A 2016 study by Hug and colleagues assembled a phylogenomic tree from 3,083 organisms spanning all three domains of life, including a large representation of newly discovered microbial lineages known only from environmental DNA sequences.11 Their tree revealed that the majority of life’s diversity is microbial and that the branches of the tree representing animals, plants, and other macroscopic eukaryotes are a comparatively narrow sliver of the full diversity of life on Earth.

The Open Tree of Life project, described in a landmark 2015 paper by Hinchliff and colleagues, represents an ambitious attempt to synthesize all published phylogenetic information into a single comprehensive tree of life covering all named species.12 The project combined published phylogenetic trees from thousands of studies with taxonomic data for species lacking phylogenetic placement, producing a supertree covering approximately 2.3 million named species. The Open Tree of Life is openly accessible, continuously updated as new phylogenetic data are published, and provides a resource for comparative biology, conservation planning, and ecological research that was inconceivable before the genomic era.

Phylogenomics has also continued to refine the deepest branches of the tree. The discovery of the Asgard archaea — a group identified initially from environmental metagenomes taken from marine sediments — has fundamentally altered understanding of the origin of eukaryotes.8 Asgard archaea possess genes previously thought to be eukaryote-specific, including homologs of actin, tubulin, and components of the endosomal sorting machinery. Their phylogenomic position as the closest known relatives of eukaryotes supports a model in which the eukaryotic cell arose when an archaeal host closely related to the Asgard group engulfed an alpha-proteobacterium, giving rise to the mitochondrion and the chimeric character of the eukaryotic genome.8, 14 The Asgard archaea thus represent a living, if distant, connection to the archaeal ancestor of all complex life.

Despite the complications introduced by horizontal gene transfer, the tree-within-a-web character of microbial evolution, and the chimeric origins of the eukaryotic cell, the fundamental picture that Darwin sketched in his branching diagram has been confirmed and enriched at every scale. The tree of life is real. Its deepest roots reach back to a singular common ancestor, and its branches, comprising the full diversity of known life, are being mapped with increasing resolution by the methods of modern molecular biology and comparative genomics. As the Open Tree of Life project and its successors continue to integrate new data, the tree will become ever more complete — a monument to the unity of all life on Earth and to the power of the evolutionary process to generate, from a single ancestor, the extraordinary diversity of living things.

References

1

On the origin of species by means of natural selection

Darwin, C. · John Murray, London, 1859

open_in_new
2

Phylogenetic structure of the prokaryotic domain: the primary kingdoms

Woese, C. R. & Fox, G. E. · Proceedings of the National Academy of Sciences 74(11): 5088–5090, 1977

open_in_new
3

Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya

Woese, C. R., Kandler, O. & Wheelis, M. L. · Proceedings of the National Academy of Sciences 87(12): 4576–4579, 1990

open_in_new
4

Formal test of the theory of universal common ancestry

Theobald, D. L. · Nature 465: 219–222, 2010

open_in_new
5

Darwinian evolution in the light of genomics

Koonin, E. V. · Nucleic Acids Research 37(4): 1011–1034, 2009

open_in_new
6

Horizontal gene transfer: the buzz word in microbial evolution

Ochman, H., Lawrence, J. G. & Groisman, E. A. · Nature 405: 299–304, 2000

open_in_new
7

The tree of life and the origin of eukaryotes

Williams, T. A. et al. · Science 337: 816–820, 2012

open_in_new
8

Archaeal origin of the eukaryotic proteome supports a two-domain tree of life

Williams, T. A. et al. · Science 360: 1040–1043, 2018

open_in_new
9

Synthesis of a universal common ancestry hypothesis

Steel, M. & Penny, D. · Nature 465: 168–169, 2010

open_in_new
10

Reconstructing the last universal common ancestor

Weiss, M. C. et al. · Nature Microbiology 1: 16116, 2016

open_in_new
11

A new view of the tree of life

Hug, L. A. et al. · Nature Microbiology 1: 16048, 2016

open_in_new
12

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Hinchliff, C. E. et al. · Proceedings of the National Academy of Sciences 112(41): 12764–12769, 2015

open_in_new
14

Eukaryotic origins: how and when was the mitochondrion acquired?

Gray, M. W. · Cold Spring Harbor Perspectives in Biology 4(9): a011403, 2012

open_in_new
15

Is the tree of life the best metaphor, model, or heuristic for phylogenetics?

Morrison, D. A. · Systematic Biology 63(4): 628–638, 2014

open_in_new
0:00