bookmark

Pseudogenes


Overview

  • Pseudogenes are broken, non-functional copies of genes that persist in the genome—relics of genes that once encoded functional proteins but were disabled by mutations, deletions, or faulty copying events.
  • Humans and chimpanzees share thousands of pseudogenes at identical genomic locations with identical inactivating mutations, a pattern that is explained by inheritance from a common ancestor but has no explanation under independent design.
  • Specific examples—including the GULO pseudogene for vitamin C synthesis, hundreds of broken olfactory receptor genes, and the vitellogenin pseudogene from egg-laying ancestors—trace the history of gene loss across vertebrate evolution.

Scattered throughout the genomes of all complex organisms are the remnants of genes that once worked but no longer do. These pseudogenes—literally "false genes"—retain recognizable sequence similarity to their functional counterparts but have been disabled by mutations that disrupt their ability to produce a working protein. They are molecular fossils: records of genes that served a purpose in ancestral organisms but were subsequently broken by the accumulation of stop codons, frameshift mutations, or regulatory damage.1, 2 The human genome contains an estimated 13,000 to 20,000 pseudogenes, a number roughly comparable to the approximately 20,000 protein-coding genes that remain functional.1, 13

Pseudogenes are far more than genomic clutter. Their distribution, structure, and shared patterns of damage across species constitute some of the most powerful molecular evidence for common descent. When two species share the same pseudogene at the same chromosomal location, bearing the same inactivating mutations, the most parsimonious explanation is that both inherited the broken gene from a common ancestor in which the original disabling mutation occurred.5, 14

Types of pseudogenes

Pseudogenes arise through three principal mechanisms, each leaving a distinctive molecular signature. Processed pseudogenes are created when an mRNA transcript is reverse-transcribed back into DNA by the cellular machinery of a retrotransposon (typically a LINE-1 element) and reinserted into the genome at a new location. Because the copy is made from processed mRNA, these pseudogenes lack introns and carry a poly-A tail—hallmarks that distinguish them from the original gene. They also typically lack the upstream regulatory sequences needed for transcription, rendering them dead on arrival. The human genome contains thousands of processed pseudogenes, including over 2,000 derived from ribosomal protein genes alone.3, 4

Duplicated pseudogenes arise when a segment of DNA containing a gene is duplicated through unequal crossing over or segmental duplication. One copy continues to function while the other, freed from selective constraint, accumulates mutations until it can no longer encode a functional protein. Unlike processed pseudogenes, duplicated pseudogenes retain their intron-exon structure and reside near the original gene on the same chromosome.1, 2

Unitary pseudogenes represent a third and particularly informative category. These are genes that have no functional counterpart anywhere in the genome—the only copy has been disabled. Unitary pseudogenes arise when a previously functional gene accumulates inactivating mutations in a lineage where the gene's function is no longer under strong selective pressure. Zhang et al. identified approximately 80 unitary pseudogenes in the human genome that retain functional orthologs in chimpanzees, representing recent gene losses specific to the human lineage.5 Because unitary pseudogenes mark the loss of a function that was once present, they are especially useful for reconstructing evolutionary history.

Shared pseudogenes as evidence for common descent

The evidential power of pseudogenes for common ancestry lies not merely in their existence but in the precise pattern of their sharing across species. When comparing the genomes of humans and chimpanzees, researchers find thousands of pseudogenes at orthologous (corresponding) chromosomal positions, carrying the same disabling mutations.5, 15 This pattern requires explanation. If each species were independently designed, there is no reason a designer would place identical broken genes at identical locations with identical patterns of damage. Under common descent, however, the pattern is expected: the gene was broken once in a shared ancestor, and every descendant inherited the same broken copy at the same genomic address.

The logic is analogous to catching two students with identical wrong answers on an exam—not merely the same questions missed, but the same specific incorrect responses. One or two shared mistakes might be coincidence; thousands of shared mistakes, each with matching details, point unmistakably to a common source.

The GULO pseudogene

Perhaps the most widely cited example of an informative pseudogene is the broken vitamin C gene, GULOP. Most mammals synthesize their own vitamin C through a four-step enzymatic pathway, the final step of which is catalyzed by the enzyme L-gulonolactone oxidase, encoded by the GULO gene. Humans cannot perform this step because our copy of GULO has been disabled by the deletion of several exons and the accumulation of numerous point mutations.6, 7

When researchers sequenced the GULO pseudogene in chimpanzees, gorillas, orangutans, and macaques, they found the same exon deletions and the same accumulated mutations in every species, indicating that the gene was broken once in a common ancestor of the haplorrhine primates, approximately 40–65 million years ago.6, 7 Guinea pigs also cannot synthesize vitamin C, but their GULO pseudogene carries entirely different mutations—demonstrating an independent loss event in the rodent lineage and ruling out the possibility that the shared primate pattern arose by some inherent fragility of the gene itself.7

Olfactory receptor pseudogenes

The olfactory receptor (OR) gene family is the largest gene family in the mammalian genome, encoding the receptors that detect odor molecules in the nasal epithelium. Mice possess approximately 1,200 OR genes, of which roughly 20% are pseudogenes. Humans, by contrast, have approximately 800 OR genes, but roughly 60% of them—nearly 500—are pseudogenes incapable of producing functional receptors.8, 9, 10

This dramatic difference reflects the reduced importance of olfaction in primates, which rely more heavily on vision. The pattern of OR pseudogenization tracks primate phylogeny precisely: the fraction of broken OR genes increases from prosimians to monkeys to apes, correlating with the increasing development of trichromatic color vision.8 Many of the same OR genes are broken in both humans and chimpanzees, carrying identical inactivating mutations at the same positions, while remaining functional in more distantly related mammals.9 This shared degradation is consistent with a common ancestor in which natural selection relaxed on olfactory receptor genes as visual processing became dominant.

The vitellogenin pseudogene

Among the most striking pseudogenes in the human genome are the remnants of vitellogenin genes. Vitellogenin is the major egg yolk precursor protein in egg-laying vertebrates—birds, reptiles, amphibians, and fish all possess functional vitellogenin genes that produce the nutrient-rich yolk sustaining embryonic development. Mammals, which nourish their young through placentation and lactation, have no need for egg yolk production.12

Brawand, Wahli, and Kaessmann demonstrated that the human genome retains degraded remnants of vitellogenin genes—identifiable fragments of the ancestral VIT1, VIT2, and VIT3 genes that have been progressively pseudogenized. The chicken retains all three functional vitellogenin genes, while the platypus, which still lays eggs, retains one functional copy (VIT2) alongside pseudogenized copies of VIT1 and VIT3. In placental mammals including humans, all three copies are pseudogenes.12 The stepwise loss of vitellogenin function correlates with the evolutionary transition from egg-laying to placental reproduction: as mammals evolved lactation and placentation to nourish their young, selection pressure to maintain vitellogenin genes was progressively relaxed, and the genes accumulated disabling mutations.

The vitellogenin pseudogenes are particularly powerful evidence for evolution because they record a specific historical transition. Humans carry the broken remains of egg yolk genes because our distant ancestors were egg-laying vertebrates. Under a framework of independent design, there is no reason to equip mammals with degraded copies of genes whose sole function is producing egg yolk for a reproductive strategy that mammals do not use.12

Broader significance

Pseudogenes collectively represent one of the strongest classes of molecular evidence for common descent. Each shared pseudogene is an independent data point: a broken gene inherited from a common ancestor, carrying the molecular scars of a specific historical event. The thousands of shared pseudogenes between humans and other primates, the hundreds of shared olfactory receptor pseudogenes, the vitellogenin remnants from egg-laying ancestors, and the broken GULO gene from a primate ancestor that lost vitamin C synthesis—all converge on the same conclusion.1, 5, 12 They are joined by parallel lines of evidence from shared endogenous retroviruses and shared transposable element insertions, each providing an independent molecular record of shared ancestry.

The argument from pseudogenes is not merely that organisms share similar sequences—functional similarities might conceivably reflect similar design requirements. The argument is that organisms share identical patterns of damage in non-functional sequences, at the same genomic locations, with the same specific mutations. This pattern has one natural explanation—inheritance from a common ancestor—and no coherent alternative.5, 14

The CMAH pseudogene

Another instructive unitary pseudogene in the human genome is CMAH (cytidine monophosphate-N-acetylneuraminic acid hydroxylase). In most mammals, CMAH encodes an enzyme that converts the cell-surface sugar Neu5Ac into Neu5Gc, a modification that decorates cell membranes and plays roles in cell-cell recognition and pathogen interaction. In humans, CMAH has been inactivated by a 92-base-pair deletion in exon 6 that introduces a frameshift, rendering the gene non-functional. Chimpanzees, gorillas, and other great apes retain a functional CMAH gene and express Neu5Gc on their cell surfaces, while humans produce only Neu5Ac.18

Chou and colleagues dated the inactivation of human CMAH to approximately 2.5–3 million years ago by comparing the rate of neutral substitution accumulation in the pseudogene with functional copies in other primates. This timing is notable because it coincides with the period of rapid brain expansion in the hominin lineage, leading to speculation that the loss of Neu5Gc may have influenced susceptibility to certain pathogens or played a role in the evolution of human-specific neural development, though these functional implications remain under investigation.18

The question of pseudogene function

A common objection to the use of pseudogenes as evidence for common descent is the claim that pseudogenes may not truly be non-functional—that they might serve regulatory or other roles that have not yet been discovered. It is true that a small minority of pseudogenes have been found to produce non-coding RNA transcripts that can regulate the expression of their functional parent genes, acting as competing endogenous RNAs (ceRNAs) or natural antisense transcripts.17 However, this observation does not undermine the evolutionary argument. The evidential force of shared pseudogenes rests on the pattern of shared inactivating mutations at orthologous positions across species, not on the assumption of complete non-functionality. Even if a pseudogene has acquired a secondary regulatory role, the shared pattern of its original disabling mutations still traces directly to a common ancestor in which the coding function was lost.1, 17

Furthermore, the vast majority of pseudogenes show signatures of neutral evolution—their sequences diverge between species at rates consistent with the absence of selective constraint, and they accumulate deletions and insertions that would be incompatible with protein-coding function. Genome-wide surveys consistently find that fewer than 5% of pseudogenes show any evidence of transcription, and fewer still show evidence of functional constraint on their sequences.2, 16 The existence of a handful of co-opted pseudogenes does not alter the conclusion that the overwhelming majority are genuinely non-functional relics, and the shared patterns of their degradation remain powerful evidence of shared ancestry.

References

1

Pseudogenes: pseudo-functional or key regulators in health and disease?

Poliseno, L. et al. · Nature Reviews Genetics 16: 339–349, 2015

open_in_new
2

Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution

Zheng, D. et al. · Genome Research 17: 839–851, 2007

open_in_new
3

The life of a dead gene: the role of processed pseudogenes in gene evolution

Kaessmann, H., Vinckenbosch, N. & Long, M. · PNAS 106: 11154–11159, 2009

open_in_new
4

Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome

Zhang, Z. et al. · Genome Research 12: 1466–1482, 2002

open_in_new
5

Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates

Zhang, Z. D. et al. · Genome Biology 11: R26, 2010

open_in_new
6

Random nucleotide substitutions in primate nonfunctional gene for L-gulono-γ-lactone oxidase, the missing enzyme in L-ascorbic acid biosynthesis

Ohta, Y. & Nishikimi, M. · Biochimica et Biophysica Acta 1472: 408–411, 1999

open_in_new
7

The genetics of vitamin C loss in vertebrates

Drouin, G., Godin, J.-R. & Pagé, B. · Current Genomics 12(5): 371–378, 2011

open_in_new
8

Human specific loss of olfactory receptor genes

Gilad, Y., Man, O., Pääbo, S. & Lancet, D. · PNAS 100(6): 3324–3327, 2003

open_in_new
9

A comparison of the human and chimpanzee olfactory receptor gene repertoires

Go, Y. & Niimura, Y. · Genome Research 15(2): 224–230, 2005

open_in_new
10

Evolution of the olfactory receptor gene family in vertebrates

Niimura, Y. & Nei, M. · PNAS 100(21): 12235–12240, 2003

open_in_new
12

Loss of egg yolk genes in mammals and the origin of lactation and placentation

Brawand, D., Wahli, W. & Kaessmann, H. · PLoS Biology 6(3): e63, 2008

open_in_new
13

Initial sequencing and analysis of the human genome

Lander, E. S. et al. (International Human Genome Sequencing Consortium) · Nature 409: 860–921, 2001

open_in_new
14

Pseudogene evolution and natural selection for a compact genome

Petrov, D. A. & Hartl, D. L. · Journal of Heredity 91(3): 221–227, 2000

open_in_new
15

Initial sequence of the chimpanzee genome and comparison with the human genome

Chimpanzee Sequencing and Analysis Consortium · Nature 437: 69–87, 2005

open_in_new
16

A high-coverage genome sequence from an archaic Denisovan individual

Meyer, M. et al. · Science 338: 222–226, 2012

open_in_new
17

Pseudogene-derived lncRNAs: emerging regulators of gene expression

An, Y. et al. · Frontiers in Cell and Developmental Biology 7: 242, 2019

open_in_new
18

Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution

Chou, H.-H. et al. · PNAS 99: 11736–11741, 2002

open_in_new
0:00