Overview
- Pseudogenes are broken, non-functional copies of genes that persist in the genome—relics of genes that once encoded functional proteins but were disabled by mutations, deletions, or faulty copying events.
- Humans and chimpanzees share thousands of pseudogenes at identical genomic locations with identical inactivating mutations, a pattern that is explained by inheritance from a common ancestor but has no explanation under independent design.
- Specific examples—including the GULO pseudogene for vitamin C synthesis, hundreds of broken olfactory receptor genes, and the vitellogenin pseudogene from egg-laying ancestors—trace the history of gene loss across vertebrate evolution.
Scattered throughout the genomes of all complex organisms are the remnants of genes that once worked but no longer do. These pseudogenes—literally "false genes"—retain recognizable sequence similarity to their functional counterparts but have been disabled by mutations that disrupt their ability to produce a working protein. They are molecular fossils: records of genes that served a purpose in ancestral organisms but were subsequently broken by the accumulation of stop codons, frameshift mutations, or regulatory damage.1, 2 The human genome contains an estimated 13,000 to 20,000 pseudogenes, a number roughly comparable to the approximately 20,000 protein-coding genes that remain functional.1, 13
Pseudogenes are far more than genomic clutter. Their distribution, structure, and shared patterns of damage across species constitute some of the most powerful molecular evidence for common descent. When two species share the same pseudogene at the same chromosomal location, bearing the same inactivating mutations, the most parsimonious explanation is that both inherited the broken gene from a common ancestor in which the original disabling mutation occurred.5, 14
Types of pseudogenes
Pseudogenes arise through three principal mechanisms, each leaving a distinctive molecular signature. Processed pseudogenes are created when an mRNA transcript is reverse-transcribed back into DNA by the cellular machinery of a retrotransposon (typically a LINE-1 element) and reinserted into the genome at a new location. Because the copy is made from processed mRNA, these pseudogenes lack introns and carry a poly-A tail—hallmarks that distinguish them from the original gene. They also typically lack the upstream regulatory sequences needed for transcription, rendering them dead on arrival. The human genome contains thousands of processed pseudogenes, including over 2,000 derived from ribosomal protein genes alone.3, 4
Duplicated pseudogenes arise when a segment of DNA containing a gene is duplicated through unequal crossing over or segmental duplication. One copy continues to function while the other, freed from selective constraint, accumulates mutations until it can no longer encode a functional protein. Unlike processed pseudogenes, duplicated pseudogenes retain their intron-exon structure and reside near the original gene on the same chromosome.1, 2
Unitary pseudogenes represent a third and particularly informative category. These are genes that have no functional counterpart anywhere in the genome—the only copy has been disabled. Unitary pseudogenes arise when a previously functional gene accumulates inactivating mutations in a lineage where the gene's function is no longer under strong selective pressure. Zhang et al. identified approximately 80 unitary pseudogenes in the human genome that retain functional orthologs in chimpanzees, representing recent gene losses specific to the human lineage.5 Because unitary pseudogenes mark the loss of a function that was once present, they are especially useful for reconstructing evolutionary history.
Shared pseudogenes as evidence for common descent
The evidential power of pseudogenes for common ancestry lies not merely in their existence but in the precise pattern of their sharing across species. When comparing the genomes of humans and chimpanzees, researchers find thousands of pseudogenes at orthologous (corresponding) chromosomal positions, carrying the same disabling mutations.5, 15 This pattern requires explanation. If each species were independently designed, there is no reason a designer would place identical broken genes at identical locations with identical patterns of damage. Under common descent, however, the pattern is expected: the gene was broken once in a shared ancestor, and every descendant inherited the same broken copy at the same genomic address.
The logic is analogous to catching two students with identical wrong answers on an exam—not merely the same questions missed, but the same specific incorrect responses. One or two shared mistakes might be coincidence; thousands of shared mistakes, each with matching details, point unmistakably to a common source.
The GULO pseudogene
Perhaps the most widely cited example of an informative pseudogene is the broken vitamin C gene, GULOP. Most mammals synthesize their own vitamin C through a four-step enzymatic pathway, the final step of which is catalyzed by the enzyme L-gulonolactone oxidase, encoded by the GULO gene. Humans cannot perform this step because our copy of GULO has been disabled by the deletion of several exons and the accumulation of numerous point mutations.6, 7
When researchers sequenced the GULO pseudogene in chimpanzees, gorillas, orangutans, and macaques, they found the same exon deletions and the same accumulated mutations in every species, indicating that the gene was broken once in a common ancestor of the haplorrhine primates, approximately 40–65 million years ago.6, 7 Guinea pigs also cannot synthesize vitamin C, but their GULO pseudogene carries entirely different mutations—demonstrating an independent loss event in the rodent lineage and ruling out the possibility that the shared primate pattern arose by some inherent fragility of the gene itself.7
Olfactory receptor pseudogenes
The olfactory receptor (OR) gene family is the largest gene family in the mammalian genome, encoding the receptors that detect odor molecules in the nasal epithelium. Mice possess approximately 1,200 OR genes, of which roughly 20% are pseudogenes. Humans, by contrast, have approximately 800 OR genes, but roughly 60% of them—nearly 500—are pseudogenes incapable of producing functional receptors.8, 9, 10
This dramatic difference reflects the reduced importance of olfaction in primates, which rely more heavily on vision. The pattern of OR pseudogenization tracks primate phylogeny precisely: the fraction of broken OR genes increases from prosimians to monkeys to apes, correlating with the increasing development of trichromatic color vision.8 Many of the same OR genes are broken in both humans and chimpanzees, carrying identical inactivating mutations at the same positions, while remaining functional in more distantly related mammals.9 This shared degradation is consistent with a common ancestor in which natural selection relaxed on olfactory receptor genes as visual processing became dominant.
The vitellogenin pseudogene
Among the most striking pseudogenes in the human genome are the remnants of vitellogenin genes. Vitellogenin is the major egg yolk precursor protein in egg-laying vertebrates—birds, reptiles, amphibians, and fish all possess functional vitellogenin genes that produce the nutrient-rich yolk sustaining embryonic development. Mammals, which nourish their young through placentation and lactation, have no need for egg yolk production.12
Brawand, Wahli, and Kaessmann demonstrated that the human genome retains degraded remnants of vitellogenin genes—identifiable fragments of the ancestral VIT1, VIT2, and VIT3 genes that have been progressively pseudogenized. The chicken retains all three functional vitellogenin genes, while the platypus, which still lays eggs, retains one functional copy (VIT2) alongside pseudogenized copies of VIT1 and VIT3. In placental mammals including humans, all three copies are pseudogenes.12 The stepwise loss of vitellogenin function correlates with the evolutionary transition from egg-laying to placental reproduction: as mammals evolved lactation and placentation to nourish their young, selection pressure to maintain vitellogenin genes was progressively relaxed, and the genes accumulated disabling mutations.
The vitellogenin pseudogenes are particularly powerful evidence for evolution because they record a specific historical transition. Humans carry the broken remains of egg yolk genes because our distant ancestors were egg-laying vertebrates. Under a framework of independent design, there is no reason to equip mammals with degraded copies of genes whose sole function is producing egg yolk for a reproductive strategy that mammals do not use.12
Broader significance
Pseudogenes collectively represent one of the strongest classes of molecular evidence for common descent. Each shared pseudogene is an independent data point: a broken gene inherited from a common ancestor, carrying the molecular scars of a specific historical event. The thousands of shared pseudogenes between humans and other primates, the hundreds of shared olfactory receptor pseudogenes, the vitellogenin remnants from egg-laying ancestors, and the broken GULO gene from a primate ancestor that lost vitamin C synthesis—all converge on the same conclusion.1, 5, 12 They are joined by parallel lines of evidence from shared endogenous retroviruses and shared transposable element insertions, each providing an independent molecular record of shared ancestry.
The argument from pseudogenes is not merely that organisms share similar sequences—functional similarities might conceivably reflect similar design requirements. The argument is that organisms share identical patterns of damage in non-functional sequences, at the same genomic locations, with the same specific mutations. This pattern has one natural explanation—inheritance from a common ancestor—and no coherent alternative.5, 14
The CMAH pseudogene
Another instructive unitary pseudogene in the human genome is CMAH (cytidine monophosphate-N-acetylneuraminic acid hydroxylase). In most mammals, CMAH encodes an enzyme that converts the cell-surface sugar Neu5Ac into Neu5Gc, a modification that decorates cell membranes and plays roles in cell-cell recognition and pathogen interaction. In humans, CMAH has been inactivated by a 92-base-pair deletion in exon 6 that introduces a frameshift, rendering the gene non-functional. Chimpanzees, gorillas, and other great apes retain a functional CMAH gene and express Neu5Gc on their cell surfaces, while humans produce only Neu5Ac.18
Chou and colleagues dated the inactivation of human CMAH to approximately 2.5–3 million years ago by comparing the rate of neutral substitution accumulation in the pseudogene with functional copies in other primates. This timing is notable because it coincides with the period of rapid brain expansion in the hominin lineage, leading to speculation that the loss of Neu5Gc may have influenced susceptibility to certain pathogens or played a role in the evolution of human-specific neural development, though these functional implications remain under investigation.18
The question of pseudogene function
A common objection to the use of pseudogenes as evidence for common descent is the claim that pseudogenes may not truly be non-functional—that they might serve regulatory or other roles that have not yet been discovered. It is true that a small minority of pseudogenes have been found to produce non-coding RNA transcripts that can regulate the expression of their functional parent genes, acting as competing endogenous RNAs (ceRNAs) or natural antisense transcripts.17 However, this observation does not undermine the evolutionary argument. The evidential force of shared pseudogenes rests on the pattern of shared inactivating mutations at orthologous positions across species, not on the assumption of complete non-functionality. Even if a pseudogene has acquired a secondary regulatory role, the shared pattern of its original disabling mutations still traces directly to a common ancestor in which the coding function was lost.1, 17
Furthermore, the vast majority of pseudogenes show signatures of neutral evolution—their sequences diverge between species at rates consistent with the absence of selective constraint, and they accumulate deletions and insertions that would be incompatible with protein-coding function. Genome-wide surveys consistently find that fewer than 5% of pseudogenes show any evidence of transcription, and fewer still show evidence of functional constraint on their sequences.2, 16 The existence of a handful of co-opted pseudogenes does not alter the conclusion that the overwhelming majority are genuinely non-functional relics, and the shared patterns of their degradation remain powerful evidence of shared ancestry.
References
Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution
Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome
Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates
Random nucleotide substitutions in primate nonfunctional gene for L-gulono-γ-lactone oxidase, the missing enzyme in L-ascorbic acid biosynthesis
Inactivation of CMP-N-acetylneuraminic acid hydroxylase occurred prior to brain expansion during human evolution