bookmark

Evolution of language


Overview

  • Language is a uniquely complex human trait underpinned by genetic factors such as the FOXP2 gene, specialized neural circuitry including Broca's and Wernicke's areas, and anatomical adaptations of the vocal tract, all of which evolved incrementally over millions of years.
  • Archaeological proxies for symbolic cognition — including ochre use, shell beads, and composite tool manufacture — suggest that the cognitive prerequisites for language were present by at least 100,000 years ago, with some evidence pushing precursors back to Homo heidelbergensis or earlier.
  • Language likely coevolved with increasing social complexity, with larger group sizes placing selective pressure on more efficient communication, supporting the social brain hypothesis as a key framework for understanding why humans alone developed fully syntactic language.

Language is often regarded as the defining characteristic of the human species. No other organism communicates using a system of arbitrary symbols combined through recursive syntactic rules to produce an essentially infinite range of meaningful expressions.16 The evolution of this capacity involved the convergence of genetic, neurological, anatomical, cognitive, and social factors over millions of years, and understanding how and when language emerged remains one of the most challenging problems in the study of human evolution. Because language leaves no direct fossil trace, researchers must reconstruct its history through indirect evidence: the genetics of speech-related disorders, the comparative anatomy of the vocal tract in fossil hominins, the neural architecture visible in endocasts and modern brain imaging, the archaeological record of symbolic behavior, and the communicative abilities of living primates.8, 16

Genetic foundations: FOXP2 and beyond

The discovery that mutations in a single gene could profoundly disrupt speech and language production transformed the study of language evolution. In 2001, a team led by Cecilia Lai identified the gene FOXP2 on chromosome 7 as the locus responsible for a severe speech and language disorder affecting three generations of a British family known as the KE family. Affected members exhibited difficulties with the fine motor control of the lips, tongue, and jaw required for articulate speech, as well as broader deficits in grammatical processing.1 FOXP2 encodes a transcription factor — a protein that regulates the expression of other genes — and its disruption impairs the development of neural circuits in the basal ganglia and cerebellum that coordinate the rapid, sequential orofacial movements underlying speech.

Comparative genomic analyses published by Wolfgang Enard and colleagues in 2002 revealed that the human version of FOXP2 differs from the chimpanzee version by two amino acid substitutions, and that these changes occurred after the human lineage diverged from the last common ancestor shared with chimpanzees approximately six to seven million years ago. The pattern of variation around the FOXP2 locus in the human genome showed signatures of a selective sweep, suggesting that the human-specific variant was strongly favored by natural selection.2 Subsequent ancient DNA analyses found that Neanderthals shared the same two human-specific amino acid substitutions, indicating that the modern FOXP2 variant was already present in the common ancestor of Neanderthals and Homo sapiens, roughly 500,000 to 800,000 years ago.2 FOXP2 is not a "gene for language" in any simple sense — it is expressed in many tissues and plays roles in lung, heart, and gut development — but its identification provided the first molecular handle on the biological basis of a uniquely human cognitive capacity.

Neural architecture of language

The neural substrates of language are concentrated in the left hemisphere of the human brain, a pattern of lateralization that is evident in the vast majority of right-handed individuals and the majority of left-handed individuals as well.4 Two regions identified in the nineteenth century remain central to modern models of language processing. Broca's area, located in the left inferior frontal gyrus (Brodmann areas 44 and 45), is involved in speech production, syntactic processing, and the hierarchical organization of sequential behavior.3 Wernicke's area, located in the posterior portion of the left superior temporal gyrus, is critical for speech comprehension and the retrieval of word meanings. Damage to Broca's area produces nonfluent aphasia, in which patients speak haltingly and with simplified grammar but preserve comprehension, whereas damage to Wernicke's area produces fluent but largely incomprehensible speech with impaired comprehension.4

Diagram of the left hemisphere of the human brain showing the locations of Broca's area and Wernicke's area, the two classical cortical language regions
The left hemisphere of the human brain showing Broca's area (frontal lobe) and Wernicke's area (temporal lobe), the two classical cortical regions involved in speech production and comprehension. Wikimedia Commons, public domain

Modern neuroimaging has substantially complicated this classical model. Language processing engages a distributed network that extends well beyond Broca's and Wernicke's areas, including regions of the middle temporal gyrus, the angular gyrus, the supplementary motor area, and subcortical structures such as the thalamus and basal ganglia.4 The arcuate fasciculus, a white-matter tract connecting temporal and frontal language regions, is substantially larger and more developed in humans than in other primates, and its expansion may have been critical to the integration of phonological, semantic, and syntactic processing that characterizes human language.3, 4 Endocasts of fossil hominin skulls show that the reorganization of the frontal and temporal lobes was already underway in Homo erectus approximately 1.5 to 1.8 million years ago, though the degree to which these anatomical changes correspond to language-like capacities remains debated.19

Vocal tract anatomy and the descended larynx

The human vocal tract differs from that of all other primates in ways that are directly relevant to speech production. In adult humans, the larynx sits lower in the throat than in any other primate, creating an elongated pharyngeal cavity above the vocal folds. This descended larynx, combined with a shortened oral cavity and a tongue root that extends deep into the pharynx, gives humans a supralaryngeal vocal tract with a roughly 1:1 ratio between the horizontal (oral) and vertical (pharyngeal) tubes. This geometry enables the production of the quantal vowels /i/, /a/, and /u/ — the extreme vowels that form the acoustic anchors of virtually all human languages — and greatly expands the range of discriminable speech sounds.8

Philip Lieberman argued for decades that the descended larynx was a uniquely human adaptation for speech, though subsequent research by Tecumseh Fitch and others demonstrated that laryngeal descent also occurs in some non-human mammals, including red deer and several other species, where it appears to serve the function of exaggerating perceived body size through vocalization rather than enabling speech.7 This finding complicates the narrative that the descended larynx evolved exclusively for language, though the human configuration remains unique in the degree of pharyngeal expansion and the precision of supralaryngeal articulation it permits. The descended larynx also carries a significant cost: it increases the risk of choking by bringing the pathways for food and air into closer proximity, a vulnerability that natural selection would presumably tolerate only if the communicative benefits were substantial.7, 8

The hyoid bone and Neanderthal speech

The hyoid is a small, horseshoe-shaped bone in the throat that anchors the muscles of the tongue, larynx, and pharynx and plays an essential role in swallowing and speech production. Unlike most bones in the body, the hyoid does not articulate with any other bone, which means it is rarely preserved in the fossil record and, when found, provides unusually direct evidence about the soft-tissue anatomy of the throat.

Cast of the Neanderthal 1 (Feldhofer 1) skullcap, whose hyoid bone and FOXP2 gene variant inform debates about language evolution
Cast of the Neanderthal 1 (Feldhofer 1) skullcap, whose hyoid bone and FOXP2 gene variant inform debates about language evolution. Gunnar Creutz, Wikimedia Commons, CC BY-SA 4.0

The most important fossil hyoid in paleoanthropology is the specimen recovered from the Kebara 2 Neanderthal burial in Israel, dated to approximately 60,000 years ago. When Baruch Arensburg and colleagues published their analysis in 1989, they reported that the Kebara hyoid was virtually indistinguishable in size and morphology from that of modern humans, suggesting that the Neanderthal vocal tract was anatomically capable of producing a range of speech sounds comparable to our own.5 A more detailed micro-CT analysis by Ruggero D'Anastasio and colleagues in 2013 examined the internal architecture of the Kebara hyoid and confirmed that its pattern of trabecular bone was consistent with the biomechanical loading patterns produced by human speech musculature, strengthening the case that Neanderthals possessed at least the anatomical prerequisites for spoken language.6

Whether Neanderthals actually spoke in a manner comparable to modern humans remains an open question. Anatomical capacity for speech does not by itself demonstrate the presence of syntactically structured language, which depends on cognitive and neural organization that cannot be directly inferred from a single bone. Nevertheless, the Kebara hyoid, combined with the presence of the human-derived FOXP2 variant in Neanderthal DNA, suggests that some form of vocal communication was present in this closely related species.2, 5, 6

The gestural origins hypothesis

An influential alternative to vocal-first models of language evolution proposes that language originated in manual gesture rather than vocalization. The gestural origins hypothesis, developed most fully by Michael Corballis, holds that early hominins communicated through a system of iconic and conventional hand and arm movements, and that spoken language emerged later as a secondary channel that gradually supplanted gesture as the primary medium of linguistic communication.9

Several lines of evidence support this proposal. First, the great apes show far greater voluntary control over their hands than over their vocalizations: chimpanzees and bonobos can be taught to use manual signs and visual symbols with some flexibility, whereas their vocal repertoires are largely fixed and involuntary.15 Second, human sign languages are fully developed linguistic systems with their own phonology, morphology, and syntax, demonstrating that the language faculty is not inherently tied to the vocal-auditory channel.9 Third, the discovery of mirror neurons in the premotor cortex of macaques — neurons that fire both when the monkey performs a grasping action and when it observes the same action performed by another — led Giacomo Rizzolatti and Michael Arbib to propose that a mirror system for hand movements could have provided the neural substrate for the earliest forms of communicative reference, a "matching" of action between sender and receiver that would be the precursor to symbolic meaning.10

Critics of the gestural hypothesis note that gesture requires visual attention and line of sight, making it impractical in the dark, at a distance, or while the hands are occupied with tool use. The transition from a gestural to a vocal system also requires explanation, and the mechanisms proposed for this shift remain speculative. Nevertheless, the gestural hypothesis has been productive in highlighting the multimodal nature of human communication, in which speech and gesture remain tightly integrated in everyday language use.9, 10

Archaeological proxies for language

Because spoken language leaves no direct material trace, researchers rely on archaeological proxies — artifacts and behaviors that imply the cognitive capacities associated with language. The most commonly cited proxies are evidence of symbolic behavior: the use of pigments for body decoration, the manufacture of personal ornaments such as shell beads, the creation of abstract engravings, and the production of representational art. The logic is that if a population used arbitrary symbols to communicate meaning through material culture, it likely possessed the cognitive architecture to do the same through vocalization.11, 17

The earliest clear evidence for symbolic behavior comes from sites in southern Africa dating to the Middle Stone Age. At Blombos Cave in South Africa, Christopher Henshilwood and colleagues documented engraved ochre pieces with geometric cross-hatched patterns dating to approximately 77,000 years ago, and a 100,000-year-old ochre-processing workshop that implies deliberate, planned pigment production.12 Perforated Nassarius shell beads from Blombos, dating to approximately 75,000 years ago, represent some of the earliest known personal ornaments.21 Similar shell beads from sites in North Africa and the Levant extend the evidence for symbolic behavior back to at least 100,000 years ago.17

Composite tool technology — the manufacture of implements composed of multiple parts joined together, such as hafted stone points — represents another important proxy. The production of a hafted tool requires planning, the mental representation of a finished form not yet in existence, and the teaching of a multi-step manufacturing sequence, all of which plausibly require or are greatly facilitated by language. Hafted stone points appear in the archaeological record by approximately 200,000 to 300,000 years ago, associated with Middle Stone Age and Middle Paleolithic assemblages in Africa and Europe.11

Archaeological proxies for symbolic cognition and language11, 12, 17, 21

Proxy Earliest date (ka) Key site Region
Hafted stone tools ~300 Kathu Pan 1 South Africa
Ochre processing ~100 Blombos Cave South Africa
Shell beads ~100 Skhul, Oued Djebbana Levant, North Africa
Engraved geometric patterns ~77 Blombos Cave South Africa
Figurative cave art ~45 Leang Tedongnge Sulawesi, Indonesia

Brain size, encephalization, and language capacity

The human brain is approximately three times larger than expected for a primate of our body size, and much of this expansion occurred in the neocortex, particularly the prefrontal and temporal association areas most directly implicated in language.19 The trajectory of hominin brain expansion is well documented in the fossil record: from approximately 400 to 500 cubic centimetres in the australopithecines, to roughly 600 to 900 cubic centimetres in early Homo, to 1,200 to 1,500 cubic centimetres in Homo sapiens and Neanderthals.19 However, brain size alone is a poor predictor of linguistic capacity. The critical factor appears to be not overall volume but the internal reorganization of the brain — the expansion and rewiring of specific circuits, particularly in the prefrontal cortex, the temporal lobe, and the white-matter pathways connecting them.

A landmark 2018 study by Simon Neubauer, Jean-Jacques Hublin, and Philipp Gunz used CT scans of fossil and modern human crania to show that the globular brain shape characteristic of living Homo sapiens — with expanded parietal and cerebellar regions — was not present in the earliest anatomically modern humans from sites like Jebel Irhoud (approximately 300,000 years ago) but evolved gradually, reaching its modern configuration only within the past 100,000 to 35,000 years.20 This finding suggests that important neural reorganization continued well after the appearance of our species' skeletal anatomy, and that the cognitive capacities underlying fully modern language may have emerged relatively late in human evolution.

Comparative evidence from primate communication

The communicative abilities of non-human primates provide a baseline against which to measure the distinctiveness of human language. Decades of field research on vervet monkeys, chimpanzees, and other species have revealed that primate vocalizations can be functionally referential — that is, specific calls are reliably associated with specific external events, such as the presence of a particular type of predator. Robert Seyfarth and Dorothy Cheney's classic studies of vervet alarm calls showed that distinct calls for eagles, leopards, and snakes elicited distinct and appropriate evasive behaviors from listening vervets, demonstrating a degree of referential specificity in non-human primate communication.15

A vervet monkey (Chlorocebus pygerythrus) at Letaba Camp, Kruger National Park, South Africa, the species whose functionally referential alarm calls were studied by Seyfarth and Cheney
A vervet monkey (Chlorocebus pygerythrus) at Kruger National Park, South Africa. Seyfarth and Cheney's field studies of this species demonstrated that vervets produce functionally referential alarm calls — distinct vocalizations for eagles, leopards, and snakes that elicit distinct and appropriate evasive responses in listeners — showing that non-human primate communication can achieve a degree of referential specificity, while still falling far short of the recursive, syntactically structured signal system that defines human language. Bernard Dupont, Wikimedia Commons, CC BY-SA 2.0

However, primate vocal communication differs from human language in fundamental ways. Primate call repertoires are small, typically comprising 15 to 40 distinct call types, and are largely innate rather than learned. Crucially, primates do not combine calls into structured sequences with compositional meaning — they lack syntax, the recursive combinatorial system that allows human languages to generate an unbounded number of sentences from a finite set of elements.15, 16 In their influential 2002 paper, Marc Hauser, Noam Chomsky, and Tecumseh Fitch proposed a distinction between the "faculty of language in the broad sense" (FLB), which includes sensorimotor and conceptual-intentional systems shared with other animals, and the "faculty of language in the narrow sense" (FLN), which they argued consists solely of recursion and may be unique to humans.16 This proposal remains debated, but the absence of anything resembling recursive syntax in natural primate communication underscores the magnitude of the evolutionary gap between animal signaling and human language.

Protolanguage and the path to syntax

Many researchers posit an intermediate stage — a protolanguage — between the holistic, affective vocalizations of ancestral hominins and the fully syntactic language of modern humans. Protolanguage, as typically conceived, would have consisted of individual meaningful units (proto-words) used in isolation or in simple, unstructured combinations, without the grammatical rules that allow modern languages to express relations of agent, action, and patient, or to embed one proposition within another.16 A protolanguage speaker might have been able to say the equivalent of "meat," "fire," or "danger," and perhaps even "big cat" or "give meat," but not "the man who was hunting yesterday told me that the river is flooding."

The transition from protolanguage to full language would have required the evolution of the computational capacity for recursion and hierarchical structure — what Hauser, Chomsky, and Fitch termed the faculty of language in the narrow sense.16 When and how this transition occurred is unknown. Some researchers argue for a gradual, incremental emergence driven by cumulative selection pressures over hundreds of thousands of years, while others propose a relatively sudden genetic or cognitive reorganization, perhaps associated with the globularization of the brain documented by Neubauer and colleagues.18, 20 Michael Tomasello has argued that the key transition was not primarily computational but social: the evolution of shared intentionality and cooperative communication, which provided the pragmatic framework within which grammatical conventions could emerge through cultural processes.18

Language and the coevolution of social complexity

Robin Dunbar's social brain hypothesis provides one of the most influential frameworks for understanding why language evolved. Dunbar observed a strong positive correlation across primate species between neocortex size (relative to total brain volume) and the typical size of social groups. Extrapolating this relationship to humans, he predicted a natural group size of approximately 150 individuals — a number that recurs in ethnographic, historical, and organizational data and is now known as "Dunbar's number."13

In smaller-brained primates, social bonds are maintained primarily through grooming, a time-intensive, one-on-one activity. As group size increases beyond a threshold, the time required for grooming becomes unsustainable — Dunbar estimated that a group of 150 would require individuals to devote roughly 40 percent of their waking hours to grooming, which would be incompatible with foraging and other essential activities.14 Language, Dunbar proposed, evolved as a more efficient form of social bonding — a kind of "vocal grooming" that allowed individuals to maintain social relationships with multiple partners simultaneously. Conversation permits the exchange of social information about third parties (gossip), the reinforcement of alliances, and the negotiation of status within groups of a size that would be unmanageable through physical grooming alone.14

This framework predicts that the selective pressures driving language evolution were fundamentally social rather than ecological or technological. On this view, language did not evolve primarily to coordinate hunting, transmit technical knowledge, or describe the external world, but rather to manage the increasingly complex social relationships that accompanied the expansion of hominin group sizes over the past two million years.13, 14 The social brain hypothesis does not explain the mechanism by which syntactic language emerged, but it provides a compelling account of the selective environment that would have favored increasingly sophisticated communicative abilities, creating the conditions under which language could evolve from gestural and vocal precursors into the recursive, symbolic system that defines the human species.13, 18

References

1

A forkhead-domain gene is mutated in a severe speech and language disorder

Lai, C. S. L. et al. · Nature 413: 519–523, 2001

open_in_new
2

Molecular evolution of FOXP2, a gene involved in speech and language

Enard, W. et al. · Nature 418: 869–872, 2002

open_in_new
3

Broca's area and the hierarchical organization of human behavior

Koechlin, E. & Jubault, T. · Neuron 50: 963–974, 2006

open_in_new
4

The functional neuroanatomy of language

Price, C. J. · Annals of the New York Academy of Sciences 1191: 62–88, 2010

open_in_new
5

A Neanderthal hyoid bone

Arensburg, B. et al. · American Journal of Physical Anthropology 73: 137–146, 1989

open_in_new
6

Morphology of the Kebara 2 hyoid and the origin of human speech

D'Anastasio, R. et al. · PLoS ONE 8: e82261, 2013

open_in_new
7

The descent of the larynx in man and other mammals

Fitch, W. T. & Reby, D. · Proceedings of the Royal Society B 268: 1669–1675, 2001

open_in_new
8

The evolution of the vocal tract and the origins of language

Lieberman, P. · Cambridge University Press, 2006

open_in_new
9

From hand to mouth: the origins of language

Corballis, M. C. · Princeton University Press, 2002

open_in_new
10

Mirror neurons and the evolution of language

Rizzolatti, G. & Arbib, M. A. · Trends in Neurosciences 21: 188–194, 1998

open_in_new
11

The origin of modern human behavior: critique of the models and their test implications

Henshilwood, C. S. & Marean, C. W. · Current Anthropology 44(5): 627–651, 2003

open_in_new
12

A 100,000-year-old ochre-processing workshop at Blombos Cave, South Africa

Henshilwood, C. S. et al. · Science 334: 219–222, 2011

open_in_new
13

The social brain hypothesis

Dunbar, R. I. M. · Evolutionary Anthropology 6: 178–190, 1998

open_in_new
14

Grooming, gossip, and the evolution of language

Dunbar, R. I. M. · Harvard University Press, 1996

open_in_new
15

Primate vocal communication: a useful tool for understanding human speech and language evolution?

Seyfarth, R. M. & Cheney, D. L. · Philosophical Transactions of the Royal Society B 367: 1785–1801, 2012

open_in_new
16

The faculty of language: what is it, who has it, and how did it evolve?

Hauser, M. D., Chomsky, N. & Fitch, W. T. · Science 298: 1569–1579, 2002

open_in_new
17

Archaeological evidence for the emergence of language, symbolism, and music — an alternative multidisciplinary perspective

d'Errico, F. et al. · Journal of World Prehistory 17(1): 1–70, 2003

open_in_new
18

The evolution of language from social cognition

Tomasello, M. · Current Opinion in Neurobiology 28: 5–8, 2014

open_in_new
19

Brain size, cranial morphology, climate, and time machines

Rightmire, G. P. · Current Anthropology 45: 653–660, 2004

open_in_new
20

The evolution of modern human brain shape

Neubauer, S., Hublin, J.-J. & Gunz, P. · Science Advances 4: eaao5961, 2018

open_in_new
21

Nassarius kraussianus shell beads from Blombos Cave: evidence for symbolic behaviour in the Middle Stone Age

d'Errico, F. et al. · Journal of Human Evolution 48: 3–24, 2005

open_in_new
0:00