Skip to main content

Verified by Psychology Today

The Origins of Language

Language is not a singular event.

Key points

  • Attempts to explain the evolution of language overlooked that it evolved in verbal and non-verbal stages.
  • Non-verbal stages of language include hyper-cooperation, intersubjectivity and joint attention. Verbal stages include words and grammar.
  • The best way to make progress in the evolution of language is the focus on the origins of words, not grammar.

For more than 150 years ago, the assumption that language is a singular event has hampered progress in explaining its evolution. Another obstacle was the failure to recognize that certain social interactions, uniquely human interactions, are necessary for the evolution of language.

These problems have been recently remedied by recognizing that words had to evolve before grammar and discovering non-verbal emotional and cognitive relations between an infant and caregiver. As I elaborate below, those relations are known as intersubjectivity and joint attention.

Julia Margaret Cameron / Wikipedia
Charles Darwin
Source: Julia Margaret Cameron / Wikipedia

Darwin argued that the theory of evolution could account for the transition from animal communication to language by the principle of natural selection. The idea was that “language differed in degree and not kind” from animal communication. What remained to be discovered was the degree–“innumerable gradations” that separated them.

Some of those gradations have been discovered in recent years. But their nature suggests that language differs in kind from animal communication. With Darwin, Alfred Wallace, who published the first article on the theory of natural selection, wondered how natural selection, which assumes the survival value of a new ability, could account for man’s “superior intelligence.” Compared to apes, Wallace couldn’t understand why natural selection would produce anything more than a slight increment in mental ability. Language, not to mention numerical knowledge or music, is hardly necessary for survival.

Alfred Wallace
Source: Wikipedia

Because Wallace assumed that language was a singular event, he didn’t realize that words had to evolve before grammar. If he did, he might have recognized how a theory of the evolution of words would be consistent with the principle of natural selection.

Before words could evolve, some of our ancestors had to become more cooperative than apes. That increment in cooperation was necessary for intersubjectivity and joint attention to evolve. To see how language's verbal and non-verbal components relate to one another, it is helpful to review why chimpanzees, our nearest living relative, can’t learn language.

For chimpanzees, competition is the norm, not cooperation. Chimpanzee mothers (and other apes) don’t allow anyone else to interact with their infants for about six months. By contrast, human mothers allow others (relatives and non-kin, so-called “allomothers) to interact with a newborn right after birth. That practice, known as collective breeding, appears to have begun with Homo erectus, an ancestor who lived about 1.8 million years ago.

Infants raised by collective breeding are faced with two problems. In addition to discerning their mother's emotions and learning how to relate to her, collectively bred infants encounter the same problem when interacting with alloparents. Therefore, infants reared in this manner are challenged socially in ways that apes are not.

Chimpanzees are not only more competitive than humans, but they seldom share rewards, for example, exchanging a banana for some grapes. Collective breeding changed that and made cooperation, rather than competition, the norm.

Human infants not only exchange physical rewards but also participate in exchanges in which the reward is social, for example, when informing another about the location of a missing object by pointing to it.

Infants who are good at relating to alloparents were more likely to survive than those who were not. That selection pressure helped foster the high degree of cooperation that is crucial for developing intersubjectivity and joint attention.

Intersubjectivity refers to exchanges of affect between an infant and caregiver that often manifest themselves in games. Peek-a-boo, a game observed in all cultures, is a good example. Joint attention refers to a relation between an infant and caregiver in which they share attention to external objects, for example, an infant pointing to a dog.

Intersubjectivity begins at birth, a consequence of cradling and the proximity of an infant’s eyes to her mother’s. The bond they form is subsequently amplified by joint attention between an infant and her caregiver to objects of mutual interest.

The dynamics of intersubjectivity and joint attention are invisible to the untutored eye. What greater joy for parents than playing peek-a-boo with their infant or seeing their infant point to something and then smile? Such play is necessary for producing the infant’s first words around her first birthday.

Hyper-cooperation, intersubjectivity, and joint attention collectively created a perfect storm for the transition from animal communication to words. Linguists have neglected that transition in favor of the transition from words to grammar, the most celebrated language feature. It is easy to show that the transition from animal communication to words required more structural changes than words to grammar, specifically, the transition from analog primate calls to discrete digital speech. But grammar could not evolve without words.

The analog signals that animals use to communicate vary in intensity and frequency. Moreover, the average number of signals a given species uses rarely exceeds two dozen. By contrast, variations of meaning in language are conveyed by the choice of discrete words, whose upper bound is enormous. A reader of this blog knows more than 50,000.

The shift from the analog emotional signals that animals communicate to discrete words was a dramatic evolutionary change. Aside from the nature of the signal, emotional signals also differ fundamentally from words in that they are involuntary, unlearned, and uni-directional. Their sole function is to influence another’s behavior, such as assert dominance, stake out a territory, express an interest in mating, alert others about a predator, find food, and the like.

Emotional signals are also unchangeable. Dogs can only bark, cats can only purr, birds can only sing, and lions can only roar.

Words are voluntary, learned, and arbitrary. Typically, words are also conversational. A speaker and a listener alternate roles while sharing information. In contrast to emotional signals, whose form is fixed, the form of a word is arbitrary. A person can say tree, l’arbre, der baum, el árbol, il arbero, or their equivalent, in more than 6000 languages people speak, or in the gestures used in dozens of sign languages.

In sum, the transition from the emotional signals of animals to words involved a more significant change in the form of expression than the transition from words to grammar. The latter only involves their organization, order, inflection, etc. The transition from animal communication to words marks the first occasion our ancestors communicated conversationally, in an arbitrary manner. This is not to minimize the significance of the transition from words to grammar but only to clarify that it didn’t require a new form of expression.

Despite these apparent facts about words, they remain stepchildren in discussions of the evolution of language. As mentioned earlier, the vast literature on the evolution of language has focused on grammar, not words.

Wugapodes / Wikimedia
Noam Chomsky
Source: Wugapodes / Wikimedia

That imbalance may be attributed to Chomsky and his students. For more than 70 years, they have sought to discover the nature and the origins of grammar possibly at the expense of words. As can be seen in a recent comment, Chomsky seems to be aware of this problem:

The minimal meaning-bearing elements of human languages…are radically different from anything known in animal communication systems. Their origin is entirely obscure, posing a serious problem for the evolution of human cognitive capacities, particularly language.1

I recognize the importance of understanding grammar and why a theory of grammar would be the ultimate step in explaining the evolution of language. But neglecting the origin of words in the quest to understand the origin of grammar strikes me as putting the cart before the horse. It’s like attempting to understand molecules without understanding the nature of elements and the atoms that define them. Just as alchemists’ efforts to transmute lead into gold impeded our understanding of chemistry, ignorance about the origin of words impedes our understanding of language and its functions.

Focusing on words instead of grammar, however, reveals an interesting problem. Linguists have yet to agree on a definition of a word. Culturally, linguists regard all individual utterances as words. That is true of both people and animals. Childrens’ utterances such as hi, no, up, ouch, more, bye, and so on are regarded as words. So are the utterances that apes have been taught in experiments on “language” and animals' signals to communicate. For example, the alarm calls of vervet monkeys, eagles, leopards, and snakes, have erroneously been referred to as words.

What is needed is a definition that will distinguish between those utterances and the referential properties of words. That is why I define words as arbitrary utterances that are used conversationally. Speakers use words to refer to objects or events for the benefit of a listener and vice versa.

This definition provides an important evolutionary boundary that preserves the essence of human language. It may violate deeply felt cultural biases by excluding a minuscule number of utterances that infants make, utterances that are not referential. But language as we know it would never develop if such utterances were all that a child could learn.

To recapitulate, I have argued that the best way to make progress in language evolution is to focus on the origins of words, not grammar. That effort should be both phylogenetic and ontogenetic.

Phylogenetically, it’s important to ask, what psychological and environmental factors facilitated the transition from animal communication to words?

Ontogenetically, we must ask, how do the utterances of human infants become referential words?


[1] Berwick, R. C., & Chomsky, N. (2016). Why Only Us. Cambridge, MA: MIT Press, pp. 90-91.