Herbert S. Terrace Ph.D.

The Origin of Words


How Infants Learn to Use Words

The contribution of joint attention to the growth of vocabulary.

Posted Aug 24, 2020

Why is language uniquely human? As mentioned in previous posts, chimpanzees can’t learn language because they can’t learn to name things. Only humans can. We’ve also argued that an infant’s interpersonal world in her first year is crucial. An infant and her parent must engage in two types of nonverbal relationships before an infant produces her first words (at approximately one year): intersubjectivity (2-4 months) and joint attention (beginning at 9 months). In this post, we will describe how joint attention facilitates the onset of words and the growth of vocabulary.

To do that we will address the infant’s linguistic understanding of words, rather than her emotional understanding of words. We note, however, that there is evidence that, just on the basis of their vocal contours, very young infants can detect approval or disapproval in their parents’ utterances in English and other languages (Fernald, 1993).

Linguists describe two sources of knowledge about words: how we understand them and how we use them. They refer to the former as comprehension and to the latter as production. Infants comprehend the linguistic meaning of words before they begin to produce them. That’s because comprehension makes fewer cognitive demands than production. Production requires the ability to refer to objects; comprehension doesn’t.

That difference is evident in tests of comprehension and production. All an infant has to do, to provide evidence of comprehension, is to behave appropriately. Consider, for example, a test of an infant’s comprehension of the noun doll. Acceptable responses include pointing to one in a picture book (as in the photo on left) or to one in her collections of toys. Evidence of understanding other types of words is equally easy to obtain. To show that an infant understands bye-bye she simply has to wave. To show that she understands peek-a-boo, she simply has to cover or uncover her eyes.

Tests of production are more difficult. For a positive outcome, the infant must coordinate her own attention to an object as well as that of her parent. She must also refer to that object. That can only be done by naming it. For example, when the parent holds up a doll and asks, “What’s that?”, the infant would have to say doll. Pointing to a doll is not acceptable. Infants pass tests of production at about one year.

Animals can respond correctly on tests of comprehension as, for example, when a dog fetches a ball, in response to the command, fetch the ball. Such behavior, however, has no linguistic implications and it differs from an infant’s response on tests of comprehension. Whereas a dog only hears a sound, an infant parses a parent’s utterance into a stream of phonemes, the minimal linguistic units of words. The ability to parse utterances into phonemes is critical for the comprehension of words. But she still has to learn to use phonemes to produce words.

There are two theories about how an infant learns to produce words: associationism and joint attention. Associationists argue that an infant learns to produce a word, say, ball, because of the temporal co-variation of two events; the sound of the ball and seeing it. There is much evidence, however, that the principle of association is too weak to explain production.

Assuming that the infant is looking at the parent, is she attending to the sound of the parent’s utterance or to her face? And what if the infant was looking elsewhere? What's missing from the associationistic explanation are the targets of the infant’s and parent’s attention and the overt social sharing of that attention, that is, joint attention.

But the problem is deeper. Suppose that an actor on a TV show said doll while the infant was watching his show. An infant is as unlikely the learn to say doll under those circumstances as she would to say the words, sky, rain, and so on. Without direct social engagement from an adult, there’s no possibility of the infant coordinating her attention with the actor’s.

Joint attention solves that problem in two stages. First, the infant and the parent must intentionally coordinate their attention to an object. They also have to engage socially to produce evidence of a “meeting of the minds.” As described in our last post, that can be done by various types of interaction such as mutual gazing, pointing and smiling.

There is much evidence to suggest that the size of an infant’s vocabulary and the kinds of words infants learn depends on the nature of their interactions with an adult. Consider two experiments performed by Michael Tomasello and his colleagues on 1-year-old infants.

They showed that the more time an infant and adult engaged in joint attention, the larger the vocabulary at 18 and 24 months. They also showed that an infant's vocabulary was influenced by the way in which joint attention was initiated and maintained.

Vocabulary size was larger when the adult followed the infant's attentional focus, than when the adult attempted to direct the infant's attention to a new object. Following an infants’ focus also influences the type of words they learn. There were more names than personal-social words when the infant initiated attention to an object than when the adult attempted to direct the infant’s attention to a new object.

These results are not surprising. When an adult follows the infant's attentional focus, the infant’s attention is more likely to remain focused on a particular object. The adult is not asking the infant to shift attention to the adult’s target of interest. There’s also a greater opportunity for the adult to name the object for the infant. When adults call attention to a new object, their utterances are more open-ended. They’re also more likely to use personal pronouns, as in, would you like to play? as opposed to naming an object as in, look at the toy car!

In this and previous posts, we described how an infant’s first words were the first step in creating a new type of communication. Animal communication, which originated millions of years ago, consists of at most two-dozen unlearned and uni-directional signals that are involuntary. Their only function is to influence another’s behavior.

Words, which are arbitrary, learned, and voluntary, allow infants to express their thoughts and to have conversations with others about objects that are immediately present. But it is still a big step for infants to learn to use words to refer to objects that are not immediately present, to combine words and to learn grammatical rules. These developments will be described in future posts.

Beatrice Beebe, Ph.D., is a clinical professor of psychology (in psychiatry), College of Physicians & Surgeons, Columbia University; Department of Child and Adolescent Psychiatry, New York State Psychiatric Institute. Her most recent book is The mother-infant interaction picture book: Origins of Attachment (Beebe, Cohen & Lachman, Norton, 2016).


Fernald, A. (1993). Approval and disapproval: Infant responsiveness to vocal affect in familiar and unfamiliar languages. Child Development, 64, 657-674.

Tomasello, M. and J. Todd (1983). "Joint attention and lexical acquisition style." First Language 4: 197-211.

Tomasello, M. and M. J. Farrar (1986). "Joint Attention and early Language." Child Development 57: 1454-1463.