
The Speech-to-Song Illusion
Crossing the borderline between speech and song.
Posted February 26, 2020 Reviewed by Lybi Ma
In general, it would appear obvious that speech and song are distinct and separate forms of communication: We hear speech when people are speaking, and song when they are singing. I was awestruck by an incident that occurred when I was putting the finishing touches on my CD Musical Illusions and Paradoxes. The first track on the CD consists of spoken commentary, and to detect flaws in my recorded speech, I looped phrases so that I could hear them several times over. The commentary includes the sentence:
‘The sounds as they appear to you are not only different from those that are really present, but they sometimes behave so strangely as to seem quite impossible.’
I had one of these phrases—sometimes behave so strangely—on a loop, and began working on something else. Suddenly it seemed to me that an unknown woman had entered the room, and was singing! I looked around, and finding that I was still alone, realized that I was hearing my own voice repeatedly producing this phrase—but now, instead of speech, it appeared that a sung melody was spilling out of the loudspeaker! In other words, the phrase had morphed perceptually from speech to song through the simple process of repetition.
I named this the ‘Speech-to-Song Illusion’, and it is indeed bizarre. It occurs without altering the sound in any way, and without any context provided by other sounds, but simply as a result of repeating the phrase several times over.
Here is the full sentence followed by the repeating phrase:
Here is the phrase in musical notation as most people hear it as song.

As a further surprise, when you listen to the full sentence again, it begins by sounding like speech (as indeed it is), but when you come to the phrase that had been repeated—sometimes behave so strangely—it suddenly appears to burst into song:
And once you’ve heard this phrase as song, you continue to hear it as song even after months, or even years, have elapsed. The speech-to-song illusion provides an example of very rapid and yet very long-lasting neural plasticity.
This transformation doesn’t only occur in adults with musical training. Walt Boyer, a music teacher at Atwater School in Shorewood, Wisconsin, played the Speech-to-Song Illusion to his class of fifth graders without first telling them what they might hear. As shown in his video, he began with the full sentence, and then played the spoken phrase 10 times over. As the children listened to the repetitions they became intrigued; so Boyer said ‘Try it, go ahead!’, whereupon they sang along with the phrase, first tentatively and then with gusto, and in tune. You can view the video here.
This curious illusion has no obvious explanation in terms of current scientific thinking about the relationship between speech and song. The two forms of communication in general differ in their physical characteristics: Speech consists largely of pitch glides that are often steep, and of rapid changes in loudness and sound quality. Song, in contrast, consists largely of well-defined musical notes, and so of more stable pitches, and these form melodies and rhythms. Given the differences in their physical features, neuroscientists have assumed that speech and song are subserved by entirely independent neural pathways, or modules, that the sounds of speech are processed in a module that excludes from analysis other sounds such as music.
And that the sounds of music are subserved by a different module that excludes from analysis other sounds such as speech. This view can’t explain the speech-to-song illusion, since here a phrase is perceptually transformed without changing its features in any way.
The argument that speech and song should be regarded as distinct and separate runs contrary to the many types of vocalization that stand at the boundary between these two forms of communication; these include incantations, religious chants, opera recitative, whistle languages, and rap or hip-hop music. Indeed, philosophers and musicians have argued for centuries that strong linkages must exist between speech and music.

The 19th-century British philosopher Herbert Spencer proposed that a continuum extends from conversational speech at one end, and song in the other, with emotional and heavily intoned speech in between. He wrote:
"What we regard as the distinctive traits of song, are simply the traits of emotional speech intensified and systematized. In respect of its general characteristics, we think it has been made clear that vocal music, and by consequence all music, is an idealization of the natural language of passion…vocal music originally diverged from emotional speech in a gradual, unobtrusive manner."
Composers have also argued that expressivity in music is derived from inflections in speech, and have incorporated into their compositions characteristics of emotional expression in spoken utterances. The 19th-century Russian composer Modest Mussorgsky felt strongly that song was heavily intoned speech, and in his music he drew on overheard conversations, so employing the musical intervals, timing, and loudness variations that occur in natural speech.

Mussorgsky expressed this view forcibly in a letter to his friend Nikolai Rimsky-Korsokoff:
"Whatever speech I hear, no matter who is speaking . . . my brain immediately sets to working out a musical exposition for this speech."
How can we explain the illusion? An important difference between music and conversational speech is that repetition is a powerful feature of music; however, in normal conversation, a spoken phrase that’s repeated several times in succession sounds incongruous. Put another way, when we recall a conversation, we don’t usually remember the precise words we heard, but we instead remember the gist, or general meaning of what was said. But when we remember a piece of music, we don’t summarize it, rather the sounds and sound patterns stand for themselves. It’s understandable that music should contain a substantial amount of repetition, but that this is lacking in conversational speech.
In the Illusion, repetition is providing a cue that the phrase being heard might be music rather than speech. But why does my spoken phrase ‘sometimes behave so strangely’, as recorded on my CDs, morph so convincingly? My favorite explanation: The basic pitch pattern forming this phrase is very close to that of a phrase in the famous Westminster chimes. Further, the rhythm of the phrase is identical to that in the well-known Christmas song ‘Rudolph, the Reed-Nosed Reindeer’. We must have recorded in our brains a database of well-remembered pitch patterns, and another database of well-remembered rhythms, and we recognize songs by accessing these databases. So we can suppose that the brain circuitry underlying memory for melodies recognizes the Westminster chimes, and that the brain circuitry that underlies memory for rhythms recognizes Rudolph, the Red-Nosed Reindeer. So when these two memories are combined with the cue provided by repetition, our perceptual system concludes that song is being produced rather than speech – and so invokes the brain mechanisms that are responsible for analyzing this pattern as song. Of course, other factors must also be involved, but accessing long term memories for music is likely to be part of the picture.
References
Deutsch, Diana (2019). Musical Illusions and Phantom Words: How Music and Speech Unlock Mysteries of the Brain. New York, N.Y: Oxford University Press.
Deutsch, D., Henthorn, T., and Lapidis, R. Illusory transformation from speech to song. Journal of the Acoustical Society of America, 2011, 129, 2245-2252.
Deutsch, Diana (2003). Phantom Words, and Other Curiosities (CD) La Jolla: Philomel Records.