Let's Visualize Your Voice

Publish: July 30, 2015

Yukiko Sugiyama (Associate Professor, Foreign Languages and General Education)

I specialize in phonetics, a field that investigates the physical characteristics of speech sounds and seeks to understand the mechanisms of human speech perception. For example, suppose a friend says "ohayō" (good morning) to you. First, you hear the "o" sound, then "ha," followed by "yo," and then "u," as four distinct sounds in succession. Physically, however, speech is a continuous stream with no breaks. The perception of these four distinct sounds is merely an interpretation happening inside your head.

Let's take a look at the physical characteristics of the speech sound when someone says "ame o nameru" (to lick a candy). First, if we break down "ame o nameru" into consonants and vowels, we find that it consists of six vowels and four consonants:

/a/, /m/, /e/, /o/, /n/, /a/, /m/, /e/, /r/, /u/

Now, let's visualize it. Figure 1 is the waveform of my voice saying "ame o nameru," showing the changes in air pressure produced as the sound was emitted from my mouth. Although there are variations in amplitude, no clear boundaries between consonants and vowels are visible. Figure 2 is a spectrogram of "ame o nameru" (what is commonly known as a voiceprint), where darker colors indicate a higher concentration of energy at that frequency. Looking at this, we can see there are no breaks in the sound either. In other words, physical speech sounds change gradually from one to the next; they do not abruptly switch from one sound to another. Our brains unconsciously perform the incredibly sophisticated feat of perceiving this constantly changing, continuous stream of sound by dividing it into linguistically meaningful chunks. In technical terms, this is called segmentation. Why is segmentation considered a "feat"? It's because if we couldn't segment, we would have to consciously think things like, "This is a sound between /a/ and /m/," or "This is close to /m/, but it's still mixed with a bit of /a/ and isn't quite /m/ yet," making communication impossible.

Figure 2. Spectrogram of "ame o nameru" (speaker: Sugiyama)

Not long ago, speech analysis required large, expensive machinery. Today, however, there is a free-to-download speech analysis software called Praat (which apparently means "speak" in Dutch). Why not try visualizing your own voice?

Praat website URL: Praat: doing Phonetics by Computer

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Will Quantum Information Change Our Lives?

The "Force" Seen Across Different Scales

The Infinite Potential of Molecules

Showing item 1 of 3.

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Showing item 1 of 3.

Will Quantum Information Change Our Lives?

The "Force" Seen Across Different Scales

The Infinite Potential of Molecules

Gakumon no susume (An Encouragement of Learning) (Research Introduction)