Keio University

Distinguishing Sounds to Determine Their Direction

Participant Profile

  • Nozomu Hamada

    Nozomu Hamada

The human auditory system perceives surrounding sounds and engages in voice communication by capturing the vibrations of the air that envelops us. Even in a noisy environment, humans can distinguish specific sounds. By determining the direction of a sound, we can recognize what is happening there. In our laboratory, we are conducting research to equip computers and robots with these functions of human hearing. These are challenges known as "sound source separation" and "sound source localization."

The auditory ability to distinguish a specific voice from a mixture of various sounds is made possible by a certain property of the voice itself. This requires a bit of explanation. As you may know, sound is a vibration of air pressure, and it is composed of several components of different frequencies. These components for each frequency are called a spectrum. The spectrum that constitutes a voice changes over time. Figure 1 shows a voice divided into fine regions of time and frequency, with the magnitude of its components represented by differences in color. In the figure, the red areas indicate that the components at those times and frequencies are abundant in the voice. This property of speech is the fact that there are only a few regions with such large components. Therefore, even if two people speak at the same time, it is rare for the time and frequency regions containing many components to overlap between the two voices (Fig. 1).

画像

Separating a mixture and restoring its original components can be an extremely difficult task, depending on the physical phenomenon. However, even when multiple voices are mixed, they rarely blend in the units of "time and frequency." By taking advantage of this fact, mixed voices can be separated.

The key to distinguishing each sound is its direction of arrival. Just as humans use their left and right ears to determine the direction of a sound—even when many sounds are heard simultaneously—using multiple microphones allows us to determine the direction of arrival of each sound. The direction of arrival is estimated for each time and frequency region of the mixed voice. Then, by collecting the regions estimated to be from the same direction, the separation is complete, and the target sound can be restored (Fig. 2).

画像

In our laboratory, we have proposed a time-varying frequency-phase difference pattern as a new method for sound source separation. We have also developed a method for estimating the direction of a sound source using a microphone array with an arbitrary arrangement. Furthermore, we are conducting research on robot audition to use voice for communication between robots and humans, in collaboration with the Kazuo Nakazawa Laboratory (Fig. 3).

画像

Brain science has a continuing interest in the auditory functions of humans and animals. The reason research on sound separation and direction estimation fascinates us is that this challenge comprehensively involves fields such as physics, mathematics, information science, and biology.

Please watch two videos by the Hamada Laboratory related to this content.

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Showing item 1 of 3.

Gakumon no susume (An Encouragement of Learning) (Research Introduction)

Showing item 1 of 3.