Sound localization

  • Two potential types of info for determining the source of a sound:
    • Even though sound travels fast, the pressure wave will not arrive at both ears at the same time → sounds arrive sooner at the ear closer to the sound source
    • The intensity of a sound is greater in the ear closer to the source

Inter-aural time difference (ITD)

  • Inter-aural time difference: the difference in time between a sound arriving at one ear versus the other (figure 10.2 page 277 & figure 10.4 page 278) o ITDs are the largest when sound comes directly from the left or directly from the right
    • A sound coming from directly in front or directly behind produces an ITD of zero → the sound reaches both ears simultaneously
    • Only useable for low frequencies: in high frequencies the peaks of two different periods can reach both ears at the same time, making it seem like the sound comes from straight ahead (aliasing)
  • Azimuth: the angle of a sound source on the horizontal plane relative to a point in the centre of the head between the ears

The physiology of ITDs

  • Medial superior olive (MSO): first place in the auditory system where inputs from both ears converge, the place for ITD detectors (works similar to motion detection)

 

  • The interpretation of ITDs is critically dependent on the size of the head → the development of the ability to use ITDs to localize sounds depends on having experience with separate sounds coming from different places in space

 

Inter-aural level difference (ILD)

  • Inter-aural level difference: the difference between level (intensity) between a sound arriving at one ear versus the other → sounds are more intense at the ear closer to the sound source because the head partially blocks the sound pressure wave from reaching the opposite ear o ILDs are the largest when sound comes directly from the left or directly from the right
    • A sound coming from directly in front or directly behind produces an ILD of zero → the sound reaches both ears simultaneously
    • Only useable for high frequencies: the head blocks high-frequency sounds much more effectively than it does low-frequency sounds. This is because the long wavelengths of low-frequency sounds bend around the head, making it impossible to detect a difference in sound intensity
    • Provides the best info about sound source location

The physiology of ILDs

  • Lateral superior olive (LSO): receives both excitatory, from the ipsilateral ear, and inhibitory, from the contralateral ear, inputs
  • The LSOs are sensitive to differences in intensity across the two ears due to the competition between excitatory inputs from the ipsilateral ear and inhibitory inputs from the contralateral ear. When the sound is more intense in one ear, connections from that ear are better both at exciting LSO neurons on that side and inhibiting LSO neurons on the other side

Cones of confusion

  • Cone of confusion: a region of position in space where all sounds produce the same time and level differences (ITDs and ILDs)
  • However, as soon as you move your head, the ITD and ILD of a sound source shift, and only one location will be consistent with the ITDs and ILDs perceived at both head positions

Pinna and head cues

  • Directional transfer function (DTF): a measure that describes how the pinna, ear canal, head and torso change the intensity of sounds with different frequencies that arrive at each ear from different locations in space (determine elevation) o Works mainly for high-frequency sounds
    • Learned during development: the ear changes as a result of growth but also due to piercing the ears etc
  • When the form of the pinnae is changed by inserting plastic plugs into folds of the pinna, performance on localizing sounds decreases dramatically. After 6 weeks of living with these plugs, the localization abilities greatly improved, indicating that a new auditory representation of the world is formed. When the plugs are removed, the old auditory representation of the world still appears to be present

 

 

Auditory distance perception

  • Relative intensity of a sound is used as a cue for judging the distance to the sound source: the sound becomes less intense with greater distance
  • Inverse-square law: a principle stating that as distance from the source increases, intensity initially decreases much faster than distance increases, such that the decrease in intensity is equal to the increase in distance squared → when sound sources are close to the listener, a small difference in distance can produce a relatively large difference in intensity o Listeners are fairly good at using intensity differences to determine distance when sounds are presented within 1 meter of the head, but tend to consistently underestimate the distance of sound sources farther away
  • Intensity works best as a distance cue when the sound source or the listener is moving, sounds that are farther away do not seem to change direction in relation to the listener as much as nearer sound do
  • Spectral composition of sounds: the sound-absorbing qualities of air dampen high frequencies more than low frequencies, so when sound sources are farther away, higher frequencies decrease more in energy than lower frequencies as the sound waves travel from the source to the ear (noticeable only for fairly large distances, greater than 1000m)
  • The relative amount of direct versus reverberant energy: when a sound source is close to the listener, most of the energy reaching the ear is direct, whereas reverberant energy provides a greater proportion of the total when the sound source is farther away

Complex sounds

Harmonics

  • Fundamental frequency: the lowest frequency component of a complex periodic sound
  • The perceived pitch of a complex sound is determined by the fundamental frequency and the harmonics, add to the perceived richness of the sound o The pitch of a complex sound is perceived even if the fundamental frequency is not part of the sound→ missing-fundamental effect
  • All harmonics of a fundamental have fluctuations in sound pressure at regular intervals corresponding to the fundamental frequency → temporal code (fig 10.15 page 288)

Timbre

  • Perception of timbre is related to the relative energy of different acoustic spectral components: two different instruments playing the same note, with exactly the same fundamental frequency at exactly the same loudness can be discerned by the difference in harmonics

Auditory “colour” constancy

  • Surfaces in the environment reflect and absorb energy at different frequencies in ways that change the spectral shape that finally arrives at your ears
  • Kiefte & Kluender: figure 10.17 page 290 o Listeners use both spectral tilt and the frequency of spectral peaks to identify vowels.

K & K created stimuli that enabled them to separately measure the contributions of

tilt and frequency of the second peak when perceiving these vowels. They created some sentences so that the overall tilt of the sentence was the same as the tilt of the following vowel. And they created some sentences where they added a peak in the spectrum all the way through the sentence at the same frequency as the second peak in the vowel that listeners would identify.

  • When tilt stayed the same for both the preceding sentence and vowel, listeners used only the frequency of the second peak to identify the vowel.
  • When the second peak was present al the way through the preceding sentence, listeners relied mostly on tilt to identify the vowel

Attack and decay

  • Attack: the part of a sound during which the amplitude increases (onset)
  • Decay: : the part of a sound during which the amplitude decreases (offset)

Auditory scene analysis

  • Source segregation/ auditory scene analysis: processing an auditory scene consisting of multiple sound sources into separate sound images

Spatial, spectral and temporal segregation

  • Spatial segregation: sounds that emanate from the same location in space can typically be treated as if they arose from the same source o A sound that is perceived to move in space can more easily be separated from background sounds that are relatively stationary
  • Temporal segregation: sounds with the same or similar pitch are more likely to be treated as coming from the same source and to be segregated from other sounds
  • Spectral segregation: sounds that are perceived to emanate from the same source are often described as being part of the same auditory stream
  • Auditory stream segregation: the division of the auditory world into separate auditory objects (consistent with the Gestalt law of similarity)

Grouping by timbre

  • If you overlap two streams of simple sine wave tones, one increasing and then decreasing, the other first decreasing then increasing, the two streams of sound are heard without overlapping pitches: one stream includes all the high tones and one includes all the low tones. However if harmonics are added to one of the sequences, thus creating a richer timbre, two overlapping patterns are heard as distinct
  • Grouping by timbre is robust because sounds with similar timbre usually arise from the same sound source

Grouping by onset

  • Sound components that begin at the same time, or nearly the same time, such as the harmonics of a sound, will also tend to be heard as coming from the same sound source → frequency components with different onset times are less likely to be grouped (consistent with the Gestalt law of common fate)

Continuity and restoration effects

  • Continuity effect/perceptual restoration effect: a continuous auditory stream is heard to continue behind the masking sound (consistent with the Gestalt law of good continuation) o The same brain activity is present in A1 when a short burst of noise is present instead of the true tone