Sensory-cognitive interaction in the neural encoding of speech in noise: a review

Samira Anderson, Nina Kraus, Samira Anderson, Nina Kraus

Abstract

Background: Speech-in-noise (SIN) perception is one of the most complex tasks faced by listeners on a daily basis. Although listening in noise presents challenges for all listeners, background noise inordinately affects speech perception in older adults and in children with learning disabilities. Hearing thresholds are an important factor in SIN perception, but they are not the only factor. For successful comprehension, the listener must perceive and attend to relevant speech features, such as the pitch, timing, and timbre of the target speaker's voice. Here, we review recent studies linking SIN and brainstem processing of speech sounds.

Purpose: To review recent work that has examined the ability of the auditory brainstem response to complex sounds (cABR), which reflects the nervous system's transcription of pitch, timing, and timbre, to be used as an objective neural index for hearing-in-noise abilities.

Study sample: We examined speech-evoked brainstem responses in a variety of populations, including children who are typically developing, children with language-based learning impairment, young adults, older adults, and auditory experts (i.e., musicians).

Data collection and analysis: In a number of studies, we recorded brainstem responses in quiet and babble noise conditions to the speech syllable /da/ in all age groups, as well as in a variable condition in children in which /da/ was presented in the context of seven other speech sounds. We also measured speech-in-noise perception using the Hearing-in-Noise Test (HINT) and the Quick Speech-in-Noise Test (QuickSIN).

Results: Children and adults with poor SIN perception have deficits in the subcortical spectrotemporal representation of speech, including low-frequency spectral magnitudes and the timing of transient response peaks. Furthermore, auditory expertise, as engendered by musical training, provides both behavioral and neural advantages for processing speech in noise.

Conclusions: These results have implications for future assessment and management strategies for young and old populations whose primary complaint is difficulty hearing in background noise. The cABR provides a clinically applicable metric for objective assessment of individuals with SIN deficits, for determination of the biologic nature of disorders affecting SIN perception, for evaluation of appropriate hearing aid algorithms, and for monitoring the efficacy of auditory remediation and training.

American Academy of Audiology.

Figures

Figure 1
Figure 1
In the left panel, the time domains of a 40 msec stimulus /da/ (gray) and auditory brainstem response (black) are pictured. The stimulus evokes characteristic peaks in the response, labeled as V, A, C, D, E, F, and O. The stimulus waveform has been shifted to account for neural lag and to allow visual alignment between peaks in the response and the stimulus, which are indicated by arrows. Two responses from the same individual are shown to demonstrate replicability. In the right panel are the spectra of the stimulus and response. Adapted from Skoe and Kraus, 2010.
Figure 2
Figure 2
Grand average response waveforms of typically developing children (N=21) in response to repetitive (gray) versus variable (black) presentation of a 170 msec speech syllable /da/ (top panel). Brainstem responses in regularly occurring (gray) versus variable (black) presentations of the /da/ syllable differ in their frequency spectra, with enhanced representation of H2 and H4 (over 10 Hz bins represented by vertical lines) noted in the regular presentation (bottom left). The differences in spectral amplitude of H2 and H4 (7–60 msec) between the two conditions (repetitive context minus variable context) were calculated for each child and normalized to the group mean by converting to a z-score. The normalized difference in H2 magnitude between the regularly occurring and variable conditions is related to SIN performance as measured by the Hearing-in-Noise Test (HINT) (bottom right). Adapted from Chandrasekaran, Hornickel, et al, 2009.
Figure 3
Figure 3
Subcortical differentiation of stop consonants (/ba/, /da/, and /ga/) is related to SIN performance on the HINT. Children with better subcortical differentiation scores have higher HINT scores (p < 0.01). Adapted from Hornickel et al, 2009.
Figure 4
Figure 4
Effects of noise on brainstem responses in children with good and poor SIN perception. The effects are most evident in the transition region (A, boxed) of the response from 30 to 60 ms in the grand average waveforms of 66 children (B and C). Greater noise-induced latency shifts were noted in the children with poor SIN perception compared to children with good SIN perception (p < 0.01) (D). Adapted from Anderson et al, 2010.
Figure 5
Figure 5
Stimulus timelines and audiovisual grand averages. (A) Auditory and visual components of speech and music stimuli. Acoustic onsets for both speech and music occurred 350 msec after the first video frame and simultaneously with the release of consonant closure and onset of string vibration, respectively. Speech and music sounds were 350 msec in duration and similar to each other in envelope and spectral characteristics. (B) Grand average brainstem responses to audiovisual speech (upper) and cello (lower) stimuli. Amplitude differences in the responses between musicians and controls are evident over the entire response waveforms (p < 0.05). Adapted from Musacchia et al, 2007.
Figure 6
Figure 6
Stimulus (infant cry) and grand average response waveforms from musicians (gray) and nonmusicians (black). Response waveforms have been shifted back in time (7 msec) to align the stimulus and response onsets. Boxes delineate two stimulus subsections and the corresponding brainstem responses. The first subsection (112–142 msec) corresponds to the most periodic portion of the response and the corresponding region in the ABR. The second subsection (145–212 msec) corresponds to the more acoustically complex portion of the stimulus, characterized by transient amplitude bursts and rapid spectral changes. Musicians’ responses demonstrate greater amplitudes than nonmusicians’ responses throughout the complex region of the response (peak 1: p < 0.003; peak 2: p < 0.03) but not for the periodic region. Adapted from Strait et al, 2009a.
Figure 7
Figure 7
Comparison of brainstem responses to the speech syllable /da/ in quiet and babble noise conditions in musicians vs. nonmusicians. The selected peaks (onset and transition) are circled (A). Noise delays peak latencies (B), particularly in the onset and transition portions of the response. The musicians (gray) show significantly shorter lateny delays in noise than nonmusicians (black) for the onset (C, p < 0.01) and transition peaks (D, p < 0.01). The latencies of the onset (E) and transition peaks (F) are correlated with SIN perception (onset: r=0.551, p < 0.002; transition: r=0.481, p=0.006). Adapted from Parbery-Clark, Skoe, Kraus, 2009.

Source: PubMed

3
購読する