Auditory brain stem response to complex sounds: a tutorial

Erika Skoe, Nina Kraus, Erika Skoe, Nina Kraus

Abstract

This tutorial provides a comprehensive overview of the methodological approach to collecting and analyzing auditory brain stem responses to complex sounds (cABRs). cABRs provide a window into how behaviorally relevant sounds such as speech and music are processed in the brain. Because temporal and spectral characteristics of sounds are preserved in this subcortical response, cABRs can be used to assess specific impairments and enhancements in auditory processing. Notably, subcortical auditory function is neither passive nor hardwired but dynamically interacts with higher-level cognitive processes to refine how sounds are transcribed into neural code. This experience-dependent plasticity, which can occur on a number of time scales (e.g., life-long experience with speech or music, short-term auditory training, on-line auditory processing), helps shape sensory perception. Thus, by being an objective and noninvasive means for examining cognitive function and experience-dependent processes in sensory activity, cABRs have considerable utility in the study of populations where auditory function is of interest (e.g., auditory experts such as musicians, and persons with hearing loss, auditory processing, and language disorders). This tutorial is intended for clinicians and researchers seeking to integrate cABRs into their clinical or research programs.

Figures

Figure 1. Transient and sustained features in…
Figure 1. Transient and sustained features in the cABR to /dɑ/
Time-domain representation of a 40 ms stimulus /da/ (gray) and response (black). The cABR to /dɑ/ includes both transient and sustained response features. This stimulus evokes seven characteristic response peaks that we have termed V, A, C, D, E, F and O. As can be seen in this figure, these peaks relate to major acoustic landmarks in the stimulus. Peaks occur approximately 7 to 8 ms after the corresponding stimulus landmark, which is consistent with neural transmission time between the cochlea and rostral brainstem. In this figure, the stimulus waveform is shifted in time to account for this transmission time and maximize the visual coherence between the two signals. Along with V and A, C and O are considered transient responses in that they correspond to transient stimulus features, the beginning and end of voicing, respectively. The V-A complex, often together referred to as the onset response, is analogous to the click-evoked wave V-Vn complex. This sharp onset response arises from the broadband stop burst associated with /d/. The region between D and F forms the frequency-following response (FFR). Peaks D, E, F and the small voltage fluctuations between them correspond to sustained stimulus features, namely the fundamental frequency (F0) and harmonics within the consonant-vowel formant transition. The D-E-F inter-peak interval (~8-9 ms duration, arrows) occurs at the period of the F0 of the stimulus, which ramps from 103-125 Hz. We have developed a systematic approach for identifying these peaks and have established normative data for 3-4 year olds, 5-12 year olds and young adults (Johnson et al., 2008b; Dhar et al., 2009). Here, and in all figures showing a stimulus waveform, the stimulus plot is scaled to match the size of the response. Hence, the microvolt bar refers only to the response.
Figure 2. Transient responses
Figure 2. Transient responses
To maximize the visual coherence between the stimulus (gray) and response (black), stimulus waveforms are shifted in time to align the stimulus with the response onset. Arrows indicate major transient features. In the response, these transient features are represented as large peaks. Top: The brainstem response to a cello note with a low pitch (G2, 100 Hz). The sound onset occurs when the bow contacts the string and causes a brief transience before the string starts to vibrate in a periodic manner. This leads to a strong onset, followed by a more sustained response. Because of the gradual decay of this sound, a strong offset response is not apparent. Adapted from Musacchia, Sams, Skoe and Kraus, 2007. [to listen to stimulus see Audio file, Supplemental Digital Content 1.wav] [to listen to response see Audio file, Supplemental Digital Content 2.wav] [Middle: Percussive instruments, like the piano, have fast attacks and rapid decays. These features are evident in this 5-note piano melody. Large response peaks coincide with the onset of each piano note. The stimulus amplitude envelope is also preserved in the response. [to listen to stimulus see Audio file, Supplemental Digital Content 3.wav] [to listen to response see Audio file, Supplemental Digital Content 4.wav] Bottom: Sounds with abrupt changes in the amplitude envelope also trigger multiple onset-like transient responses. This is illustrated here using the sound of a crying baby. Adapted from Strait, Skoe, Ashley, and Kraus, 2009. In the top and bottom plots, the stimulus was presented binaurally; in the middle plot, it was presented monaurally.
Figure 3. Sustained phaselocked responses
Figure 3. Sustained phaselocked responses
Low frequencies, including those associated with pitch and timbre perception, are preserved in the cABR. For complex sounds the pitch corresponds (in large part) to the lowest resonant frequency, also known as the fundamental frequency (F0). Timbre enables us to differentiate two sounds with the same pitch. Timbre is a multidimensional property resulting from timing cues of attack and decay, and the interaction of spectral and temporal properties associated with the harmonics of the F0. Together these timbral features give rise to the characteristic sound quality associated with a given instrument or voice. Top: The full view of the time-domain stimulus /dɑ/ (gray) and its cABR (black). The spectrotemporal features of the stimulus, including the F0 and harmonics, are evident in the response. The gray box demarcates six cycles of the F0. This section is magnified in the middle panel. Middle: The smallest repeating unit of the stimulus has a duration of 10 ms (i.e., the periodicity of the 100 Hz F0). Bottom: The left panel shows a close-up of a single F0 cycle. The harmonics of the F0 (frequencies at multiples of 100 Hz) are represented as small fluctuations between the major F0 peaks in both the stimulus and response. In the right panel, the stimulus and cABR are plotted in the frequency domain. The frequencies important for the perception of pitch and timbre are maintained in the response.
Figure 4. Distortion products (DPs) in the…
Figure 4. Distortion products (DPs) in the cABR
Stimulus (top) and response (bottom) spectra for a consonant musical interval (major 6th). This musical stimulus was created from G2 and E3 notes produced by an electric piano. When two harmonically complex notes are played simultaneously, the F0s and harmonics interact via nonlinear auditory processes to create DPs that are measurable in the response but not present in the stimulus. In this figure, parentheses denote the DPs, f1 denotes the lower tone (G2, red) and f2 denotes the upper tone (E3, blue). Adapted from Lee, Skoe, Kraus and Ashley, 2009 [to listen to stimulus see Audio file, Supplemental Digital Content 5.wav] [to listen to response see Audio file, Supplemental Digital Content 6.wav].
Figure 5. cABRs to harmonically complex signals
Figure 5. cABRs to harmonically complex signals
The sustained aspects of brainstem responses (right) and their evoking stimuli (left) can be visualized using spectrograms (Section V and Figure 13). These graphs represent a 200 ms steady-state (unchanging) segment of the vowel /ɑ/ (top) and the cello note (bottom, see also Figure 2) used in Musacchia et al. 2007. In this example, the speech (top) and musical stimulus (bottom) have the same pitch (F0 = 100 Hz; arrows), yet have very different harmonic structures and consequently different timbres. These acoustic differences account for the different response patterns. For the cello (bottom), the dominant frequency bands occur at 200 and 400 Hz in both the stimulus and response. For the speech signal (top), the harmonics around the first formant (700 Hz) have much more energy than the F0. Yet, lower frequencies dominate the response. This reflects the low-pass nature of brainstem phaselocking and the nonlinear processes that amplify the energy of the F0 and the lower-harmonics. Adapted from Musacchia, Sams, Skoe and Kraus, 2007.
Figure 6. Stimulus polarities
Figure 6. Stimulus polarities
cABRs the two polarities of the /dɑ/ stimulus from Figure 1. For shorthand, they are referred to as polarity A (red) and B (blue). The cABRs to A and B are quite similar, especially for the prominent negative going peaks corresponding to the F0 (top) (Akhoun et al., 2008b). By adding or subtracting A and B, spectral and envelope components of the response, respectively, can be separated (see footnote 3). Adding (gray) accentuates the lower-frequency components of the response, including the temporal envelope, and minimizes stimulus artifact and the cochlear microphonic (see Figure 8, and Section V for a discussion of artifacts). Subtracting (black) emphasizes the higher-frequency components by maximizing the spectral response; however, this process can also maximize artifact contamination. In the bottom panel, the ADD and SUB responses are plotted in the frequency domain. In contrast to the ADD response, which has peaks occurring at F0 (~100 Hz) and the harmonics of the F0, the SUB response has well-defined peaks in the 200-700 Hz range. This range corresponds to the first formant trajectory of this stimulus. In this figure, ADD = (A+B)/2; SUB = (A−B)/2).
Figure 7. Detecting stimulus artifact
Figure 7. Detecting stimulus artifact
Stimulus artifacts are easily discernable in cABR recordings. Unlike the response (bottom), the artifact (middle) contains frequencies that are higher than the phaselocking capabilities of the brainstem (Moushegian et al., 1973). In contrast to the cABR response, which occurs within 6-10 ms after the stimulus (top) is played, the stimulus artifact also exhibits no delay. In addition, the artifact is often larger than a typical cABR. In this example, the artifact to a 40 ms /dɑ/ (Figure 1) is 10 times larger than the response. Stimulus artifact can be minimized by using electromagnetically shielded insert earphones and adding the responses to alternating polarities (Figure 8).
Figure 8. Adding polarities (A and B)…
Figure 8. Adding polarities (A and B) minimizes stimulus artifact and cochlear microphonic
cABRs to the A and B polarities of the 40-ms /dɑ/ from Figure 1. Top: The response to polarity A (inset) is magnified (−5 to 10 ms) to illustrate the stimulus artifact. When the A response (red) is compared to the stimulus (light gray), the two waveforms align in phase for ~4 ms. This is because the stimulus artifact (and CM) follow the temporal pattern of the stimulus waveform. Middle: The response to polarity B (blue) is inverted with respect to the A response in this region. Bottom: By adding together A and B responses (gray), the artifact is canceled. In contrast, the artifact is accentuated when the two polarities are subtracted (black). Thus, while the analysis of the subtracted waveform or the single polarity response can be complicated by unwanted artifacts, the added response ensures a response of neurogenic origins (Aiken et al., 2008). In this figure, ADD = (A+B)/2; SUB = (A−B)/2.
Figure 9. Tracking latency differences over time
Figure 9. Tracking latency differences over time
The frequency trajectories that differentiate the CV stop syllables /ba/, /da/, /ga/ are represented in the cABR by latency differences, with /ga/ responses occurring first, followed by /da/ and then /ba/, i.e., higher frequencies yield earlier peak latencies than lower frequencies. In the stimulus, the frequency differences diminish over the course of the 50-ms formant transition. Top: This pattern is reflected in the timing of the cABR (/ga/ < /da/ < /ba/) and is most apparent at four discrete response peaks. Peaks ~55 are magnified in the inset. Bottom: The normalized latency difference between responses is plotted as a function of time (see Johnson et al., 2008b; Hornickel et al. 2009b, for details).
Figure 10. Stimulus-to-response cross-correlation
Figure 10. Stimulus-to-response cross-correlation
Cross-correlation is used to compare the timing and morphology of two signals A. The cABR (black, bottom) to a 170 ms /dɑ/ (gay, top) is compared to a low-pass filtered version (gray, middle) of the evoking stimulus. The stimulus consists of an onset stop-burst, CV formant transition and a steady-state (i.e., unchanging) vowel. B: This plot represents the degree to which the low-pass stimulus and response are correlated as a function of the time shift. The maximal correlation is reached at an 8.5 ms time displacement, an indication of the neural transmission delay (rmax = 0.60; rmax = 0.32 for unfiltered stimulus (correlogram not shown)). An alternative approach is to cross-correlate the response with the stimulus envelope (Akhoun et al., 2008a), which can often lead to higher correlation values. C: Running-window analyses can be used to visualize and quantify the similarity of two signals across time. In this example, when the same low-pass stimulus and response are compared in this manner (40 ms windows), the two signals are more similar during the steady-state portion, although the time displacement is consistent across time. [to listen to stimulus see Audio file, Supplemental Digital Content 7.wav] [to listen to response see Audio file, Supplemental Digital Content 8.wav]
Figure 11. An illustration of frequency tracking…
Figure 11. An illustration of frequency tracking by autocorrelation
By cross-correlating a cABR waveform with itself, the time interval between peaks can be determined. The frequency of the F0 and other periodic aspects of the response, including the temporal envelope (Krishnan et al., 2004; Lee et al., 2009), can be derived from an autocorrelogram. This technique can also be used to calculate the strength of phaselocking to these features. A. In this example, a cABR to a syllable /mi/ with a dipping F0 contour (Mandarin Tone 3; black line) is plotted. B. By applying the autocorrelation technique on 40 ms sliding-windows, a frequency contour can be tracked over time. Colors represent strength of correlation; white is highest. C and D. An illustration of cross-correlation performed of a single time window (100-140 ms; demarcated in A). When a copy of this window is shifted 10.4 ms, the first peak of the copy lines up with the second peak of the original (C). A correlogram (D) represents the degree of correlation as a function of the time shift. The highest correlation occurs at 10.4 ms; thus, the fundamental periodicity of this window is 1/10.4 ms or 96 Hz. The strength of correlation at 10.4 ms is r = 0.98, indicating a strong phaselocking to 96 Hz in this time window.
Figure 12. Fast Fourier analysis of a…
Figure 12. Fast Fourier analysis of a complex signal with time-varying features
This cABR was evoked by a 40-ms /dɑ/ sound comprising an onset stop-burst followed by a CV formant transition period. A frequency-domain representation of the FFR was generated using the fast Fourier transform (FFT). As a measure of phaselocking, spectral amplitudes are calculated over a range of frequencies corresponding to F0 (103-125 Hz) and first formant (F1; 220 – 720 Hz). The noise floor is plotted in gray. The time-domain representation of this response is plotted in Figure 1.
Figure 13. An illustration of frequency-tracking by…
Figure 13. An illustration of frequency-tracking by STFT method
STFT is a method for examining frequency-tracking, that enables the tracking of the stimulus F0 and harmonics Top: A cABR to a /mi/ syllable with a rising pitch contour (Mandarin Tone 2). The rising pitch is evident in the increasingly smaller inter-peak intervals in the stimulus and response over time. Middle: The estimated response F0 (yellow) contour is plotted against the known F0 of the stimulus (black). Each point represents the spectral maximum within a single 40-ms window of a sliding-window STFT analysis. The precision of phaselocking can be measured by calculating the frequency error between the stimulus and response trajectories (Wong et al. 2007; Russo et al. 2008). Bottom: Plotting the resulting spectrogram of the STFT procedure enables a visualization of the response's tracking of F0 and its harmonics. [to listen to stimulus see Audio file, Supplemental Digital Content 9.wav] [to listen to response see Audio file, Supplemental Digital Content 10.wav]

Source: PubMed

3
Abonner