Perceptual learning evidence for tuning to spectrotemporal modulation in the human auditory system

Andrew T Sabin, David A Eddins, Beverly A Wright, Andrew T Sabin, David A Eddins, Beverly A Wright

Abstract

Natural sounds are characterized by complex patterns of sound intensity distributed across both frequency (spectral modulation) and time (temporal modulation). Perception of these patterns has been proposed to depend on a bank of modulation filters, each tuned to a unique combination of a spectral and a temporal modulation frequency. There is considerable physiological evidence for such combined spectrotemporal tuning. However, direct behavioral evidence is lacking. Here we examined the processing of spectrotemporal modulation behaviorally using a perceptual-learning paradigm. We trained human listeners for ∼1 h/d for 7 d to discriminate the depth of spectral (0.5 cyc/oct; 0 Hz), temporal (0 cyc/oct; 32 Hz), or upward spectrotemporal (0.5 cyc/oct; 32 Hz) modulation. Each trained group learned more on their respective trained condition than did controls who received no training. Critically, this depth-discrimination learning did not generalize to the trained stimuli of the other groups or to downward spectrotemporal (0.5 cyc/oct; -32 Hz) modulation. Learning on discrimination also led to worsening on modulation detection, but only when the same spectrotemporal modulation was used for both tasks. Thus, these influences of training were specific to the trained combination of spectral and temporal modulation frequencies, even when the trained and untrained stimuli had one modulation frequency in common. This specificity indicates that training modified circuitry that had combined spectrotemporal tuning, and therefore that circuits with such tuning can influence perception. These results are consistent with the possibility that the auditory system analyzes sounds through filters tuned to combined spectrotemporal modulation.

Figures

Figure 1.
Figure 1.
Spectrotemporal modulation filter bank. A schematic of hypothetical filters in a spectrotemporal modulation filter bank. Each spectrogram depicts the spectrotemporal modulation to which a particular filter is tuned. Each filter has a preferred spectral modulation frequency (vertical spacing of bars), temporal modulation frequency (horizontal spacing of bars), and direction (up or down: direction of bar tilt). These filters are tuned to spectral-alone (middle column), temporal-alone (bottom row), or spectrotemporal (downward: left two columns; upward: right two columns) modulation. Note that the filters that are selective for isolated spectral or temporal modulation frequencies simply constitute one slice of this spectrotemporal modulation filter bank where the preferred modulation frequency in the other dimension is zero.
Figure 2.
Figure 2.
Tasks. Three sounds were presented on each trial. The listener's task was to choose the sound (signal) that was different from the other two (standards). A, In the modulation-depth-discrimination tasks (spectrotemporal shown), the signal had a shallower modulation depth (left) than the standards (middle and right). The more similar the modulation depths that could be discriminated, the better the threshold. B, In the modulation-detection task, the signal was modulated (left) and the standards were not (middle and right). The shallower the signal modulation depth that could be detected, the better the threshold.
Figure 3.
Figure 3.
Performance on the trained depth-discrimination conditions. A–C, The group average modulation-depth-discrimination thresholds (79% correct performance) as a function of testing session for listeners trained on spectral (A, triangles; n = 8), temporal (B, diamonds; n = 8), or spectrotemporal (C, squares; n = 8) modulation. Results are also shown for controls who received no training (circles; n = 8). Spectrograms of each trained stimulus are depicted at the top of each column. Error bars indicate ±1 SEM. Asterisks indicate a significant interaction (p < 0.05) of a group (trained vs control) × time (pre vs post) ANOVA using time as a repeated measure. D–F, The slopes of individual regression lines fitted to all threshold estimates versus the log10 of the session number for each listener in the spectral- (D), temporal- (E), and spectrotemporal- (F) modulation trained groups. Filled symbols indicate that the slope for that listener was significant and negative. The box plots to the left of the individual points reflect the distribution of points, where the box is comprised of lines at the upper quartile, median, and lower quartile values and the whiskers extend to the maximum and minimum values (excluding outliers). Slopes were computed either across all sessions (left in each panel) or across all sessions excluding the pretest (right in each panel). Asterisks indicate that the population of slopes was significantly less than zero (p < 0.05) according to a one-sample t test. Training led to improvement for all three trained modulations, but with different time courses.
Figure 4.
Figure 4.
Performance on all of the modulation-depth-discrimination conditions. The difference in threshold between the pretest and posttest (pre minus post) for each group (bars) and listener (symbols) on the depth-discrimination conditions. A–D, Results are shown for spectral modulation (SM; A) and temporal modulation (TM; B), as well as for upward (C) and downward (D) spectrotemporal modulation (STM+ and STM−, respectively). For each modulation, the magnitude of improvement is shown for the listeners, if any, who were trained on that modulation (TRN), for those who were trained on other modulations, and for those who received no training. Spectrograms of the tested stimuli are displayed near the top right corner of each panel. Error bars indicate ±1 SEM. Asterisks indicate a significant interaction (p < 0.05) of a group (trained vs control) × time (pre vs post) repeated-measures ANOVA. The learning on the trained conditions did not generalize to any untrained depth-discrimination conditions.
Figure 5.
Figure 5.
Effect sizes for all trained versus control comparisons on the modulation-depth-discrimination conditions. Partial η2 effect sizes of the group (trained vs control) × session (pre vs post) interaction in ANOVA using time as a repeated measure. A–D, Effect sizes are shown for spectral modulation (SM; A) and temporal modulation (TM; B), as well as for upward (C) and downward (D) spectrotemporal modulation (STM+ and STM−, respectively). For each modulation, the effect size is shown for the listeners who were trained on that modulation (TRN) as well as for those who were trained on other modulations. Asterisks indicate that the interaction was significant (p < 0.05). For each depth-discrimination condition, the effect size was at least twice as large when it was trained than when it was not.
Figure 6.
Figure 6.
Performance on the untrained spectrotemporal modulation detection condition. A, As in Figure 3, but for the detection (rather than discrimination) of upward spectrotemporal modulation. B–E, The difference in threshold between the pretest and posttest (pre minus post) for discrimination (abscissa) and detection (ordinate) of upward spectrotemporal modulation. Results are plotted separately for listeners trained to discriminate the depth of spectral (B), temporal (C), or spectrotemporal (D) modulation as well as for controls (E). Discrimination training led to worsening on detection but only when both tasks used the same modulation.

Source: PubMed

3
Abonnere