Musicians and non-musicians are equally adept at perceiving masked speech

Dana Boebinger, Samuel Evans, Stuart Rosen, César F Lima, Tom Manly, Sophie K Scott, Dana Boebinger, Samuel Evans, Stuart Rosen, César F Lima, Tom Manly, Sophie K Scott

Abstract

There is much interest in the idea that musicians perform better than non-musicians in understanding speech in background noise. Research in this area has often used energetic maskers, which have their effects primarily at the auditory periphery. However, masking interference can also occur at more central auditory levels, known as informational masking. This experiment extends existing research by using multiple maskers that vary in their informational content and similarity to speech, in order to examine differences in perception of masked speech between trained musicians (n = 25) and non-musicians (n = 25). Although musicians outperformed non-musicians on a measure of frequency discrimination, they showed no advantage in perceiving masked speech. Further analysis revealed that non-verbal IQ, rather than musicianship, significantly predicted speech reception thresholds in noise. The results strongly suggest that the contribution of general cognitive abilities needs to be taken into account in any investigations of individual variability for perceiving speech in noise.

Figures

Figure 1
Figure 1
Wave forms and spectrograms of the four maskers, arranged from least informational masking to most. For the spectrograms, time is represented on the x-axis (0-1.7 s) and frequency on the y-axis (0-4 kHz). (A) Speech-spectrum steady-state noise has no temporal modulations or spectro-temporal dynamics. (B) Speech-amplitude modulated noise has temporal modulations but no spectro-temporal dynamics. (C) Spectrally-rotated speech contains similar temporal modulations and spectro-temporal dynamics as clear speech, but is unintelligible. (D) Clear speech contains both temporal modulations and spectro-temporal dynamics, and is intelligible.
Figure 2
Figure 2
Boxplots illustrating performance on the duration and frequency discrimination tasks separated by group, * = p < .05, ** = p < .001. Note the different units for duration (ms) and frequency (Hz) measures. Lower thresholds indicate better performance.
Figure 3
Figure 3
Boxplots of SRTs for each masking condition. Note that lower thresholds indicate better performance. SSN = Speech-Spectrum steady-state Noise, SMN = Speech-amplitude Modulated Noise, Rot = Rotated speech, Spe = clear speech.
Figure 4
Figure 4
Participants’ average SRTs as predicted by their WASI Matrix Reasoning score. Note that lower thresholds indicate better performance.

Source: PubMed

3
Abonnere