Why would Musical Training Benefit the Neural Encoding of Speech? The OPERA Hypothesis

Aniruddh D Patel, Aniruddh D Patel

Abstract

Mounting evidence suggests that musical training benefits the neural encoding of speech. This paper offers a hypothesis specifying why such benefits occur. The "OPERA" hypothesis proposes that such benefits are driven by adaptive plasticity in speech-processing networks, and that this plasticity occurs when five conditions are met. These are: (1) Overlap: there is anatomical overlap in the brain networks that process an acoustic feature used in both music and speech (e.g., waveform periodicity, amplitude envelope), (2) Precision: music places higher demands on these shared networks than does speech, in terms of the precision of processing, (3) Emotion: the musical activities that engage this network elicit strong positive emotion, (4) Repetition: the musical activities that engage this network are frequently repeated, and (5) Attention: the musical activities that engage this network are associated with focused attention. According to the OPERA hypothesis, when these conditions are met neural plasticity drives the networks in question to function with higher precision than needed for ordinary speech communication. Yet since speech shares these networks with music, speech processing benefits. The OPERA hypothesis is used to account for the observed superior subcortical encoding of speech in musically trained individuals, and to suggest mechanisms by which musical training might improve linguistic reading abilities.

Keywords: hypothesis; music; neural encoding; neural plasticity; speech.

Figures

Figure 1
Figure 1
A simplified schematic of the ascending auditory pathway between the cochlea and primary auditory cortex, showing a few of the subcortical structures involved in auditory processing, such as the cochlear nuclei in the brainstem and the inferior colliculus in the midbrain. Solid red lines show ascending auditory pathways, dashed lines show descending (“corticofugal”) auditory pathways. (In this diagram, corticofugal pathways are only shown on one side of the brain for simplicity. See Figure 2 for a more detailed network diagram). From Patel and Iversen (2007), reproduced with permission.
Figure 2
Figure 2
Schematic diagram of the auditory system, illustrating the many subcortical processing stations between the cochlea (bottom) and cortex (top). Blue arrows represent ascending (bottom-up) pathways; red arrows represent descending projections. From Chandrasekaran and Kraus (2010), reproduced with permission.
Figure 3
Figure 3
The auditory brainstem response to the spoken syllable /da/ (red) in comparison to the acoustic waveform of the syllable (black). The neural response can be studied in the time domain as changes in amplitude across time (top, middle and bottom-left panels) and in the spectral domain as spectral amplitudes across frequency (bottom-right panel). The auditory brainstem response reflects acoustic landmarks in the speech signal with submillisecond precision in timing and phase locking that corresponds to (and physically resembles) pitch and timbre information in the stimulus. From Kraus and Chandrasekaran (2010), reproduced with permission.
Figure 4
Figure 4
Time–amplitude waveform of a 40-ms synthesized speech stimulus /da/ is shown in blue (time shifted by 6 ms to be comparable with the neural response). The first 10 ms of the syllable are characterized by the onset burst of the consonant /d/; the following 30 ms are the formant transition to the vowel /a/. The time–amplitude waveform of the time-locked brainstem response to the 40-ms /da/ is shown below the stimulus, in black. The onset response (V) begins 6–10 ms following the stimulus, reflecting the time delay to the auditory brainstem. The start of the formant transition period is marked by wave C, marking the change from the burst to the periodic portion of the syllable, that is, the vowel. Waves D, E, and F represent the periodic portion of the syllable (frequency-following response) from which the fundamental frequency (F0) of the stimulus can be extracted. Finally, wave O marks stimulus offset. From Chandrasekaran and Kraus (2010), reproduced with permission.
Figure 5
Figure 5
Frequency-following responses to the spoken syllable /mi/ with a “dipping” (tone 3) pitch contour from Mandarin. The top row shows FFR waveforms of a musician and non-musician subject; the bottom row shows the fundamental frequency of the voice (thin black line) and the trajectories (yellow lines) of the FFR's primary periodicity, from the same two individuals. For the musician, the FFR waveform is more periodic and its periodicity tracks the time-varying F0 contour of the spoken syllable with greater accuracy. From Wong et al. (2007), reproduced with permission.
Figure 6
Figure 6
F0 contours of the three Mandarin tones used in the study of Song et al. (2008): high-level (tone 1), rising (tone 2), and dipping/falling–rising (tone 3). Reproduced with permission.
Figure 7
Figure 7
Brainstem encoding of the fundamental frequency (F0) the Mandarin syllable /mi/ with a “dipping” (tone 3) pitch contour. The F0 contour of the syllable is shown by the thin black line, and the trajectory of FFR periodicity is shown by the yellow line. Relative to pretraining (left panel), the post-training FFR in the same participant (right panel) shows more faithful tracking of time-varying F0 contour of the syllable. Data from a representative participant from Song et al. (2008). Reproduced with permission.
Figure 8
Figure 8
Examples of non-linguistic tonal stimulus waveforms for rise times of 15 (A) and 300 ms (B). From Goswami et al. (2002), reproduced with permission.
Figure 9
Figure 9
Grand average cortical responses from temporal electrodes T3 and T4 (red: right hemisphere, blue: left hemisphere) and broadband speech envelope (black) for the sentence “the young boy left home.” Ninety-five milliseconds of the prestimulus period is plotted. The speech envelope was shifted forward in time 85 ms to enable comparison to cortical responses. From Abrams et al. (2008), reproduced with permission.

References

    1. Abrams D. A., Nicol T., Zecker S., Kraus N. (2008). Right-hemisphere auditory cortex is dominant for coding syllable patterns in speech. J. Neurosci. 28, 3958–396510.1523/JNEUROSCI.0187-08.2008
    1. Abrams D. A., Nicol T., Zecker S., Kraus N. (2009). Abnormal cortical processing of the syllable rate of speech in poor readers. J. Neurosci. 29, 7686–769310.1523/JNEUROSCI.5242-08.2009
    1. Ahissar M., Nahum M., Nelken I., Hochstein S. (2009). Reverse hierarchies and sensory learning. Philos. Trans. R. Soc. Lond. B Biol. Sci. 364, 285–29910.1098/rstb.2008.0253
    1. Anvari S., Trainor L. J., Woodside J., Levy B. A. (2002). Relations among musical skills, phonological processing, and early reading ability in preschool children. J. Exp. Child Psychol. 83, 111–13010.1016/S0022-0965(02)00124-8
    1. Bajo V. M., Nodal F. R., Moore D. R., King A. J. (2010). The descending corticocollicular pathway mediates learning-induced auditory plasticity. Nat. Neurosci. 13, 252–26010.1038/nn0810-913
    1. Banai K., Hornikel J., Skoe E., Nicol T., Zecker S., Kraus N. (2009). Reading and subcortical auditory function. Cereb. Cortex 19, 2699–270710.1093/cercor/bhp024
    1. Bendor D., Wang X. (2005). The neuronal representation of pitch in primate auditory cortex. Nature 436, 1161–116510.1038/nature03867
    1. Bhatara A., Tirovolas A. K., Duan L. M., Levy B., Levitin D. J. (2011). Perception of emotional expression in musical performance. J. Exp. Psychol. Hum. Percept. Perform. 37, 921–93410.1037/a0021922
    1. Bidelman G. M., Gandour J. T., Krishnan A. (2011). Cross-domain effects of music and language experience on the representation of pitch in the human auditory brainstem. J. Cogn. Neurosci. 23, 425–43410.1162/jocn.2009.21362
    1. Caclin A., McAdams S., Smith B. K., Winsberg S. (2005). Acoustic correlates of timbre space dimensions: a confirmatory study using synthetic tones. J. Acoust. Soc. Am. 118, 471–48210.1121/1.1929229
    1. Cariani P. A., Delgutte B. (1996). Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. J. Neurophysiol. 76, 1698–1716
    1. Chandrasekaran B., Hornickel J., Skoe E., Nicol T., Kraus N. (2009). Context-dependent encoding in the human auditory brainstem relates to hearing speech in noise: implications for developmental dyslexia. Neuron 64, 311–31910.1016/j.neuron.2009.10.006
    1. Chandrasekaran B., Kraus N. (2010). The scalp-recorded brainstem response to speech: neural origins and plasticity. Psychophysiology 47, 236–24610.1111/j.1469-8986.2009.00928.x
    1. Chapin H., Jantzen K., Kelso S. J. A., Steinberg F., Large E. (2010). Dynamic emotional and neural responses to music depend on performance expression and listener experience. PLoS ONE 5: e13812.10.1371/journal.pone.0013812
    1. Clynes M. (1995). Microstructural musical linguistics: composers’ pulses are liked most by the best musicians. Cognition 55, 269–31010.1016/0010-0277(94)00650-A
    1. Corrigall K., Trainor L. (2010). “The predictive relationship between length of musical training and cognitive skills in children,” in Paper presented at the 11th Intl. Conf. on Music Perception & Cognition (ICMPC11), August 2010, Seattle, WA
    1. Cutler A. (1994). Segmentation problems, rhythmic solutions. Lingua 92, 81–10410.1016/0024-3841(94)90338-7
    1. Drullman R., Festen J. M., Plomp R. (1994). Effect of temporal envelope smearing on speech reception. J. Acoust. Soc. Am. 95, 1053–106410.1121/1.408825
    1. Edelman G. M., Gally J. (2001). Degeneracy and complexity in biological systems. Proc. Natl. Acad. Sci. U.S.A. 98, 13763–1376810.1073/pnas.231499798
    1. Eerola T., Himberg T., Toiviainen P., Louhivuori J. (2006). Perceived complexity of western and African folk melodies by western and African listeners. Psychol. Music 34, 337–37110.1177/0305735606064842
    1. Ferreira F., Patson N. D. (2007). The ‘good enough’ approach to language comprehension. Lang. Linguist. Compass 1, 71–8310.1111/j.1749-818X.2007.00007.x
    1. Fritz J., Elhilali M., Shamma S. (2005). Active listening: task-dependent plasticity of spectrotemporal receptive fields in primary auditory cortex. Hear. Res. 206, 159–17610.1016/j.heares.2005.01.015
    1. Ghitza O., Greenberg S. (2009). On the possible role of brain rhythms in speech perception: Intelligibility of time compressed speech with periodic and aperiodic insertions of silence. Phonetica 66, 113–12610.1159/000208934
    1. Gordon J. W. (1987). The perceptual attack time of musical tones. J. Acoust. Soc. Am. 82, 88–10510.1121/1.2025038
    1. Goswami U., Thompson J., Richardson U., Stainthorp R., Hughes D., Rosen S., Scott S. K. (2002). Amplitude envelope onsets and developmental dyslexia: a new hypothesis. Proc. Natl. Acad. Sci. U.S.A. 99, 10911–1091610.1073/pnas.122368599
    1. Goswami U. (2010). A temporal sampling framework for developmental dyslexia. Trends Cogn. Sci. 15, 3–1010.1016/j.tics.2010.10.001
    1. Greenberg S. (2006). “A multi-tier framework for understanding spoken language,” in Listening to Speech: An Auditory Perspective, eds Greenberg S., Ainsworth W. A. (Mahwah, NJ: Erlbaum; ), 411–433
    1. Hämäläinen J. A., Salminen H. K., Leppänen P. H. T. (in press). Basic auditory processing deficits in dyslexia: Review of the behavioural, event-related potential and magnetoencephalographic evidence. J. Learn. Disabil.
    1. Hickok G., Poeppel D. (2007). The cortical organization of speech processing. Nat. Rev. Neurosci. 8, 393–40210.1038/nrn2113
    1. Holt L. L., Idemaru K. (2011). “Generalization of dimension-based statistical learning of speech,” in Proc. 17th Intl. Cong. Phonetic Sci. (ICPhS XVII), Hong Kong, China
    1. Huss M., Verney J. P., Fosker T., Mead N., Goswami U. (2011). Music, rhythm, rise time perception and developmental dyslexia: perception of musical meter predicts reading and phonology. Cortex 47, 674–68910.1016/j.cortex.2010.07.010
    1. Hyde K. L., Lerch J., Norton A., Forgeard M., Winner E., Evans A. E., Schlaug G. (2009). Musical training shapes structural brain development. J. Neurosci. 29, 3019–302510.1523/JNEUROSCI.5118-08.2009
    1. Jagadeesh B. (2006). “Attentional modulation of cortical plasticity,” in Textbook of Neural Repair and Rehabilitation: Neural Repair and Plasticity, Vol. 1 eds Selzer M., Clarke S. E., Cohen L. G., Duncan P. W., Gage F. H. (Cambridge: Cambridge University Press; ), 194–205
    1. Jentschke S., Koeslch S., Sallat S., Friederici A. (2008). Children with specific language impairment also show impairment of music-syntactic processing. J. Cogn. Neurosci. 20, 1940–195110.1162/jocn.2008.20135
    1. Kaas J., Hackett T. (2000). Subdivisions of auditory cortex and processing streams in primates. Proc. Natl. Acad. Sci. U.S.A. 97, 11793–1179910.1073/pnas.97.22.11793
    1. Koelsch S. (2010). Towards a neural basis of music-evoked emotions. Trends Cogn. Sci. 14, 131–13710.1016/j.tics.2010.01.002
    1. Kilgard M., Merzenich M. (1998). Cortical reorganization enabled by nucleus basalis activity. Science 279, 1714–171810.1126/science.279.5357.1714
    1. Kral A., Eggermont J. (2007). What's to lose and what's to learn: development under auditory deprivation, cochlear implants and limits of cortical plasticity. Brain Res. Rev. 56, 259–26910.1016/j.brainresrev.2007.07.021
    1. Kraus N., Chandrasekaran B. (2010). Music training for the development of auditory skills. Nat. Rev. Neurosci. 11, 599–60510.1038/nrm2968
    1. Krishnan A., Xu Y., Gandour J., Cariani P. (2005). Encoding of pitch in the human brainstem is sensitive to language experience. Brain Res. Cogn. Brain Res. 25, 161–16810.1016/j.cogbrainres.2005.05.004
    1. Lakshimnarayanan K., Tallal P. (2007). Generalization of non-linguistic auditory perceptual training to syllable discrimination. Restor. Neurol. Neurosci. 25, 263–272
    1. Lappe C., Herholz S. C., Trainor L. J., Pantev C. (2008). Cortical plasticity induced by short-term unimodal and multimodal musical training. J. Neurosci. 8, 9632–963910.1523/JNEUROSCI.2254-08.2008
    1. Lee K. M., Skoe E., Kraus N., Ashley R. (2009). Selective subcortical enhancement of musical intervals in musicians. J. Neurosci. 29, 5832–584010.1523/JNEUROSCI.6185-08.2009
    1. Liu F., Patel A. D., Fourcin A., Stewart L. (2010). Intonation processing in congenital amusia: discrimination, identification, and imitation. Brain 133, 1682–169310.1093/brain/awq089
    1. Mattys S. L., White L., Melhorn J. F. (2005). Integration of multiple speech segmentation cues: a hierarchical framework. J. Exp. Psychol. General 134, 477–50010.1037/0096-3445.134.4.477
    1. McMurray B., Aslin R., Tanenhaus M., Spivey M., Subik D. (2008). Gradient sensitivity to within-category variation in speech: implications for categorical perception. J. Exp. Psychol. Hum. Percept. Perform. 34, 1609–163110.1037/a0011747
    1. Moreno S., Marques C., Santos A., Santos M., Castro S. L., Besson M. (2009). Musical training influences linguistic abilities in 8-year-old children: more evidence for brain plasticity. Cereb. Cortex 19, 712–72310.1093/cercor/bhn120
    1. Musacchia G., Sams M., Skoe E., Kraus N. (2007). Musicians have enhanced subcortical auditory and audiovisual processing of speech and music. Proc. Natl. Acad. Sci. U.S.A. 104, 15894–1589810.1073/pnas.0701498104
    1. Musacchia G., Strait D. L., Kraus N. (2008). Relationships between behavior, brainstem and cortical encoding of seen and heard speech in musicians and nonmusicians. Hear. Res. 241, 34–4210.1016/j.heares.2008.04.013
    1. Nagarajan S. S., Cheung S. W., Bedenbaugh P., Beitel R. E., Schreiner C. E., Merzenich M. M. (2002). Representation of spectral and temporal envelope of twitter vocalizations in common marmoset primary auditory cortex. J. Neurophysiol. 87, 1723–1737
    1. Overy K. (2003). Dyslexia and music: from timing deficits to musical intervention. Ann. N. Y. Acad. Sci. 999, 497–50510.1196/annals.1284.060
    1. Palmer C. (1997). Music performance. Ann. Rev. Psychol. 48, 115–13810.1146/annurev.psych.48.1.115
    1. Panizzon M. W., Fennema-Notestine C., Eyler L. T., Jernigan T. L., Prom-Wormley E., Neale M., Jacobson K., Lyons M. J., Grant M. D., Franz C. E., Xian H., Tsuang M., Tsuang M., Fischl B., Seidman L., Dale A., Kremen W. S. (2009). Distinct genetic influences on cortical surface area and cortical thickness. Cereb. Cortex 19, 2728–273510.1093/cercor/bhp026
    1. Parbery-Clark A., Skoe E., Kraus N. (2009). Musical experience limits the degradative effects of background noise on the neural processing of sound. J. Neurosci. 29, 14100–1410710.1523/JNEUROSCI.3256-09.2009
    1. Parbery-Clark A., Strait D. L., Anderson S., Hittner E., Kraus N. (2011). Musical training and the aging auditory system: implications for cognitive abilities and hearing speech in noise. PLoS ONE 6: e18082.10.1371/journal.pone.0018082
    1. Patel A. D. (2008). Music, Language, and the Brain. New York: Oxford University Press
    1. Patel A. D., Iversen J. R. (2007). The linguistic benefits of musical abilities. Trends Cogn. Sci. 11, 369–37210.1016/j.tics.2007.08.003
    1. Patel A. D., Iversen J. R., Wassenaar M., Hagoort P. (2008). Musical syntactic processing in agrammatic Broca's aphasia. Aphasiology 22, 776–78910.1080/02687030701803804
    1. Patel A. D., Xu Y., Wang B. (2010). “The role of F0 variation in the intelligibility of Mandarin sentences,” in Proc. Speech Prosody 2010, May 11–14, 2010, Chicago, IL, USA
    1. Peretz I., Coltheart M. (2003). Modularity of music processing. Nat. Neurosci. 6, 688–69110.1038/nn1083
    1. Peretz I., Cummings S., Dubé M-P. (2007). The genetics of congenital amusia (or tone-deafness): a family aggregation study. Am. J. Hum. Genet. 81, 582–58810.1086/521337
    1. Poeppel D. (2003). The analysis of speech in different temporal integration windows: cerebral lateralization as ‘asymmetric sampling in time’. Speech Commun. 41, 245–25510.1016/S0167-6393(02)00107-3
    1. Polley D. B., Steinberg E. E., Merzenich M. M. (2006). Perceptual learning directs auditory cortical map reorganization through top-down influences. J. Neurosci. 26, 4970–498210.1523/JNEUROSCI.3771-05.2006
    1. Recanzone G. H., Schreiner C. C., Merzenich M. M. (1993). Plasticity in the frequency representation of primary auditory cortex following discrimination training in adult owl monkeys. J. Neurosci. 13, 87–103
    1. Repp B. H. (1992). Diversity and commonality in music performance: an analysis of timing microstructure in Schumann's “Träumerei.” J. Acoust. Soc. Am. 92, 2546–256810.1121/1.404425
    1. Russo N., Nicol T., Musacchia G., Kraus N. (2004). Brainstem response to speech syllables. Clin. Neurophysiol. 115, 2021–203010.1016/j.clinph.2004.04.003
    1. Salimpoor V., Benovoy M., Larcher K., Dagher A., Zatorre R. (2011). Anatomically distinct dopamine release during anticipation and experience of peak emotion to music. Nat. Neurosci. 14, 257–26210.1038/nn.2726
    1. Schnupp J., Nelken I., King A. (2011). Auditory Neuroscience: Making Sense of Sound. Cambridge, MA: MIT Press
    1. Schofield B. R. (2010). Projections from auditory cortex to midbrain cholinergic neurons that project to the inferior colliculus. Neuroscience 166, 231–24010.1016/j.neuroscience.2009.12.008
    1. Skoe E., Kraus N. (2010). Auditory brainstem response to complex sounds: a tutorial. Ear Hear. 31, 302–32410.1097/AUD.0b013e3181cdb272
    1. Song J., Skoe E., Banai K., Kraus N. (2010). Perception of speech in noise: neural correlates. J. Cogn. Neurosci. 23, 2268–227910.1162/jocn.2010.21556
    1. Song J. H., Skoe E., Wong P. C., Kraus N. (2008). Plasticity in the adult human auditory brainstem following short-term linguistic training. J. Cogn. Neurosci. 10, 1892–190210.1162/jocn.2008.20131
    1. Stevens K. N. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press
    1. Strait D. L., Kraus N., Parbery-Clark A., Ashley R. (2010). Musical experience shapes top-down auditory mechanisms: evidence from masking and auditory attention performance. Hear. Res. 261, 22–2910.1016/j.heares.2009.12.021
    1. Strait D. L., Skoe E., Kraus N., Ashley R. (2009). Musical experience and neural efficiency: effects of training on subcortical processing of vocal expressions of emotion. Eur. J. Neurosci. 29, 661–66810.1111/j.1460-9568.2009.06617.x
    1. Strong W., Clark M. C. (1967). Perturbations of synthetic orchestral wind-instrument tones. J. Acoust. Soc. Am. 41, 277–28510.1121/1.1910503
    1. Suga N. (2008). Role of corticofugal feedback in hearing. J. Comp. Physiol. A Neuroethol. Sens. Neural Behav. Physiol. 194, 169–18310.1007/s00359-007-0274-2
    1. Tallal P., Gaab N. (2006). Dynamic auditory processing, musical experience and language development. Trends Neurosci. 29, 382–39010.1016/j.tins.2006.06.003
    1. Tervaniemi M., Kruck S., De Baene W., Schröger E., Alter K., Friederici A. D. (2009). Top-down modulation of auditory processing: effects of sound context, musical expertise and attentional focus. Eur. J. Neurosci. 30, 1636–164210.1111/j.1460-9568.2009.06955.x
    1. Thiel C. M. (2007). Pharmacological modulation of learning-induced plasticity in human auditory cortex. Restor. Neurol. Neurosci. 25, 435–443
    1. Thompson J. M., Goswami U. (2008). Rhythmic processing in children with developmental dyslexia: auditory and motor rhythms link to reading and spelling. J. Physiol. Paris 102, 120–12910.1016/j.jphysparis.2008.03.007
    1. Toscano J. C., McMurray B. (2010). Cue integration with categories: weighting acoustic cues in speech using unsupervised learning and distributional statistics. Cogn. Sci. 34, 434–46410.1111/j.1551-6709.2009.01077.x
    1. Toscano J. C., McMurray B., Dennhardt J., Luck S. J. (2010). Continuous perception and graded categorization: electrophysiological evidence for a linear relationship between the acoustic signal and perceptual encoding of speech. Psychol. Sci. 21, 1532–154010.1177/0956797610384142
    1. Tzounopoulos T., Kraus N. (2009). Learning to encode timing: mechanisms of plasticity in the auditory brainstem. Neuron 62, 463–46910.1016/j.neuron.2009.05.002
    1. Vandermosten M., Boets B., Luts H., Poelmans H., Golestani N., Wouters J., Ghesquière P. (2010). Adults with dyslexia are impaired in categorizing speech and nonspeech sounds on the basis of temporal cues. Proc. Natl. Acad. Sci. U.S.A. 107, 10389–1039410.1073/pnas.0912858107
    1. Vos P. G., Troost J. M. (1989). Ascending and descending melodic intervals: statistical findings and their perceptual relevance. Music Percept. 6, 383–396
    1. Weinberger N. (2007). Auditory associative memory and representational plasticity in the primary auditory cortex. Hear. Res. 229, 54–6810.1016/j.heares.2007.01.004
    1. Winer J. (2006). Decoding the auditory corticofugal systems. Hear. Res. 212, 1–810.1016/j.heares.2005.06.014
    1. Wong P. C., Skoe E., Russo N. M., Dees T., Kraus N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nat. Neurosci. 10, 420–422
    1. Zatorre R. J., Belin P., Penhune V. B. (2002). Structure and function of auditory cortex: music and speech. Trends Cogn. Sci. 6, 37–4610.1016/S1364-6613(00)01816-7
    1. Zatorre R. J., Gandour J. T. (2008). Neural specializations for speech and pitch: moving beyond the dichotomies. Phil. Trans. R. Soc. B 363, 1087–110410.1098/rstb.2007.2161

Source: PubMed

3
Předplatit