Voice emotion recognition by cochlear-implanted children and their normally-hearing peers

Monita Chatterjee, Danielle J Zion, Mickael L Deroche, Brooke A Burianek, Charles J Limb, Alison P Goren, Aditya M Kulkarni, Julie A Christensen, Monita Chatterjee, Danielle J Zion, Mickael L Deroche, Brooke A Burianek, Charles J Limb, Alison P Goren, Aditya M Kulkarni, Julie A Christensen

Abstract

Despite their remarkable success in bringing spoken language to hearing impaired listeners, the signal transmitted through cochlear implants (CIs) remains impoverished in spectro-temporal fine structure. As a consequence, pitch-dominant information such as voice emotion, is diminished. For young children, the ability to correctly identify the mood/intent of the speaker (which may not always be visible in their facial expression) is an important aspect of social and linguistic development. Previous work in the field has shown that children with cochlear implants (cCI) have significant deficits in voice emotion recognition relative to their normally hearing peers (cNH). Here, we report on voice emotion recognition by a cohort of 36 school-aged cCI. Additionally, we provide for the first time, a comparison of their performance to that of cNH and NH adults (aNH) listening to CI simulations of the same stimuli. We also provide comparisons to the performance of adult listeners with CIs (aCI), most of whom learned language primarily through normal acoustic hearing. Results indicate that, despite strong variability, on average, cCI perform similarly to their adult counterparts; that both groups' mean performance is similar to aNHs' performance with 8-channel noise-vocoded speech; that cNH achieve excellent scores in voice emotion recognition with full-spectrum speech, but on average, show significantly poorer scores than aNH with 8-channel noise-vocoded speech. A strong developmental effect was observed in the cNH with noise-vocoded speech in this task. These results point to the considerable benefit obtained by cochlear-implanted children from their devices, but also underscore the need for further research and development in this important and neglected area. This article is part of a Special Issue entitled <Lasker Award>.

Copyright © 2014 Elsevier B.V. All rights reserved.

Figures

Fig. 1
Fig. 1
Results of acoustic analyses of the male (circles) and female (squares) talker’s utterances, plotted for each of the five emotions (abscissa). Each panel corresponds to a different acoustic cue. Error bars show +/− 1 s.d. from the mean.
Fig. 2
Fig. 2
Summed discriminability indices for the different cues (abscissa) plotted for the male (orange) and female (blue) talkers, and for full-spectrum stimuli. Results are shown for Mean Intensity (Int.), Duration (Dur.), F0 height (F0 ht.), F0 range (F0 rng.), and Intensity Range (Int. Rng.).
Fig. 3
Fig. 3
Results of acoustic analyses for the cues of Mean Intensity (dB SPL) (top) and Intensity Range (bottom), compared for NBV (circles) and full spectrum (diamonds) stimuli and for the male (orange) and female (blue) talker, respectively.
Fig. 4
Fig. 4
Summed discriminability indices for Mean Intensity (Int.), Duration (Dur.), and Intensity Range (Int. Rng.), for full spectrum (solid bars) and NBV (hatched bars) stimuli, and for the male and female talkers (orange and blue, respectively).
Fig. 5
Fig. 5
Mean emotion recognition scores with full spectrum stimuli for the four subject groups, for the male (orange) and female (blue) talkers, respectively. Error bars show +/− 1 s.d.. The solid horizontal line shows chance performance.
Fig. 6
Fig. 6
Mean emotion recognition scores plotted against the spectral resolution condition, for the four subject groups. Note that aNH were tested under all conditions; cCI and aCI were tested only in the full-spectrum condition, and cNH were tested in full-spectrum and 8-channel NBV conditions. Error bars show +/− 1 s.d. from the mean. Left and right hand panels show results obtained with the female and male talker, respectively.
Fig. 7
Fig. 7
RAU transformed scores (filled symbols) and percent correct scores (open symbols) plotted against age, for cNH (circles) and aNH (squares) and listening to full-spectrum stimuli. The solid line shows the regression line through the RAU-transformed data for the cNH only (r and p values are also indicated).
Fig. 8
Fig. 8
Percent correct scores plotted against age, for cNH (filled symbols) and aNH (open symbols) listening to 8-channel NBV stimuli. The regression line was plotted through the data obtained from cNH only (r and p values indicated).
Fig. 9
Fig. 9
Percent correct scores plotted against age, for cCI (blue circles) and aCI (red squares) listening to full-spectrum stimuli. The individual data are not plotted in any particular order along the abscissa. Solid horizontal lines indicate aNHs’ mean scores under different conditions of spectral resolution (no. of channels, shown on the right hand ordinate), for comparison.
Fig. 10
Fig. 10
Mean confusion matrices obtained with stimuli recorded by the male (left panel) and female (right panel) talkers, and for the different listener groups and different conditions of spectral resolution (top to bottom). Confusion matrices are presented with the stimuli organized vertically and the response categories organized horizontally. Each cell shows the number of responses for that particular stimulus and response combination: the range is from 0 (white) to 12 (darkest green).
Fig. 11
Fig. 11
Values of d′ calculated for each of the confusion matrices shown in Fig. 10, plotted against the corresponding emotion. Left and right panels show results obtained with sentences recorded by the male and female talker, respectively. Within each panel, the different symbols show different levels of spectral resolution (e.g., squares represent the full spectrum condition). The different colors show results obtained with different subject groups.

Source: PubMed

3
Abonner