Perception of Child-Directed Versus Adult-Directed Emotional Speech in Pediatric Cochlear Implant Users

Karen Chan Barrett, Monita Chatterjee, Meredith T Caldwell, Mickael L D Deroche, Patpong Jiradejvong, Aditya M Kulkarni, Charles J Limb, Karen Chan Barrett, Monita Chatterjee, Meredith T Caldwell, Mickael L D Deroche, Patpong Jiradejvong, Aditya M Kulkarni, Charles J Limb

Abstract

Objectives: Cochlear implants (CIs) are remarkable in allowing individuals with severe to profound hearing loss to perceive speech. Despite these gains in speech understanding, however, CI users often struggle to perceive elements such as vocal emotion and prosody, as CIs are unable to transmit the spectro-temporal detail needed to decode affective cues. This issue becomes particularly important for children with CIs, but little is known about their emotional development. In a previous study, pediatric CI users showed deficits in voice emotion recognition with child-directed stimuli featuring exaggerated prosody. However, the large intersubject variability and differential developmental trajectory known in this population incited us to question the extent to which exaggerated prosody would facilitate performance in this task. Thus, the authors revisited the question with both adult-directed and child-directed stimuli.

Design: Vocal emotion recognition was measured using both child-directed (CDS) and adult-directed (ADS) speech conditions. Pediatric CI users, aged 7-19 years old, with no cognitive or visual impairments and who communicated through oral communication with English as the primary language participated in the experiment (n = 27). Stimuli comprised 12 sentences selected from the HINT database. The sentences were spoken by male and female talkers in a CDS or ADS manner, in each of the five target emotions (happy, sad, neutral, scared, and angry). The chosen sentences were semantically emotion-neutral. Percent correct emotion recognition scores were analyzed for each participant in each condition (CDS vs. ADS). Children also completed cognitive tests of nonverbal IQ and receptive vocabulary, while parents completed questionnaires of CI and hearing history. It was predicted that the reduced prosodic variations found in the ADS condition would result in lower vocal emotion recognition scores compared with the CDS condition. Moreover, it was hypothesized that cognitive factors, perceptual sensitivity to complex pitch changes, and elements of each child's hearing history may serve as predictors of performance on vocal emotion recognition.

Results: Consistent with our hypothesis, pediatric CI users scored higher on CDS compared with ADS speech stimuli, suggesting that speaking with an exaggerated prosody-akin to "motherese"-may be a viable way to convey emotional content. Significant talker effects were also observed in that higher scores were found for the female talker for both conditions. Multiple regression analysis showed that nonverbal IQ was a significant predictor of CDS emotion recognition scores while Years using CI was a significant predictor of ADS scores. Confusion matrix analyses revealed a dependence of results on specific emotions; for the CDS condition's female talker, participants had high sensitivity (d' scores) to happy and low sensitivity to the neutral sentences while for the ADS condition, low sensitivity was found for the scared sentences.

Conclusions: In general, participants had higher vocal emotion recognition to the CDS condition which also had more variability in pitch and intensity and thus more exaggerated prosody, in comparison to the ADS condition. Results suggest that pediatric CI users struggle with vocal emotion perception in general, particularly to adult-directed speech. The authors believe these results have broad implications for understanding how CI users perceive emotions both from an auditory communication standpoint and a socio-developmental perspective.

Figures

Figure 1.. Acoustic characteristics of the adult-…
Figure 1.. Acoustic characteristics of the adult- and child-directed stimuli for each emotion, relative to the neutral emotion.
In each row, ADSF and ADSM indicate adult-directed speech by the female and male talkers respectively, and CDSF and CDSM indicate child-directed speech by the female and male talkers respectively. Within each panel, the abscissa shows the specific emotion. The ordinates in the top, middle and bottom rows show boxplots of the ratio of the mean pitch (fundamental frequency) of the recorded sentences spoken in each emotion to the mean pitch of the same sentences spoken in the neutral emotion, the ratio of the duration of the recorded sentences spoken in each emotion to that of the same sentences spoken in the neutral emotion, and the intensity difference between the recorded sentences in each emotion and the same sentences in the neutral emotion, respectively.
Figure 2.. Percent Correct scores across the…
Figure 2.. Percent Correct scores across the Conditions.
Comparison of Percent Correct Scores for CDS vs. ADS conditions reveal that performance is significantly higher for CDS compared to ADS condition [t(26)=11.53, p < 0.001]. Mean CDS Score: 71.7% +/− SD 20.3, Mean ADS Score: 50.0% +/− SD 14.4.
Figure 3.. Percent Correct Scores separated by…
Figure 3.. Percent Correct Scores separated by Talker.
Panels A and B show group average percent correct scores while Panels C and D depict individual scores. Participants showed higher percent correct score for the female talker in both the CDS (Female Mean Score: 77.5% +/− SD 20.8; Male Mean Score: 65.8% +/− SD 20.7, see panel 3A) and ADS condition (Female Mean Score: 55.3% +/− SD 15.5; Male Mean Score: 44.7% +/− SD 14.8, see panel 3B). Individual data separated by talker (female vs. male) is depicted for the CDS condition (Panel 3C, bottom left) and for the ADS condition (Panel 3B, bottom right).
Figure 4.. CDS condition confusion matrix data…
Figure 4.. CDS condition confusion matrix data analysis evaluating responses to specific target emotion.
Top: Boxplots represent average d’ score for each emotion separated out by talker. Left, bottom: Boxplot represents average hit rates (subcomponent of the d’ score) for each emotion. Right, bottom: Boxplot represents average false alarm rates (subcomponent of the d’ score) for each emotion.
Figure 5.. ADS condition confusion matrix data…
Figure 5.. ADS condition confusion matrix data analysis evaluating responses to specific target emotion.
Top: Boxplots represent average d’ score for each emotion separated out by talker. Left, bottom: Boxplot represents average hit rates (subcomponent of the d’ score) for each emotion. Right, bottom: Boxplot represents average false alarm rates (subcomponent of the d’ score) for each emotion.

Source: PubMed

3
Abonner