Acoustic differences between healthy and depressed people: a cross-situation study

Jingying Wang, Lei Zhang, Tianli Liu, Wei Pan, Bin Hu, Tingshao Zhu, Jingying Wang, Lei Zhang, Tianli Liu, Wei Pan, Bin Hu, Tingshao Zhu

Abstract

Background: Abnormalities in vocal expression during a depressed episode have frequently been reported in people with depression, but less is known about if these abnormalities only exist in special situations. In addition, the impacts of irrelevant demographic variables on voice were uncontrolled in previous studies. Therefore, this study compares the vocal differences between depressed and healthy people under various situations with irrelevant variables being regarded as covariates.

Methods: To examine whether the vocal abnormalities in people with depression only exist in special situations, this study compared the vocal differences between healthy people and patients with unipolar depression in 12 situations (speech scenarios). Positive, negative and neutral voice expressions between depressed and healthy people were compared in four tasks. Multiple analysis of covariance (MANCOVA) was used for evaluating the main effects of variable group (depressed vs. healthy) on acoustic features. The significances of acoustic features were evaluated by both statistical significance and magnitude of effect size.

Results: The results of multivariate analysis of covariance showed that significant differences between the two groups were observed in all 12 speech scenarios. Although significant acoustic features were not the same in different scenarios, we found that three acoustic features (loudness, MFCC5 and MFCC7) were consistently different between people with and without depression with large effect magnitude.

Conclusions: Vocal differences between depressed and healthy people exist in 12 scenarios. Acoustic features including loudness, MFCC5 and MFCC7 have potentials to be indicators for identifying depression via voice analysis. These findings support that depressed people's voices include both situation-specific and cross-situational patterns of acoustic features.

Keywords: Acoustic feature; Cross-situation; Major depressive disorder; Voice analysis.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The number of significant acoustic features in each scenario (Task: VW, video watching; QA, question answering; TR, text reading; PD, picture describing. Emotion: pos, positive; neu, neutral; neg, negative)

References

    1. Cohen AS, Kim Y, Najolia GM. Psychiatric symptom versus neurocognitive correlates of diminished expressivity in schizophrenia and mood disorders. Schizophr Res. 2013;146:249–253. doi: 10.1016/j.schres.2013.02.002.
    1. American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders (DSM-5®). Washington D.C: American Psychiatric Pub; 2013.
    1. Cannizzaro M, Harel B, Reilly N, Chappell P, Snyder PJ. Voice acoustical measurement of the severity of major depression. Brain Cogn. 2004;56:30–35. doi: 10.1016/j.bandc.2004.05.003.
    1. Kuny S, Stassen HH. Speaking behavior and voice sound characteristics in depressive patients during recovery. J Psychiatr Res. 1993;27:289–307. doi: 10.1016/0022-3956(93)90040-9.
    1. Mundt JC, Vogel AP, Feltner DE, Lenderking WR. Vocal acoustic biomarkers of depression severity and treatment response. Biol Psychiatry. 2012;72:580–587. doi: 10.1016/j.biopsych.2012.03.015.
    1. Stassen HH, Kuny S, Hell D. The speech analysis approach to determining onset of improvement under antidepressants. Eur Neuropsychopharmacol. 1998;8:303–310. doi: 10.1016/S0924-977X(97)00090-4.
    1. Cohn JF, Kruez TS, Matthews I, Yang Y, Nguyen MH, Padilla MT, et al. 2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. 2009. Detecting depression from facial actions and vocal prosody; pp. 1–7.
    1. Ee Brian Ooi K, Lech M, Brian Allen N. Prediction of major depression in adolescents using an optimized multi-channel weighted speech classification system. Biomed Signal Process Control. 2014;14(Supplement C):228–239. doi: 10.1016/j.bspc.2014.08.006.
    1. Moore E, II, Clements MA, Peifer JW, Weisser L. Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE Trans Biomed Eng. 2008;55:96–107. doi: 10.1109/TBME.2007.900562.
    1. Solomon C, Valstar MF, Morriss RK, Crowe J. Objective methods for reliable detection of concealed depression. Hum-Media Interact. 2015;2:5. doi: 10.3389/fict.2015.00005.
    1. Cohen AS, Elvevåg B. Automated computerized analysis of speech in psychiatric disorders. Curr Opin Psychiatry. 2014;27:203–209. doi: 10.1097/YCO.0000000000000056.
    1. Cohen AS, Najolia GM, Kim Y, Dinzeo TJ. On the boundaries of blunt affect/alogia across severe mental illness: implications for research domain criteria. Schizophr Res. 2012;140:41–45. doi: 10.1016/j.schres.2012.07.001.
    1. Cohen AS, Lee Hong S, Guevara A. Understanding emotional expression using prosodic analysis of natural speech: refining the methodology. J Behav Ther Exp Psychiatry. 2010;41:150–157. doi: 10.1016/j.jbtep.2009.11.008.
    1. Alghowinem S, Goecke R, Wagner M, Epps J, Breakspear M, Parker G. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 2013. Detecting depression: A comparison between spontaneous and read speech; pp. 7547–7551.
    1. Laukka P, Juslin P, Bresin R. A dimensional approach to vocal expression of emotion. Cogn Emot. 2005;19:633–653. doi: 10.1080/02699930441000445.
    1. Fu Q-J, Chinchilla S, Galvin JJ. The role of spectral and temporal cues in voice gender discrimination by Normal-hearing listeners and Cochlear implant users. J Assoc Res Otolaryngol. 2004;5:253–260. doi: 10.1007/s10162-004-4046-1.
    1. Frances A. Diagnostic and statistical manual of mental disorders: DSM-IV. Washington D.C: American Psychiatric Association; 1994.
    1. Dibeklioglu H, Hammal Z, Cohn JF. Dynamic Multimodal Measurement of Depression Severity Using Deep Autoencoding. IEEE J Biomed Health Inform. 2017;22:1–1.
    1. Ellgring H, Scherer PKR. Vocal indicators of mood change in depression. J Nonverbal Behav. 1996;20:83–110. doi: 10.1007/BF02253071.
    1. Tolkmitt F, Helfrich H, Standke R, Scherer KR. Vocal indicators of psychiatric treatment effects in depressives and schizophrenics. J Commun Disord. 1982;15:209–222. doi: 10.1016/0021-9924(82)90034-X.
    1. Alpert M, Pouget ER, Silva RR. Reflections of depression in acoustic measures of the patient’s speech. J Affect Disord. 2001;66:59–69. doi: 10.1016/S0165-0327(00)00335-9.
    1. Mandal MK, Srivastava P, Singh SK. Paralinguistic characteristics of speech in schizophrenics and depressives. J Psychiatr Res. 1990;24:191–196. doi: 10.1016/0022-3956(90)90059-Y.
    1. Naarding P, Broek WW van den, Wielaert S, Harskamp F van. Aprosodia in major depression. J Neurolinguistics. 2003;16:37–41. doi: 10.1016/S0911-6044(01)00043-4.
    1. Kohler CG, Martin EA, Milonova M, Wang P, Verma R, Brensinger CM, et al. Dynamic evoked facial expressions of emotions in schizophrenia. Schizophr Res. 2008;105:30–39. doi: 10.1016/j.schres.2008.05.030.
    1. Renneberg B, Heyn K, Gebhard R, Bachmann S. Facial expression of emotions in borderline personality disorder and depression. J Behav Ther Exp Psychiatry. 2005;36:183–196. doi: 10.1016/j.jbtep.2005.05.002.
    1. Eyben F, Weninger F, Gross F, Schuller B. Proceedings of the 21st ACM international conference on multimedia. New York: ACM; 2013. Recent developments in openSMILE, the Munich open-source multimedia feature extractor; pp. 835–838.
    1. Mundt JC, Snyder PJ, Cannizzaro MS, Chappie K, Geralts DS. Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. J Neurolinguistics. 2007;20:50–64. doi: 10.1016/j.jneuroling.2006.04.001.
    1. Cummins N, Epps J, Breakspear M, Goecke R. An investigation of depressed speech detection: features and normalization. 2011. pp. 2997–3000.
    1. Gupta R, Malandrakis N, Xiao B, Guha T, Van Segbroeck M, Black M, et al. Proceedings of the 4th international workshop on audio/visual emotion challenge. New York: ACM; 2014. Multimodal prediction of affective dimensions and depression in human-computer interactions; pp. 33–40.
    1. Taguchi T, Tachikawa H, Nemoto K, Suzuki M, Nagano T, Tachibana R, et al. Major depressive disorder discrimination using vocal acoustic features. J Affect Disord. 2018;225(Supplement C):214–220. doi: 10.1016/j.jad.2017.08.038.
    1. Schuller B, Steidl S, Batliner A, Burkhardt F, Devillers L, Müller C, et al. In Proc. Interspeech. 2010. The INTERSPEECH 2010 paralinguistic challenge.
    1. Chiou BC, Chen CP. 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference. 2013. Feature space dimension reduction in speech emotion recognition using support vector machine; pp. 1–6.
    1. Schuller B, Villar RJ, Rigoll G, Lang M. Proceedings. (ICASSP ‘05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005. 2005. Meta-Classifiers in Acoustic and Linguistic Feature Fusion-Based Affect Recognition; pp. 325–328.
    1. Dhall A, Goecke R, Joshi J, Wagner M, Gedeon T. Proceedings of the 15th ACM on international conference on multimodal interaction. New York: ACM; 2013. Emotion recognition in the wild challenge 2013; pp. 509–516.
    1. Tabachnick BG, Fidell LS. Multivariate analysis of variance and covariance. In: Using multivariate statistics. New York: Pearson; 2007. p. 402–7.
    1. Cohen J. Statistical power analyses for the behavioral sciences. 2. Hillsdale: Lawrence Erlbaum Associates; 1988.
    1. Zhu Y, Kim YC, Proctor MI, Narayanan SS, Nayak KS. Dynamic 3-D visualization of vocal tract shaping during speech. IEEE Trans Med Imaging. 2013;32:838–848. doi: 10.1109/TMI.2012.2230017.
    1. Keedwell PA, Andrew C, Williams SCR, Brammer MJ, Phillips ML. The neural correlates of anhedonia in major depressive disorder. Biol Psychiatry. 2005;58:843–853. doi: 10.1016/j.biopsych.2005.05.019.
    1. Burton MW. The role of inferior frontal cortex in phonological processing. Cogn Sci. 2001;25:695–709. doi: 10.1207/s15516709cog2505_4.
    1. Paulesu E, Goldacre B, Scifo P, Cappa SF, Gilardi MC, Castiglioni I, et al. Functional heterogeneity of left inferior frontal cortex as revealed by fMRI. NeuroRep. 1997;8:2011. doi: 10.1097/00001756-199705260-00042.
    1. Yang Y, Fairbairn C, Cohn JF. Detecting depression severity from vocal prosody. IEEE Trans Affect Comput. 2013;4:142–150. doi: 10.1109/T-AFFC.2012.38.
    1. Scherer KR. Dynamics of Stress. Boston: Springer; 1986. Voice, Stress, and Emotion; pp. 157–179.
    1. Cummins N, Scherer S, Krajewski J, Schnieder S, Epps J, Quatieri TF. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 2015;71:10–49. doi: 10.1016/j.specom.2015.03.004.
    1. Rottenberg J, Gross JJ, Gotlib IH. Emotion context insensitivity in major depressive disorder. J Abnorm Psychol. 2005;114:627–639. doi: 10.1037/0021-843X.114.4.627.
    1. Vogt T, Andre E. Improving automatic emotion recognition from speech via gender differentiation. LREC. 2006. p. 1123–6.
    1. Kendler KS, Gardner CO, Prescott CA. Toward a comprehensive developmental model for major depression in men. Am J Psychiatry. 2006;163:115–124. doi: 10.1176/appi.ajp.163.1.115.
    1. Kendler KS, Gardner CO, Prescott CA. Toward a comprehensive developmental model for major depression in women. FOCUS. 2005;3:83–97. doi: 10.1176/foc.3.1.83.
    1. Zimmerman FJ, Katon W. Socioeconomic status, depression disparities, and financial strain: what lies behind the income-depression relationship? Health Econ. 2005;14:1197–1215. doi: 10.1002/hec.1011.
    1. Ohala JJ. Cross-language use of pitch: an ethological view. Phonetica. 1983;40:1–18. doi: 10.1159/000261678.

Source: PubMed

3
Subscribe