Re-examining the robustness of voice features in predicting depression: Compared with baseline of confounders

Wei Pan, Jonathan Flint, Liat Shenhav, Tianli Liu, Mingming Liu, Bin Hu, Tingshao Zhu, Wei Pan, Jonathan Flint, Liat Shenhav, Tianli Liu, Mingming Liu, Bin Hu, Tingshao Zhu

Abstract

A large proportion of Depression Disorder patients do not receive an effective diagnosis, which makes it necessary to find a more objective assessment to facilitate a more rapid and accurate diagnosis of depression. Speech data is easy to acquire clinically, its association with depression has been studied, although the actual predictive effect of voice features has not been examined. Thus, we do not have a general understanding of the extent to which voice features contribute to the identification of depression. In this study, we investigated the significance of the association between voice features and depression using binary logistic regression, and the actual classification effect of voice features on depression was re-examined through classification modeling. Nearly 1000 Chinese females participated in this study. Several different datasets was included as test set. We found that 4 voice features (PC1, PC6, PC17, PC24, P<0.05, corrected) made significant contribution to depression, and that the contribution effect of the voice features alone reached 35.65% (Nagelkerke's R2). In classification modeling, voice data based model has consistently higher predicting accuracy(F-measure) than the baseline model of demographic data when tested on different datasets, even across different emotion context. F-measure of voice features alone reached 81%, consistent with existing data. These results demonstrate that voice features are effective in predicting depression and indicate that more sophisticated models based on voice features can be built to help in clinical diagnosis.

Conflict of interest statement

The authors have declared that no competing interests exist.

References

    1. Murray, C. J., Lopez, A. D., & World Health Organization. (1996). The global burden of disease: a comprehensive assessment of mortality and disability from diseases, injuries, and risk factors in 1990 and projected to 2020: summary.
    1. Melse J. M., Essink-Bot M. L., Kramers P. G., & Hoeymans N. (2000). A national burden of disease calculation: Dutch disability-adjusted life-years. Dutch Burden of Disease Group. American journal of public health, 90(8), 1241–1247. 10.2105/ajph.90.8.1241
    1. Michaud C. M., Murray C. J., & Bloom B. R. (2001). Burden of disease—implications for future research. Jama, 285(5), 535–539. 10.1001/jama.285.5.535
    1. Nierenberg A. A., Gray S. M., & Grandin L. D. (2001). Mood disorders and suicide. The Journal of clinical psychiatry.
    1. Penninx B. W., Beekman A. T., Honig A., Deeg D. J., Schoevers R. A., van Eijk J. T., & van Tilburg W. (2001). Depression and cardiac mortality: results from a community-based longitudinal study. Archives of general psychiatry, 58(3), 221–227. 10.1001/archpsyc.58.3.221
    1. Alonso J., Angermeyer M. C., Bernert S., Bruffaerts R., Brugha T. S., … & Gasquet I. (2004). Disability and quality of life impact of mental disorders in Europe: results from the European Study of the Epidemiology of Mental Disorders (ESEMeD) project. Acta Psychiatrica Scandinavica, 109, 38–46. 10.1111/j.1600-0047.2004.00329.x
    1. Üstün T. B., Ayuso-Mateos J. L., Chatterji S., Mathers C., & Murray C. J. (2004). Global burden of depressive disorders in the year 2000. The British journal of psychiatry, 184(5), 386–392. 10.1192/bjp.184.5.386
    1. World Health Organization. (2017). Depression and other common mental disorders: global health estimates.
    1. Goldberg D. (1995). Epidemiology of mental disorders in primary care settings. Epidemiologic reviews, 17(1), 182–190. 10.1093/oxfordjournals.epirev.a036174
    1. World Health Organization, 2018. Depression. Retrived from .
    1. Spitzer R. L., Forman J. B., & Nee J. (1979). DSM-III field trials: I. Initial interrater diagnostic reliability. The American journal of psychiatry. 10.1176/ajp.136.6.815
    1. Regier D. A., Narrow W. E., Clarke D. E., Kraemer H. C., Kuramoto S. J., Kuhl E. A., & Kupfer D. J. (2013). DSM-5 field trials in the United States and Canada, Part II: test-retest reliability of selected categorical diagnoses. American journal of psychiatry, 170(1), 59–70. 10.1176/appi.ajp.2012.12070999
    1. Mitchell A. J., Vaze A., & Rao S. (2009). Clinical diagnosis of depression in primary care: a meta-analysis. The Lancet, 374(9690), 609–619. 10.1016/S0140-6736(09)60879-5
    1. Mundt J. C., Snyder P. J., Cannizzaro M. S., Chappie K., & Geralts D. S. (2007). Voice acoustic measures of depression severity and treatment response collected via interactive voice response (IVR) technology. Journal of neurolinguistics, 20(1), 50–64. 10.1016/j.jneuroling.2006.04.001
    1. Redei E. E., Andrus B. M., Kwasny M. J., Seok J., Cai X., Ho J., & Mohr D. C. (2014). Blood transcriptomic biomarkers in adult primary care patients with major depressive disorder undergoing cognitive behavioral therapy. Translational psychiatry, 4(9), e442 10.1038/tp.2014.66
    1. France D. J., Shiavi R. G., Silverman S., Silverman M., & Wilkes M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE transactions on Biomedical Engineering, 47(7), 829–837. 10.1016/S0022-3956(99)00037-0
    1. Alghowinem, S., Goecke, R., Wagner, M., Epps, J., Gedeon, T., Breakspear, M., & Parker, G. (2013). A comparative study of different classifiers for detecting depression from spontaneous speech. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp. 8022–8026). IEEE. doi:10.1109/ICASSP.2013.6639227
    1. Cummins, N., Epps, J., Breakspear, M., & Goecke, R. (2011). An investigation of depressed speech detection: Features and normalization. In Twelfth Annual Conference of the International Speech Communication Association.
    1. Low L. S. A., Maddage N. C., Lech M., Sheeber L. B., & Allen N. B. (2011). Detection of clinical depression in adolescents’ speech during family interactions. IEEE Transactions on Biomedical Engineering, 58(3), 574–586. 10.1109/TBME.2010.2091640
    1. Moore E. II, Clements M. A., Peifer J. W., & Weisser L. (2008). Critical analysis of the impact of glottal features in the classification of clinical depression in speech. IEEE transactions on biomedical engineering, 55(1), 96–107. 10.1109/TBME.2007.900562
    1. Scherer S., Stratou G., Gratch J., & Morency L. P. (2013). Investigating voice quality as a speaker-independent indicator of depression and PTSD In Interspeech; (pp. 847–851).
    1. Christ S. L., Lee D. J., Fleming L. E., LeBlanc W. G., Arheart K. L., Chung-Bridges K., … & McCollister K. E. (2007). Employment and occupation effects on depressive symptoms in older Americans: does working past age 65 protect against depression?. The Journals of Gerontology Series B: Psychological Sciences and Social Sciences, 62(6), S399–S403. 10.1093/geronb/62.6.S399
    1. Morales, M., Scherer, S., & Levitan, R. (2017). A Cross-modal Review of Indicators for Depression Detection Systems. In Proceedings of the Fourth Workshop on Computational Linguistics and Clinical Psychology–-From Linguistic Signal to Clinical Reality (1–12). doi: 10.18653/v1/W17-3101
    1. Titze I. R. (1989). Physiologic and acoustic differences between male and female voices. The Journal of the Acoustical Society of America, 85(4), 1699–1707. 10.1121/1.397959
    1. Cummins, N., Epps, J., Sethu, V., & Krajewski, J. (2014). Variability compensation in small data: Oversampled extraction of i-vectors for the classification of depressed speech. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on (970–974). IEEE. doi: 10.1109/ICASSP.2014.6853741
    1. Cummins N., Scherer S., Krajewski J., Schnieder S., Epps J., & Quatieri T. F. (2015). A review of depression and suicide risk assessment using speech analysis. Speech Communication, 71, 10–49. 10.1016/j.specom.2015.03.004
    1. Sturim, D., Torres-Carrasquillo, P. A., Quatieri, T. F., Malyska, N., & McCree, A. (2011). Automatic detection of depression in speech using gaussian mixture modeling with factor analysis. In Twelfth Annual Conference of the International Speech Communication Association.
    1. Scherer S., Stratou G., Lucas G., Mahmoud M., Boberg J., Gratch J., & Morency L. P. (2014). Automatic audiovisual behavior descriptors for psychological disorder analysis. Image and Vision Computing, 32(10), 648–658. 10.1016/j.imavis.2014.06.001
    1. Yang, L., Jiang, D., He, L., Pei, E., Oveneke, M. C., & Sahli, H. (2016). Decision tree based depression classification from audio video and language information. In Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge (89–96). ACM. doi: 10.1145/2988257.2988269
    1. Dobson C., Woller-Skar M. M., & Green J. (2017). An inquiry-based activity to show the importance of sample size and random sampling. Science Scope, 40(8), 76.
    1. Sim J., Saunders B., Waterfield J., & Kingstone T. (2018). Can sample size in qualitative research be determined a priori?. International Journal of Social Research Methodology, 1–16. 10.1080/13645579.2018.1454643
    1. Akobeng A. K. (2016). Understanding type I and type II errors, statistical power and sample size. Acta Paediatrica, 105(6), 605–609. 10.1111/apa.13384
    1. Yang F., Zhao H., Wang Z., Tao D., Xiao X., Niu Q., … & Li K. (2014). Age at onset of recurrent major depression in Han Chinese women–a replication study. Journal of affective disorders, 157, 72–79. 10.1016/j.jad.2014.01.004
    1. Yang F., Li Y., Xie D., Shao C., Ren J., Wu W., ‥ & Qiao D. (2011). Age at onset of major depressive disorder in Han Chinese women: relationship with clinical features and family history. Journal of affective disorders, 135(1–3), 89–94. 10.1016/j.jad.2011.06.056
    1. Liu, Z., Hu, B., Yan, L., Wang, T., Liu, F., Li, X., & Kang, H. (2015, September). Detection of depression in speech. In 2015 international conference on affective computing and intelligent interaction (ACII) (pp. 743–747). IEEE.
    1. Wang, J., Sui, X., Hu, B., Flint, J., Bai, S., Gao, Y., ‥ & Zhu, T. (2017a). Detecting Postpartum Depression in Depressed People by Speech Features. In International Conference on Human Centered Computing (pp. 433–442). Springer, Cham. 10.1007/978-3-319-74521-3_46
    1. Wang, J., Sui, X., Zhu, T., & Flint, J. (2017b). Identifying comorbidities from depressed people via voice analysis. In Bioinformatics and Biomedicine (BIBM), 2017 IEEE International Conference on (pp. 986–991). IEEE. doi: 10.1109/BIBM.2017.8217791
    1. Weng, S., Chen, S., Yu, L., Wu, X., Cai, W., Liu, Z., ‥ & Li, M. (2015, December). The SYSU system for the interspeech 2015 automatic speaker verification spoofing and countermeasures challenge. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015 Asia-Pacific (pp. 152–155). IEEE. doi: 10.1109/APSIPA.2015.7415492
    1. Eyben F, Weninger F, Gross F, et al. Recent developments in opensmile, the munich open-source multimedia feature extractor[C]//Proceedings of the 21st ACM international conference on Multimedia. ACM, 2013: 835–838
    1. Sui XY.(2017) Depression Recognition with Audios Collected under Natural Environment. Postgraduate dissertation. Doctoral dissertation, Beijing. Graduate School of Chinese Academy of Sciences
    1. Rosenbaum P. R., & Rubin D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.
    1. Bewick V., Cheek L., & Ball J. (2005). Statistics review 14: Logistic regression. Critical care, 9(1), 112 10.1186/cc3045
    1. Miron B. Kursa, Witold R. Rudnicki (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), p. 1–13. URL:
    1. Iverson L. R., Prasad A. M., Matthews S. N., & Peters M. (2008). Estimating potential habitat for 134 eastern US tree species under six climate scenarios. Forest Ecology and Management, 254(3), 390–406. 10.1016/j.foreco.2007.07.023
    1. Metz C. E. (1978, October). Basic principles of ROC analysis In Seminars in nuclear medicine (Vol. 8, No. 4, pp. 283–298). WB Saunders; 10.1016/S0001-2998(78)80014-2
    1. Davidson R. J., Pizzagalli D., Nitschke J. B., & Putnam K. (2002). Depression: perspectives from affective neuroscience. Annual review of psychology, 53(1), 545–574. 10.1146/annurev.psych.53.100901.135148
    1. Siegle G. J., Steinhauer S. R., Thase M. E., Stenger V. A., & Carter C. S. (2002). Can’t shake that feeling: event-related fMRI assessment of sustained amygdala activity in response to emotional information in depressed individuals. Biological psychiatry, 51(9), 693–707.
    1. Murray I. R., & Arnott J. L. (1993). Toward the simulation of emotion in synthetic speech: A review of the literature on human vocal emotion. The Journal of the Acoustical Society of America, 93(2), 1097–1108. 10.1121/1.405558
    1. Abelin, Å., & Allwood, J. (2000). Cross linguistic interpretation of emotional prosody. In ISCA Tutorial and Research Workshop (ITRW) on Speech and Emotion.
    1. Scherer, K. R. (2000). A cross-cultural investigation of emotion inferences from voice and speech: Implications for speech technology. In Sixth International Conference on Spoken Language Processing.
    1. Bhatti, M. W., Wang, Y., & Guan, L. (2004, May). A neural network approach for human emotion recognition in speech. In Circuits and Systems, 2004. ISCAS'04. Proceedings of the 2004 International Symposium on (Vol. 2, pp. II-181). IEEE. doi: 10.1109/ISCAS.2004.1329238
    1. El Ayadi M., Kamel M. S., & Karray F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587. 10.1016/j.patcog.2010.09.020
    1. Lieberman P., & Michaels S. B. (1962). Some aspects of fundamental frequency and envelope amplitude as related to the emotional content of speech. The Journal of the Acoustical Society of America, 34(7), 922–927. 10.1121/1.1918222
    1. Toda T., Black A. W., & Tokuda K. (2007). Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2222–2235. 10.1109/TASL.2007.907344
    1. Vaissière J. (1983). Language-independent prosodic features In Prosody: Models and measurements (pp. 53–66). Springer, Berlin, Heidelberg: 10.1007/978-3-642-69103-4_5
    1. Coker C. H. (1976). A model of articulatory dynamics and control. Proceedings of the IEEE, 64(4), 452–460. 10.1109/PROC.1976.10154
    1. Crystal, D. (1976). Prosodic systems and intonation in English(Vol. 1). CUP Archive.
    1. Darby J. K., & Hollien H. (1977). Vocal and speech patterns of depressive patients. Folia Phoniatrica et Logopaedica, 29(4), 279–291. 10.1159/000264098
    1. Frick R. W. (1985). Communicating emotion: The role of prosodic features. Psychological Bulletin, 97(3), 412 10.1037/0033-2909.97.3.412
    1. Afshan A., Guo J., Park S. J., Ravi V., Flint J., & Alwan A. (2018). Effectiveness of Voice Quality Features in Detecting Depression. In Proc. Interspeech (pp. 1676–1680).
    1. Guo J., Xu N., Qian K., Shi Y., Xu K., Wu Y., & Alwan A. (2018). Deep neural network based i-vector mapping for speaker verification using short utterances. Speech Communication, 105, 92–102.
    1. Guo J., Yang R., Arsikere H., & Alwan A. (2017). Robust speaker identification via fusion of subglottal resonances and cepstral features. the Journal of the Acoustical Society of America, 141(4), EL420–EL426. 10.1121/1.4979841
    1. Guo J., Nookala U. A., & Alwan A. (2017). CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. In INTERSPEECH; (pp. 3712–3716).
    1. Guo J., Yeung G., Muralidharan D., Arsikere H., Afshan A., & Alwan A. (2016). Speaker Verification Using Short Utterances with DNN-Based Estimation of Subglottal Acoustic Features. In INTERSPEECH; (pp. 2219–2222).

Source: PubMed

3
Abonnieren