Detection of clinical depression in adolescents' speech during family interactions

Lu-Shih Alex Low, Namunu C Maddage, Margaret Lech, Lisa B Sheeber, Nicholas B Allen, Lu-Shih Alex Low, Namunu C Maddage, Margaret Lech, Lisa B Sheeber, Nicholas B Allen

Abstract

The properties of acoustic speech have previously been investigated as possible cues for depression in adults. However, these studies were restricted to small populations of patients and the speech recordings were made during patients' clinical interviews or fixed-text reading sessions. Symptoms of depression often first appear during adolescence at a time when the voice is changing, in both males and females, suggesting that specific studies of these phenomena in adolescent populations are warranted. This study investigated acoustic correlates of depression in a large sample of 139 adolescents (68 clinically depressed and 71 controls). Speech recordings were made during naturalistic interactions between adolescents and their parents. Prosodic, cepstral, spectral, and glottal features, as well as features derived from the Teager energy operator (TEO), were tested within a binary classification framework. Strong gender differences in classification accuracy were observed. The TEO-based features clearly outperformed all other features and feature combinations, providing classification accuracy ranging between 81%-87% for males and 72%-79% for females. Close, but slightly less accurate, results were obtained by combining glottal features with prosodic and spectral features (67%-69% for males and 70%-75% for females). These findings indicate the importance of nonlinear mechanisms associated with the glottal flow formation as cues for clinical depression.

Figures

Fig. 1
Fig. 1
Block diagram in modeling speech of depressed and control adolescents.
Fig 2
Fig 2
TEO-CB-Auto-Env feature (a) Feature extraction implementation (b) Example of the TEO profile and the autocorrelation envelope for an utterance within the CB-9.
Fig 3
Fig 3
Glottal inverse filtering (a) Speech frame of 25 ms (b) Glottal flow estimate (c) Glottal flow derivative (d) Glottal flow spectrum.
Fig 4
Fig 4
SBCCA for the TEO and cepstral features using GIM and GDM.
Fig 5
Fig 5
Classification accuracies using different concatenated test utterances length for TEO feature category.
Fig 6
Fig 6
Average frames (25 ms) normalized area under the autocorrelation envelope for the TEO feature category for each of the 15 CBs in all adolescents within the depressed and control classes.

Source: PubMed

3
Subscribe