Voice for Health: The Use of Vocal Biomarkers from Research to Clinical Practice

Guy Fagherazzi, Aurélie Fischer, Muhannad Ismael, Vladimir Despotovic, Guy Fagherazzi, Aurélie Fischer, Muhannad Ismael, Vladimir Despotovic

Abstract

Diseases can affect organs such as the heart, lungs, brain, muscles, or vocal folds, which can then alter an individual's voice. Therefore, voice analysis using artificial intelligence opens new opportunities for healthcare. From using vocal biomarkers for diagnosis, risk prediction, and remote monitoring of various clinical outcomes and symptoms, we offer in this review an overview of the various applications of voice for health-related purposes. We discuss the potential of this rapidly evolving environment from a research, patient, and clinical perspective. We also discuss the key challenges to overcome in the near future for a substantial and efficient use of voice in healthcare.

Keywords: Artificial intelligence; COVID-19; Signal decomposition; Smart home; Vocal biomarker; Voice.

Conflict of interest statement

The authors have no conflicts of interest to declare.

Copyright © 2021 by S. Karger AG, Basel.

Figures

Fig. 1
Fig. 1
Pipeline for vocal biomarker identification, from research to practice.
Fig. 2
Fig. 2
Representation of a typical voice signal pre-processing and feature extraction using MFCCs. Representation of a typical voice signal pre-processing and linguistic and acoustic feature extraction. Voice signal represents the sound of the following sentence (e.g., “Luxembourg is a resolutely multilingual environment”). ASR refers to automatic speech recognition. Linguistic annotation includes part-of-speech, dependency and constituency parses, and sense tagging. In this diagram, linguistic annotation is applied using tools like CoreNLP. The number of pauses, speech rate, and noun rate are linguistic features and extracted using the BlaBla package, which is a clinical linguistic feature extraction tool. Acoustic features are extracted using MFCCs. The framing step refers to a signal segmentation into N samples. Windowing is multiplying of the signal sample by a window function like Hamming to minimize discontinuous signals that can cause noise in the subsequent fast Fourier transform (FFT) step. In this diagram, dimension reduction is represented by the principal component analysis (PCA) method, reducing feature space to a one-dimensional vector.
Fig. 3
Fig. 3
Overview of present and future use of vocal biomarkers for health.

Source: PubMed

3
Se inscrever