Convolutional Neural Network Classifies Pathological Voice Change in Laryngeal Cancer with High Accuracy

HyunBum Kim, Juhyeong Jeon, Yeon Jae Han, YoungHoon Joo, Jonghwan Lee, Seungchul Lee, Sun Im, HyunBum Kim, Juhyeong Jeon, Yeon Jae Han, YoungHoon Joo, Jonghwan Lee, Seungchul Lee, Sun Im

Abstract

Voice changes may be the earliest signs in laryngeal cancer. We investigated whether automated voice signal analysis can be used to distinguish patients with laryngeal cancer from healthy subjects. We extracted features using the software package for speech analysis in phonetics (PRAAT) and calculated the Mel-frequency cepstral coefficients (MFCCs) from voice samples of a vowel sound of /a:/. The proposed method was tested with six algorithms: support vector machine (SVM), extreme gradient boosting (XGBoost), light gradient boosted machine (LGBM), artificial neural network (ANN), one-dimensional convolutional neural network (1D-CNN) and two-dimensional convolutional neural network (2D-CNN). Their performances were evaluated in terms of accuracy, sensitivity, and specificity. The result was compared with human performance. A total of four volunteers, two of whom were trained laryngologists, rated the same files. The 1D-CNN showed the highest accuracy of 85% and sensitivity and sensitivity and specificity levels of 78% and 93%. The two laryngologists achieved accuracy of 69.9% but sensitivity levels of 44%. Automated analysis of voice signals could differentiate subjects with laryngeal cancer from those of healthy subjects with higher diagnostic properties than those performed by the four volunteers.

Keywords: deep learning; larynx cancer; machine learning; voice change; voice pathology classification.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The graphic presentation of transformation from raw signal into Mel-frequency cepstral coefficients (MFCCs) image, a necessary process to comply with the two-dimensional convolutional neural network input shape. (a) Plot of signals down sampled to 16,000 Hz; (b) plot of signals normalized between −1 and 1; (c) image of signals after MFCCs transformation.
Figure 2
Figure 2
The flowchart of Mel-frequency cepstral coefficients (MFCCs) transformation. (a) and presentation of Mel filter banks (b). The triangular filter banks are densely located towards low frequency range, reflecting the distinctive nature of the human voice in that range. Abbreviations: FFT, fast Fourier transform; DFT, discrete Fourier transform.
Figure 3
Figure 3
Illustration of five-fold validation. A given data set is split into five subsections where each fold is used as a testing set, a useful method to use all data where data is limited.
Figure 4
Figure 4
Illustration of one-dimensional convolutional neural network model structure.
Figure 5
Figure 5
Illustration of two-dimensional convolutional neural network model structure.
Figure 6
Figure 6
Feature importance analysis of XGBoost. The plot demonstrates the relative information gains based on the feature importance classification task of male voice samples.
Figure 7
Figure 7
ROC (receiver operating characteristic) curve analysis of the different models for the classification of laryngeal cancer. ROC curves algorithms for classification task of only male voice samples. Abbreviations: LGBM, LightGBM; XBG, XGBoost; SVM, support vector machine; ANN, artificial neural network; 1D-CNN, one-dimensional convolutional neural network; 2D-CNN, two-dimensional convolutional neural network; MFCCs, Mel-frequency cepstral coefficients; STFT, short time Fourier transform.

References

    1. Siegel R.L., Miller K.D., Jemal A. Cancer statistics. CA Cancer J. Clin. 2020;70:7–30. doi: 10.3322/caac.21590.
    1. Nieminen M., Aro K., Jouhi L., Back L., Makitie A., Atula T. Causes for delay before specialist consultation in head and neck cancer. Acta Oncol. 2018;57:1677–1686. doi: 10.1080/0284186X.2018.1497297.
    1. Polesel J., Furlan C., Birri S., Giacomarra V., Vaccher E., Grando G., Gobitti C., Navarria F., Schioppa O., Minatel E., et al. The impact of time to treatment initiation on survival from head and neck cancer in north-eastern Italy. Oral Oncol. 2017;67:175–182. doi: 10.1016/j.oraloncology.2017.02.009.
    1. Aylward A., Park J., Abdelaziz S., Hunt J.P., Buchmann L.O., Cannon R.B., Rowe K., Snyder J., Deshmukh V., Newman M., et al. Individualized prediction of late-onset dysphagia in head and neck cancer survivors. Head Neck. 2020;42:708–718. doi: 10.1002/hed.26039.
    1. Balaguer M., Pommee T., Farinas J., Pinquier J., Woisard V., Speyer R. Effects of oral and oropharyngeal cancer on speech intelligibility using acoustic analysis: Systematic review. Head Neck. 2020;42:111–130. doi: 10.1002/hed.25949.
    1. Jeon J., Han Y.J., Park G.Y., Sohn D.G., Lee S., Im S. Artificial intelligence in the field of electrodiagnosis-a new threat or heralding a new era in electromyography? Clin. Neurophysiol. 2019;130:1995–1996. doi: 10.1016/j.clinph.2019.06.005.
    1. Mohammed M.A., Abdulkareem K.H., Mostafa S.A., Ghani M.K.A., Maashi M.S., Garcia-Zapirain B., Oleagordia I., Alhakami H., AL-Dhief F.T. Voice pathology detection and classification using convolutional neural network model. Appl. Sci. 2020;10:3723. doi: 10.3390/app10113723.
    1. Al-Nasheri A., Muhammad G., Alsulaiman M., Ali Z. Investigation of voice pathology detection and classification on different frequency regions using correlation functions. J. Voice. 2017;31:3–15. doi: 10.1016/j.jvoice.2016.01.014.
    1. Chuang Z.Y., Yu X.T., Chen J.Y., Hsu Y.T., Xu Z.Z., Wang C.T., Lin F.C., Fang S.H. DNN-based approach to detect and classify pathological voice; Proceedings of the 2018 IEEE International Conference on Big Data (Big Data); Seattle, WA, USA. 10–13 December 2018; pp. 5238–5241.
    1. Fang S.H., Tsao Y., Hsiao M.J., Chen J.Y., Lai Y.H., Lin F.C., Wang C.T. Detection of pathological voice using cepstrum vectors: A deep learning approach. J. Voice. 2019;33:634–641. doi: 10.1016/j.jvoice.2018.02.003.
    1. Godino-Llorente J.I., Gomez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans. Biomed. Eng. 2004;51:380–384. doi: 10.1109/TBME.2003.820386.
    1. Eye M., Infirmary E. Voice Disorders Database. Volume 1.03. Kay Elemetrics Corporation; Lincoln Park, NJ, USA: 1984. [CD-ROM]
    1. Saarbrucken Voice Database. [(accessed on 30 January 2020)]; Available online:
    1. Boersma P., Weenink D. PRAAT, a system for doing phonetics by computer. Glot. Int. 2000;5:341–345.
    1. Muda L., Begam M., Elamvazuthi I. Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2010;2:138–143.
    1. Huang S., Cai N., Pacheco P.P., Narrandes S., Wang Y., Xu W. Applications of support vector machine (SVM) learning in cancer genomics. Cancer Genom. Proteom. 2018;15:41–51.
    1. Tao J., Qin C., Li W., Liu C. Intelligent fault diagnosis of diesel engines via extreme gradient boosting and high-accuracy time-frequency information of vibration signals. Sensors. 2019;19:3280. doi: 10.3390/s19153280.
    1. Hao X., Bao Y., Guo Y., Yu M., Zhang D., Risacher S.L., Saykin A.J., Yao X., Shen L., Alzheimer’s Disease Neuroimaging Initiative Multi-modal neuroimaging feature selection with consistent metric constraint for diagnosis of Alzheimer’s disease. Med. Image Anal. 2020;60:101625. doi: 10.1016/j.media.2019.101625.
    1. Zeiler M.D., Ranzato M., Monga R., Mao M., Yang K., Le Q.V., Nguyen P., Senior A., Vanhoucke V., Dean J., et al. On rectified linear units for speech processing; Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; Vancouver, BC, Canada. 26–31 May 2013; pp. 3517–3521.
    1. Schmidt-Hieber J. Nonparametric regression using deep neural networks with ReLU activation function. Ann. Statist. 2020;48:1875–1897. doi: 10.1214/19-AOS1875.
    1. Titze I.R. Physiologic and acoustic differences between male and female voices. J. Acoust. Soc. Am. 1989;85:1699–1707. doi: 10.1121/1.397959.
    1. Gavidia-Ceballos L., Hansen J.H. Direct speech feature estimation using an iterative EM algorithm for vocal fold pathology detection. IEEE Trans. Biomed. Eng. 1996;43:373–383. doi: 10.1109/10.486257.
    1. Ritchings R.T., McGillion M., Moore C.J. Pathological voice quality assessment using artificial neural networks. Med. Eng. Phys. 2002;24:561–564. doi: 10.1016/S1350-4533(02)00064-4.
    1. Li F., Liu M., Zhao Y., Kong L., Dong L., Liu X., Hui M. Feature extraction and classification of heart sound using 1D convolutional neural networks. EURASIP J. Adv. Signal Process. 2019;2019:59. doi: 10.1186/s13634-019-0651-3.
    1. Tang J.A., Lango M.N. Diverging incidence trends for larynx and tonsil cancer in low socioeconomic regions of the USA. Oral Oncol. 2019;91:65–68. doi: 10.1016/j.oraloncology.2019.02.024.
    1. Louie K.S., Mehanna H., Sasieni P. Trends in head and neck cancers in England from 1995 to 2011 and projections up to 2025. Oral Oncol. 2015;51:341–348. doi: 10.1016/j.oraloncology.2015.01.002.
    1. Cook M.B., McGlynn K.A., Devesa S.S., Freedman N.D., Anderson W.F. Sex disparities in cancer mortality and survival. Cancer Epidemiol. Biomark. Prev. 2011;20:1629–1637. doi: 10.1158/1055-9965.EPI-11-0246.
    1. Chatenoud L., Garavello W., Pagan E., Bertuccio P., Gallus S., La Vecchia C., Negri E., Bosetti C. Laryngeal cancer mortality trends in European countries. Int. J. Cancer. 2016;138:833–842. doi: 10.1002/ijc.29833.
    1. Fraile R., Saenz-Lechon N., Godino-Llorente J.I., Osma-Ruiz V., Fredouille C. Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex. Folia Phoniatr. Logop. 2009;61:146–152. doi: 10.1159/000219950.
    1. Skuk V.G., Schweinberger S.R. Influences of fundamental frequency, formant frequencies, aperiodicity, and spectrum level on the perception of voice gender. J. Speech Lang. Hear. Res. 2014;57:285–296. doi: 10.1044/1092-4388(2013/12-0314).
    1. Mekis J., Strojan P., Boltezar I.H. Factors affecting voice quality in early glottic cancer before and after radiotherapy. Radiol. Oncol. 2019;53:459–464. doi: 10.2478/raon-2019-0050.
    1. Rabeh H., Salah H., Adnane C. Voice pathology recognition and classification using noise related features. Int. J. Adv. Comput. Sci. Appl. 2018;9:82–87. doi: 10.14569/IJACSA.2018.091112.
    1. Kinshuck A.J., Shenoy A., Jones T.M. Voice outcomes for early laryngeal cancer. Curr. Opin. Otolaryngol. Head Neck Surg. 2017;25:211–216. doi: 10.1097/MOO.0000000000000363.
    1. Byeon H., Cha S. Evaluating the effects of smoking on the voice and subjective voice problems using a meta-analysis approach. Sci. Rep. 2020;10:4720. doi: 10.1038/s41598-020-61565-3.
    1. Raitiola H., Pukander J., Laippala P. Glottic and supraglottic laryngeal carcinoma: Differences in epidemiology, clinical characteristics and prognosis. Acta Otolaryngol. 1999;119:847–851.
    1. Hirasawa T., Aoyama K., Tanimoto T., Ishihara S., Shichijo S., Ozawa T., Ohnishi T., Fujishiro M., Matsuo K., Fujisaki J., et al. Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images. Gastric Cancer. 2018;21:653–660. doi: 10.1007/s10120-018-0793-2.
    1. Esteva A., Kuprel B., Novoa R.A., Ko J., Swetter S.M., Blau H.M., Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056.
    1. de Bruijn M., ten Bosch L., Kuik D.J., Langendijk J.A., Leemans C.R., Leeuw I.V.d. Artificial neural network analysis to assess hypernasality in patients treated for oral or oropharyngeal cancer. Logoped. Phoniatr. Vocol. 2011;36:168–174. doi: 10.3109/14015439.2011.606227.
    1. Aicha A.B., Ezzine K. Cancer larynx detection using glottal flow parameters and statistical tools; Proceedings of the 2016 International Symposium on Signal, Image, Video and Communications (ISIVC); Tunis, Tunisia. 21–23 November 2016; pp. 65–70.
    1. Diamant A., Chatterjee A., Vallieres M., Shenouda G., Seuntjens J. Deep learning in head & neck cancer outcome prediction. Sci. Rep. 2019;9:2764. doi: 10.1038/s41598-019-39206-1.
    1. Xiong H., Lin P., Yu J.G., Ye J., Xiao L., Tao Y., Jiang Z., Lin W., Liu M., Xu J., et al. Computer-aided diagnosis of laryngeal cancer via deep learning based on laryngoscopic images. EBioMedicine. 2019;48:92–99. doi: 10.1016/j.ebiom.2019.08.075.
    1. Mueller P.B. The aging voice. Semin. Speech Lang. 1997;18:159–168. doi: 10.1055/s-2008-1064070.
    1. Brody R.M., Albergotti W.G., Shimunov D., Nicolli E., Patel U.A., Harris B.N., Bur A.M. Changes in head and neck oncologic practice during the COVID-19 pandemic. Head Neck. 2020;42:1448–1453. doi: 10.1002/hed.26233.
    1. Ali Z., Hossain M.S., Muhammad G., Sangaiah A.K. An intelligent healthcare system for detection and classification to discriminate vocal fold disorders. Future Gener. Comput. Syst. 2018;85:19–28. doi: 10.1016/j.future.2018.02.021.

Source: PubMed

3
Subscribe