Machine-Learning Approaches in COVID-19 Survival Analysis and Discharge-Time Likelihood Prediction Using Clinical Data

Mohammadreza Nemati, Jamal Ansary, Nazafarin Nemati, Mohammadreza Nemati, Jamal Ansary, Nazafarin Nemati

Abstract

As a highly contagious respiratory disease, COVID-19 has yielded high mortality rates since its emergence in December 2019. As the number of COVID-19 cases soars in epicenters, health officials are warning about the possibility of the designated treatment centers being overwhelmed by coronavirus patients. In this study, several computational techniques are implemented to analyze the survival characteristics of 1,182 patients. The computational results agree with the outcome reported in early clinical reports released for a group of patients from China that confirmed a higher mortality rate in men compared with women and in older age groups. The discharge-time prediction of COVID-19 patients was also evaluated using different machine-learning and statistical analysis methods. The results indicate that the Gradient Boosting survival model outperforms other models for patient survival prediction in this study. This research study is aimed to help health officials make more educated decisions during the outbreak.

Keywords: COVID-19; artificial intelligence; biostatistic; coronavirus; machine learning; pandemic; statistical analysis; survival analysis.

Conflict of interest statement

The authors declare no competing interests.

© 2020 The Authors.

Figures

Graphical abstract
Graphical abstract
Figure 1
Figure 1
Probability Estimation of Discharge Time in Different Age and Sex Groups (A) Discharge-time probability estimation of sex groups after showing the symptoms. (B) Discharge-time probability estimation of sex groups after hospitalization. (C) Discharge-time probability estimation of two categories of age groups. (D) Discharge-time probability estimation of four categories of age groups.
Figure 2
Figure 2
Age Variation of 1,182 Patients Patients are categorized into four different age groups. First, second, and third quartiles are 34, 46, and 60, respectively.
Figure 3
Figure 3
Data-Processing Steps (A) Data collection and filtering. (B) Data-processing steps required for analysis.
Figure 4
Figure 4
Demonstration of Data Censorship Status Patients A, B, and D have not experienced any events until the end of the study, so they are considered as censored samples, but patient C is not censored because the event has occurred and it is fully observed.
Figure 5
Figure 5
Survival Analysis Algorithms Survival analysis techniques applied on COVID-19 data to predict survival time and hospital discharge-time probabilities.

References

    1. Li Q., Guan X., Wu P., Wang X., Zhou L., Tong Y., Ren R., Leung K.S., Lau E.H., Wong J.Y., Xing X. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N. Engl. J. Med. 2020;382:1199–1207.
    1. Mahase E. Coronavirus: Covid-19 has killed more people than SARS and MERS combined, despite lower case fatality rate. BMJ. 2020;368:m641.
    1. World Health Organization Coronavirus Disease 2019 (COVID-19): Situation Report, 61. 2020.
    1. Wu Z., McGoogan J.M. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese Center for Disease Control and Prevention. JAMA. 2020;323:1239–1242.
    1. Dowd J.B., Andriano L., Brazel D.M., Rotondi V., Block P., Ding X., Liu Y., Mills M.C. Demographic science aids in understanding the spread and fatality rates of COVID-19. Proc. Natl. Acad. Sci. U S A. 2020;117:9696–9698.
    1. Livingston E., Bucher K. Coronavirus disease 2019 (COVID-19) in Italy. JAMA. 2020;323:1335.
    1. Ji J.S., Liu Y., Liu R., Zha Y., Chang X., Zhang L., Zhang Y., Zeng J., Dong T., Xu X., Zhou L. Survival analysis of hospital length of stay of novel coronavirus (COVID-19) pneumonia patients in Sichuan, China. medRxiv. 2020 doi: 10.1101/2020.04.07.20057299.
    1. Li X., Xu S., Yu M., Wang K., Tao Y., Zhou Y., Shi J., Zhou M., Wu B., Yang Z. Risk factors for severity and mortality in adult COVID-19 inpatients in Wuhan. J. Allergy Clin. Immunol. 2020;146:110–118.
    1. Du R.H., Liang L.R., Yang C.Q., Wang W., Cao T.Z., Li M., Guo G.Y., Du J., Zheng C.L., Zhu Q., Hu M. Predictors of mortality for patients with COVID-19 pneumonia caused by SARS-CoV-2: a prospective cohort study. Eur. Respir. J. 2020;55:2000524.
    1. Uno H., Cai T., Pencina M.J., D'Agostino R.B., Wei L.J. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 2011;30:1105–1117.
    1. Fotso S. Deep neural networks for survival analysis based on a multi-task framework. arXiv. 2018 1801.05512.
    1. Farhangi A., Bian J., Wang J., Guo Z. 2019 IEEE Real-Time Systems Symposium (RTSS) IEEE; 2019. Work-in-progress: a deep learning strategy for I/O scheduling in storage systems; pp. 568–571.
    1. Fotso S. PySurvival: open source package for survival analysis modeling. 2019.
    1. Pan F., Ye T., Sun P., Gui S., Liang B., Li L., Zheng D., Wang J., Hesketh R.L., Yang L., Zheng C. Time course of lung changes on chest CT during recovery from 2019 novel coronavirus (COVID-19) pneumonia. Radiology. 2020;295:715–721.
    1. Ruan Q., Yang K., Wang W., Jiang L., Song J. Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China. Intensive Care Med. 2020;46:846–848.
    1. Zhou F., Yu T., Du R., Fan G., Liu Y., Liu Z., Xiang J., Wang Y., Song B., Gu X., Guan L. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–1062.
    1. Xu B., Gutierrez B., Mekaru S., Sewalk K., Goodwin L., Loskill A., Cohn E.L., Hswen Y., Hill S.C., Cobo M.M., Zarebski A.E. Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data. 2020;7:1–6.
    1. Ji W., Wang X., Zhang D. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2016. A probabilistic multi-touch attribution model for online advertising; pp. 1373–1382.
    1. Reddy C.K., Li Y. A review of clinical prediction models. Healthc. Data Analytics. 2015;36:343–378.
    1. Pölsterl S., Gupta P., Wang L., Conjeti S., Katouzian A., Navab N. Heterogeneous ensembles for predicting survival of metastatic, castrate-resistant prostate cancer patients. F1000Res. 2016;5:2676.
    1. Kourou K., Exarchos T.P., Exarchos K.P., Karamouzis M.V., Fotiadis D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015;13:8–17.
    1. Abbasi-Kesbi R., Memarzadeh-Tehran H., Deen M.J. Technique to estimate human reaction time based on visual perception. Healthc. Technol. Lett. 2017;4:73–77.
    1. Goel M.K., Khanna P., Kishore J. Understanding survival analysis: Kaplan-Meier estimate. Int. J. Ayurveda Res. 2010;1:274.
    1. Wang P., Li Y., Reddy C.K. Machine learning for survival analysis: a survey. ACM Comput. Surv. (CSUR) 2019;51 doi: 10.1145/3214306.
    1. Mittal S., Madigan D., Burd R.S., Suchard M.A. High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis. Biostatistics. 2014;15:207–221.
    1. Lee J.A., Verleysen M. Springer Science & Business Media; 2007. Nonlinear Dimensionality Reduction.
    1. Esmaeilbeig Z., Ghaemmaghami S. 2018 15th International ISC (Iranian Society of Cryptology) Conference on Information Security and Cryptology (ISCISC) IEEE; 2018. Compressed video watermarking for authentication and reconstruction of the audio part.
    1. Kalbfleisch J.D., Prentice R.L. Second Edition. Vol. 360. John Wiley & Sons; 2011. The statistical analysis of failure time data. (Wiley Series in Probability and Statistics).
    1. Schapire R.E., Freund Y., Bartlett P., Lee W.S. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 1998;26:1651–1686.
    1. Chen Y., Jia Z., Mercola D., Xie X. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Comput. Math. Methods Med. 2013;2013:873595.
    1. Bühlmann P., Yu B. Boosting with the L 2 loss: regression and classification. J. Am. Stat. Assoc. 2003;98:324–339.
    1. Fathi M., Nemati M., Mohammadi S.M., Abbasi-Kesbi R. A machine learning approach based on SVM for classification of liver diseases. Biomed. Eng. Appl. Basis Commun. 2020;32 doi: 10.4015/S1016237220500180.
    1. Khan F.M., Zubek V.B. 2008 Eighth IEEE International Conference on Data Mining. IEEE; 2008. December. Support vector regression for censored data (SVRc): a novel tool for survival analysis; pp. 863–868.
    1. Pölsterl S., Navab N., Katouzian A. An efficient training algorithm for kernel survival support vector machines. arXiv. 2016 1611.07054.
    1. Pölsterl S., Navab N., Katouzian A. September. Fast training of support vector machines for survival analysis. In: Cellier P., Driessens K., editors. Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer; 2015. pp. 243–259.

Source: PubMed

3
Předplatit