Machine Learning Assisted Prediction of Prognostic Biomarkers Associated With COVID-19, Using Clinical and Proteomics Data

Rahila Sardar, Arun Sharma, Dinesh Gupta, Rahila Sardar, Arun Sharma, Dinesh Gupta

Abstract

With the availability of COVID-19-related clinical data, healthcare researchers can now explore the potential of computational technologies such as artificial intelligence (AI) and machine learning (ML) to discover biomarkers for accurate detection, early diagnosis, and prognosis for the management of COVID-19. However, the identification of biomarkers associated with survival and deaths remains a major challenge for early prognosis. In the present study, we have evaluated and developed AI-based prediction algorithms for predicting a COVID-19 patient's survival or death based on a publicly available dataset consisting of clinical parameters and protein profile data of hospital-admitted COVID-19 patients. The best classification model based on clinical parameters achieved a maximum accuracy of 89.47% for predicting survival or death of COVID-19 patients, with a sensitivity and specificity of 85.71 and 92.45%, respectively. The classification model based on normalized protein expression values of 45 proteins achieved a maximum accuracy of 89.01% for predicting the survival or death, with a sensitivity and specificity of 92.68 and 86%, respectively. Interestingly, we identified 9 clinical and 45 protein-based putative biomarkers associated with the survival/death of COVID-19 patients. Based on our findings, few clinical features and proteins correlate significantly with the literature and reaffirm their role in the COVID-19 disease progression at the molecular level. The machine learning-based models developed in the present study have the potential to predict the survival chances of COVID-19 positive patients in the early stages of the disease or at the time of hospitalization. However, this has to be verified on a larger cohort of patients before it can be put to actual clinical practice. We have also developed a webserver CovidPrognosis, where clinical information can be uploaded to predict the survival chances of a COVID-19 patient. The webserver is available at http://14.139.62.220/covidprognosis/.

Keywords: COVID-19; biomarkers discovery; feature selection; machine learning; proteomics and bioinformatics.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2021 Sardar, Sharma and Gupta.

Figures

FIGURE 1
FIGURE 1
ML-based pipeline to identify key features associated with survival based on clinical and proteomics data. (The figure images were generated using biorender.com).
FIGURE 2
FIGURE 2
Selected features from clinical data to classify COVID-19 patients who survived vs. those who died.
FIGURE 3
FIGURE 3
Pathway analysis of the selected 45 proteins.
FIGURE 4
FIGURE 4
A screenshot showing the functionality of the CovidPrognosis webserver with three clinical parameters for Day 0.

References

    1. Al-Rohaimi A. H., Al Otaibi F. (2020). Novel SARS-CoV-2 outbreak and COVID19 disease; a systemic review on the global pandemic. Genes Dis. 7 491–501. 10.1016/j.gendis.2020.06.004
    1. Augustine R., Das S., Hasan A., S A., Abdul Salam S., Augustine P., et al. (2020). Rapid antibody-based COVID-19 mass surveillance: relevance, challenges, and prospects in a pandemic and post-pandemic world. J. Clin. Med. 9:3372. 10.3390/jcm9103372
    1. Bojkova D., Klann K., Koch B., Widera M., Krause D., Ciesek S., et al. (2020). Proteomics of SARS-CoV-2-infected host cells reveals therapy targets. Nature 583 469–472. 10.1038/s41586-020-2332-7
    1. Cai Y., Winn M. E., Zehmer J. K., Gillette W. K., Lubkowski J. T., Pilon A. L., et al. (2014). Preclinical evaluation of human secretoglobin 3A2 in mouse models of lung development and fibrosis. Am. J. Physiol. Lung Cell. Mol. Physiol. 306 L10–L22. 10.1152/ajplung.00037.2013
    1. Cambiaghi A., Díaz R., Martinez J. B., Odena A., Brunelli L., Caironi P., et al. (2018). An innovative approach for the integration of proteomics and metabolomics data in severe septic shock patients stratified for mortality. Sci. Rep. 8:6681. 10.1038/s41598-018-25035-1
    1. Cascella M., Rajnik M., Cuomo A., Dulebohn S. C., Di Napoli R. (2020). Features, Evaluation, and Treatment of Coronavirus (COVID-19). Treasure Island, FL: StatPearls Publishing.
    1. Cheng Y., Luo R., Wang K., Zhang M., Wang Z., Dong L., et al. (2020). Kidney disease is associated with in-hospital death of patients with COVID-19. Kidney Int. 97 829–838. 10.1016/j.kint.2020.03.005
    1. Dumancas G. G., Adrianto I., Bello G., Dozmorov M. (2017). Current developments in machine learning techniques in biological data mining. Bioinform. Biol. Insights 11 1–4. 10.1177/1177932216687545
    1. Filbin M., Goldberg M., Hacohen N. Data Provided by the MGH Emergency Department COVID-19 Cohort with O-Link Proteomics. Available online at:
    1. Frank E., Hall M. A., Pal C. J., Witten I. H. (2016). The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, 4th Edn. Burlington, MA: Mourghan Kaufamnn.
    1. González-Pacheco H., Amezcua-Guerra L. M., Sandoval J., Arias-Mendoza A. (2020). Potential usefulness of pentoxifylline, a non-specific phosphodiesterase inhibitor with anti-inflammatory, anti-thrombotic, antioxidant, and anti-fibrogenic properties, in the treatment of SARS-CoV-2. Eur. Rev. Med. Pharmacol. Sci. 24 7612–7614. 10.26355/eurrev_202007_21921
    1. Graziani D., Soriano J. B., Del Rio-Bermudez C., Morena D., Díaz T., Castillo M., et al. (2020). Characteristics and prognosis of COVID-19 in patients with COPD. J. Clin. Med. 9:3259. 10.3390/jcm9103259
    1. Gu J.-G., Zhu C., Cheng D., Xie Y., Liu F., Zhou X. (2011). Enchanced levels of apolipoprotein M during HBV infection feedback suppresses HBV replication. Lipids Health Dis. 10:154. 10.1186/1476-511X-10-154
    1. Jablonka K. M., Ongari D., Moosavi S. M., Smit B. (2020). Big-data science in porous materials: materials genomics and machine learning. Chem. Rev. 120 8066–8129. 10.1021/acs.chemrev.0c00004
    1. Jiang M., Mieronkoski R., Syrjälä E., Anzanpour A., Terävä V., Rahmani A. M., et al. (2019). Acute pain intensity monitoring with the classification of multiple physiological parameters. J. Clin. Monit. Comput. 33 493–507. 10.1007/s10877-018-0174-8
    1. Jiao X., Sherman B. T., Huang D. W., Stephens R., Baseler M. W., Lane H. C., et al. (2012). DAVID-WS: a stateful web service to facilitate gene/protein list analysis. Bioinformatics 28 1805–1806. 10.1093/bioinformatics/bts251
    1. Kalinowski A., Ueki I., Min-Oo G., Ballon-Landa E., Knoff D., Galen B., et al. (2014). EGFR activation suppresses respiratory virus-induced IRF1-dependent CXCL10 production. Am. J. Physiol. Lung Cell. Mol. Physiol. 307 L186–L196. 10.1152/ajplung.00368.2013
    1. Kaur M., Tiwari S., Jain R. (2020). Protein based biomarkers for non-invasive Covid-19 detection. Sens. Bio Sens. Res. 29:100362. 10.1016/j.sbsr.2020.100362
    1. Kermali M., Khalsa R. K., Pillai K., Ismail Z., Harky A. (2020). The role of biomarkers in diagnosis of COVID-19 – a systematic review. Life Sci. 254:117788. 10.1016/j.lfs.2020.117788
    1. Kollias A., Kyriakoulis K. G., Dimakakos E., Poulakou G., Stergiou G. S., Syrigos K. (2020). Thromboembolic risk and anticoagulant therapy in COVID-19 patients: emerging evidence and call for action. Br. J. Haematol. 189 846–847. 10.1111/bjh.16727
    1. Kumar S. N., Saxena P., Patel R., Sharma A., Pradhan D., Singh H., et al. (2020). Predicting risk of low birth weight offspring from maternal features and blood polycyclic aromatic hydrocarbon concentration. Reprod. Toxicol. 94 92–100. 10.1016/j.reprotox.2020.03.009
    1. Lalmuanawma S., Hussain J., Chhakchhuak L. (2020). Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: a review. Chaos Solitons Fractals 139:110059. 10.1016/j.chaos.2020.110059
    1. Larsen C. P., Bourne T. D., Wilson J. D., Saqqa O., Sharshir M. A. (2020). Collapsing glomerulopathy in a patient with COVID-19. Kidney Int. Rep. 5 935–939. 10.1016/j.ekir.2020.04.002
    1. Mete M., Sakoglu U., Spence J. S., Devous M. D., Harris T. S., Adinoff B. (2016). Successful classification of cocaine dependence using brain imaging: a generalizable machine learning approach. BMC Bioinformatics 17(Suppl. 13):357. 10.1186/s12859-016-1218-z
    1. Müller C., Hardt M., Schwudke D., Neuman B. W., Pleschka S., Ziebuhr J. (2017). Inhibition of cytosolic phospholipase A2α impairs an early step of coronavirus replication in cell culture. J. Virol. 92:JVI.01463-17. 10.1128/JVI.01463-17
    1. Nath A., Subbiah K. (2016). Probing an optimal class distribution for enhancing prediction and feature characterization of plant virus-encoded RNA-silencing suppressors. 3 Biotech 6:93. 10.1007/s13205-016-0410-1
    1. Overmyer K. A., Shishkova E., Miller I. J., Balnis J., Bernstein M. N., Peters-Clarke T. M., et al. (2020). Large-scale multi-omic analysis of COVID-19 severity. Cell Syst. 12 23–40. 10.1016/j.cels.2020.10.003
    1. Pan F., Yang L., Li Y., Liang B., Li L., Ye T., et al. (2020). Factors associated with death outcome in patients with severe coronavirus disease-19 (COVID-19): a case-control study. Int. J. Med. Sci. 17 1281–1292. 10.7150/ijms.46614
    1. Pandey S. C., Pande V., Sati D., Upreti S., Samant M. (2020). Vaccination strategies to combat novel corona virus SARS-CoV-2. Life Sci. 256:117956. 10.1016/j.lfs.2020.117956
    1. Paranjpe I., Russak A., De Freitas J. K., Lala A., Miotto R., Vaid A., et al. (2020). Clinical characteristics of hospitalized Covid-19 patients in New York city. medRxiv [Preprint] 10.1101/2020.04.19.20062117
    1. Qiu Y., Wu D., Ning W., Zhang J., Shu T., Huang C., et al. (2020). Postmortem Tissue Proteomics Reveals The Pathogenesis of Multiorgan Injuries of COVID-19. Durham, NC: Research Squrae. 10.21203/-38091/v1
    1. Saha A., Anirvan P. (2020). Cancer progression in COVID-19: integrating the roles of renin angiotensin aldosterone system, angiopoietin-2, heat shock protein-27 and epithelial mesenchymal transition. Ecancermedicalscience 14:1099. 10.3332/ecancer.2020.1099
    1. Sardar R., Satish D., Birla S., Gupta D. (2020). Integrative analyses of SARS-CoV-2 genomes from different geographical locations reveal unique features potentially consequential to host-virus interaction, pathogenesis and clues for novel therapies. Heliyon 6:e04658. 10.1016/j.heliyon.2020.e04658
    1. Sharma A., Gupta P., Kumar R., Bhardwaj A. (2016). dPABBs: a novel in silico approach for predicting and designing anti-biofilm peptides. Sci. Rep. 6:21839. 10.1038/srep21839
    1. Sharma A., Rani S., Gupta D. (2020). Artificial intelligence-based classification of chest X-ray images into COVID-19 and other infectious diseases. Int. J. Biomed. Imaging 2020 1–10. 10.1155/2020/8889023
    1. Shen B., Yi X., Sun Y., Bi X., Du J., Zhang C., et al. (2020). Proteomic and metabolomic characterization of COVID-19 patient sera. Cell 182 59–72.e15. 10.1016/j.cell.2020.05.032
    1. Shi S., Qin M., Shen B., Cai Y., Liu T., Yang F., et al. (2020). Association of cardiac injury with mortality in hospitalized patients with COVID-19 in Wuhan, China. JAMA Cardiol. 5:802. 10.1001/jamacardio.2020.0950
    1. Shu T., Ning W., Wu D., Xu J., Han Q., Huang M., et al. (2020). Plasma proteomics identify biomarkers and pathogenesis of COVID-19. Immunity 53 1108–1122.e5. 10.1016/j.immuni.2020.10.008
    1. Srivastava N., Baxi P., Ratho R. K., Saxena S. K. (2020). “Global trends in epidemiology of coronavirus disease 2019 (COVID-19),” in Coronavirus Disease 2019 (COVID-19) Medical Virology: From Pathogenesis to Disease Control, ed. Saxena S. K. (Singapore: Springer Singapore; ), 9–21. 10.1007/978-981-15-4814-7_2
    1. Surkova E., Nikolayevskyy V., Drobniewski F. (2020). False-positive COVID-19 results: hidden problems and costs. Lancet Respir. Med. 8 1167–1168. 10.1016/S2213-2600(20)30453-7
    1. To K. K.-W., Hung I. F.-N., Ip J. D., Chu A. W.-H., Chan W.-M., Tam A. R., et al. (2020). Coronavirus disease 2019 (COVID-19) re-infection by a phylogenetically distinct severe acute respiratory syndrome coronavirus 2 strain confirmed by whole genome sequencing. Clin. Infect. Dis. ciaa1275. 10.1093/cid/ciaa1275
    1. Wang Y., Zhang S., Li F., Zhou Y., Zhang Y., Wang Z., et al. (2020). Therapeutic target database 2020: enriched resource for facilitating research and early development of targeted therapeutics. Nucleic Acids Res. 48 D1031–D1041. 10.1093/nar/gkz981
    1. Wu Y., Xu X., Chen Z., Duan J., Hashimoto K., Yang L., et al. (2020). Nervous system involvement after infection with COVID-19 and other coronaviruses. Brain Behav. Immun. 87 18–22. 10.1016/j.bbi.2020.03.031
    1. Wu Z., McGoogan J. M. (2020). Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: summary of a report of 72 314 cases from the Chinese center for disease control and prevention. JAMA 323:1239. 10.1001/jama.2020.2648
    1. Wynants L., Van Calster B., Collins G. S., Riley R. D., Heinze G., Schuit E., et al. (2020). Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ 369:m1328. 10.1136/bmj.m1328
    1. Yan L., Zhang H.-T., Goncalves J., Xiao Y., Wang M., Guo Y., et al. (2020a). A machine learning-based model for survival prediction in patients with severe COVID-19 infection. medRxiv [Preprint]. 10.1101/2020.02.27.20028027
    1. Yan L., Zhang H.-T., Goncalves J., Xiao Y., Wang M., Guo Y., et al. (2020b). An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2 283–288. 10.1038/s42256-020-0180-7
    1. Yao H., Zhang N., Zhang R., Duan M., Xie T., Pan J., et al. (2020). Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests. Front. Cell Dev. Biol. 8:683. 10.3389/fcell.2020.00683
    1. Yao Y., Cao J., Wang Q., Shi Q., Liu K., Luo Z., et al. (2020). D-dimer as a biomarker for disease severity and mortality in COVID-19 patients: a case control study. J. Intensive Care 8:49. 10.1186/s40560-020-00466-z
    1. Yin X.-X., Zheng X.-R., Peng W., Wu M.-L., Mao X.-Y. (2020). Vascular endothelial growth factor (VEGF) as a vital target for brain inflammation during the COVID-19 outbreak. ACS Chem. Neurosci. 11 1704–1705. 10.1021/acschemneuro.0c00294
    1. Zhang X., Tan Y., Ling Y., Lu G., Liu F., Yi Z., et al. (2020). Viral and host factors related to the clinical outcome of COVID-19. Nature 583 437–440. 10.1038/s41586-020-2355-0
    1. Zhong J., Tang J., Ye C., Dong L. (2020). The immunology of COVID-19: is immune modulation an option for treatment? Lancet Rheumatol. 2 e428–e436. 10.1016/S2665-9913(20)30120-X
    1. Zhu J., Zhong Z., Ji P., Li H., Li B., Pang J., et al. (2020). Clinicopathological characteristics of 8697 patients with COVID-19 in China: a meta-analysis. Fam. Med. Commun. Health 8:e000406. 10.1136/fmch-2020-000406

Source: PubMed

3
Subscribe