Bioinformatics and machine learning approach identifies potential drug targets and pathways in COVID-19

Md Rabiul Auwul, Md Rezanur Rahman, Esra Gov, Md Shahjaman, Mohammad Ali Moni, Md Rabiul Auwul, Md Rezanur Rahman, Esra Gov, Md Shahjaman, Mohammad Ali Moni

Abstract

Current coronavirus disease-2019 (COVID-19) pandemic has caused massive loss of lives. Clinical trials of vaccines and drugs are currently being conducted around the world; however, till now no effective drug is available for COVID-19. Identification of key genes and perturbed pathways in COVID-19 may uncover potential drug targets and biomarkers. We aimed to identify key gene modules and hub targets involved in COVID-19. We have analyzed SARS-CoV-2 infected peripheral blood mononuclear cell (PBMC) transcriptomic data through gene coexpression analysis. We identified 1520 and 1733 differentially expressed genes (DEGs) from the GSE152418 and CRA002390 PBMC datasets, respectively (FDR < 0.05). We found four key gene modules and hub gene signature based on module membership (MMhub) statistics and protein-protein interaction (PPI) networks (PPIhub). Functional annotation by enrichment analysis of the genes of these modules demonstrated immune and inflammatory response biological processes enriched by the DEGs. The pathway analysis revealed the hub genes were enriched with the IL-17 signaling pathway, cytokine-cytokine receptor interaction pathways. Then, we demonstrated the classification performance of hub genes (PLK1, AURKB, AURKA, CDK1, CDC20, KIF11, CCNB1, KIF2C, DTL and CDC6) with accuracy >0.90 suggesting the biomarker potential of the hub genes. The regulatory network analysis showed transcription factors and microRNAs that target these hub genes. Finally, drug-gene interactions analysis suggests amsacrine, BRD-K68548958, naproxol, palbociclib and teniposide as the top-scored repurposed drugs. The identified biomarkers and pathways might be therapeutic targets to the COVID-19.

Keywords: COVID-19; differentially expressed genes; gene coexpression network; machine learning; protein–protein interaction; systems biology.

© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Figures

Figure 1
Figure 1
Flowchart of this study.
Figure 2
Figure 2
Construction of WGCNA coexpression modules and hub modules selection. (A) The cluster dendrogram of COVID-19 infected samples. (B) Analysis of the scale-free fit index (left) and the mean connectivity (right) for various soft-thresholding powers. (C) Heatmap plot of all genes. (D) Dendrogram of all differentially expressed genes clustered based on a dissimilarity measure (1-TOM). (E) Module eigengene dendrogram and eigengene network heatmap summarize the modules yielded in the clustering analysis. (F) The median rank of the modules; the rank value close to zero indicates a high degree of module preservation. (G) The Z summary statistics plot over each module; the blue and green dashed lines indicate the thresholds Z = 2 and Z = 10, indicate moderate and strong preservation thresholds, respectively.
Figure 3
Figure 3
GO and KEGG enrichment analysis for four key modules. (A) biological process, (B) molecular function, (C) cellular components and (D) KEGG enrichment analysis.
Figure 4
Figure 4
Receiver operating curve (ROC) plot of the five classifier performance based on (A) accuracies, (B) AUC.
Figure 5
Figure 5
Hub gene expression profiles. (A) Venn diagram of common hub genes identified among the hub genes of GSE152418 identified via MM scores and PPI and the DEGs of CRA002390 data. (B) Heatmap of hub genes of GSE152418 dataset. (C) Bar chart of the log expression values of 52 common hub-genes in the GSE152418 dataset.
Figure 6
Figure 6
Network construction. (A) PPI network of the 52 common hub genes of COVID-19 data, (B) TFs-Gene interaction network of the 52 common hub genes, (C) gene-miRNAs interaction for the common hub genes of COVID-19.

References

    1. Cucinotta D, Vanelli M. WHO declares COVID-19 a pandemic. Acta Biomed 2020;90:157–60.
    1. Chen N, Zhou M, Dong X, et al. Epidemiological and clinical characteristics of 99 cases of 2019 coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 2020;395:507–13.
    1. Ahamad MM, Aktar S, Rashed-Al-Mahfuz M, et al. A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert systems with applications 2020;160:113661.
    1. Blanco-Melo D, Nilsson-Payant B, Liu W-C, et al. SARS-CoV-2 launches a unique transcriptional signature from in vitro, ex vivo, and in vivo systems. bioRxiv 2020. March 24, 2020. doi: 10.1101/2020.03.24.004655 Arxiv biorxiv;2020.03.24.004655v1, preprint: not peer reviewed.
    1. Islam T, Rahman MR, Aydin B, et al. Integrative transcriptomics analysis of lung epithelial cells and identification of repurposable drug candidates for COVID-19. Eur J Pharmacol 2020;887:173594.
    1. Xiong Y, Liu Y, Cao L, et al. Transcriptomic characteristics of bronchoalveolar lavage fluid and peripheral blood mononuclear cells in COVID-19 patients. Emerg Microbes Infect 2020;9:761–70.
    1. Arunachalam PS, Wimmers F, Mok CKP, et al. Systems biological assessment of immunity to mild versus severe COVID-19 infection in humans. Science (80-) 2020;369:1210–20.
    1. Ong EZ, Fu Y, Chan Z, et al. A dynamic immune response shapes COVID-19 progression. Cell Host Microbe 2020;27:879–882.e2.
    1. Fagone P, Ciurleo R, Lombardo SD, et al. Transcriptional landscape of SARS-CoV-2 infection dismantles pathogenic pathways activated by the virus, proposes unique sex-specific differences and predicts tailored therapeutic strategies. Autoimmun Rev 2020;19:102571.
    1. Dolan M, Hill D, Mukherjee G, et al. Investigation of COVID-19 comorbidities reveals genes and pathways coincident with the SARS-CoV-2 viral disease. Sci Rep 2020;10:20848.
    1. Satu MS, Khan MI, Rahman MR, et al. Diseasome and comorbidities complexities of SARS-CoV-2 infection with common malignant diseases. Brief Bioinform 2021;22:1415–29.
    1. Moni MA, Quinn JMW, Sinmaz N, et al. Gene expression profiling of SARS-CoV-2 infections reveal distinct primary lung cell and systemic immune infection responses that identify pathways relevant in COVID-19 disease. Brief Bioinform 2020;22:1324–37.
    1. Nashiry A, Sarmin Sumi S, Islam S, et al. Bioinformatics and system biology approach to identify the influences of COVID-19 on cardiovascular and hypertensive comorbidities. Brief Bioinform 2021;22:1387–401.
    1. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 2008;9:559.
    1. Jin X, Li J, Li W, et al. Weighted gene co-expression network analysis reveals specific modules and biomarkers in Parkinson ’ s disease. Neurosci Lett 2020;728:134950.
    1. Iu R, Zhang W, Liu Z, et al. Associating transcriptional modules with colon cancer survival through weighted gene co-expression network analysis. BMC Genomics 2017;18:361.
    1. Feng T, Li K, Zheng P, et al. Weighted gene coexpression network analysis identified MicroRNA coexpression modules and related pathways in type 2 diabetes mellitus. Oxid Med Cell Longev 2019;2019:1–12.
    1. Barrett T, Wilhite SE, Ledoux P, et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 2013;41:991–5.
    1. Love MI, Huber W, Anders S. Oderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Geneome Biol 2014;15:550.
    1. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res 2015;43:e47.
    1. Langfelder P, Luo R, Oldham MC, et al. Is my network module preserved and reproducible? PLoS Comput Biol 2011; 7:e1001057 e1001057.
    1. Rahman MH, Rana HK, Peng S, et al. Bioinformatics and machine learning methodologies to identify the effects of central nervous system disorders on glioblastoma progression. Brief Bioinform 2021;bbaa365.
    1. Rahman MR, Islam T, Nicoletti F, et al. Identification of common pathogenetic processes between schizophrenia and diabetes mellitus by systems biology analysis. Genes (Basel) 2021;12:237.
    1. Huang d W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44–57.
    1. Yu G, Wang L, Han Y, et al. clusterProfiler: an R package for comparing biological themes among gene clusters. Omi A J Integr Biol 2012;16:284–7.
    1. Yuan L, Chen L, Qian K, et al. Co-expression network analysis identi fi ed six hub genes in association with progression and prognosis in human clear cell renal cell carcinoma (ccRCC). Genomics Data 2017;14:132–40.
    1. Boser B, Guyon I, Vapnik V. A training algorithm for optimal margin classes. Proc. 5th Annu. Work. Comput. Learn. theory 1992; 144–52
    1. Ho TK. Random decision forests. Proc. Int. Conf. Doc. Anal. Recognition, ICDAR 1995; 1:278–82
    1. Witten DM. Classification and clustering of sequencing data using a Poisson model. Annals of Applied Statistics 2011;5:2493–518.
    1. Dong K, Zhao H, Tong T, et al. NBLDA: negative binomial linear discriminant analysis for RNA-Seq data. BMC Bioinf 2016;17:369.
    1. Dudoit S, Fridlyand J, Speed TP. Comparison of discrimination methods for the classification of tumors using gene expression data. J Am Stat Assoc 2002;97:77–87.
    1. Goksuluk D, Zararsiz G, Korkmaz S, et al. MLSeq: machine learning interface for RNA-sequencing data. Comput Methods Programs Biomed 2019;175:223–31.
    1. Szklarczyk D, Morris JH, Cook H, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res 2017;45:D362–8.
    1. Smoot ME, Ono K, Ruscheinski J, et al. Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics 2011;27:431–2.
    1. Khan A, Fornes O, Stigliani A, et al. JASPAR 2018: update of the open-access database of transcription factor binding profiles and its web framework. Nucleic Acids Res 2018;46:D260–6.
    1. Xia J, Gill EE, Hancock REW. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data. Nat Protoc 2015;10:823–44.
    1. Sethupathy P, Corda B, Hatzigeorgiou AG. TarBase: a comprehensive database of experimentally supported animal microRNA targets. RNA 2006;12:192–7.
    1. Hsu S-D, Lin F-M, Wu W-Y, et al. miRTarBase: a database curates experimentally validated microRNA–target interactions. Nucleic Acids Res 2011;39:D163–9.
    1. Rahman MR, Islam T, Gov E, et al. Identification of prognostic biomarker signatures and candidate drugs in colorectal cancer: insights from systems biology analysis. Medicina (Kaunas) 2019;55:20.
    1. Wang Z, Lachmann A, Keenan AB, et al. L1000FWD: fireworks visualization of drug-induced transcriptomic signatures. Bioinformatics 2018;34:2150–2.
    1. Li X, Yu J, Zhang Z, et al. Network bioinformatics analysis provides insight into drug repurposing for COVID-2019 Preprints. 2019.
    1. Feng Z, Chen M, Liang T, et al. Virus-CKB: an integrated bioinformatics platform and analysis resource for COVID-19 research. Brief Bioinform 2020;22:882–95.
    1. Hasan MM, Schaduangrat N, Basith S, et al. HLPpred-fuse: improved and robust prediction of hemolytic peptide and its activity by fusing multiple feature representation. Bioinformatics 2020;36:3350–6.
    1. Hasan MM, Khatun MS, Kurata H. iLBE for computational identification of linear B-cell epitopes by integrating sequence and evolutionary features. Genomics Proteomics Bioinformatics 2020.
    1. Hasan MM, Basith S, Khatun MS, et al. Meta-i6mA: an interspecies predictor for identifying DNA N6-methyladenine sites of plant genomes by exploiting informative features in an integrative machine-learning framework. Brief Bioinform 2020;bbaa202.
    1. Sun D, Luthra P, Li Z, et al. PLK1 down-regulates parainfluenza virus 5 gene expression. PLoS One 2009;5:e1000525.
    1. Bock J-O, Ortea I. Re-analysis of SARS-CoV-2-infected host cell proteomics time-course data by impact pathway analysis and network analysis: a potential link with inflammatory response. Aging (Albany NY) 2020;12:11277–86.
    1. Su M, Chen Y, Qi S, et al. A mini-review on cell cycle regulation of coronavirus infection. Front Vet Sci 2020;7:943.
    1. Yang W-X, Pan Y-Y, You C-G. CDK1, CCNB1, CDC20, BUB1, MAD2L1, MCM3, BUB1B, MCM2, and RFC4 may be potential therapeutic targets for hepatocellular carcinoma using integrated bioinformatic analysis. Biomed Res Int 2019;2019:16 pp.
    1. Efficacy of addition of naproxen in the treatment of critically ill patients hospitalized for COVID-19 infection (ENACOVID). 2020.
    1. Oany AR, Mia M, Pervin T, et al. Design of novel viral attachment inhibitors of the spike glycoprotein (S) of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) through virtual screening and dynamics. International journal of antimicrobial agents 2020;56:106177.
    1. Bharadwaj S, Azhar EI, Amjad Kamal M, et al. SARS-CoV-2 Mpro inhibitors: identification of anti-SARS-CoV-2 Mpro compounds from FDA approved drugs. J Biomol Struct Dyn 2020;38:1–16.

Source: PubMed

3
Předplatit