Genome Composition and Divergence of the Novel Coronavirus (2019-nCoV) Originating in China

Aiping Wu, Yousong Peng, Baoying Huang, Xiao Ding, Xianyue Wang, Peihua Niu, Jing Meng, Zhaozhong Zhu, Zheng Zhang, Jiangyuan Wang, Jie Sheng, Lijun Quan, Zanxian Xia, Wenjie Tan, Genhong Cheng, Taijiao Jiang, Aiping Wu, Yousong Peng, Baoying Huang, Xiao Ding, Xianyue Wang, Peihua Niu, Jing Meng, Zhaozhong Zhu, Zheng Zhang, Jiangyuan Wang, Jie Sheng, Lijun Quan, Zanxian Xia, Wenjie Tan, Genhong Cheng, Taijiao Jiang

Abstract

An in-depth annotation of the newly discovered coronavirus (2019-nCoV) genome has revealed differences between 2019-nCoV and severe acute respiratory syndrome (SARS) or SARS-like coronaviruses. A systematic comparison identified 380 amino acid substitutions between these coronaviruses, which may have caused functional and pathogenic divergence of 2019-nCoV.

Copyright © 2020 Elsevier Inc. All rights reserved.

Figures

Figure 1
Figure 1
Genome composition and phylogenetic tree for 2019-nCoV (A) Schematic diagram of the genome organization and the encoded proteins of pp1ab and pp1a for the IVDC-HB-01/2019 (HB01) strain. The largest gene, namely the orf1ab, encodes the pp1ab protein that contains 15 nsps (nsp1-nsp10 and nsp12-nsp16). The pp1a protein encoded by the orf1a gene also contains 10 nsps (nsp1-nsp10). Structural proteins are encoded by the four structural genes, including spike (S), envelope (E), membrane (M), and nucleocapsid (N) genes. The accessory genes are distributed among the structural genes. The protein-encoding genes of the genome of 2019-nCoV were predicted by the online servers of GeneMarkS (http://exon.gatech.edu/GeneMark/genemarks.cgi) and ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) with manual check. (B) Phylogenetic relationship based on the whole genome for the HB01 strain and other coronaviruses. All viral strains were classified by the genus and the type, which are presented on the left and right schematic phylogenetic trees, respectively. The four genera of the coronaviruses, including Alphacoronavirus (red), Betacoronavirus (blue), Gammacoronavirus (green), and Deltacoronavirus (violet) are blocked in the left phylogenetic tree. The MERS coronavirus (brown), the SARS-like bat coronavirus (violet), human SARS coronavirus (light blue), and the HB01 strain (red) are highlighted by lines of different colors in the right phylogenetic tree. (C) Schematic phylogenetic trees of individual genes for the HB01 strain. The coronavirus species were colored in the same way as (B). The amount of the strains in the phylogenetic clade is denoted by the area of the circles.
Figure 2
Figure 2
Amino Acid Substitutions of 2019-nCoV against SARS and SARS-like Viruses All 27 proteins encoded by 2019-nCoV have been aligned against SARS-CoVs and SARS-like bat CoVs using the FFT-NS-2 algorithm in MAFFT (version v7.407) (The number of aligned proteins were listed in Table S1E). An amino acid substitution was defined as an absolutely conserved site in the group of SARS and SARS-like CoVs but different from that of 2019-nCoV. In total, 380 amino acid substitutions have been identified between the amino acid sequences of 2019-nCoV (HB01) and the corresponding consensus sequences of SARS and SARS-like CoVs.

References

    1. China CDC Tracking the Epidemic. 2020.
    1. Cui J., Li F., Shi Z.L. Origin and evolution of pathogenic coronaviruses. Nat. Rev. Microbiol. 2019;17:181–192.
    1. Drosten C., Kellam P., Memish Z.A. Evidence for camel-to-human transmission of MERS coronavirus. N. Engl. J. Med. 2014;371:1359–1360.
    1. Ge X.Y., Li J.L., Yang X.L., Chmura A.A., Zhu G., Epstein J.H., Mazet J.K., Hu B., Zhang W., Peng C., et al. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. Nature. 2013;503:535–538.
    1. Guan Y., Zheng B.J., He Y.Q., Liu X.L., Zhuang Z.X., Cheung C.L., Luo S.W., Li P.H., Zhang L.J., Guan Y.J., et al. Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science. 2003;302:276–278.
    1. Guo J.P., Petric M., Campbell W., McGeer P.L. SARS corona virus peptides recognized by antibodies in the sera of convalescent cases. Virology. 2004;324:251–256.
    1. Huang C., Wang Y., Li X., Ren L., Zhao J., Hu Y., Zhang L., Fan G., Xu J., Gu X., et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020 doi: 10.1016/s0140-6736(20)30183-5.
    1. Jeffers S.A., Tusell S.M., Gillim-Ross L., Hemmila E.M., Achenbach J.E., Babcock G.J., Thomas W.D., Jr., Thackray L.B., Young M.D., Mason R.J., et al. CD209L (L-SIGN) is a receptor for severe acute respiratory syndrome coronavirus. Proc. Natl. Acad. Sci. USA. 2004;101:15748–15753.
    1. Li F. Structure, Function, and Evolution of Coronavirus Spike Proteins. Annu. Rev. Virol. 2016;3:237–261.
    1. Song Z., Xu Y., Bao L., Zhang L., Yu P., Qu Y., Zhu H., Zhao W., Han Y., Qin C. From SARS to MERS, Thrusting Coronaviruses into the Spotlight. Viruses. 2019;11:E59.
    1. Tang Q., Song Y., Shi M., Cheng Y., Zhang W., Xia X.Q. Inferring the hosts of coronavirus using dual statistical models based on nucleotide composition. Sci. Rep. 2015;5:17155.
    1. Zhou P., Yang X.-L., Wang X.-G., Hu B., Zhang L., Zhang W., Si H.-R., Zhu Y., Li B., Huang C.-L., et al. Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. bioRxiv. 2020 doi: 10.1101/2020.01.22.914952.
    1. Zhu Z., Zhang Z., Chen W., Cai Z., Ge X., Zhu H., Jiang T., Tan W., Peng Y. Predicting the receptor-binding domain usage of the coronavirus based on kmer frequency on spike protein. Infect. Genet. evol. 2018;61:183–184.
    1. Zhu N., Zhang D., Wang W., Li X., Yang B., Song J., Zhao X., Huang B., Shi W., Lu R., et al. China Novel Coronavirus Investigating and Research Team A Novel Coronavirus from Patients with Pneumonia in China, 2019. N. Engl. J. Med. 2020 doi: 10.1056/NEJMoa2001017.

Source: PubMed

3
Abonnieren