Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan

Jasper Fuk-Woo Chan, Kin-Hang Kok, Zheng Zhu, Hin Chu, Kelvin Kai-Wang To, Shuofeng Yuan, Kwok-Yung Yuen, Jasper Fuk-Woo Chan, Kin-Hang Kok, Zheng Zhu, Hin Chu, Kelvin Kai-Wang To, Shuofeng Yuan, Kwok-Yung Yuen

Abstract

A mysterious outbreak of atypical pneumonia in late 2019 was traced to a seafood wholesale market in Wuhan of China. Within a few weeks, a novel coronavirus tentatively named as 2019 novel coronavirus (2019-nCoV) was announced by the World Health Organization. We performed bioinformatics analysis on a virus genome from a patient with 2019-nCoV infection and compared it with other related coronavirus genomes. Overall, the genome of 2019-nCoV has 89% nucleotide identity with bat SARS-like-CoVZXC21 and 82% with that of human SARS-CoV. The phylogenetic trees of their orf1a/b, Spike, Envelope, Membrane and Nucleoprotein also clustered closely with those of the bat, civet and human SARS coronaviruses. However, the external subdomain of Spike's receptor binding domain of 2019-nCoV shares only 40% amino acid identity with other SARS-related coronaviruses. Remarkably, its orf3b encodes a completely novel short protein. Furthermore, its new orf8 likely encodes a secreted protein with an alpha-helix, following with a beta-sheet(s) containing six strands. Learning from the roles of civet in SARS and camel in MERS, hunting for the animal source of 2019-nCoV and its more ancestral virus would be important for understanding the origin and evolution of this novel lineage B betacoronavirus. These findings provide the basis for starting further studies on the pathogenesis, and optimizing the design of diagnostic, antiviral and vaccination strategies for this emerging infection.

Keywords: Coronavirus; SARS; Wuhan; bioinformatics; emerging; genome; respiratory; virus.

Figures

Figure 1.
Figure 1.
Betacoronavirus genome organization. The betacoronavirus genome comprises of the 5'-untranslated region (5'-UTR), open reading frame (orf) 1a/b (yellow box) encoding non-structural proteins (nsp) for replication, structural proteins including spike (blue box), envelop (orange box), membrane (red box), and nucleocapsid (cyan box) proteins, accessory proteins (purple boxes) such as orf 3, 6, 7a, 7b, 8 and 9b in the 2019-nCoV (HKU-SZ-005b) genome, and the 3'-untranslated region (3'-UTR). Examples of lineages A to D betacoronaviruses include human coronavirus (HCoV) HKU1 (lineage A), 2019-nCoV (HKU-SZ-005b) and SARS-CoV (lineage B), MERS-CoV and Tylonycteris bat CoV HKU4 (lineage C), and Rousettus bat CoV HKU9 (lineage D). The length of nsps and orfs are not drawn in scale.
Figure 2.
Figure 2.
Comparison of protein sequences of Spike stalk S2 subunit. Multiple alignment of Spike S2 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number NC004718) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21 respectively. The black boxes represent the identity while the grey boxes represent the similarity of the four amino acid sequences.
Figure 3.
Figure 3.
Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
Figure 3.
Figure 3.
Comparison of protein sequences of A. Spike globular head S1, and B. S1 receptor-binding domain (RBD) subunit. Multiple alignment of Spike S1 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21, bat-SL-CoVZXC45, bat-SL-CoV-YNLF_31C, bat-SL-CoV-YNLF_34C and bat SL-CoV HKU3-1 (accession number MG772934.1 and MG772933.1, KP886808, KP886809 and DQ022305, respectively), human SARS coronavirus GZ02 and Tor2 (accession number AY390556 and AY274119, respectively) and Paguma SARS-CoV (accession number AY515512) was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. Orange box indicates the region of signal peptide, while green and blue boxes indicate the core domain and receptor binding domain respectively. Sequences of RBD, highlighted in (A) were used for comparison. External subdomain variable region of 2019-nCoV HKU-SZ-005b was predicted by comparison of amino acid similarity and published structural analysis [17]. Purple box indicates the external subdomain region.
Figure 4.
Figure 4.
Analysis of orf3b. A. Multiple alignment of orf3b protein sequence between 2019-nCoV (HKU-SZ-005b), SARS-CoV and SARS-related CoV. B. A novel putative short protein found in orf3b.
Figure 5.
Figure 5.
Analysis of orf8 to show novel putative protein. (A) Phylogenetic analysis of orf8 amino acid sequences of 2019-nCoV HKU-SZ-005b (accession number MN975262), bat SARS-like coronavirus isolates bat-SL-CoVZXC21 and bat-SL-CoVZXC45 (accession number MG772934.1 and MG772933.1, respectively) and human SARS coronavirus (accession number AY274119) was performed using the neighbour-joining method with bootstrap 1000. The evolutionary distances were calculated using the JTT matrix-based method. (B) Multiple alignment was performed and displayed using CLUSTAL 2.1 and BOXSHADE 3.21, respectively. The black background represents the identity while the grey background represents the similarity of the amino acid sequences. (C) Structural analysis of Orf8 was performed using PSI-blast-based secondary structure PREDiction (PSIPRED). Predicted helix structure (h) and strand (s) were boxed with red and yellow respectively.
Figure 6.
Figure 6.
Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
Figure 6.
Figure 6.
Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
Figure 6.
Figure 6.
Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
Figure 6.
Figure 6.
Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
Figure 6.
Figure 6.
Phylogenetic tree construction by the neighbour joining method was performed using MEGA X software, with bootstrap values being calculated from 1000 trees using amino acid sequences of (A) orf1ab polypeptide; (B) Spike glycoprotein; (C) Envelope protein; (D) Membrane protein; (E) Nucleoprotein.
Figure 7.
Figure 7.
Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR.
Figure 7.
Figure 7.
Secondary structure prediction and comparison in the 5′-untranslated region (UTR) and 3′-UTR using the RNAfold WebServer (with minimum free energy and partition function in Fold algorithms and basic options. The SARS 5′- and 3′- UTR was used as a reference to adjust the prediction results.(A) SARS-CoV 5'-UTR; (B) 2019-nCoV (HKU-SZ-005b) 5'-UTR; (C) ZC45 5'-UTR; (D) SARS-CoV 3'-UTR; (E) 2019-nCoV (HKU-SZ-005b) 3'-UTR; (F) ZC45 3'-UTR.

References

    1. Chan JF, To KK, Tse H, et al. . Interspecies transmission and emergence of novel viruses: lessons from bats and birds. Trends Microbiol. 2013 Oct;21(10):544–555. doi: 10.1016/j.tim.2013.05.005
    1. Cheng VC, Lau SK, Woo PC, et al. . Severe acute respiratory syndrome coronavirus as an agent of emerging and reemerging infection. Clin Microbiol Rev. 2007 Oct;20(4):660–694. doi: 10.1128/CMR.00023-07
    1. Chan JF, Lau SK, To KK, et al. . Middle East respiratory syndrome coronavirus: another zoonotic betacoronavirus causing SARS-like disease. Clin Microbiol Rev. 2015 Apr;28(2):465–522. doi: 10.1128/CMR.00102-14
    1. Woo PC, Lau SK, Chu CM, et al. . Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia. J Virol. 2005 Jan;79(2):884–895. doi: 10.1128/JVI.79.2.884-895.2005
    1. Peiris JS, Lai ST, Poon LL, et al. . Yuen KY; SARS study group. coronavirus as a possible cause of severe acute respiratory syndrome. Lancet. 2003 Apr 19;361(9366):1319–1325. doi: 10.1016/S0140-6736(03)13077-2
    1. Yeung ML, Yao Y, Jia L, et al. . MERS coronavirus induces apoptosis in kidney and lung by upregulating Smad7 and FGF2. Nat Microbiol. 2016 Feb 22;1:16004. doi: 10.1038/nmicrobiol.2016.4
    1. World Health Organization. Novel coronavirus . [cited 2020 Jan 16]. Available from: .
    1. World Health Organization. Novel Coronavirus – Thailand (ex-China) . [cited 2020 Jan 16]. Available from: .
    1. South China Morning Post. Wuhan pneumonia: Japan confirms Chinese man had new coronavirus . [cited 2020 Jan 16]. Available from .
    1. Huang C, Wang Y, Li X, et al. . Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020. 10.1016/S0140-6736(20)30183-5. [Epub ahead of print]
    1. Chan JF, Yuan S, Kok KH, et al. . A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. 2020. 10.1016/S0140-6736(20)30154-9 [Epub ahead of print].
    1. Saitou N, Nei M.. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987 Jul;4(4):406–425.
    1. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985 Jul;39(4):783–791. doi: 10.1111/j.1558-5646.1985.tb00420.x
    1. Zuckerkandl E, Pauling L.. Evolutionary divergence and convergence in proteins. In: Bryson V, Vogel HJ, editors. Evolving genes and proteins. New York: Academic Press; 1965. p. 97–166.
    1. Kumar S, Stecher G, Li M, et al. . MEGA x: Molecular evolutionary Genetics analysis across computing platforms. Mol Biol Evol. 2018 Jun 1;35(6):1547–1549. doi: 10.1093/molbev/msy096
    1. Buchan DWA, Jones DT.. The PSIPRED protein analysis Workbench: 20 years on. Nucleic Acids Res. 2019;47(W1):W402–W407. doi: 10.1093/nar/gkz297
    1. Wang Q, Qi J, Yuan Y, et al. . Bat origins of MERS-CoV supported by bat coronavirus HKU4 usage of human receptor CD26. Cell Host Microbe. 2014 Sep 10;16(3):328–337. doi: 10.1016/j.chom.2014.08.009
    1. Xia S, Yan L, Xu W, et al. . A pan-coronavirus fusion inhibitor targeting the HR1 domain of human coronavirus spike. Sci Adv. 2019 Apr 10;5(4):eaav4580. doi: 10.1126/sciadv.aav4580
    1. Yount B, Roberts RS, Sims AC, et al. . Severe acute respiratory syndrome coronavirus group-specific open reading frames encode nonessential functions for replication in cell cultures and mice. J Virol. 2005 Dec;79(23):14909–14922. doi: 10.1128/JVI.79.23.14909-14922.2005
    1. Khan S, Fielding BC, Tan TH, et al. . Over-expression of severe acute respiratory syndrome coronavirus 3b protein induces both apoptosis and necrosis in Vero E6 cells. Virus Res. 2006 Dec;122(1-2):20–27. doi: 10.1016/j.virusres.2006.06.005
    1. Kopecky-Bromberg SA, Martinez-Sobrido L, Frieman M, et al. . Severe acute respiratory syndrome coronavirus open reading frame (orf) 3b, orf 6, and nucleocapsid proteins function as interferon antagonists. J Virol. 2007 Jan;81(2):548–557. doi: 10.1128/JVI.01782-06
    1. Zhou P, Li H, Wang H, et al. . Bat severe acute respiratory syndrome-like coronavirus ORF3b homologues display different interferon antagonist activities. J Gen Virol. 2012 Feb;93(Pt 2):275–281. doi: 10.1099/vir.0.033589-0
    1. Song HD, Tu CC, Zhang GW, et al. . Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human. Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2430–2435. doi: 10.1073/pnas.0409608102
    1. Oostra M, de Haan CA, Rottier PJ.. The 29-nucleotide deletion present in human but not in animal severe acute respiratory syndrome coronaviruses disrupts the functional expression of open reading frame 8. J Virol. 2007;81:13876–13888. doi: 10.1128/JVI.01631-07
    1. Lau SK, Feng Y, Chen H, et al. . Severe acute respiratory syndrome (SARS) coronavirus ORF8 protein Is acquired from SARS-related coronavirus from Greater horseshoe bats through recombination. J Virol. 2015 Oct;89(20):10532–10547. doi: 10.1128/JVI.01048-15
    1. Shi CS, Nabar NR, Huang NN, et al. . SARS-Coronavirus Open reading frame-8b triggers intracellular stress pathways and activates NLRP3 inflammasomes. Cell Death Discov. 2019;5:101. doi: 10.1038/s41420-019-0181-7
    1. Yang D, Leibowitz JL.. The structure and functions of coronavirus genomic 3’ and 5’ ends. Virus Res. 2015 Aug 3;206:120–133. doi: 10.1016/j.virusres.2015.02.025

Source: PubMed

3
Subscribe