Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

Roujian Lu, Xiang Zhao, Juan Li, Peihua Niu, Bo Yang, Honglong Wu, Wenling Wang, Hao Song, Baoying Huang, Na Zhu, Yuhai Bi, Xuejun Ma, Faxian Zhan, Liang Wang, Tao Hu, Hong Zhou, Zhenhong Hu, Weimin Zhou, Li Zhao, Jing Chen, Yao Meng, Ji Wang, Yang Lin, Jianying Yuan, Zhihao Xie, Jinmin Ma, William J Liu, Dayan Wang, Wenbo Xu, Edward C Holmes, George F Gao, Guizhen Wu, Weijun Chen, Weifeng Shi, Wenjie Tan, Roujian Lu, Xiang Zhao, Juan Li, Peihua Niu, Bo Yang, Honglong Wu, Wenling Wang, Hao Song, Baoying Huang, Na Zhu, Yuhai Bi, Xuejun Ma, Faxian Zhan, Liang Wang, Tao Hu, Hong Zhou, Zhenhong Hu, Weimin Zhou, Li Zhao, Jing Chen, Yao Meng, Ji Wang, Yang Lin, Jianying Yuan, Zhihao Xie, Jinmin Ma, William J Liu, Dayan Wang, Wenbo Xu, Edward C Holmes, George F Gao, Guizhen Wu, Weijun Chen, Weifeng Shi, Wenjie Tan

Abstract

Background: In late December, 2019, patients presenting with viral pneumonia due to an unidentified microbial agent were reported in Wuhan, China. A novel coronavirus was subsequently identified as the causative pathogen, provisionally named 2019 novel coronavirus (2019-nCoV). As of Jan 26, 2020, more than 2000 cases of 2019-nCoV infection have been confirmed, most of which involved people living in or visiting Wuhan, and human-to-human transmission has been confirmed.

Methods: We did next-generation sequencing of samples from bronchoalveolar lavage fluid and cultured isolates from nine inpatients, eight of whom had visited the Huanan seafood market in Wuhan. Complete and partial 2019-nCoV genome sequences were obtained from these individuals. Viral contigs were connected using Sanger sequencing to obtain the full-length genomes, with the terminal regions determined by rapid amplification of cDNA ends. Phylogenetic analysis of these 2019-nCoV genomes and those of other coronaviruses was used to determine the evolutionary history of the virus and help infer its likely origin. Homology modelling was done to explore the likely receptor-binding properties of the virus.

Findings: The ten genome sequences of 2019-nCoV obtained from the nine patients were extremely similar, exhibiting more than 99·98% sequence identity. Notably, 2019-nCoV was closely related (with 88% identity) to two bat-derived severe acute respiratory syndrome (SARS)-like coronaviruses, bat-SL-CoVZC45 and bat-SL-CoVZXC21, collected in 2018 in Zhoushan, eastern China, but were more distant from SARS-CoV (about 79%) and MERS-CoV (about 50%). Phylogenetic analysis revealed that 2019-nCoV fell within the subgenus Sarbecovirus of the genus Betacoronavirus, with a relatively long branch length to its closest relatives bat-SL-CoVZC45 and bat-SL-CoVZXC21, and was genetically distinct from SARS-CoV. Notably, homology modelling revealed that 2019-nCoV had a similar receptor-binding domain structure to that of SARS-CoV, despite amino acid variation at some key residues.

Interpretation: 2019-nCoV is sufficiently divergent from SARS-CoV to be considered a new human-infecting betacoronavirus. Although our phylogenetic analysis suggests that bats might be the original host of this virus, an animal sold at the seafood market in Wuhan might represent an intermediate host facilitating the emergence of the virus in humans. Importantly, structural analysis suggests that 2019-nCoV might be able to bind to the angiotensin-converting enzyme 2 receptor in humans. The future evolution, adaptation, and spread of this virus warrant urgent investigation.

Funding: National Key Research and Development Program of China, National Major Project for Control and Prevention of Infectious Disease in China, Chinese Academy of Sciences, Shandong First Medical University.

Copyright © 2020 Elsevier Ltd. All rights reserved.

Figures

Figure 1
Figure 1
Sequence comparison and genomic organisation of 2019-nCoV (A) Sequence alignment of eight full-length genomes of 2019-nCoV, 29 829 base pairs in length, with a few nucleotides truncated at both ends of the genome. (B) Coding regions of 2019-nCoV, bat-SL-CoVZC45, bat-SL-CoVZXC21, SARS-CoV, and MERS-CoV. Only open reading frames of more than 100 nucleotides are shown. 2019-nCoV=2019 novel coronavirus. SARS-CoV=severe acute respiratory syndrome coronavirus. MERS-CoV=Middle East respiratory syndrome coronavirus.
Figure 2
Figure 2
Sequence identity between the consensus of 2019-nCoV and representative betacoronavirus genomes (A) Sequence identities for 2019-nCoV compared with SARS-CoV GZ02 (accession number AY390556) and the bat SARS-like coronaviruses bat-SL-CoVZC45 (MG772933) and bat-SL-CoVZXC21 (MG772934). (B) Similarity between 2019-nCoV and related viruses. 2019-nCoV=2019 novel coronavirus. SARS-CoV=severe acute respiratory syndrome coronavirus.
Figure 3
Figure 3
Phylogenetic analysis of full-length genomes of 2019-nCoV and representative viruses of the genus Betacoronavirus 2019-nCoV=2019 novel coronavirus. MERS-CoV=Middle East respiratory syndrome coronavirus. SARS-CoV=severe acute respiratory syndrome coronavirus.
Figure 4
Figure 4
Specific amino acid variations among the spike proteins of the subgenus sarbecovirus Viruses are ordered by the tree topology (as shown in figure 3) from top to bottom. One-letter codes are used for amino acids. CoV=coronavirus. 2019-nCoV=2019 novel coronavirus. SARS=severe acute respiratory syndrome. *Bat-derived SARS-like viruses that can grow in human cell lines or in mice. †Bat-derived SARS-like viruses without experimental data available.
Figure 5
Figure 5
Phylogenetic analysis and homology modelling of the receptor-binding domain of the 2019-nCoV, SARS-CoV, and MERS-CoV (A) Phylogenetic analysis of the receptor-binding domain from various betacoronaviruses. The star highlights 2019-nCoV and the question marks means that the receptor used by the viruses remains unknown. Structural comparison of the receptor-binding domain of SARS-CoV (B), 2019-nCoV (C), and MERS-CoV (D) binding to their own receptors. Core subdomains are magenta, and the external subdomains of SARS-CoV, 2019-nCoV, and MERS CoV are orange, dark blue, and green, respectively. Variable residues between SARS-CoV and 2019-nCoV in the receptor-binding site are highlighted as sticks. CoV=coronavirus. 2019-nCoV=2019 novel coronavirus. SARS-CoV=severe acute respiratory syndrome coronavirus. MERS=Middle East respiratory syndrome coronavirus.

References

    1. Su S, Wong G, Shi W, et al. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol. 2016;24:490–502.
    1. Cavanagh D. Coronavirus avian infectious bronchitis virus. Vet Res. 2007;38:281–297.
    1. Ismail MM, Tang AY, Saif YM. Pathogenicity of turkey coronavirus in turkeys and chickens. Avian Dis. 2003;47:515–522.
    1. Zhou P, Fan H, Lan T, et al. Fatal swine acute diarrhoea syndrome caused by an HKU2-related coronavirus of bat origin. Nature. 2018;556:255–258.
    1. Peiris JS, Guan Y, Yuen KY. Severe acute respiratory syndrome. Nat Med. 2004;10(suppl 12):S88–S97.
    1. Chan-Yeung M, Xu RH. SARS: epidemiology. Respirology. 2003;8(suppl):S9–S14.
    1. Zaki AM, van Boheemen S, Bestebroer TM, Osterhaus AD, Fouchier RA. Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia. N Engl J Med. 2012;367:1814–1820.
    1. Lee J, Chowell G, Jung E. A dynamic compartmental model for the Middle East respiratory syndrome outbreak in the Republic of Korea: a retrospective analysis on control interventions and superspreading events. J Theor Biol. 2016;408:118–126.
    1. Lee JY, Kim YJ, Chung EH, et al. The clinical and virological features of the first imported case causing MERS-CoV outbreak in South Korea, 2015. BMC Infect Dis. 2017;17:498.
    1. Tan W, Zhao X, Ma X, et al. A novel coronavirus genome identified in a cluster of pneumonia cases—Wuhan, China 2019–2020. China CDC Weekly. 2020;2:61–62.
    1. Zhu N, Zhang D, Wang W, et al. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med. 2020 doi: 10.1056/NEJMoa2001017. published online Jan 24.
    1. Chan JFW, Yuan S, Kok KH, et al. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. Lancet. 2020 doi: 10.1016/S0140-6736(20)30154-9. published online Jan 24.
    1. Huang C, Wang Y, Li X, et al. Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet. 2020 doi: 10.1016/S0140-6736(20)30183-5. published online Jan 24.
    1. Niu P, Shen J, Zhu N, Lu R, Tan W. Two-tube multiplex real-time reverse transcription PCR to detect six human coronaviruses. Virol Sin. 2016;31:85–88.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760.
    1. Zhao Y, Tang H, Ye Y. RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics. 2012;28:125–126.
    1. Nurk S, Bankevich A, Antipov D, et al. In: Research in computational molecular biology (RECOMB 2013): lecture notes in computer science. Deng M, Jiang R, Sun F, Zhang X, editors. vol 7821. Springer; Berlin: 2013. Assembling genomes and mini-metagenomes from highly chimeric reads; pp. 158–170.
    1. Pan M, Gao R, Lv Q, et al. Human infection with a novel, highly pathogenic avian influenza A (H5N6) virus: virological and clinical findings. J Infect. 2016;72:52–59.
    1. Marchler-Bauer A, Bo Y, Han L, et al. CDD/SPARCLE: functional classification of proteins via subfamily domain architectures. Nucleic Acids Res. 2017;45:D200–D203.
    1. Lole KS, Bollinger RC, Paranjape RS, et al. Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol. 1999;73:152–160.
    1. Nakamura T, Yamada KD, Tomii K, Katoh K. Parallelization of MAFFT for large-scale multiple sequence alignments. Bioinformatics. 2018;34:2490–2492.
    1. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–1313.
    1. Hu D, Zhu C, Ai L, et al. Genomic characterization and infectivity of a novel SARS-like coronavirus in Chinese bats. Emerg Microbes Infect. 2018;7:154.
    1. Li F. Structure, function, and evolution of coronavirus spike proteins. Annu Rev Virol. 2016;3:237–261.
    1. Lu G, Wang Q, Gao GF. Bat-to-human: spike features determining ‘host jump’ of coronaviruses SARS-CoV, MERS-CoV, and beyond. Trends Microbiol. 2015;23:468–478.
    1. Wang Q, Wong G, Lu G, Yan J, Gao GF. MERS-CoV spike protein: targets for vaccines and therapeutics. Antiviral Res. 2016;133:165–177.
    1. He Y, Zhou Y, Liu S, et al. Receptor-binding domain of SARS-CoV spike protein induces highly potent neutralizing antibodies: implication for developing subunit vaccine. Biochem Biophys Res Commun. 2004;324:773–781.
    1. Li F. Evidence for a common evolutionary origin of coronavirus spike protein receptor-binding subunits. J Virol. 2012;86:2856–2858.
    1. Li F, Li W, Farzan M, Harrison SC. Structure of SARS coronavirus spike receptor-binding domain complexed with receptor. Science. 2005;309:1864–1868.
    1. Lu G, Hu Y, Wang Q, et al. Molecular basis of binding between novel human coronavirus MERS-CoV and its receptor CD26. Nature. 2013;500:227–231.
    1. Wang N, Shi X, Jiang L, et al. Structure of MERS-CoV spike receptor-binding domain complexed with human receptor DPP4. Cell Res. 2013;23:986–993.
    1. Wang Q, Qi J, Yuan Y, et al. Bat origins of MERS-CoV supported by bat coronavirus HKU4 usage of human receptor CD26. Cell Host Microbe. 2014;16:328–337.
    1. Waterhouse A, Bertoni M, Bienert S, et al. SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic Acids Res. 2018;46:W296–W303.
    1. Prabakaran P, Gan J, Feng Y, et al. Structure of severe acute respiratory syndrome coronavirus receptor-binding domain complexed with neutralizing antibody. J Biol Chem. 2006;281:15829–15836.
    1. Guan Y, Zheng BJ, He YQ, et al. Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science. 2003;302:276–278.
    1. Alagaili AN, Briese T, Mishra N, et al. Middle East respiratory syndrome coronavirus infection in dromedary camels in Saudi Arabia. mBio. 2014;5:e00884–e00914.
    1. Zhou P, Yang X-L, Wang X-G, et al. Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin. bioRxiv. 2020 doi: 10.1101/2020.01.22.914952. published online Jan 23.

Source: PubMed

3
Sottoscrivi