On the origin and continuing evolution of SARS-CoV-2

Xiaolu Tang, Changcheng Wu, Xiang Li, Yuhe Song, Xinmin Yao, Xinkai Wu, Yuange Duan, Hong Zhang, Yirong Wang, Zhaohui Qian, Jie Cui, Jian Lu, Xiaolu Tang, Changcheng Wu, Xiang Li, Yuhe Song, Xinmin Yao, Xinkai Wu, Yuange Duan, Hong Zhang, Yirong Wang, Zhaohui Qian, Jie Cui, Jian Lu

Abstract

The SARS-CoV-2 epidemic started in late December 2019 in Wuhan, China, and has since impacted a large portion of China and raised major global concern. Herein, we investigated the extent of molecular divergence between SARS-CoV-2 and other related coronaviruses. Although we found only 4% variability in genomic nucleotides between SARS-CoV-2 and a bat SARS-related coronavirus (SARSr-CoV; RaTG13), the difference at neutral sites was 17%, suggesting the divergence between the two viruses is much larger than previously estimated. Our results suggest that the development of new variations in functional sites in the receptor-binding domain (RBD) of the spike seen in SARS-CoV-2 and viruses from pangolin SARSr-CoVs are likely caused by natural selection besides recombination. Population genetic analyses of 103 SARS-CoV-2 genomes indicated that these viruses had two major lineages (designated L and S), that are well defined by two different SNPs that show nearly complete linkage across the viral strains sequenced to date. We found that L lineage was more prevalent than the S lineage within the limited patient samples we examined. The implication of these evolutionary changes on disease etiology remains unclear. These findings strongly underscores the urgent need for further comprehensive studies that combine viral genomic data, with epidemiological studies of coronavirus disease 2019 (COVID-19).

Keywords: SARS-CoV-2; molecular evolution; population genetics; virus.

© The Author(s) 2020. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd.

Figures

Figure 1.
Figure 1.
Molecular divergence and selective pressures during the evolution of SARS-CoV-2 and related viruses. (A) The phylogenetic tree of SARS-CoV-2 and the related Coronaviruses. The branch length (dS) is presented, and the dN/dS (ω) value is given in the parenthesis. The phylogenetic tree was reconstructed with the synonymous sites in the concatenated CDSs of nine conserved ORFs (orf1ab, E, M, N, S, ORF3a, ORF6, ORF7a and ORF7b). (B) Conservation of 6 critical amino acid residues in the spike (S) protein. The critical active sites are Y442, L472, N479, D480, T487, and Y491 in SARS-CoV, and they correspond to L455, F486, Q493, S494, N501, and Y505 in SARS-CoV-2 (marked with inverted triangles), respectively. (C) Three candidate positively selected sites (marked with inverted triangles) in the receptor-binding domain (RBD) of spike protein (S:439 N, S:483 V and S:493Q) and the surrounding 10 amino acids.
Figure 2.
Figure 2.
The frequency spectra of derived mutations in 103 SARS-CoV-2 viruses. Note the derived alleles of synonymous mutations are skewed towards higher frequencies than those of nonsynonymous mutations.
Figure 3.
Figure 3.
Linkage disequilibrium between SNPs in the SARS-CoV-2 viruses. (A) LD plot of any two SNP pairs among the 29 sites that have minor alleles in at least two strains. The number near slashes at the top of the image shows the coordinate of sites in the genome. Color in the square is given by standard (D'/LOD), and the number in square is r2 value. (B) The r2 of each pair of SNPs (y-axis) against the genomic distance between that pair (x-axis). (C) The LOD of each pair of SNPs (y-axis) against the genomic distance between that pair (x-axis). Note that in both (B) and (C), the red point represents the LD between SNPs at 8,782 and 28,144.
Figure 4.
Figure 4.
Haplotype analysis of SARS-CoV-2 viruses. (A) The haplotype networks of SARS-CoV-2 viruses. Blue represents the L lineage, and red is the S lineage. Note that in this study, we marked each sample with a unique ID that starting with the geological location, followed by the date the virus was isolated (see Table S1 for details). Each ID did not contain information of the patient's race or ethnicity. ZJ, Zhejiang; YN, Yunnan; WH, Wuhan; USA, United States of America; TW, Taiwan; SZ, Shenzhen; SD, Shandong; SC, Sichuan; JX, Jiangxi; JS, Jiangsu; HZ, Hangzhou; GZ, Guangzhou; GD, Guangdong; FS, Foshan; CQ, Chongqing. (B) Evolution of the L and S lineages of SARS-CoV-2 viruses. ‘.’, The nucleotide sequence is identical; ‘-’, gap.
Figure 5.
Figure 5.
The unrooted phylogenetic tree of the 103 SARS-CoV-2 genomes. The ID of each sample is the same as in Fig. 4A. Note WH_2019/12/31.a represents the reference genome (NC_045512). Note SZ_2020/01/13.a had C at both positions 8,782 and 28,144 in the genome, belonging to neither L nor S lineage.
Figure 6.
Figure 6.
The heteroplasmy of SARS-CoV-2 viruses in human patients. The viruses isolated from a patient that lived in the United States (USA_2020/01/21.a, GISAID ID: EPI_ISL_404253) had the genotype Y (C or T) at both 8,782 and 28,144. The most likely explanation is that this patient was infected by both the L and S lineages. Note the reference is L lineage.

References

    1. Lu R, Zhao X, Li Jet al. . Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. Lancet 2020; 395: 565–74.
    1. Zhou P, Yang XL, Wang XGet al. . A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020; 579: 270–3.
    1. Ren L-L, Wang Y-M, Wu Z-Qet al. . Identification of a novel coronavirus causing severe pneumonia in human. Chin Med J 2020; 133: 1015–24.
    1. Cui J, Li F, Shi Z-L. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol 2019; 17: 181–92.
    1. Li X, Song Y, Wong Get al. . Bat origin of a new human coronavirus: there and back again. Sci China Life Sci 2020; 63: 461–2.
    1. Li W, Shi Z, Yu Met al. . Bats are natural reservoirs of SARS-like coronaviruses. Science 2005; 310: 676–9.
    1. Dominguez SR, O'Shea TJ, Oko LMet al. . Detection of group 1 coronaviruses in bats in North America. Emerg Infect Dis 2007; 13: 1295–300.
    1. Wu A, Peng Y, Huang Bet al. . Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe 2020; 27: 325–8.
    1. Xu X, Chen P, Wang Jet al. . Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission. Sci China Life Sci 2020; 63: 457–60.
    1. Benvenuto D, Giovanetti M, Ciccozzi Aet al. . The 2019-new coronavirus epidemic: evidence for virus evolution. J Med Virol 2020; 92: 455–9.
    1. Zhu N, Zhang D, Wang Wet al. . A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 2020; 382: 727–33.
    1. Chan JF, Kok KH, Zhu Zet al. . Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect 2020; 9: 221–36.
    1. Wei X, Li X, Cui J. Evolutionary perspectives on novel coronaviruses identified in pneumonia cases in China. Natl Sci Rev 2020; 7: 239–42.
    1. Paraskevis D, Kostaki EG, Magiorkinis Get al. . Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event. Infect Genet Evol 2020; 79: 104212.
    1. Gralinski LE, Menachery VD. Return of the coronavirus: 2019-nCoV. Viruses 2020; 12: 135.
    1. Wong MC, Cregeen SJJ, Ajami NJet al. . Evidence of recombination in coronaviruses implicating pangolin origins of nCoV-2019. bioRxiv 2020. 10.1101/2020.02.07.939207.
    1. Xiao K, Zhai J, Feng Yet al. . Isolation and characterization of 2019-nCoV-like coronavirus from malayan pangolins. bioRxiv 2020. doi: 10.1101/2020.02.17.951335.
    1. Lam TT, Shum MH, Zhu Het al. . Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins. Nature 2020. 10.1038/s41586-020-2169-0.
    1. Wu C-I, Poo MM. Moral imperative for the immediate release of 2019-nCoV sequence data. Natl Sci Rev 2020; 7: 719–20.
    1. Liu P, Jiang J-Z, Wang Xet al. . Are pangolins the intermediate host of the 2019 novel coronavirus (2019-nCoV)? PLoS Pathog 2020; 16: e1008421.
    1. Liu P, Chen W, Chen JP. Viral metagenomics revealed sendai virus and coronavirus infection of malayan pangolins (Manis javanica). Viruses 2019; ; 11: 979.
    1. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 2007; 24: 1586–91.
    1. Hanson G, Coller J. Codon optimality, bias and usage in translation and mRNA decay. Nat Rev Mol Cell Biol 2018; 19: 20–30.
    1. Wan Y, Shang J, Graham Ret al. . Receptor recognition by novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS. J Virol 2020; 94: e00127–20.
    1. Wrapp D, Wang N, Corbett KSet al. . Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation. Science 2020; 367: 1260–3.
    1. Ou X, Liu Y, Lei Xet al. . Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with spike glycoprotein of SARS-CoV. Nat Commun 2020; 11: 1620.
    1. Qu X-X, Hao P, Song X-Jet al. . Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy. J Biol Chem 2005; 280: 29588–95.
    1. Ren W, Qu X, Li Wet al. . Difference in receptor usage between severe acute respiratory syndrome (SARS) coronavirus and SARS-Like coronavirus of bat origin. J Virol 2008; 82: 1899–907.
    1. Wu F, Zhao S, Yu Bet al. . A new coronavirus associated with human respiratory disease in China. Nature 2020; 579: 265–9.
    1. Ji W, Wang W, Zhao Xet al. . Homologous recombination within the spike glycoprotein of the newly identified coronavirus may boost cross‐species transmission from snake to human. J Med Virol 2020; 92: 433–40.
    1. Zhao Z, Li H, Wu Xet al. . Moderate mutation rate in the SARS coronavirus genome and its implications. BMC Evol Biol 2004; 4: 21.
    1. Zhang C, Wang M. Origin time and epidemic dynamics of the 2019 novel coronavirus. bioRxiv 2020. 10.1101/2020.01.25.919688.
    1. Yu W-B, Tang G-D, Zhang L, Corlett RT. Decoding evolution and transmissions of novel pneumonia coronavirus using the whole genomic data. Zool Res 2020; 41: 247–57.
    1. Barrett JC, Fry B, Maller Jet al. . Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 2005; 21: 263–5.
    1. Waterson RH, Lander ES, Wilson RKet al. . Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 2005; 437: 69–87.
    1. Gibbs RA, Rogers J, Katze MGet al. . Evolutionary and biomedical insights from the rhesus macaque genome. Science 2007; 316: 222.
    1. Waterston RH, Lindblad-Toh K, Birney Eet al. . Initial sequencing and comparative analysis of the mouse genome. Nature 2002; 420: 520–62.
    1. Graham RL, Sparks JS, Eckerle LDet al. . SARS coronavirus replicase proteins in pathogenesis. Virus Res 2008; 133: 88–100.
    1. Hu B, Zeng L-P, Yang X-Let al. . Discovery of a rich gene pool of bat SARS-related coronaviruses provides new insights into the origin of SARS coronavirus. PLoS Pathog 2017; 13: e1006698.
    1. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 2004; 32: 1792–7.
    1. Slater GS, Birney E. Automated generation of heuristics for biological sequence comparison. BMC Bioinformatics 2005; 6: 31.
    1. Wernersson R, Pedersen AG. RevTrans: multiple alignment of coding DNA from aligned amino acid sequences. Nucleic Acids Res 2003; 31: 3537–9.
    1. Kumar S, Stecher G, Li Met al. . MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol 2018; 35: 1547–9.
    1. Gao F, Chen C, Arab DAet al. . EasyCodeML: a visual tool for analysis of selection using CodeML. Ecol Evol 2019; 9: 3891–8.
    1. Rozas J, Ferrer-Mata A, Sanchez-DelBarrio JCet al. . DnaSP 6: DNA sequence polymorphism analysis of large data sets. Mol Biol Evol 2017; 34: 3299–302.
    1. Leigh JW, Bryant D. popart: full-feature software for haplotype network construction. Methods Ecol Evol 2015; 6: 1110–6.
    1. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 2014; 30: 1312–3.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25: 1754–60.
    1. Li H, Handsaker B, Wysoker Aet al. . The sequence alignment/map format and SAMtools. Bioinformatics 2009; 25: 2078–9.
    1. Sharp PM, Li WH. Codon usage in regulatory genes in Escherichia coli does not reflect selection for ‘rare’ codons. Nucleic Acids Res 1986; 14: 7737–49.

Source: PubMed

3
S'abonner