Coronavirus genomics and bioinformatics analysis

Patrick C Y Woo, Yi Huang, Susanna K P Lau, Kwok-Yung Yuen, Patrick C Y Woo, Yi Huang, Susanna K P Lau, Kwok-Yung Yuen

Abstract

The drastic increase in the number of coronaviruses discovered and coronavirus genomes being sequenced have given us an unprecedented opportunity to perform genomics and bioinformatics analysis on this family of viruses. Coronaviruses possess the largest genomes (26.4 to 31.7 kb) among all known RNA viruses, with G + C contents varying from 32% to 43%. Variable numbers of small ORFs are present between the various conserved genes (ORF1ab, spike, envelope, membrane and nucleocapsid) and downstream to nucleocapsid gene in different coronavirus lineages. Phylogenetically, three genera, Alphacoronavirus, Betacoronavirus and Gammacoronavirus, with Betacoronavirus consisting of subgroups A, B, C and D, exist. A fourth genus, Deltacoronavirus, which includes bulbul coronavirus HKU11, thrush coronavirus HKU12 and munia coronavirus HKU13, is emerging. Molecular clock analysis using various gene loci revealed that the time of most recent common ancestor of human/civet SARS related coronavirus to be 1999-2002, with estimated substitution rate of 4×10(-4) to 2×10(-2) substitutions per site per year. Recombination in coronaviruses was most notable between different strains of murine hepatitis virus (MHV), between different strains of infectious bronchitis virus, between MHV and bovine coronavirus, between feline coronavirus (FCoV) type I and canine coronavirus generating FCoV type II, and between the three genotypes of human coronavirus HKU1 (HCoV-HKU1). Codon usage bias in coronaviruses were observed, with HCoV-HKU1 showing the most extreme bias, and cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape such codon usage bias in coronaviruses.

Keywords: bioinformatics; coronavirus; genome.

Figures

Figure 1.
Figure 1.
Genome organizations of members in different genera of the Coronaviridae family. PL1, papain-like protease 1; PL2, papain-like protease 2; PL, papain-like protease; 3CL, chymotrypsin-like protease; Pol, RNA-dependent RNA polymerase; Hel, helicase; HE, haemagglutinin esterase; S, spike; E, envelope; M, membrane; N, nucleocapsid. TGEV, porcine transmissible gastroenteritis virus (NC_002306); PRCV, porcine respiratory coronavirus (DQ811787); FCoV, feline coronavirus (NC_012937); HCoV-229E, human coronavirus 229E (NC_002645); HCoV-NL63, human coronavirus NL63 (NC_005831); PEDV, porcine epidemic diarrhea virus (NC_003436); Sc-BatCoV 512, Scotophilus bat coronavirus 512 (NC_009657); Rh-BatCoV-HKU2, Rhinolophus bat coronavirus HKU2 (NC_009988); Mi-BatCoV-HKU8, Miniopterus bat coronavirus HKU8 (NC_010438); Mi-BatCoV 1A, Miniopterus bat coronavirus 1A (NC_010437); Mi-BatCoV 1B, Miniopterus bat coronavirus 1B (NC_010436); HCoV-OC43, human coronavirus OC43 (NC_005147); BCoV, bovine coronavirus (NC_003045); PHEV, porcine hemagglutinating encephalomyelitis virus (NC_007732); HCoV-HKU1, human coronavirus HKU1 (NC_006577); MHV, mouse hepatitis virus (NC_006852); ECoV, equine coronavirus (NC_010327); SARSr-CoV, human SARS related coronavirus (NC_004718); SARSr-Rh-BatCoV HKU3, SARS-related Rhinolophus bat coronavirus HKU3 (NC_009694); Ty-BatCoV-HKU4, Tylonycteris bat coronavirus HKU4 (NC_009019); Pi-BatCoV-HKU5, Pipistrellus bat coronavirus HKU5 (NC_009020); Ro-BatCoV-HKU9, Rousettus bat coronavirus HKU9 (NC_009021); IBV, infectious bronchitis virus (NC_001451); TCoV, turkey coronavirus (NC_010800); SW1, beluga whale coronavirus (NC_010646); BuCoV HKU11, bulbul coronavirus HKU11 (FJ376620); ThCoV HKU12, thrush coronavirus HKU12 (NC_011549); MunCoV HKU13, munia coronavirus HKU13 (NC_011550).
Figure 2.
Figure 2.
Phylogenetic analysis of RNA-dependent RNA polymerases (Pol) of coronaviruses with complete genome sequences available. The tree was constructed by the neighbor-joining method and rooted using Breda virus polyprotein (YP_337905). Bootstrap values were calculated from 1000 trees. 1118 amino acid positions in Pol were included. The scale bar indicates the estimated number of substitutions per 20 amino acids. All abbreviations for the coronaviruses were the same as those in Figure 1.

References

    1. Snijder EJ, Bredenbeek PJ, Dobbe JC, Thiel V, Ziebuhr J, Poon LL, Guan Y, Rozanov M, Spaan WJ, Gorbalenya AE. Unique and conserved features of genome and proteome of SARS-coronavirus, an early split-off from the coronavirus group 2 lineage. J Mol Biol. 2003;331:991–1004.
    1. Woo PC, Lau SK, Lam CS, Lai KK, Huang Y, Lee P, Luk GS, Dyrting KC, Chan KH, Yuen KY. Comparative analysis of complete genome sequences of three avian coronaviruses reveals a novel group 3c coronavirus. J Virol. 2009;83:908–917.
    1. Woo PC, Wang M, Lau SK, Xu H, Poon RW, Guo R, Wong BH, Gao K, Tsoi HW, Huang Y, Li KS, Lam CS, Chan KH, Zheng BJ, Yuen KY. Comparative analysis of twelve genomes of three novel group 2c and group 2d coronaviruses reveals unique group and subgroup features. J Virol. 2007;81:1574–1585.
    1. ICTV Virus Taxonomy: 2009 Release. Available online: (accessed on 1 August 2010)
    1. Liu S, Chen J, Chen J, Kong X, Shao Y, Han Z, Feng L, Cai X, Gu S, Liu M. Isolation of avian infectious bronchitis coronavirus from domestic peafowl (Pavo cristatus) and teal (Anas) J Gen Virol. 2005;86:719–725.
    1. Tang XC, Zhang JX, Zhang SY, Wang P, Fan XH, Li LF, Li G, Dong BQ, Liu W, Cheung CL, Xu KM, Song WJ, Vijaykrishna D, Poon LL, Peiris JS, Smith GJ, Chen H, Guan Y. Prevalence and genetic diversity of coronaviruses in bats from China. J Virol. 2006;80:7481–7490.
    1. Woo PC, Lau SK, Li KS, Poon RW, Wong BH, Tsoi HW, Yip BC, Huang Y, Chan KH, Yuen KY. Molecular diversity of coronaviruses in bats. Virology. 2006;351:180–187.
    1. Fouchier RA, Hartwig NG, Bestebroer TM, Niemeyer B, de Jong JC, Simon JH, Osterhaus AD. A previously undescribed coronavirus associated with respiratory disease in humans. Proc Natl Acad Sci U S A. 2004;101:6212–6216.
    1. van der Hoek L, Pyrc K, Jebbink MF, Vermeulen-Oost W, Berkhout RJ, Wolthers KC, Wertheim-van Dillen PM, Kaandorp J, Spaargaren J, Berkhout B. Identification of a new human coronavirus. Nat Med. 2004;10:368–373.
    1. Woo PC, Lau SK, Chu CM, Chan KH, Tsoi HW, Huang Y, Wong BH, Poon RW, Cai JJ, Luk WK, Poon LL, Wong SS, Guan Y, Peiris JS, Yuen KY. Characterization and complete genome sequence of a novel coronavirus, coronavirus HKU1, from patients with pneumonia. J Virol. 2005;79:884–895.
    1. Lau SK, Woo PC, Li KS, Huang Y, Tsoi HW, Wong BH, Wong SS, Leung SY, Chan KH, Yuen KY. Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats. Proc Natl Acad Sci U S A. 2005;102:14040–14045.
    1. Chu DK, Peiris JS, Chen H, Guan Y, Poon LL. Genomic characterizations of bat coronaviruses (1A, 1B and HKU8) and evidence for co-infections in Miniopterus bats. J Gen Virol. 2008;89:1282–1287.
    1. Mihindukulasuriya KA, Wu G, St Leger J, Nordhausen RW, Wang D. Identification of a novel coronavirus from a beluga whale by using a panviral microarray. J Virol. 2008;82:5084–5088.
    1. Zhang J, Guy JS, Snijder EJ, Denniston DA, Timoney PJ, Balasuriya UB. Genomic characterization of equine coronavirus. Virology. 2007;369:92–104.
    1. Lau SK, Woo PC, Li KS, Huang Y, Wang M, Lam CS, Xu H, Guo R, Chan KH, Zheng BJ, Yuen KY. Complete genome sequence of bat coronavirus HKU2 from Chinese horseshoe bats revealed a much smaller spike gene with a different evolutionary lineage from the rest of the genome. Virology. 2007;367:428–439.
    1. Lai MM, Perlman S, L A. Coronaviridae. In: Knipe DM, Howley PM, editors. Fields virology. 5th ed. Lippincott Williams and Wilkins; Philadelphia, PA, USA: 2007. pp. 1305–1335.
    1. Lai MM, Baric RS, Makino S, Keck JG, Egbert J, Leibowitz JL, Stohlman SA. Recombination between nonsegmented RNA genomes of murine coronaviruses. J Virol. 1985;56:449–456.
    1. Woo PC, Huang Y, Lau SK, Tsoi HW, Yuen KY. In silico analysis of ORF1ab in coronavirus HKU1 genome reveals a unique putative cleavage site of coronavirus HKU1 3C-like protease. Microbiol Immunol. 2005;49:899–908.
    1. Nasr F, Filipowicz W. Characterization of the Saccharomyces cerevisiae cyclic nucleotide phosphodiesterase involved in the metabolism of ADP-ribose 1″,2″-cyclic phosphate. Nucleic Acids Res. 2000;28:1676–1683.
    1. Luytjes W, Bredenbeek PJ, Noten AF, Horzinek MC, Spaan WJ. Sequence of mouse hepatitis virus A59 mRNA 2: indications for RNA recombination between coronaviruses and influenza C virus. Virology. 1988;166:415–422.
    1. Haijema BJ, Volders H, Rottier PJ. Live, attenuated coronavirus vaccines through the directed deletion of group-specific genes provide protection against feline infectious peritonitis. J Virol. 2004;78:3863–3871.
    1. Olsen CW. A review of feline infectious peritonitis virus: molecular biology, immunopathogenesis, clinical aspects, and vaccination. Vet Microbiol. 1993;36:1–37.
    1. Tung FY, Abraham S, Sethna M, Hung SL, Sethna P, Hogue BG, Brian DA. The 9-kDa hydrophobic protein encoded at the 3′ end of the porcine transmissible gastroenteritis coronavirus genome is membrane-associated. Virology. 1992;186:676–683.
    1. Lu W, Zheng BJ, Xu K, Schwarz W, Du L, Wong CK, Chen J, Duan S, Deubel V, Sun B. Severe acute respiratory syndrome-associated coronavirus 3a protein forms an ion channel and modulates virus release. Proc Natl Acad Sci U S A. 2006;103:12540–12545.
    1. Guan Y, Zheng BJ, He YQ, Liu XL, Zhuang ZX, Cheung CL, Luo SW, Li PH, Zhang LJ, Guan YJ, Butt KM, Wong KL, Chan KW, Lim W, Shortridge KF, Yuen KY, Peiris JS, Poon LL. Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China. Science. 2003;302:276–278.
    1. Marra MA, Jones SJ, Astell CR, Holt RA, Brooks-Wilson A, Butterfield YS, Khattra J, Asano JK, Barber SA, Chan SY, et al. The Genome sequence of the SARS-associated coronavirus. Science. 2003;300:1399–1404.
    1. Rota PA, Oberste MS, Monroe SS, Nix WA, Campagnoli R, Icenogle JP, Penaranda S, Bankamp B, Maher K, Chen MH, et al. Characterization of a novel coronavirus associated with severe acute respiratory syndrome. Science. 2003;300:1394–1399.
    1. Eickmann M, Becker S, Klenk HD, Doerr HW, Stadler K, Censini S, Guidotti S, Masignani V, Scarselli M, Mora M, Donati C, Han JH, Song HC, Abrignani S, Covacci A, Rappuoli R. Phylogeny of the SARS coronavirus. Science. 2003;302:1504–1505.
    1. Sanchez CM, Gebauer F, Sune C, Mendez A, Dopazo J, Enjuanes L. Genetic evolution and tropism of transmissible gastroenteritis coronaviruses. Virology. 1992;190:92–105.
    1. Vijgen L, Keyaerts E, Moes E, Thoelen I, Wollants E, Lemey P, Vandamme AM, Van Ranst M. Complete genomic sequence of human coronavirus OC43: molecular clock analysis suggests a relatively recent zoonotic coronavirus transmission event. J Virol. 2005;79:1595–1604.
    1. Vijgen L, Keyaerts E, Lemey P, Maes P, Van Reeth K, Nauwynck H, Pensaert M, Van Ranst M. Evolutionary history of the closely related group 2 coronaviruses: porcine hemagglutinating encephalomyelitis virus, bovine coronavirus, and human coronavirus OC43. J Virol. 2006;80:7270–7274.
    1. Song HD, Tu CC, Zhang GW, Wang SY, Zheng K, Lei LC, Chen QX, Gao YW, Zhou HQ, Xiang H, et al. Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human. Proc Natl Acad Sci U S A. 2005;102:2430–2435.
    1. Hon CC, Lam TY, Shi ZL, Drummond AJ, Yip CW, Zeng F, Lam PY, Leung FC. Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. J Virol. 2008;82:1819–1826.
    1. Pyrc K, Dijkman R, Deng L, Jebbink MF, Ross HA, Berkhout B, van der Hoek L. Mosaic structure of human coronavirus NL63, one thousand years of evolution. J Mol Biol. 2006;364:964–973.
    1. Vijaykrishna D, Smith GJ, Zhang JX, Peiris JS, Chen H, Guan Y. Evolutionary insights into the ecology of coronaviruses. J Virol. 2007;81:4012–4020.
    1. BEAST Home Page. (accessed on 1 August 2010)
    1. Lau SK, Li KS, Huang Y, Shek CT, Tse H, Wang M, Choi GK, Xu H, Lam CS, Guo R, Chan KH, Zheng BJ, Woo PC, Yuen KY. Ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related Rhinolophus bat coronavirus in China reveal bats as a reservoir for acute, self-limiting infection that allows recombination events. J Virol. 2010;84:2808–2819.
    1. Zeng F, Chow KY, Leung FC. Estimated timing of the last common ancestor of the SARS coronavirus. N Engl J Med. 2003;349:2469–2470.
    1. Salemi M, Fitch WM, Ciccozzi M, Ruiz-Alvarez MJ, Rezza G, Lewis MJ. Severe acute respiratory syndrome coronavirus sequence characteristics and evolutionary rate estimate from maximum likelihood analysis. J Virol. 2004;78:1602–1603.
    1. Zhao Z, Li H, Wu X, Zhong Y, Zhang K, Zhang YP, Boerwinkle E, Fu YX. Moderate mutation rate in the SARS coronavirus genome and its implications. BMC Evol Biol. 2004;4:21.
    1. Lai MM. RNA recombination in animal and plant viruses. Microbiol Rev. 1992;56:61–79.
    1. Pasternak AO, Spaan WJ, Snijder EJ. Nidovirus transcription: how to make sense. J Gen Virol. 2006;87:1403–1421.
    1. Herrewegh AA, Smeenk I, Horzinek MC, Rottier PJ, de Groot RJ. Feline coronavirus type II strains 79-1683 and 79-1146 originate from a double recombination between feline coronavirus type I and canine coronavirus. J Virol. 1998;72:4508–4514.
    1. Keck JG, Matsushima GK, Makino S, Fleming JO, Vannier DM, Stohlman SA, Lai MM. In vivo RNA-RNA recombination of coronavirus in mouse brain. J Virol. 1988;62:1810–1813.
    1. Kottier SA, Cavanagh D, Britton P. Experimental evidence of recombination in coronavirus infectious bronchitis virus. Virology. 1995;213:569–580.
    1. Lavi E, Haluskey JA, Masters PS. The pathogenesis of MHV nucleocapsid gene chimeric viruses. Adv Exp Med Biol. 1998;440:537–541.
    1. Motokawa K, Hohdatsu T, Aizawa C, Koyama H, Hashimoto H. Molecular cloning and sequence determination of the peplomer protein gene of feline infectious peritonitis virus type I. Arch Virol. 1995;140:469–480.
    1. Wesseling JG, Vennema H, Godeke GJ, Horzinek MC, Rottier PJ. Nucleotide sequence and expression of the spike (S) gene of canine coronavirus and comparison with the S proteins of feline and porcine coronaviruses. J Gen Virol. 1994;75(Pt 7):1789–1794.
    1. Herrewegh AA, Vennema H, Horzinek MC, Rottier PJ, de Groot RJ. The molecular genetics of feline coronaviruses: comparative sequence analysis of the ORF7a/7b transcription unit of different biotypes. Virology. 1995;212:622–631.
    1. Motokawa K, Hohdatsu T, Hashimoto H, Koyama H. Comparison of the amino acid sequence and phylogenetic analysis of the peplomer, integral membrane and nucleocapsid proteins of feline, canine and porcine coronaviruses. Microbiol Immunol. 1996;40:425–433.
    1. Woo PC, Lau SK, Tsoi HW, Huang Y, Poon RW, Chu CM, Lee RA, Luk WK, Wong GK, Wong BH, Cheng VC, Tang BS, Wu AK, Yung RW, Chen H, Guan Y, Chan KH, Yuen KY. Clinical and molecular epidemiological features of coronavirus HKU1-associated community-acquired pneumonia. J Infect Dis. 2005;192:1898–1907.
    1. Woo PC, Lau SK, Yip CC, Huang Y, Tsoi HW, Chan KH, Yuen KY. Comparative analysis of 22 coronavirus HKU1 genomes reveals a novel genotype and evidence of natural recombination in coronavirus HKU1. J Virol. 2006;80:7136–7145.
    1. Woo PC, Wong BH, Huang Y, Lau SK, Yuen KY. Cytosine deamination and selection of CpG suppressed clones are the two major independent biological forces that shape codon usage bias in coronaviruses. Virology. 2007;369:431–442.
    1. Huang Y, Lau SK, Woo PC, Yuen KY. CoVDB: a comprehensive database for comparative analysis of coronavirus genes and genomes. Nucleic Acids Res. 2008;36:D504–511.
    1. Woo PC, Lau SK, Huang Y, Yuen KY. Coronavirus diversity, phylogeny and interspecies jumping. Exp Biol Med (Maywood) 2009;234:1117–1127.

Source: PubMed

3
Sottoscrivi