Geographic Population Structure in Epstein-Barr Virus Revealed by Comparative Genomics

Matteo Chiara, Caterina Manzari, Claudia Lionetti, Rosella Mechelli, Eleni Anastasiadou, Maria Chiara Buscarinu, Giovanni Ristori, Marco Salvetti, Ernesto Picardi, Anna Maria D'Erchia, Graziano Pesole, David S Horner, Matteo Chiara, Caterina Manzari, Claudia Lionetti, Rosella Mechelli, Eleni Anastasiadou, Maria Chiara Buscarinu, Giovanni Ristori, Marco Salvetti, Ernesto Picardi, Anna Maria D'Erchia, Graziano Pesole, David S Horner

Abstract

Epstein-Barr virus (EBV) latently infects the majority of the human population and is implicated as a causal or contributory factor in numerous diseases. We sequenced 27 complete EBV genomes from a cohort of Multiple Sclerosis (MS) patients and healthy controls from Italy, although no variants showed a statistically significant association with MS. Taking advantage of the availability of ∼130 EBV genomes with known geographical origins, we reveal a striking geographic distribution of EBV sub-populations with distinct allele frequency distributions. We discuss mechanisms that potentially explain these observations, and their implications for understanding the association of EBV with human disease.

Keywords: Epstein-Barr virus; comparative genomics; genome sequence; population structure.

© The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Figures

Fig . 1.—
Fig. 1.—
Population structure, PCA and phenetic clustering of EBV genome sequences. (A) Barplot displaying the probability of provenance as inferred by Structure for all the 127 EBV genomes considered in this study, geographic origins are shown for each isolate. (B) Scatterplot of the PCA of 75 “pure” genomes. (C) Phenetic tree of the “pure” genomes. Different groups are indicated by colors and the root position is arbitrary. “Pure” genomes are defined as those where Structure assigned a ≥ 90% probability of provenance from a single population. Colors are consistent between panels A, B and C.
Fig . 2.—
Fig. 2.—
NeighborNet and Population allele frequency relationships. (A) NeighborNet analysis of 69 non-admixed representatives of inferred EBV sub-populations. NeighborNet resolves the same clusters of non-admixed genomes as Structure and phenetic clustering, highlighting conflicts that correspond to allele types shared between sub-populations. (B) Allele frequency bootstrap tree of possible relationships between inferred EBV subpopulations. The tree was estimated using Treemix (Pickrell and Pritchard 2012) using default parameters (no migration, no linkage disequilibrium).

References

    1. Abdel-Hamid M, Chen JJ, Constantine N, Massoud M, Raab-Traub NJ. 1992. EBV strain variation: geographical distribution and relation to disease state. Virology 190(1):168–175.
    1. Abi-Rached L, et al. 2011. The shaping of modern human immune systems by multiregional admixture with archaic humans. Science 334(6052):89–94.
    1. Allan GJ, Rowe DT. 1989. Size and stability of the Epstein-Barr virus major internal repeat (IR-1) in Burkitt's lymphoma and lymphoblastoid cell lines. Virology 173(2):489–498.
    1. Baer R, et al. 1984. DNA sequence and expression of the B95-8 Epstein-Barr virus genome. Nature 310(5974):207–211.
    1. Bankevich A, et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 19(5):455–477.
    1. Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114–2120.
    1. Dunn JC. 1973. A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybernet. 3 (3):32–57.
    1. Chen R, Holmes EC. 2009. Frequent inter-species transmission and geographic subdivision in avian influenza viruses from wild birds. Virology 383(1):156–161.
    1. de Campos-Lima PO, et al. 1993. HLA-A11 epitope loss isolates of Epstein-Barr virus from a highly A11+ population. Science 260(5104):98–100.
    1. Delcher AL, Phillippy A, Carlton J, Salzberg SL. 2002. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30(11):2478–2483.
    1. Edwards RH, Seillier-Moiseiwitsch F, Raab-Traub N. 1999. Signature amino acid changes in latent membrane protein 1 distinguish Epstein-Barr virus strains. Virology 261(1):79–95.
    1. Gouy M, Guindon S, Gascuel O. 2010. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 27(2):221–224.
    1. Hubisz MJ, Falush D, Stephens M, Pritchard JK. 2009. Inferring weak population structure with the assistance of sample group information. Mol Ecol Res. 9(5):1322–1332.
    1. Huson DH, Bryant D. 2006. Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 23(2):254–267.
    1. Koboldt DC, et al. 2009. VarScan: Variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25(17):2283–2285.
    1. Kwok H, et al. 2014. Genomic diversity of Epstein-Barr virus genomes isolated from primary nasopharyngeal carcinoma biopsy samples. J Virol 88(18):10662–10672.
    1. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods. 9(4):357–359.
    1. Lei H, et al. 2015. Epstein-Barr virus from Burkitt Lymphoma biopsies from Africa and South America share novel LMP-1 promoter and gene variations. Sci Rep. 23(5):16706.
    1. Liao D. 1999. Concerted evolution: molecular mechanism and biological implications. Am J Hum Genet. 64(1):24–30.
    1. McGeoch DJ, Gatherer D. 2007. Lineage structures in the genome sequences of three Epstein-Barr virus strains. Virology 359(1):1–5.
    1. Mechelli R, et al. 2015. Epstein-Barr virus genetic variants are associated with multiple sclerosis. Neurology 84:1362–1368.
    1. Nieweglowski L. 2013. Clv: Cluster Validation Techniques. CRAN
    1. Palser AL, et al. 2015. Genome diversity of Epstein-Barr virus from multiple tumor types and normal infection. J Virol. 89(10):5222–5237.
    1. Pickrell J, Pritchard JK. 2012. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8(11):e1002967..
    1. Polman CH, et al. 2011. Diagnostic criteria for multiple sclerosis: 2010 revisions to the McDonald criteria. Ann Neurol. 69(2):292–302.
    1. Price AL, Zaitlen NA, Reich D, Patterson N. 2010. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 11(7):459–463.
    1. Salvetti M, Giovannoni G, Aloisi F. 2009. Epstein-Barr virus and multiple sclerosis. Curr Opin Neurol. 22(3):201–206.
    1. Santón A, et al. 2011. High frequency of co-infection by Epstein-Barr virus types 1 and 2 in patients with multiple sclerosis. Mult Scler. 17(11):1295–1300.
    1. Shen ZC, et al. 2015. High prevalence of the EBER variant EB-8m in endemic nasopharyngeal carcinomas. Plos One 25(10):e0121420.
    1. Tierney RJ, et al. 2006. Multiple Epstein-Barr virus strains in patients with infectious mononucleosis: comparison of ex vivo samples with in vitro isolates by use of heteroduplex tracking assays. J Infect Dis. 193(2):287–297.
    1. Tzellos S, Farrell PJ. 2012. Epstein-Barr virus sequence variation—biology and disease. Pathogens 1:156–175.
    1. Walling DM, Raab-Traub N. 1994. Epstein-Barr virus intrastrain recombination in oral hairy leukoplakia. J Virol. 68(12):7909–7917.
    1. Wang J, et al. 2009. Package "robust". CRAN
    1. Zhang XS, et al. 2002. The 30-bp deletion variant: a polymorphism of latent membrane protein 1 prevalent in endemic and non-endemic areas of nasopharyngeal carcinomas in China. Cancer Lett. 176(1):65–73.

Source: PubMed

3
Abonnere