Derivation of HLA types from shotgun sequence datasets

René L Warren, Gina Choe, Douglas J Freeman, Mauro Castellarin, Sarah Munro, Richard Moore, Robert A Holt, René L Warren, Gina Choe, Douglas J Freeman, Mauro Castellarin, Sarah Munro, Richard Moore, Robert A Holt

Abstract

The human leukocyte antigen (HLA) is key to many aspects of human physiology and medicine. All current sequence-based HLA typing methodologies are targeted approaches requiring the amplification of specific HLA gene segments. Whole genome, exome and transcriptome shotgun sequencing can generate prodigious data but due to the complexity of HLA loci these data have not been immediately informative regarding HLA genotype. We describe HLAminer, a computational method for identifying HLA alleles directly from shotgun sequence datasets (http://www.bcgsc.ca/platform/bioinfo/software/hlaminer). This approach circumvents the additional time and cost of generating HLA-specific data and capitalizes on the increasing accessibility and affordability of massively parallel sequencing.

Figures

Figure 1
Figure 1
Computational predictions of HLA-I from shotgun data by targeted assembly (left) or read alignment (right). For targeted assembly, NGS reads having their first fifteen 5' bases matching one of HLA CDS (RNA-Seq) or genomic (WGS/exon capture) sequences are recruited and assembled de novo with TASR. Resulting sequence contigs are aligned against a database sequence of all predicted HLA CDS (RNA-Seq) or genomic sequences (WGS/exon capture), tracking best HLA hit(s). Reciprocal best alignments are considered in the same manner. Putative allele assignments from shotgun datasets (HLAminer) are informed by contig length, depth of coverage and similarity to reference sequences, when applicable. The probability of each prediction being correct is estimated by determining the probability of that prediction being observed by chance.
Figure 2
Figure 2
HLAminer performance. HLA allele group and protein coding allele predictions derived from targeted read assembly (black symbols) or direct read alignment (grey symbols) of simulated 100-nucleotide RNA-Seq, WGS and exon capture (ExCap) datasets were compared to original, spiked-in, HLA sequences and performance metrics evaluated (ambiguity, sensitivity and specificity represented by circle, triangle and square symbols, respectively). HLAminer predictions were also obtained from targeted assembly of colorectal cancer (CRC; blue symbols), lymphoma (DLBCL; red, orange and yellow symbols), 1000 Genomes (1KG; green symbols) and ovarian cancer (OV; violet and magenta symbols) patient tumor (T) and/or matched normal (N) shotgun datasets and compared to PCR-based HLA types to calculate performance metrics.

References

    1. Carrington M, O'Brien SJ. The influence of HLA genotype on AIDS. Annu Rev Med. 2003;54:535–551. doi: 10.1146/annurev.med.54.101601.152346.
    1. Dawson DV, Ozgur M, Sari K, Ghanayem M, Kostyu DD. Ramifications of HLA class I polymorphism and population genetics for vaccine development. Genet Epidemiol. 2001;20:87–106. doi: 10.1002/1098-2272(200101)20:1<87::AID-GEPI8>;2-R.
    1. Fernando MM, Stevens CR, Walsh EC, De Jager PL, Goyette P, Plenge RM, Vyse TJ, Rioux JD. Defining the role of the MHC in autoimmunity: a review and pooled analysis. PLoS Genet. 2008;4:e1000024. doi: 10.1371/journal.pgen.1000024.
    1. Mizuki N, Meguro A, Ota M, Ohno S, Shiota T, Kawagoe T, Ito N, Kera J, Okada E, Yatsu K, Song YW, Lee EB, Kitaichi N, Namba K, Horie Y, Takeno M, Sugita S, Mochizuki M, Bahram S, Ishigatsubo Y, Inoko H. Genome-wide association studies identify IL23R-IL12RB2 and IL10 as Behçet's disease susceptibility loci. Nat Genet. 2010;42:703–706. doi: 10.1038/ng.624.
    1. Rioux JD, Goyette P, Vyse TJ, Hammarström L, Fernando MM, Green T, De Jager PL, Foisy S, Wang J, de Bakker PI, Leslie S, McVean G, Padyukov L, Alfredsson L, Annese V, Hafler DA, Pan-Hammarström Q, Matell R, Sawcer SJ, Compston AD, Cree BA, Mirel DB, Daly MJ, Behrens TW, Klareskog L, Gregersen PK, Oksenberg JR, Hauser SL. Mapping of multiple susceptibility variants within the MHC region for 7 immune-mediated diseases. Proc Natl Acad Sci USA. 2009;106:18680–18685.
    1. Ryder LP, Svejgaard A, Dausset J. Genetics of HLA disease association. Annu Rev Genet. 1981;15:169–187. doi: 10.1146/annurev.ge.15.120181.001125.
    1. Shugart YY, Wang Y, Jia WH, Zeng YX. GWAS signals across the HLA regions: revealing a clue for common etiology underlying infectious tumors and other immunity diseases. Chin J Cancer. 2011;30:226–230. doi: 10.5732/cjc.011.10075.
    1. Wang C, Krishnakumar S, Wilhelmy J, Babrzadeh F, Stepanyan L, Su LF, Levinson D, Fernandez-Viña MA, Davis RW, Davis MM, Mindrinos MN. High-throughput, high-fidelity HLA genotyping with deep sequencing. Proc Natl Acad Sci USA. 2012;109:8676–8681. doi: 10.1073/pnas.1206614109.
    1. Erlich RL, Jia X, Anderson S, Banks E, Gao X, Carrington M, Gupta N, DePristo MA, Henn MR, Lennon NJ, de Bakker PI. Next-generation sequencing for HLA typing of class I loci. BMC Genomics. 2011;12:42. doi: 10.1186/1471-2164-12-42.
    1. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature. 2004;431:931–945. doi: 10.1038/nature03001.
    1. Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. The Diploid Genome Sequence of an Individual Human. PLoS Biol. 2007;5:e254. doi: 10.1371/journal.pbio.0050254.
    1. Cancer Genome Atlas Research Network. McLendon R, Friedman A, Bigner D, Van Meir EG, Brat DJ, Mastrogianakis GM, Olson JJ, Mikkelsen T, Lehman N, Aldape K, Yung WK, Bogler O, Weinstein JN, VandenBerg S, Berger M, Prados M, Muzny D, Morgan M, Scherer S, Sabo A, Nazareth L, Lewis L, Hall O, Zhu Y, Ren Y, Alvi O, Yao J, Hawes A, Jhangiani S. et al.Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068. doi: 10.1038/nature07385.
    1. Human Microbiome Jumpstart Reference Strains Consortium. Nelson KE, Weinstock GM, Highlander SK, Worley KC, Creasy HH, Wortman JR, Rusch DB, Mitreva M, Sodergren E, Chinwalla AT, Feldgarden M, Gevers D, Haas BJ, Madupu R, Ward DV, Birren BW, Gibbs RA, Methe B, Petrosino JF, Strausberg RL, Sutton GG, White OR, Wilson RK, Durkin S, Giglio MG, Gujja S, Howarth C, Kodira CD, Kyrpides N. et al.A catalog of reference genomes from the human microbiome. Science. 2010;328:994–999.
    1. International Cancer Genome Consortium. Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR. et al.International network of cancer genome projects. Nature. 2010;464:993–998. doi: 10.1038/nature08987.
    1. 1000 Genomes Project Consortium. Altshuler D, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A, Clark AG, Collins FS, De La Vega FM, Donnelly P, Egholm M, Flicek P, Gabriel SB, Gibbs RA, Knoppers BM, Lander ES, Lehrach H, Mardis ER, McVean GA, Nickerson DA, Peltonen L, Schafer AJ, Sherry ST, Wang J, Wilson R, Gibbs RA, Deiros D, Metzker M, Muzny D, Reid J. et al.A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534.
    1. Castellarin M, Warren RL, Freeman JD, Dreolini L, Krzywinski M, Strauss J, Barnes R, Watson P, Allen-Vercoe E, Moore RA, Holt RA. Fusobacterium nucleatum infection is prevalent in human colorectal carcinoma. Genome Res. 2012;22:299–306. doi: 10.1101/gr.126516.111.
    1. Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL, Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM, Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ, Ben-Neriah S. et al.Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature. 2011;476:298–303. doi: 10.1038/nature10351.
    1. NCBI Sequence Read Archive.
    1. Sample libraries.
    1. Warren RL, Freeman JD, Zeng T, Choe G, Munro S, Moore R, Webb JR, Holt RA. Exhaustive T-cell repertoire sequencing of human peripheral blood samples reveals signatures of antigen selection and a directly measured repertoire size of at least 1 million clonotypes. Genome Res. 2011;21:790–797. doi: 10.1101/gr.115428.110.
    1. HLA CDS and genomic sequences.
    1. Robinson J, Waller MJ, Fail SC, McWilliam H, Lopez R, Parham P, Marsh SG. The IMGT/HLA database. Nucleic Acids Res. 2009;37:D1013–D1017. doi: 10.1093/nar/gkn662.
    1. Warren RL, Holt RA. Targeted assembly of short sequence reads. PLoS ONE. 2011;6:e19816. doi: 10.1371/journal.pone.0019816.
    1. Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics. 2007;23:500–501. doi: 10.1093/bioinformatics/btl629.
    1. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215:403–410.
    1. Flicek P, Amode MR, Barrell D, Beal K, Brent S, Chen Y, Clapham P, Coates G, Fairley S, Fitzgerald S, Gordon L, Hendrix M, Hourlier T, Johnson N, Kähäri A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Larsson P, Longden I, McLaren W, Overduin B, Pritchard B, Riat HS, Rios D, Ritchie GR, Ruffier M, Schuster M. et al.Ensembl 2011. Nucleic Acids Res. 2011;39:D800–D806. doi: 10.1093/nar/gkq1064.
    1. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, Brockman W, Fennell T, Giannoukos G, Fisher S, Russ C, Gabriel S, Jaffe DB, Lander ES, Nusbaum C. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol. 2009;27:182–189. doi: 10.1038/nbt.1523.
    1. SureSelect Target Enrichment.
    1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324.
    1. Cereb N, Maye P, Lee S, Kong Y, Yang SY. Locus-specific amplification of HLA class I genes from genomic DNA: Locus-specific sequences in the first and third introns of HLA-A, -B, and -C alleles. Tissue Antigens. 1995;45:1–11. doi: 10.1111/j.1399-0039.1995.tb02408.x.
    1. Phred/Phrap/Consed.
    1. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG. Clustal W and Clustal × version 2.0. Bioinformatics. 2007;23:2947–2948. doi: 10.1093/bioinformatics/btm404.
    1. Marsh SG, Albert ED, Bodmer WF, Bontrop RE, Dupont B, Erlich HA, Fernández-Viña M, Geraghty DE, Holdsworth R, Hurley CK, Lau M, Lee KW, Mach B, Maiers M, Mayr WR, Müller CR, Parham P, Petersdorf EW, Sasazuki T, Strominger JL, Svejgaard A, Terasaki PI, Tiercy JM, Trowsdale J. Nomenclature for factors of the HLA system, 2010. Tissue Antigens. 2010;75:291–455. doi: 10.1111/j.1399-0039.2010.01466.x.
    1. de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, Marchini J, Ke X, Monsuur AJ, Whittaker P, Delgado M, Morrison J, Richardson A, Walsh EC, Gao X, Galver L, Hart J, Hafler DA, Pericak-Vance M, Todd JA, Daly MJ, Trowsdale J, Wijmenga C, Vyse TJ, Beck S, Murray SS, Carrington M, Gregory S, Deloukas P, Rioux JD. A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC. Nat Genet. 2006;38:1166–1172. doi: 10.1038/ng1885.
    1. HLAminer.

Source: PubMed

3
Subscribe