Targeted capture and massively parallel sequencing of 12 human exomes

Sarah B Ng, Emily H Turner, Peggy D Robertson, Steven D Flygare, Abigail W Bigham, Choli Lee, Tristan Shaffer, Michelle Wong, Arindam Bhattacharjee, Evan E Eichler, Michael Bamshad, Deborah A Nickerson, Jay Shendure, Sarah B Ng, Emily H Turner, Peggy D Robertson, Steven D Flygare, Abigail W Bigham, Choli Lee, Tristan Shaffer, Michelle Wong, Arindam Bhattacharjee, Evan E Eichler, Michael Bamshad, Deborah A Nickerson, Jay Shendure

Abstract

Genome-wide association studies suggest that common genetic variants explain only a modest fraction of heritable risk for common diseases, raising the question of whether rare variants account for a significant fraction of unexplained heritability. Although DNA sequencing costs have fallen markedly, they remain far from what is necessary for rare and novel variants to be routinely identified at a genome-wide scale in large cohorts. We have therefore sought to develop second-generation methods for targeted sequencing of all protein-coding regions ('exomes'), to reduce costs while enriching for discovery of highly penetrant variants. Here we report on the targeted capture and massively parallel sequencing of the exomes of 12 humans. These include eight HapMap individuals representing three populations, and four unrelated individuals with a rare dominantly inherited disorder, Freeman-Sheldon syndrome (FSS). We demonstrate the sensitive and specific identification of rare and common variants in over 300 megabases of coding sequence. Using FSS as a proof-of-concept, we show that candidate genes for Mendelian disorders can be identified by exome sequencing of a small number of unrelated, affected individuals. This strategy may be extendable to diseases with more complex genetics through larger sample sizes and appropriate weighting of non-synonymous variants by predicted functional impact.

Figures

Figure 1. Minor allele frequency and coding…
Figure 1. Minor allele frequency and coding indel length distributions
(a) The distribution of minor allele frequencies is shown for previously annotated versus novel cSNPs. (b) The distribution of minor allele frequencies is shown for synonymous versus nonsynonymous cSNPs. (c) The distribution of minor allele frequencies (by proportion, rather than count) is shown for synonymous cSNPs (n = 21,201) versus nonsynonymous cSNPs predicted to be benign (n = 13,295), possibly damaging (n = 3,368), or probably damaging (n = 2,227) by PolyPhen. (d) The distribution of lengths of coding insertion-deletion variants is shown (average numbers per exome). Error bars indicate s.d.
Figure 2. Direct identification of the causal…
Figure 2. Direct identification of the causal gene for a monogenic disorder by exome sequencing
Boxes list the number of genes with 1+ nonsynonymous cSNP, splice-site SNP, or coding indel (“NS/SS/I”) meeting specified filters. Columns show the effect of requiring that 1+ NS/SS/I variants be observed in each of 1 to 4 affected individuals. Rows show the effect of excluding from consideration variants found in dbSNP, the 8 HapMap exomes, or both. Column 5 models limited genetic heterogeneity or data incompleteness by relaxing criteria such that variants need only be observed in any 3 of 4 exomes for a gene to qualify.

References

    1. Cohen JC, et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305(5685):869–872.
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nature reviews. 2009;10(4):241–251.
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol. 2008;26(10):1135–1145.
    1. IHC A haplotype map of the human genome. Nature. 2005;437(7063):1299–1320.
    1. Toydemir RM, et al. Mutations in embryonic myosin heavy chain (MYH3) cause Freeman-Sheldon syndrome and Sheldon-Hall syndrome. Nature genetics. 2006;38(5):561–565.
    1. Sjoblom T, et al. The consensus coding sequences of human breast and colorectal cancers. Science. 2006;314(5797):268–274.
    1. Olson M. Enrichment of super-sized resequencing targets from the human genome. Nat Methods. 2007;4(11):891–892.
    1. Hodges E, et al. Genome-wide in situ exon capture for selective resequencing. Nature genetics. 2007;39(12):1522–1527.
    1. .
    1. Ng PC, et al. Genetic variation in an individual human exome. PLoS Genet. 2008;4(8):e1000160.
    1. Kidd JM, et al. Mapping and sequencing of structural variation from eight human genomes. Nature. 2008;453(7191):56–64.
    1. Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456(7218):53–59.
    1. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008
    1. Campbell PJ, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nature genetics. 2008;40(6):722–729.
    1. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8(3):186–194.
    1. Turner EH, Lee C, Ng SB, Shendure J. Massively parallel exon capture and library-free resequencing across 16 individuals. Nat Methods. 2009 Apr 6; Advanced Online Publication.
    1. Kidd JM, et al. Haplotype sorting using human fosmid clone end-sequence pairs. Genome Res. 2008;18(12):2016–2023.
    1. Albert TJ, et al. Direct selection of human genomic loci by microarray hybridization. Nat Methods. 2007;4(11):903–905.
    1. Wheeler DA, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature. 2008;452(7189):872–876.
    1. Wang J, et al. The diploid genome sequence of an Asian individual. Nature. 2008;456(7218):60–65.
    1. Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5(10):e254.
    1. Ley TJ, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456(7218):66–72.
    1. Boyko AR, et al. Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet. 2008;4(5):e1000083.
    1. Sunyaev S, et al. Prediction of deleterious human alleles. Hum Mol Genet. 2001;10(6):591–597.
    1. Yngvadottir B, et al. A genome-wide survey of the prevalence and evolutionary forces acting on human nonsense SNPs. Am J Hum Genet. 2009;84(2):224–234.
    1. Olson MV. When less is more: gene loss as an engine of evolutionary change. Am J Hum Genet. 1999;64(1):18–23.
    1. Cohen J, et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nature genetics. 2005;37(2):161–165.
    1. Jones S, et al. Exomic Sequencing Identifies PALB2 as a Pancreatic Cancer Susceptibility Gene. Science. 2009
    1. Siva N. 1000 Genomes project. Nat Biotechnol. 2008;26(3):256.
    1. Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR. Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci U S A. 2009

Source: PubMed

3
Předplatit