Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples

Kristian Cibulskis, Michael S Lawrence, Scott L Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S Lander, Gad Getz, Kristian Cibulskis, Michael S Lawrence, Scott L Carter, Andrey Sivachenko, David Jaffe, Carrie Sougnez, Stacey Gabriel, Matthew Meyerson, Eric S Lander, Gad Getz

Abstract

Detection of somatic point substitutions is a key step in characterizing the cancer genome. However, existing methods typically miss low-allelic-fraction mutations that occur in only a subset of the sequenced cells owing to either tumor heterogeneity or contamination by normal cells. Here we present MuTect, a method that applies a Bayesian classifier to detect somatic mutations with very low allele fractions, requiring only a few supporting reads, followed by carefully tuned filters that ensure high specificity. We also describe benchmarking approaches that use real, rather than simulated, sequencing data to evaluate the sensitivity and specificity as a function of sequencing depth, base quality and allelic fraction. Compared with other methods, MuTect has higher sensitivity with similar specificity, especially for mutations with allelic fractions as low as 0.1 and below, making MuTect particularly useful for studying cancer subclones and their evolution in standard exome and genome sequencing data.

Figures

Figure 1
Figure 1
Overview of somatic point mutation detection using MuTect. MuTect takes as input tumor (T) and normal (N) next generation sequencing data and, after removing low quality reads (Supplementary Methods), determines if there is evidence for a variant beyond the expected random sequencing errors. Candidate variant sites are then passed through six filters to remove artifacts (Table 1). Next, a Panel of Normals is used to screen out remaining false positives caused by rare error modes only detectable in additional samples. Finally, the somatic or germline status of passing variants is determined using the matched normal.
Figure 2
Figure 2
Sensitivity as a function of sequencing depth and allelic fraction. (a) Sensitivity and specificity of MuTect for mutations with an allele fraction of 0.2, tumor depth of 30x and normal depth of 30x using various values of the LOD threshold (θT) (0.1 ≤ θT ≤ 100). Results using a model of independent sequencing errors with uniform Q35 base quality scores and accurate read placement (solid grey) are shown as well as results from the virtual tumor approach for the standard (STD, dashed green) and high-confidence (HC, solid green) configurations. A typical setting of θT = 6.3 is marked with black circles. (b) Sensitivity as a function of tumor sequencing depth and allele fraction (indicated by color) using θT = 6.3. The calculated sensitivity using a model of independent sequencing errors and accurate read placement with uniform Q35 base quality scores (solid lines) are shown as well as results from the virtual tumor approach (circles) and the downsampling of validated colorectal mutations (diamonds). Error bars represent 95% CIs.
Figure 3
Figure 3
Specificity of variant detection and variant classification using virtual tumor approach. (a) Somatic miscall error rate for true reference sites as a function of tumor sequencing depth for the STD (red), HC (blue) and HC+PON (green) configurations of MuTect. Error bars represent 95% CIs. (b) Distribution of allele fraction for all miscalls as a function of tumor sequencing depth. (c) Fraction of events rejected by each filter; hashed regions indicate events rejected exclusively by each filter. (d) Somatic miscall error rate for true germline SNP sites by sequencing depth in the normal when the site is known to be variant in the population (blue) and novel (red). Error bars represent 95% CIs. (e,f) Mean power as a function of sequencing depth in the normal to have classified these events as germline or somatic at novel germline sites (e) and known germline variant sites (f).
Figure 4
Figure 4
Benchmarking mutation detection methods. (a) Comparison of sensitivity as a function of tumor sequencing depth and mutation allele fraction for different mutation detection methods and configurations. (b) Comparison of somatic miscall error rate for true germline sites as a function of sequencing depth in the normal. (c) Comparison of somatic miscall error rate for true reference sites as a function of tumor sequencing depth. (d) Sensitivity as a function of specificity for mutations with an allele fraction of 0.1, tumor depth of 30x and normal depth of 30x between different methods and configurations. Black dotted lines indicate change in sensitivity and specificity between STD and HC configurations for a method. Grey solid lines are the MuTect results of virtual tumor approach from Supplementary Figure 3. (a–c) Error bars represent 95% CIs.

References

    1. Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011;474:609–615.
    1. Cancer Genome Atlas Research Network Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 2008;455:1061–1068.
    1. Banerji S, et al. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature. 2012;486:405–409.
    1. Stransky N, et al. The mutational landscape of head and neck squamous cell carcinoma. Science. 2011;333:1157–1160.
    1. Ding L, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature. 2008;455:1069–1075.
    1. Berger MF, et al. Melanoma genome sequencing reveals frequent PREX2 mutations. Nature. 2012;485:502–506.
    1. Network TCGA. Comprehensive molecular characterization of human colon and rectal cancer. Nature. 2012;487:330–337.
    1. Carter SL, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30:413–421.
    1. Walter MJ, et al. Clonal architecture of secondary acute myeloid leukemia. N Engl J Med. 2012;366:1090–1098.
    1. So Yeon Park MGHJKFMKP. Cellular and genetic diversity in the progression of in situ human breast carcinomas to an invasive phenotype. The Journal of Clinical Investigation. 2010;120:636.
    1. Nik-Zainal S, et al. The life history of 21 breast cancers. CELL. 2012;149:994–1007.
    1. Ding L, et al. Clonal evolution in relapsed acute myeloid leukaemia revealed by whole-genome sequencing. Nature. 2012;481:506–510.
    1. Landau DA, et al. Evolution and impact of subclonal mutations in chronic lymphocytic leukemia. CELL. 2012 Accepted.
    1. Campbell PJ, et al. The patterns and dynamics of genomic instability in metastatic pancreatic cancer. Nature. 2010;467:1109–1113.
    1. Navin N, et al. Tumour evolution inferred by single-cell sequencing. Nature. 2011;472:90–94.
    1. Hou Y, et al. Single-cell exome sequencing and monoclonal evolution of a JAK2-negative myeloproliferative neoplasm. CELL. 2012;148:873–885.
    1. Xu X, et al. Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor. CELL. 2012;148:886–895.
    1. International Cancer Genome Consortium et al. International network of cancer genome projects. Nature. 2010;464:993–998.
    1. Chapman MA, et al. Initial genome sequencing and analysis of multiple myeloma. Nature. 2011;471:467–472.
    1. Getz G, et al. Comment on “The consensus coding sequences of human breast and colorectal cancers”. Science. 2007;317:1500.
    1. Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317.
    1. Roth A, et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics. 2012;28:907–913.
    1. Saunders CT, et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811–1817.
    1. Barbieri CE, et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nature genetics. 2012;44:685–689.
    1. Bass AJ, et al. Genomic sequencing of colorectal adenocarcinomas identifies a recurrent VTI1A-TCF7L2 fusion. Nature genetics. 2011;43:964–968.
    1. Wang L, et al. SF3B1 and other novel cancer genes in chronic lymphocytic leukemia. N Engl J Med. 2011;365:2497–2506.
    1. Pugh TJ, et al. Medulloblastoma exome sequencing uncovers subtype-specific somatic mutations. Nature. 2012;488:106–110.
    1. Berger MF, et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–220.
    1. Lohr JG, et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc Natl Acad Sci USA. 2012;109:3879–3884.
    1. Imielinski M, et al. Mapping the Hallmarks of Lung Adenocarcinoma with Massively Parallel Sequencing. CELL. 2012;150:1107–1120.
    1. Wang P. Mutations in isocitrate dehydrogenase 1 and 2 occur frequently in intrahepatic cholangiocarcinomas and share hypermethylation targets with glioblastomas. Oncogene. 2012 doi: 10.1038/onc.2012.315.
    1. Durinck S, et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov. 2011;1:137–143.
    1. Lee RS, et al. A remarkably simple genome underlies highly malignant pediatric rhabdoid cancers. The Journal of Clinical Investigation. 2012;122:2983–2988.
    1. Cancer Genome Atlas Research Network et al. Comprehensive genomic characterization of squamous cell lung cancers. Nature. 2012;489:519–525.
    1. Hodis E, et al. A landscape of driver mutations in melanoma. CELL. 2012;150:251–263.
    1. Forbes SA, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–50.
    1. Pleasance ED, et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2009;463:191–196.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760.
    1. Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079.
    1. DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nature genetics. 2011;43:491–498.
    1. Sherry ST, et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311.
    1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073.
    1. Gnerre S, et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA. 2011;108:1513–1518.
    1. Shah SP, et al. Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature. 2009;461:809–813.
    1. Yachida S, et al. Distant metastasis occurs late during the genetic evolution of pancreatic cancer. Nature. 2010;467:1114–1117.
    1. Cibulskis K, et al. ContEst: estimating cross-contamination of human samples in next-generation sequencing data. Bioinformatics. 2011;27:2601–2602.

Source: PubMed

3
Suscribir