Discovery and saturation analysis of cancer genes across 21 tumour types

Michael S Lawrence, Petar Stojanov, Craig H Mermel, James T Robinson, Levi A Garraway, Todd R Golub, Matthew Meyerson, Stacey B Gabriel, Eric S Lander, Gad Getz, Michael S Lawrence, Petar Stojanov, Craig H Mermel, James T Robinson, Levi A Garraway, Todd R Golub, Matthew Meyerson, Stacey B Gabriel, Eric S Lander, Gad Getz

Abstract

Although a few cancer genes are mutated in a high proportion of tumours of a given type (>20%), most are mutated at intermediate frequencies (2-20%). To explore the feasibility of creating a comprehensive catalogue of cancer genes, we analysed somatic point mutations in exome sequences from 4,742 human cancers and their matched normal-tissue samples across 21 cancer types. We found that large-scale genomic analysis can identify nearly all known cancer genes in these tumour types. Our analysis also identified 33 genes that were not previously known to be significantly mutated in cancer, including genes related to proliferation, apoptosis, genome stability, chromatin regulation, immune evasion, RNA processing and protein homeostasis. Down-sampling analysis indicates that larger sample sizes will reveal many more genes mutated at clinically important frequencies. We estimate that near-saturation may be achieved with 600-5,000 samples per tumour type, depending on background mutation frequency. The results may help to guide the next stage of cancer genomics.

Figures

Figure 1
Figure 1
Mutation patterns for one known and two novel cancer genes. EGFR shows distinctive tumor-type-specific concentrations of mutations in different regions of the gene. RHEB, which encodes a small GTPase in the RAS superfamily, shows a mutational hotspot in the effector domain. RHOA, another a member of the RAS superfamily, also shows a mutational hotspot in the effector domain. Colored bars after tumor type names are copy-ratio distributions for the gene, when available (red=amplified, blue=deleted). See also Supplementary Figure 4. Similar diagrams for all genes are available at http://www.tumorportal.org.
Figure 2
Figure 2
Cancer genes in selected tumor types. Genes are arranged on the horizontal line according to p-value (combined value for the three tests in MutSig). Yellow region contains genes that achieve FDR q≤0.1. Orange interval contains p-values for the next 20 genes. Gene name color indicates whether the gene is a known cancer gene (blue), a novel gene with clear connection to cancer (red; discussed in text), or an additional novel gene (black). Circle color indicates the frequency (percent of patients carrying non-silent somatic mutations) in that tumor type. See also Supplementary Figure 5.
Figure 3
Figure 3
Cancer genes identified in 4742-tumor dataset. X-axis indicates the q-value (FDR) in the most significant of the 21 tumor types. Y-axis indicates the q-value when the 4742 tumors are analyzed as a combined cohort. Genes in the upper left quadrant reached significance only in the combined analysis. Genes in the lower right quadrant reached significance only in one or more single-type analyses. Genes in the upper right quadrant were significant in both the combined set and in individual tumor types. Color of gene names is as in Figure 2.
Figure 4
Figure 4
Down-sampling analysis shows that gene discovery is continuing as samples and tumor types are added. a. Analysis within tumor types. Each point represents a random subset of patients. Blue line is a smoothed fit. b. Analysis by adding tumor types. Each grey line represents a random ordering of the 21 tumor types. c. Analysis by adding samples. Each point is a random subset of the 4742 patients. d. Analysis in panel c broken down by mutation frequency. Genes mutated at frequencies ≥ 20% are nearing saturation, while intermediate frequencies show steep growth. See also Supplementary Figures 7, 8.
Figure 5
Figure 5
Number of samples needed to detect significantly mutated genes, as a function of a tumor type’s median background mutation frequency of (x-axis) and a cancer gene’s mutation rate above background (the various curves). Y-axis shows the number of samples needed to achieve 90% power for 90% of genes. Grey vertical lines indicate tumor type median background mutation frequencies. Black dots indicate sample sizes in the current study. For most tumor types, the current sample size is inadequate to reliably detect genes mutated at 5% or less above background. See also Supplementary Figure 9.

References

    1. Garraway LA, Lander ES. Lessons from the cancer genome. Cell. 2013;153:17–37.
    1. Vogelstein B, et al. Cancer genome landscapes. Science. 2013;339:1546–1558.
    1. Imielinski M, et al. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell. 2012;150:1107–1120.
    1. Carter SL, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol. 2012;30:413–421.
    1. Cibulskis K, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–219.
    1. Lawrence MS, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218.
    1. Zack TI, et al. Pan-cancer patterns of somatic copy number alteration. Nat Genet. 2013;45:1134–1140.
    1. Lohr JG, et al. Discovery and prioritization of somatic mutations in diffuse large B-cell lymphoma (DLBCL) by whole-exome sequencing. Proc Natl Acad Sci U S A. 2012;109:3879–3884.
    1. Cancer Genome Atlas Research, N et al. Integrated genomic characterization of endometrial carcinoma. Nature. 2013;497:67–73.
    1. Kandoth C, et al. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339.
    1. Tamborero D, et al. Comprehensive identification of mutational cancer driver genes across 12 tumor types. Sci Rep. 2013;3:2650.
    1. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144:646–674.
    1. Ferlay J, et al. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer. 2010;127:2893–2917.

Source: PubMed

3
Předplatit