Universal Patterns of Selection in Cancer and Somatic Tissues

Iñigo Martincorena, Keiran M Raine, Moritz Gerstung, Kevin J Dawson, Kerstin Haase, Peter Van Loo, Helen Davies, Michael R Stratton, Peter J Campbell, Iñigo Martincorena, Keiran M Raine, Moritz Gerstung, Kevin J Dawson, Kerstin Haase, Peter Van Loo, Helen Davies, Michael R Stratton, Peter J Campbell

Abstract

Cancer develops as a result of somatic mutation and clonal selection, but quantitative measures of selection in cancer evolution are lacking. We adapted methods from molecular evolution and applied them to 7,664 tumors across 29 cancer types. Unlike species evolution, positive selection outweighs negative selection during cancer development. On average, <1 coding base substitution/tumor is lost through negative selection, with purifying selection almost absent outside homozygous loss of essential genes. This allows exome-wide enumeration of all driver coding mutations, including outside known cancer genes. On average, tumors carry ∼4 coding substitutions under positive selection, ranging from <1/tumor in thyroid and testicular cancers to >10/tumor in endometrial and colorectal cancers. Half of driver substitutions occur in yet-to-be-discovered cancer genes. With increasing mutation burden, numbers of driver mutations increase, but not linearly. We systematically catalog cancer genes and show that genes vary extensively in what proportion of mutations are drivers versus passengers.

Keywords: cancer; evolution; genomics; mutations; selection.

Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

Figures

Graphical abstract
Graphical abstract
Figure S1
Figure S1
Impact of Different Confounding Factors on Analyses of Selection, Related to Figures 1–5 This includes simplistic substitution models, SNP contamination, SNP filtering and inadequate background models of the variation of the mutation rate. (A) Impact of simplistic mutation models on the accuracy of dN/dS in different scenarios. Each boxplot represents the dN/dS ratios estimated from 100 neutral simulations of 10,000 random coding substitutions. To exemplify the impact on dN/dS of different mutational spectra, we simulated neutral datasets using the trinucleotide spectra observed in the three different cohorts of samples (pancancer, melanoma and lung adenocarcinoma). Different panels depict dN/dS ratios for missense (ωmis) or nonsense (ωnon) mutations. (B) Simulations of the impact on dN/dS of germline SNP contamination and SNP over-filtering in catalogs of somatic mutations. 10 neutral datasets were generated by local randomization of 607 cancer whole-genomes (Alexandrov et al., 2013). Datasets with varying degrees of germline SNP contamination were simulated by adding 5% or 10% of germline common SNPs (minor allele frequency > = 5%) from 1000 genomes phase 3 (Auton et al., 2015) to the neutral simulations. Datasets with varying levels of SNP over-filtering were simulated by removing any mutation from the neutral datasets that overlapped a polymorphic site in dbSNP build 146 (either using common sites or all sites) (Sherry et al., 2001). (C) Percentage of mutations from the public TCGA catalogs of somatic calls that overlap a common dbSNP site. Based on simulations, an overlap of 1%–3% might be expected depending on the dominant mutational signatures present in a dataset, but several public TCGA catalogs show a much higher overlap suggesting extensive germline SNP contamination. As predicted from (B), this leads to an artifactual signal of negative selection in these datasets (STAR Methods). (D) Consistency between genome-wide dN/dS estimates using the trinucleotide and pentanucleotide substitution models across cancer types. Green dots represent genome-wide dN/dS estimates for each cancer type separately, and the orange dot depicts the pancancer estimates (using the 24 cancer types with CaVEMan mutation calls). (E) Corresponding estimates of the average number of driver coding substitutions per tumor. For the purpose of estimating the excess of mutations from dN/dS ratios, dN/dS values below 1 are set to 1. Error bars depict 95% CIs. (F) Simulations demonstrating the validity of estimating dN/dS at a cohort level, in heterogeneous cohorts of samples without patient-specific substitution models. The three scenarios simulated include extreme examples of heterogeneous mixtures of samples with variable signatures, numbers of mutations and selection. In each scenario, the correct fraction of mutations removed by negative selection across samples is shown as a blue horizontal line (right y axis). Estimated dN/dS values from five simulations of each scenario are shown as dots with CIs (left y axis).
Figure S2
Figure S2
Evaluation of the Relative Performance of the Three Different dN/dS Models for the Detection of Positive Selection at Gene Level, Related to Figure 2 (A) QQ-plots for the different dN/dS models on a neutral dataset obtained by randomization of 107 melanoma whole-genomes from ICGC (STAR Methods). The dNdSunif model shows a great inflation of low P-values, leading to a large number of false positives after multiple testing correction (368 genes with q-value < 0.05), and should be generally avoided. In contrast, both dNdSloc and dNdScv behave as expected for a neutral dataset, yielding no significant hits after multiple testing correction. (B) Sensitivity of dNdScv and dNdSloc. The bar plot depicts the number of significant genes (q-value < 0.05) identified by both methods in the 29 TCGA datasets. Bars colored in a lighter shade show the number of significant genes that are present in the Cancer Gene Census version 73 (Forbes et al., 2015). dNdScv shows good specificity and sensitivity under all tested conditions (STAR Methods). (C) Comparison of the number of significant genes found by dNdScv (top) and the indel model (bottom) in their default configuration (unique-sites model for indels) when including and excluding MSI samples. (D–G) Gamma distributions and log-likelihood surfaces of dNdScv on a number of genes and datasets. (D,F) Density functions of the Gamma distributions for substitutions and indels inferred by the negative binomial regression in dNdScv for two datasets (Lung-SCC and Pancancer). The Gamma distributions shown have a mean = 1, showing the spread around the mean observed across genes in each dataset. This reflects the extent of the variation of the mutation rate across genes that remains unexplained by sequence composition, signatures and covariates. (E,G) Log-likelihood ratio values for the number of missense mutations in three genes (PTEN, CDKN2A and MUC16) in the Lung-SCC (n = 167 samples) and Pancancer datasets (n = 7,664) under dNdSloc and dNdScv. The real observed number of missense mutations in each gene and dataset is shown as a vertical green line. The figures show how in small genes and/or small datasets, dNdScv has much narrower curves and much more significant P-values for cancer genes thanks to the Gamma constraint, while dNdScv and dNdSloc converge when the local number of synonymous mutations is sufficiently high. This adaptive behavior of dNdScv results from the joint likelihood equation.
Figure 1
Figure 1
Genome-wide dN/dS Ratios Show a Distinct Pattern of Selection Universally Shared across Cancer Types (A) Species evolution: median dN/dS ratios across genes for missense mutations (data from Martincorena et al. [2012] and Ensembl). Data on germline human SNPs are from the 1,000 genomes phase 3 (Auton et al., 2015), restricted to SNPs with minor allele frequency ≥5%. (B) Cancer evolution: genome-wide dN/dS values for missense and nonsense mutations across 23 cancer types. (C) Somatic mutations in normal tissues (data from Blokzijl et al., 2016, Martincorena et al., 2015, Welch et al., 2012). Error bars depict 95% CIs. See also Figure S1 and Table S1.
Figure 2
Figure 2
Positively Selected Genes (Drivers) in Cancer Genomes (A) List of genes detected under significant positive selection (dN/dS >1) in each of the 29 cancer types. Y axes show the percentage of patients carrying a non-synonymous substitution or an indel in each gene. The color of the dot reflects the significance of each gene. RHT, restricted hypothesis testing on known cancer genes (Table S2). (B) Pancancer dN/dS values for missense and nonsense mutations for genes with significant positive selection on missense mutations (depicted in red) and/or truncating substitutions. See also Figures S1 and S2.
Figure 3
Figure 3
Negative Selection in Cancer (A) Distributions of dN/dS values per gene for missense mutations in non-LOH regions. The real distribution is shown in gray and the distribution observed in a neutral simulation is shown in purple. (B) Underlying distribution of dN/dS values across genes inferred from the observed distribution. (C) Estimated percentage of genes under different levels of positive and negative selection based on the inferred dN/dS distribution in (B). (D) Average number of selected mutations per tumor based on the inferred distributions of dN/dS across genes, combining missense and truncating mutations from all copy number regions. Error bars depict 95% CIs. (E) Power calculation for the statistical detection of negative selection (dN/dS 80%. Vertical lines indicate the range in which the middle 50% and 95% of genes are in the dataset of 7,664 tumors. (F) Average mutation burden in genes grouped according to gene expression quintile and chromatin state. (G) Average dN/dS values for genes grouped according to gene expression quintile, chromatin state, and essentiality. (H) Average dN/dS values for all mutations in genes found to be haploinsufficient in the human germline, including and excluding putative driver genes. Haploinsufficient genes are defined as those having a pLI score >0.9 in the ExAC database (Lek et al., 2016). See also Figures S1 and S3.
Figure S3
Figure S3
Supplementary Analyses on Negative Selection, Related to Figure 3 (A–D) dN/dS distributions inferred for different mutation types and copy number states. These distributions, obtained as described for Figure 3C, represent the percentage of genes estimated to be under a certain selection regime. The four distributions correspond to: missense (A) and truncating (B) substitutions in regions without loss of heterozygosity, and missense and truncating substitutions in haploid regions (C and D, respectively). Note that (A) is an extension of Figure 3C, with an added middle bar for genes with dN/dS very close to 1 (0.9-1.1), which can be considered to evolve largely neutrally. Only samples with CaVEMan mutation calls, excluding melanoma samples, were considered for this analysis for the reasons explained in the Methods. For each figure, all mutations with the appropriate ploidy were included in the analysis and only genes with at least one mutation (either synonymous or non-synonymous) participate in the fitting of dN/dS distributions. Hence, the percentages of genes shown in the y-axes are relative to the total number of genes with at least one mutation in regions with the ploidy considered in each figure. Error bars depict 95% CIs. (E) Gene ontology groups deviating significantly from neutrality after removing known cancer genes. 27 gene ontology classes are found to be under significant positive selection after comprehensively removing 987 known putative cancer genes. This suggests the presence of undiscovered cancer genes in these functional groups. No gene ontology class was found to be under significant negative selection. Error bars depict 95% CIs.
Figure 4
Figure 4
Average Number of Driver Mutations in Tumors with

Figure 5

Selection in Hypermutator Tumors (A)…

Figure 5

Selection in Hypermutator Tumors (A) dN/dS and estimated number of driver mutations per…

Figure 5
Selection in Hypermutator Tumors (A) dN/dS and estimated number of driver mutations per tumor grouping samples in 20 equal-sized bins according to mutation burden. This analysis excludes melanoma samples and uses a pentanucleotide substitution model to minimize mutational biases. (B) Heatmap depicting the fraction of mutations in 288 hypermutator samples (>1,000 mutations/exome) attributed to different mutational signatures (Alexandrov et al., 2013). (C) Left: dN/dS ratios (trinucleotide model) for each class of hypermutators. Right: dN/dS ratios from a neutral simulated dataset of POLE mutations. This neutral dataset was generated by randomizing all non-coding substitutions from five POLE hypermutator whole-genomes to a different site with an identical 9-nucleotide context, within 1-megabase of its original position. (D) Stacked bar plot showing the frequency of each base around C > A and C > T substitutions in POLE hypermutator tumors. (E–G) Conservative estimation of the fraction (F) and absolute number (G) of driver coding substitutions in known cancer genes. To obtain these estimates, dN/dS ratios for known cancer genes were normalized by those from putative passenger genes, to conservatively remove mutational biases from dN/dS. Application of this approach to our tissue-specific estimates in Figure 4A yields analogous results (E).

Figure S4

Supplementary Analyses on the Number…

Figure S4

Supplementary Analyses on the Number of Coding Driver Substitutions per Tumor, Related to…

Figure S4
Supplementary Analyses on the Number of Coding Driver Substitutions per Tumor, Related to Figure 4 (A) Comparison of the number of coding driver substitutions estimated by dN/dS and the number estimated by manual annotation of driver mutations across 560 breast cancers. The figure depicts the total number of coding substitutions (gray bar) and the estimated number of driver substitutions in a list of 723 putative cancer genes across 560 breast cancer whole-genomes. A total of 2,786 coding substitutions are found in these genes across the 560 patients (data from Nik-Zainal et al., 2016). Of these, 579 were annotated as likely driver mutations by a careful and conservative manual curation in the original publication (Nik-Zainal et al., 2016) (blue bar). Using the trinucleotide dN/dS model on this dataset, restricted to these 723 genes, yielded a global dN/dS for all non-synonymous substitutions of 1.42 (CI95%: 1.29, 1.58). Reassuringly, this led to an estimated number of drivers consistent with the manual annotation: 668.9 (CI95%: 507.5, 815.3). Error bars depict 95% CIs. (B) Scatterplot of the estimated average number of coding driver substitutions per tumor in 369 known cancer genes and in all genes of the genome. This is a scatterplot representation of the bottom panels of Figures 4A and 4B, to emphasize the extent of coding driver substitutions occurring outside of the list of 369 cancer genes. Error bars depict 95% CIs. Note that the two cancer types whose estimates appear under the diagonal (mesothelioma –MESO- and thymoma –THYM-) have CIs extending above the diagonal, as expected. (C) Number of driver coding substitutions per tumor by clinical stage (see STAR Methods for details and interpretation). The panels compare stage I and stage IV tumors for the datasets with available clinical annotation, using either dN/dS-based estimates of the numbers of drivers per tumor (top panel) or raw counts of non-synonymous mutations in known cancer genes (bottom panel). Briefly, no consistent and statistically significant differences were observed.
All figures (10)
Figure 5
Figure 5
Selection in Hypermutator Tumors (A) dN/dS and estimated number of driver mutations per tumor grouping samples in 20 equal-sized bins according to mutation burden. This analysis excludes melanoma samples and uses a pentanucleotide substitution model to minimize mutational biases. (B) Heatmap depicting the fraction of mutations in 288 hypermutator samples (>1,000 mutations/exome) attributed to different mutational signatures (Alexandrov et al., 2013). (C) Left: dN/dS ratios (trinucleotide model) for each class of hypermutators. Right: dN/dS ratios from a neutral simulated dataset of POLE mutations. This neutral dataset was generated by randomizing all non-coding substitutions from five POLE hypermutator whole-genomes to a different site with an identical 9-nucleotide context, within 1-megabase of its original position. (D) Stacked bar plot showing the frequency of each base around C > A and C > T substitutions in POLE hypermutator tumors. (E–G) Conservative estimation of the fraction (F) and absolute number (G) of driver coding substitutions in known cancer genes. To obtain these estimates, dN/dS ratios for known cancer genes were normalized by those from putative passenger genes, to conservatively remove mutational biases from dN/dS. Application of this approach to our tissue-specific estimates in Figure 4A yields analogous results (E).
Figure S4
Figure S4
Supplementary Analyses on the Number of Coding Driver Substitutions per Tumor, Related to Figure 4 (A) Comparison of the number of coding driver substitutions estimated by dN/dS and the number estimated by manual annotation of driver mutations across 560 breast cancers. The figure depicts the total number of coding substitutions (gray bar) and the estimated number of driver substitutions in a list of 723 putative cancer genes across 560 breast cancer whole-genomes. A total of 2,786 coding substitutions are found in these genes across the 560 patients (data from Nik-Zainal et al., 2016). Of these, 579 were annotated as likely driver mutations by a careful and conservative manual curation in the original publication (Nik-Zainal et al., 2016) (blue bar). Using the trinucleotide dN/dS model on this dataset, restricted to these 723 genes, yielded a global dN/dS for all non-synonymous substitutions of 1.42 (CI95%: 1.29, 1.58). Reassuringly, this led to an estimated number of drivers consistent with the manual annotation: 668.9 (CI95%: 507.5, 815.3). Error bars depict 95% CIs. (B) Scatterplot of the estimated average number of coding driver substitutions per tumor in 369 known cancer genes and in all genes of the genome. This is a scatterplot representation of the bottom panels of Figures 4A and 4B, to emphasize the extent of coding driver substitutions occurring outside of the list of 369 cancer genes. Error bars depict 95% CIs. Note that the two cancer types whose estimates appear under the diagonal (mesothelioma –MESO- and thymoma –THYM-) have CIs extending above the diagonal, as expected. (C) Number of driver coding substitutions per tumor by clinical stage (see STAR Methods for details and interpretation). The panels compare stage I and stage IV tumors for the datasets with available clinical annotation, using either dN/dS-based estimates of the numbers of drivers per tumor (top panel) or raw counts of non-synonymous mutations in known cancer genes (bottom panel). Briefly, no consistent and statistically significant differences were observed.

References

    1. Alexandrov L.B., Nik-Zainal S., Wedge D.C., Aparicio S.A., Behjati S., Biankin A.V., Bignell G.R., Bolli N., Borg A., Børresen-Dale A.L., Australian Pancreatic Cancer Genome Initiative. ICGC Breast Cancer Consortium. ICGC MMML-Seq Consortium. ICGC PedBrain Signatures of mutational processes in human cancer. Nature. 2013;500:415–421.
    1. Armitage P., Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br. J. Cancer. 1954;8:1–12.
    1. Auton A., Brooks L.D., Durbin R.M., Garrison E.P., Kang H.M., Korbel J.O., Marchini J.L., McCarthy S., McVean G.A., Abecasis G.R., 1000 Genomes Project Consortium A global reference for human genetic variation. Nature. 2015;526:68–74.
    1. Beckman R.A., Loeb L.A. Negative clonal selection in tumor evolution. Genetics. 2005;171:2123–2131.
    1. Blokzijl F., de Ligt J., Jager M., Sasselli V., Roerink S., Sasaki N., Huch M., Boymans S., Kuijk E., Prins P. Tissue-specific mutation accumulation in human adult stem cells during life. Nature. 2016;538:260–264.
    1. Blomen V.A., Májek P., Jae L.T., Bigenzahn J.W., Nieuwenhuis J., Staring J., Sacco R., van Diemen F.R., Olk N., Stukalov A. Gene essentiality and synthetic lethality in haploid human cells. Science. 2015;350:1092–1096.
    1. Bolli N., Avet-Loiseau H., Wedge D.C., Van Loo P., Alexandrov L.B., Martincorena I., Dawson K.J., Iorio F., Nik-Zainal S., Bignell G.R. Heterogeneity of genomic evolution and mutational profiles in multiple myeloma. Nat. Commun. 2014;5:2997.
    1. Cairns J. Mutation selection and the natural history of cancer. Nature. 1975;255:197–200.
    1. Castro-Giner F., Ratcliffe P., Tomlinson I. The mini-driver model of polygenic cancer evolution. Nat. Rev. Cancer. 2015;15:680–685.
    1. Church D.N., Briggs S.E., Palles C., Domingo E., Kearsey S.J., Grimes J.M., Gorman M., Martin L., Howarth K.M., Hodgson S.V., NSECG Collaborators DNA polymerase ε and δ exonuclease domain mutations in endometrial cancer. Hum. Mol. Genet. 2013;22:2820–2828.
    1. Dias J., Van Nguyen N., Georgiev P., Gaub A., Brettschneider J., Cusack S., Kadlec J., Akhtar A. Structural analysis of the KANSL1/WDR5/KANSL2 complex reveals that WDR5 is required for efficient assembly and chromatin targeting of the NSL complex. Genes Dev. 2014;28:929–942.
    1. Forbes S.A., Beare D., Gunasekaran P., Leung K., Bindal N., Boutselakis H., Ding M., Bamford S., Cole C., Ward S. COSMIC: exploring the world’s knowledge of somatic mutations in human cancer. Nucleic Acids Res. 2015;43:D805–D811.
    1. Fredriksson N.J., Ny L., Nilsson J.A., Larsson E. Systematic analysis of noncoding somatic mutations and gene expression alterations across 14 tumor types. Nat. Genet. 2014;46:1258–1263.
    1. Galloway A., Saveliev A., Łukasiak S., Hodson D.J., Bolland D., Balmanno K., Ahlfors H., Monzón-Casanova E., Mannurita S.C., Bell L.S. RNA-binding proteins ZFP36L1 and ZFP36L2 promote cell quiescence. Science. 2016;352:453–459.
    1. Gerstung M., Papaemmanuil E., Campbell P.J. Subclonal variant calling with multiple samples and prior knowledge. Bioinformatics. 2014;30:1198–1204.
    1. Gerstung M., Papaemmanuil E., Martincorena I., Bullinger L., Gaidzik V.I., Paschka P., Heuser M., Thol F., Bolli N., Ganly P. Precision oncology for acute myeloid leukemia using a knowledge bank approach. Nat. Genet. 2017;49:332–340.
    1. Goldman N., Yang Z. A codon-based model of nucleotide substitution for protein-coding DNA sequences. Mol. Biol. Evol. 1994;11:725–736.
    1. Greenman C., Wooster R., Futreal P.A., Stratton M.R., Easton D.F. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics. 2006;173:2187–2198.
    1. Greenman C., Stephens P., Smith R., Dalgliesh G.L., Hunter C., Bignell G., Davies H., Teague J., Butler A., Stevens C. Patterns of somatic mutation in human cancer genomes. Nature. 2007;446:153–158.
    1. Haradhvala N.J., Polak P., Stojanov P., Covington K.R., Shinbrot E., Hess J.M., Rheinbay E., Kim J., Maruvka Y.E., Braunstein L.Z. Mutational strand asymmetries in cancer genomes reveal mechanisms of dna damage and repair. Cell. 2016;164:538–549.
    1. Hause R.J., Pritchard C.C., Shendure J., Salipante S.J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 2016;22:1342–1350.
    1. Jones D., Raine K.M., Davies H., Tarpey P.S., Butler A.P., Teague J.W., Nik-Zainal S., Campbell P.J. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinformatics. 2016;56:15.10.11–15.10.18.
    1. Kandoth C., McLellan M.D., Vandin F., Ye K., Niu B., Lu C., Xie M., Zhang Q., McMichael J.F., Wyczalkowski M.A. Mutational landscape and significance across 12 major cancer types. Nature. 2013;502:333–339.
    1. Kryazhimskiy S., Plotkin J.B. The population genetics of dN/dS. PLoS Genet. 2008;4:e1000304.
    1. Kundaje A., Meuleman W., Ernst J., Bilenky M., Yen A., Heravi-Moussavi A., Kheradpour P., Zhang Z., Wang J., Ziller M.J., Roadmap Epigenomics Consortium Integrative analysis of 111 reference human epigenomes. Nature. 2015;518:317–330.
    1. Lawrence M.S., Stojanov P., Polak P., Kryukov G.V., Cibulskis K., Sivachenko A., Carter S.L., Stewart C., Mermel C.H., Roberts S.A. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013;499:214–218.
    1. Lawrence M.S., Stojanov P., Mermel C.H., Robinson J.T., Garraway L.A., Golub T.R., Meyerson M., Gabriel S.B., Lander E.S., Getz G. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495–501.
    1. Lee W., Jiang Z., Liu J., Haverty P.M., Guan Y., Stinson J., Yue P., Zhang Y., Pant K.P., Bhatt D. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature. 2010;465:473–477.
    1. Lee H., Palm J., Grimes S.M., Ji H.P. The Cancer Genome Atlas Clinical Explorer: a web and mobile interface for identifying clinical-genomic driver associations. Genome Med. 2015;7:112.
    1. Lek M., Karczewski K.J., Minikel E.V., Samocha K.E., Banks E., Fennell T., O’Donnell-Luria A.H., Ware J.S., Hill A.J., Cummings B.B., Exome Aggregation Consortium Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291.
    1. Martincorena I., Campbell P.J. Somatic mutation in cancer and normal cells. Science. 2015;349:1483–1489.
    1. Martincorena I., Seshasayee A.S., Luscombe N.M. Evidence of non-random mutation rates suggests an evolutionary risk management strategy. Nature. 2012;485:95–98.
    1. Martincorena I., Roshan A., Gerstung M., Ellis P., Van Loo P., McLaren S., Wedge D.C., Fullam A., Alexandrov L.B., Tubio J.M. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science. 2015;348:880–886.
    1. McFarland C.D., Korolev K.S., Kryukov G.V., Sunyaev S.R., Mirny L.A. Impact of deleterious passenger mutations on cancer progression. Proc. Natl. Acad. Sci. USA. 2013;110:2910–2915.
    1. McFarland C.D., Mirny L.A., Korolev K.S. Tug-of-war between driver and passenger mutations in cancer and other adaptive processes. Proc. Natl. Acad. Sci. USA. 2014;111:15138–15143.
    1. McGranahan N., Furness A.J., Rosenthal R., Ramskov S., Lyngaa R., Saini S.K., Jamal-Hanjani M., Wilson G.A., Birkbak N.J., Hiley C.T. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science. 2016;351:1463–1469.
    1. Miyata T., Yasunaga T. Molecular evolution of mRNA: a method for estimating evolutionary rates of synonymous and amino acid substitutions from homologous nucleotide sequences and its application. J. Mol. Evol. 1980;16:23–36.
    1. Morley A.A. The somatic mutation theory of ageing. Mutat. Res. 1995;338:19–23.
    1. Mularoni L., Sabarinathan R., Deu-Pons J., Gonzalez-Perez A., López-Bigas N. OncodriveFML: a general framework to identify coding and non-coding regions with cancer driver mutations. Genome Biol. 2016;17:128.
    1. Nei M., Gojobori T. Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol. Biol. Evol. 1986;3:418–426.
    1. Nielsen R., Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics. 1998;148:929–936.
    1. Nik-Zainal S., Davies H., Staaf J., Ramakrishna M., Glodzik D., Zou X., Martincorena I., Alexandrov L.B., Martin S., Wedge D.C. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature. 2016;534:47–54.
    1. Nordling C.O. A new theory on cancer-inducing mechanism. Br. J. Cancer. 1953;7:68–72.
    1. Nowell P.C. The clonal evolution of tumor cell populations. Science. 1976;194:23–28.
    1. Ostrow S.L., Barshir R., DeGregori J., Yeger-Lotem E., Hershberg R. Cancer evolution is associated with pervasive positive selection on globally expressed genes. PLoS Genet. 2014;10:e1004239.
    1. Pleasance E.D., Cheetham R.K., Stephens P.J., McBride D.J., Humphray S.J., Greenman C.D., Varela I., Lin M.L., Ordóñez G.R., Bignell G.R. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature. 2010;463:191–196.
    1. Pleasance E.D., Stephens P.J., O’Meara S., McBride D.J., Meynert A., Jones D., Lin M.L., Beare D., Lau K.W., Greenman C. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature. 2010;463:184–190.
    1. Polak P., Karlić R., Koren A., Thurman R., Sandstrom R., Lawrence M., Reynolds A., Rynes E., Vlahoviček K., Stamatoyannopoulos J.A., Sunyaev S.R. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015;518:360–364.
    1. Raine K.M., Hinton J., Butler A.P., Teague J.W., Davies H., Tarpey P., Nik-Zainal S., Campbell P.J. cgpPindel: identifying somatically acquired insertion and deletion events from paired end sequencing. Curr. Protoc. Bioinformatics. 2015;52:15.17.11–15.17.12.
    1. Raine K.M., Van Loo P., Wedge D.C., Jones D., Menzies A., Butler A.P., Teague J.W., Tarpey P., Nik-Zainal S., Campbell P.J. ascatNgs: Identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr Protoc Bioinformatics. 2016;56:15.9.1–15.9.17.
    1. Rajasagi M., Shukla S.A., Fritsch E.F., Keskin D.B., DeLuca D., Carmona E., Zhang W., Sougnez C., Cibulskis K., Sidney J. Systematic identification of personal tumor-specific neoantigens in chronic lymphocytic leukemia. Blood. 2014;124:453–462.
    1. Rayner E., van Gool I.C., Palles C., Kearsey S.E., Bosse T., Tomlinson I., Church D.N. A panoply of errors: polymerase proofreading domain mutations in cancer. Nat. Rev. Cancer. 2016;16:71–81.
    1. Rooney M.S., Shukla S.A., Wu C.J., Getz G., Hacohen N. Molecular and genetic properties of tumors associated with local immune cytolytic activity. Cell. 2015;160:48–61.
    1. Rubio-Perez C., Tamborero D., Schroeder M.P., Antolín A.A., Deu-Pons J., Perez-Llamas C., Mestres J., Gonzalez-Perez A., Lopez-Bigas N. In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. Cancer Cell. 2015;27:382–396.
    1. Schuster-Böckler B., Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012;488:504–507.
    1. Sherry S.T., Ward M.H., Kholodov M., Baker J., Phan L., Smigielski E.M., Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–311.
    1. Shlien A., Campbell B.B., de Borja R., Alexandrov L.B., Merico D., Wedge D., Van Loo P., Tarpey P.S., Coupland P., Behjati S., Biallelic Mismatch Repair Deficiency Consortium Combined hereditary and somatic mutations of replication error repair genes result in rapid onset of ultra-hypermutated cancers. Nat. Genet. 2015;47:257–262.
    1. Strønen E., Toebes M., Kelderman S., van Buuren M.M., Yang W., van Rooij N., Donia M., Böschen M.L., Lund-Johansen F., Olweus J., Schumacher T.N. Targeting of cancer neoantigens with donor-derived T cell receptor repertoires. Science. 2016;352:1337–1341.
    1. Supek F., Miñana B., Valcárcel J., Gabaldón T., Lehner B. Synonymous mutations frequently act as driver mutations in human cancers. Cell. 2014;156:1324–1335.
    1. Tomasetti C., Marchionni L., Nowak M.A., Parmigiani G., Vogelstein B. Only three driver gene mutations are required for the development of lung and colorectal cancers. Proc. Natl. Acad. Sci. USA. 2015;112:118–123.
    1. Van den Eynden J., Basu S., Larsson E. Somatic mutation patterns in hemizygous genomic regions unveil purifying selection during tumor evolution. PLoS Genet. 2016;12:e1006506.
    1. Van Loo P., Nordgard S.H., Lingjærde O.C., Russnes H.G., Rye I.H., Sun W., Weigman V.J., Marynen P., Zetterberg A., Naume B. Allele-specific copy number analysis of tumors. Proc. Natl. Acad. Sci. USA. 2010;107:16910–16915.
    1. Vogelstein B., Papadopoulos N., Velculescu V.E., Zhou S., Diaz L.A., Jr., Kinzler K.W. Cancer genome landscapes. Science. 2013;339:1546–1558.
    1. Wang K., Li M., Hadley D., Liu R., Glessner J., Grant S.F., Hakonarson H., Bucan M. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 2007;17:1665–1674.
    1. Welch J.S., Ley T.J., Link D.C., Miller C.A., Larson D.E., Koboldt D.C., Wartman L.D., Lamprecht T.L., Liu F., Xia J. The origin and evolution of mutations in acute myeloid leukemia. Cell. 2012;150:264–278.
    1. Wong C.C., Martincorena I., Rust A.G., Rashid M., Alifrangis C., Alexandrov L.B., Tiffen J.C., Kober C., Green A.R., Massie C.E., Chronic Myeloid Disorders Working Group of the International Cancer Genome Consortium Inactivating CUX1 mutations promote tumorigenesis. Nat. Genet. 2014;46:33–38.
    1. Xie M., Lu C., Wang J., McLellan M.D., Johnson K.J., Wendl M.C., McMichael J.F., Schmidt H.K., Yellapantula V., Miller C.A. Age-related mutations associated with clonal hematopoietic expansion and malignancies. Nat. Med. 2014;20:1472–1478.
    1. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 2007;24:1586–1591.
    1. Yang Z., Bielawski J.P. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 2000;15:496–503.
    1. Yang Z., Nielsen R. Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol. Biol. Evol. 2000;17:32–43.
    1. Yang Z., Ro S., Rannala B. Likelihood models of somatic mutation and codon substitution in cancer genes. Genetics. 2003;165:695–705.
    1. Yates L.R., Gerstung M., Knappskog S., Desmedt C., Gundem G., Van Loo P., Aas T., Alexandrov L.B., Larsimont D., Davies H. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 2015;21:751–759.

Source: PubMed

3
購読する