A molecular portrait of microsatellite instability across multiple cancers

Isidro Cortes-Ciriano, Sejoon Lee, Woong-Yang Park, Tae-Min Kim, Peter J Park, Isidro Cortes-Ciriano, Sejoon Lee, Woong-Yang Park, Tae-Min Kim, Peter J Park

Abstract

Microsatellite instability (MSI) refers to the hypermutability of short repetitive sequences in the genome caused by impaired DNA mismatch repair. Although MSI has been studied for decades, large amounts of sequencing data now available allows us to examine the molecular fingerprints of MSI in greater detail. Here, we analyse ∼8,000 exomes and ∼1,000 whole genomes of cancer patients across 23 cancer types. Our analysis reveals that the frequency of MSI events is highly variable within and across tumour types. We also identify genes in DNA repair and oncogenic pathways recurrently subject to MSI and uncover non-coding loci that frequently display MSI. Finally, we propose a highly accurate exome-based predictive model for the MSI phenotype. These results advance our understanding of the genomic drivers and consequences of MSI, and our comprehensive catalogue of tumour-type-specific MSI loci will enable panel-based MSI testing to identify patients who are likely to benefit from immunotherapy.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1. Schematic overview of the MSI…
Figure 1. Schematic overview of the MSI calling pipeline.
(a) A reference set of exonic and genome-wide MS repeats was assembled from the human reference genome hg19. The sequencing reads spanning each MS repeat and at least 2 base pairs at each flanking side were extracted from the tumour and normal BAM files. This process was repeated for all MS repeats in the reference sets across all pairs of matched normal-tumour samples. The Kolmogorov–Smirnov test was used to evaluate whether the read length distributions from the normal and tumour samples differed significantly (FDR<0.05). The exonic and genome-wide MSI calls served to identify MS loci recurrently altered by MSI in MSI-H tumours, discover frequent frameshift mutations and to predict MSI status. (b) Landscape of somatic MSI in MSI-H tumours. MSI events (frameshift and in-frame), deleterious SNV (missense, nonsense and splice site) and indel (frameshift) rates in 190 MSI-H exomes. Samples harbouring hypermethylation of the MLH1 promoter are denoted by blue squares. Deleterious germline and somatic mutations (that is, missense, nonsense, splice site and frameshift) are depicted in black and red, respectively, whereas frameshfit MSI events are shown in green. Black arrows mark patients with germline and somatic mutations in MMR genes. (c) Germline and somatic mutations in MMR genes, POLE and POLD1 in MSS, MSI-L and MSI-H tumours. The heatmap and the cell labels report the number and percentage of samples in each category harbouring mutations, respectively.
Figure 2. MS loci recurrently altered by…
Figure 2. MS loci recurrently altered by MSI.
(a) Coding MSI loci recurrently targeted by frameshift MSI in CRC (COAD and READ), STAD and UCEC MSI-H tumours. The heatmap shows the fraction of CRC, STAD and UCEC MSI-H tumours containing frameshift MSI events in MS loci located within the coding sequence of the genes indicated on the x axis. The total count of frameshift MSI events at these loci is depicted in the above barplot. The full list of MS loci recurrently altered by frameshift MSI is given in Supplementary Data 4. Similarly shown for genes with frequent 3′ UTR (b) and 5′ UTR (c) MSI events in three MSI-prone tumour types.
Figure 3. Pan-cancer landscape of genome-wide MSI.
Figure 3. Pan-cancer landscape of genome-wide MSI.
(a) The first panel shows the number of MSI events across 708 whole genomes, stratified by the length of the repeat unit. The second and third panels report the MSI status and the total count of SNVs, respectively. The fourth panel shows the distribution of MSI events across the genome. (b) Landscape of MSI in mitochondrial DNA across 308 COAD, STAD and UCEC low-pass whole genomes. MSI events, including frameshift and in-frame mutations, are shown in black.
Figure 4. MS repeats recurrently altered by…
Figure 4. MS repeats recurrently altered by MSI in MSI-H tumours.
(a) The barplots report the number of COAD, STAD and UCEC tumours harbouring MSI events at the loci indicated in the central panel. This analysis examined 190 MSI-H, 118 MSI-L and 522 MSS exomes. (b) The recurrence analysis was extended to 25 MSI-H, 19 MSI-L and 105 MSS whole genomes. Genomic coordinates in a,b indicate the location of the MSI repeats in the hg19 assembly of the human genome.
Figure 5. Distribution of the number of…
Figure 5. Distribution of the number of MSI and prediction of MSI status.
Distribution of the number of MSI (a) and frameshift MSI events (b) in MSI-H and MSS (also including MSI-L) tumours. Correlation between the number of SNV and MSI events in exomes (c) and whole genomes (d). Prediction of MSI status from exome-sequencing data using conformal prediction and random forest models (e). Initially, we used 10-fold cross-validation to calculate predictions for all training examples. The fraction of trees in the forest voting for each class was recorded, and subsequently sorted in increasing order to define one Mondrian class list per category. (f) The model which was trained on all training data was applied to 7,089 exomes. For each of these samples, the algorithm recorded the fraction of trees voting for each class. The P value for each class was calculated as the number of elements in the corresponding Mondrian class list higher than the vote for that class (for example, 6 out of 7 in the toy example depicted in Fig. 5f) divided by the number of elements in that list. If the P value for a given class is above the significance, ɛ, the sample is predicted to belong to that category. The confidence level (1−ɛ) indicates the minimum fraction of predictions that are correct. (g) Number of samples predicted as MSI-H, MSS and uncertain (both: cases in which the classifier does not have enough power to confidently assign a single category; none: cases in which when the samples that are outside the applicability domain of the model). Here, the confidence level was set to 0.75. (h) Landscape of MSI for the 91 exomes predicted as MSI-H at a confidence level of 0.75. Samples predicted to be MSI-H at a confidence level of 0.80 are marked with black arrows.

References

    1. Aaltonen L. A. et al.. Clues to the pathogenesis of familial colorectal cancer. Science 260, 812–816 (1993).
    1. Hendriks Y. M. C. et al.. Diagnostic approach and management of Lynch syndrome (hereditary nonpolyposis colorectal carcinoma): a guide for clinicians. CA. Cancer J. Clin. 56, 213–225 (2006).
    1. Herman J. G. et al.. Incidence and functional consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proc. Natl Acad. Sci. USA 95, 6870–6875 (1998).
    1. Ligtenberg M. J. L. et al.. Heritable somatic methylation and inactivation of MSH2 in families with Lynch syndrome due to deletion of the 3′ exons of TACSTD1. Nat. Genet. 41, 112–117 (2009).
    1. Volinia S. et al.. A microRNA expression signature of human solid tumors defines cancer gene targets. Proc. Natl Acad. Sci. USA 103, 2257–2261 (2006).
    1. Jiricny J. The multifaceted mismatch-repair system. Nat. Rev. Mol. Cell Biol. 7, 335–346 (2006).
    1. Vilar E. & Gruber S. B. Microsatellite instability in colorectal cancer-the stable evidence. Nat. Rev. Clin. Oncol. 7, 153–162 (2010).
    1. Dudley J. C., Lin M.-T., Le D. T. & Eshleman J. R. Microsatellite instability as a biomarker for PD-1 blockade. Clin. Cancer Res. 22, 813–820 (2016).
    1. Gryfe R. et al.. Tumor microsatellite instability and clinical outcome in young patients with colorectal cancer. N. Engl. J. Med. 342, 69–77 (2000).
    1. Bodmer W., Bishop T. & Karran P. Genetic steps in colorectal cancer. Nat. Genet. 6, 217–219 (1994).
    1. Rizvi N. A. et al.. Cancer immunology. Mutational landscape determines sensitivity to PD-1 blockade in non-small cell lung cancer. Science 348, 124–128 (2015).
    1. Snyder A. et al.. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N. Engl. J. Med. 371, 2189–2199 (2014).
    1. Le D. T. et al.. PD-1 blockade in tumors with mismatch-repair deficiency. N. Engl. J. Med. 372, 2509–2520 (2015).
    1. Weinstein J. N. et al.. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
    1. Kim T.-M., Laird P. W. & Park P. J. The landscape of microsatellite instability in colorectal and endometrial cancer genomes. Cell 155, 858–868 (2013).
    1. Hause R. J., Pritchard C. C., Shendure J. & Salipante S. J. Classification and characterization of microsatellite instability across 18 cancer types. Nat. Med. 22, 1342–1350 (2016).
    1. Bass A. J. et al.. Comprehensive molecular characterization of gastric adenocarcinoma. Nature 513, 202–209 (2014).
    1. The Cancer Genome Atlas Research Network. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
    1. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330–337 (2012).
    1. Lawrence M. S. et al.. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
    1. Kang J., D'Andrea A. D. & Kozono D. A DNA repair pathway-focused score for prediction of outcomes in ovarian cancer treated with platinum-based chemotherapy. J. Natl Cancer Inst. 104, 670–681 (2012).
    1. Lynch H. T., Snyder C. L., Shaw T. G., Heinen C. D. & Hitchins M. P. Milestones of Lynch syndrome: 1895–2015. Nat. Rev. Cancer 15, 181–194 (2015).
    1. Hayward B. E. et al.. Extensive gene conversion at the PMS2 DNA mismatch repair locus. Hum. Mutat. 28, 424–430 (2007).
    1. Brogna S. & Wen J. Nonsense-mediated mRNA decay (NMD) mechanisms. Nat. Struct. Mol. Biol. 16, 107–113 (2009).
    1. de la Chapelle A. Genetic predisposition to colorectal cancer. Nat. Rev. Cancer 4, 769–780 (2004).
    1. Kandoth C. et al.. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
    1. Huang J. et al.. APC mutations in colorectal tumours with mismatch repair deficiency. Proc. Natl Acad. Sci. USA 93, 9049–9054 (1996).
    1. Giannakis M. et al.. RNF43 is frequently mutated in colorectal and endometrial cancers. Nat. Genet. 46, 1264–1266 (2014).
    1. Tougeron D. et al.. Tumor-infiltrating lymphocytes in colorectal cancers with microsatellite instability are correlated with the number and spectrum of frameshift mutations. Mod. Pathol. 22, 1186–1195 (2009).
    1. Yoshida Y. et al.. ALPK2 is crucial for luminal apoptosis and DNA repair-related gene expression in a three-dimensional colonic-crypt model. Anticancer Res. 32, 2301–2308 (2012).
    1. Katainen R. et al.. CTCF/cohesin-binding sites are frequently mutated in cancer. Nat. Genet. 47, 818–821 (2015).
    1. Weinhold N., Jacobsen A., Schultz N., Sander C. & Lee W. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat. Genet. 46, 1160–1165 (2014).
    1. Aran D., Sirota M. & Butte A. J. Systematic pan-cancer analysis of tumour purity. Nat. Commun. 6, 8971 (2015).
    1. Mayr C. & Bartel D. P. Widespread shortening of 3′UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell 138, 673–684 (2009).
    1. Yoon K. et al.. Comprehensive genome- and transcriptome-wide analyses of mutations associated with microsatellite instability in Korean gastric cancers. Genome Res. 23, 1109–1117 (2013).
    1. Pal T., Permuth-Wey J., Kumar A. & Sellers T. A. Systematic review and meta-analysis of ovarian cancers: estimation of microsatellite-high frequency and characterization of mismatch repair deficient tumor histology. Clin. Cancer Res. 14, 6847–6854 (2008).
    1. Altavilla G., Fassan M., Busatto G., Orsolan M. & Giacomelli L. Microsatellite instability and hMLH1 and hMSH2 expression in renal tumors. Oncol. Rep. 24, 927–932 (2010).
    1. Watson M. M. C., Berg M. & Søreide K. Prevalence and implications of elevated microsatellite alterations at selected tetranucleotides in cancer. Br. J. Cancer 111, 823–827 (2014).
    1. Ernst J. & Kellis M. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33, 364–376 (2015).
    1. Larman T. C. et al.. Spectrum of somatic mitochondrial mutations in five cancers. Proc. Natl Acad. Sci. USA 109, 14087–14091 (2012).
    1. Schwartz S. & Perucho M. Somatic mutations in mitochondrial DNA do not associate with nuclear microsatellite instability in gastrointestinal cancer. Gastroenterology 119, 1806–1807 (2000).
    1. Breiman L. Random forests. Mach. Learn. 45, 5–32 (2001).
    1. Norinder U., Carlsson L., Boyer S. & Eklund M. Introducing conformal prediction in predictive modeling. a transparent and flexible alternative to applicability domain determination. J. Chem. Inf. Model. 54, 1596–1603 (2014).
    1. Murphy M. A. & Wentzensen N. Frequency of mismatch repair deficiency in ovarian cancer: a systematic review. Int. J. Cancer 129, 1914–1922 (2011).
    1. Supek F. & Lehner B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
    1. Kuismanen S. A. et al.. Endometrial and colorectal tumors from patients with hereditary nonpolyposis colon cancer display different patterns of microsatellite instability. Am. J. Pathol. 160, 1953–1958 (2002).
    1. Miller S. L., Antico G., Raghunath P. N., Tomaszewski J. E. & Clevenger C. V. Nek3 kinase regulates prolactin-mediated cytoskeletal reorganization and motility of breast cancer cells. Oncogene 26, 4668–4678 (2007).
    1. Niu B. et al.. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).
    1. Huang M. N. et al.. MSIseq: software for assessing microsatellite instability from catalogs of somatic mutations. Sci. Rep. 5, 13321 (2015).
    1. Dohm J. C., Lottaz C., Borodina T. & Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res. 36, e105–e105 (2008).
    1. Bacher J. W. et al.. Development of a fluorescent multiplex assay for detection of MSI-high tumors. Dis. Markers 20, 237–250 (2004).
    1. Cibulskis K. et al.. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
    1. 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
    1. DePristo M. A. et al.. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
    1. Wang K., Li M. & Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).
    1. Consortium R. E. et al.. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
    1. Kuhn M. & Johnson K. Applied Predictive Modeling Springer (2013).

Source: PubMed

3
Iratkozz fel