Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin

Katherine A Hoadley, Christina Yau, Denise M Wolf, Andrew D Cherniack, David Tamborero, Sam Ng, Max D M Leiserson, Beifang Niu, Michael D McLellan, Vladislav Uzunangelov, Jiashan Zhang, Cyriac Kandoth, Rehan Akbani, Hui Shen, Larsson Omberg, Andy Chu, Adam A Margolin, Laura J Van't Veer, Nuria Lopez-Bigas, Peter W Laird, Benjamin J Raphael, Li Ding, A Gordon Robertson, Lauren A Byers, Gordon B Mills, John N Weinstein, Carter Van Waes, Zhong Chen, Eric A Collisson, Cancer Genome Atlas Research Network, Christopher C Benz, Charles M Perou, Joshua M Stuart, Katherine A Hoadley, Christina Yau, Denise M Wolf, Andrew D Cherniack, David Tamborero, Sam Ng, Max D M Leiserson, Beifang Niu, Michael D McLellan, Vladislav Uzunangelov, Jiashan Zhang, Cyriac Kandoth, Rehan Akbani, Hui Shen, Larsson Omberg, Andy Chu, Adam A Margolin, Laura J Van't Veer, Nuria Lopez-Bigas, Peter W Laird, Benjamin J Raphael, Li Ding, A Gordon Robertson, Lauren A Byers, Gordon B Mills, John N Weinstein, Carter Van Waes, Zhong Chen, Eric A Collisson, Cancer Genome Atlas Research Network, Christopher C Benz, Charles M Perou, Joshua M Stuart

Abstract

Recent genomic analyses of pathologically defined tumor types identify "within-a-tissue" disease subtypes. However, the extent to which genomic signatures are shared across tissues is still unclear. We performed an integrative analysis using five genome-wide platforms and one proteomic platform on 3,527 specimens from 12 cancer types, revealing a unified classification into 11 major subtypes. Five subtypes were nearly identical to their tissue-of-origin counterparts, but several distinct cancer types were found to converge into common subtypes. Lung squamous, head and neck, and a subset of bladder cancers coalesced into one subtype typified by TP53 alterations, TP63 amplifications, and high expression of immune and proliferation pathway genes. Of note, bladder cancers split into three pan-cancer subtypes. The multiplatform classification, while correlated with tissue-of-origin, provides independent information for predicting clinical outcomes. All data sets are available for data-mining from a unified resource to support further biological discoveries and insights into novel therapeutic strategies.

Copyright © 2014 Elsevier Inc. All rights reserved.

Figures

Figure 1. Integrated Cluster-Of-Cluster Assignments analysis reveals…
Figure 1. Integrated Cluster-Of-Cluster Assignments analysis reveals 11 major subtypes (see also Supplemental Figures S1-3 and Data Files S1-3)
A) Integration of subtype classifications from 5 “omic” platforms resulted in the identification of 11 major groups/subtypes from 12 pathologically defined cancer types. The groups are identified by number and color in the second bar, with the tissue of origin specified in the top bar. The matrix of individual “omic” platform type classification/subtype schemes was clustered, and each data type is represented by a different color: copy number=black, DNA methylation=purple, miRNA=blue, mRNA=red and RPPA=green. B) Mutation status for each of 10 Significantly Mutated Genes coded as: wild-type=white, mutant=red, missing data=gray. C) Copy number status for each of 9 important genes: amplified=red, deleted=blue, copy number neutral=white and missing data=gray. The color-coding schema is shown to the right. D) Overall survival (OS) of COCA subtypes by Kaplan-Meier plot. COCA subtypes are highly correlated with overall survival outcomes. E) The log-likelihood ratio (LR) statistic was estimated as we added clinical variables, COCA subtype, or tissue type information to a cox proportional hazards model. Clinical variables included age at diagnosis, tumor size, node status and metastasis status. The change in LR statistic as features were added to the model was assessed for significance by chi-square analysis. The set of samples was limited to the set of tumor types that did not have a one-to-one relationship with a COCA subtype: BLCA, BRCA, COAD, HNSC, LUAD, LUSC, and READ in COCA clusters COCA1 – LUAD-enriched, COCA2-Squamous, COCA3-BRCA/Luminal, COCA4-BRCA/Basal-like, COCA7-COAD/READ and COCA8-BLCA. First bar “A” shows results of adding tissue-of-origin to clinical variables already part of the model, followed by a variable representing the COCA subtyping; bar “B” shows results when COCA is first added on to clinical variables, and then tissue-type is added. In each case the increase in the ability to predict OS was in terms of the LR.
Figure 2. Genomic determinants of the Integrative…
Figure 2. Genomic determinants of the Integrative COCA Subtypes (see also Supplemental Figure S4 and Tables S2-3)
A. Genes from the high-confidence list of drivers (Tamborero et al., 2013) found to be mutated at a different rate within one COCA subtype compared outside it based on a two-tailed Fisher’s exact test. Mutation frequency enrichment, red to orange; genes with mutations equaling the background rate, yellow; genes with no observed mutations in a subtype, white. Displayed are top-ranked genes in terms of significant mutation enrichment (FDR<1%) in at least one COCA subtype. B. Somatic copy number alterations (SCNAs) in Integrative Clusters. SCNAs in tumors (horizontal axis) are plotted along chromosomal locations (vertical axis). The heatmap shows the presence of amplifications (red) and deletions (blue) throughout the genome. The color strip along the top indicates integrative COCA cluster membership; the number in parentheses indicates % of samples in a COCA subtype with TP53 mutation. COCA subtypes are ordered from highest TP53 mutant percentage to lowest. C. Range of copy number segments in tumors within each Integrative Cluster. The box and whisker plots show the middle quartiles and the minimum and maximum number of segments in each cluster group.
Figure 3. Subtype-specific patterns of gene-program and…
Figure 3. Subtype-specific patterns of gene-program and selected pathway expression characterizing each Pan-Cancer-12 COCA subtype (see also Supplemental Figures S5-6, Table S4, and Data File S5)
The heat map shows integrative subtypes in numerical order. Gene programs (top) and pathway signatures from PARADIGM (bottom) were clustered separately from each other. Red-blue intensities reflect the means of the scores (red=high, white=average, blue=low).
Figure 4. Genomic determinants of the C2-Squamous-like…
Figure 4. Genomic determinants of the C2-Squamous-like COCA subtype (see also Supplemental Figure S7 and Table S5)
A) SCNAs for the C2-Squamous-like subtype are shown, highlighting the importance of 3q26 gains across the different tissue-of-origin samples. B. Selected genes from 291 high-confidence driver (HCD) genes (Tamborero et al., 2013) mutated in > 5% of C2-Squamous-like samples and comparable in frequency in other subtypes. Samples with protein-affecting mutations in those genes are shown in green. C. HCD genes (as in panel B) with mutation frequency significantly higher in C2-Squamous-like tumors relative to others (stated at p<0.01 according to Fisher’s exact test with FDR correction). The method used corrections for imbalance in the number of samples from different tissues (see Supplemental Text Section 8). D. Two sub-networks of mutated pathways identified by an updated HotNet algorithm analysis using HINT interactions (see Supplemental Text) as mutated in at least 20% of the samples of the C2-Squamous-like subtype (cluster 2). Pie charts indicate interactions among the proteins in each subnetwork. Each gene (node) is colored by wedges whose size indicates the relative proportion of the gene’s mutations that are in samples from each integrated subtype. To the right of the pie charts is a gene-by-sample mutation matrix representing the mutation status of each gene across all Squamous-like samples. Full ticks represent SNVs, downticks represent deletions and upticks represent amplifications. The color of each tick indicates tissue-of-origin type, with gray indicating no mutation in the corresponding sample.
Figure 5. Comparison of molecular characteristics of…
Figure 5. Comparison of molecular characteristics of C2-Squamous-like, C4-BRCA/Basal and C9-OV (ovarian) subtypes reveals differences in TP63 and TP53 signaling (see also Supplemental Figure S7, Table S5, and Data File S4)
A) Relative significance of TP63 network activation within the C2-Squamous-like and C4-BRCA/Basal subtypes. The network neighbors surrounding the TAp63γ and ΔNp63α tetramer complexes that show significant activation (or inactivation) within the C2-Squamous-like and/or C4-BRCA/Basal subtypes relative to all other cases were visualized using Cytoscape (Shannon et al., 2003). Node shape reflects relative significance in the one-versus-all comparison (square: more significant in C2-Squamous-like, triangle: more significant in C4-BRCA/Basal). Node color indicates relative activity (red: activated in C2 and C4, blue: inactivated in C2 and C4, purple: activated in C2 but inactivated in C4, white: activated or inactivated in only one subtype). B) Box plot of isoform-specific levels of TP63 and TP73 within three of the TP53-frequently mutated COCA subtypes (C2-Squamous-like, C4-BRCA/Basal, and C9-OV). C) CircleMap of PARADIGM-Shift differences associated with TP53 mutations within the C2, C4 and C9 COCA subtypes. Samples were ordered first by integrative subtype membership (innermost ring), then by TP53 mutation status (second ring), and finally by P-Shift (outer ring, indicating TP53 activity). The GISTIC score (indicating CNV), mRNA expression level, PARADIGM upstream and downstream activities are shown in the third, fourth, fifth and sixth rings, respectively. Red-blue color intensity reflects magnitude (red: positive, blue: negative). TP53-truncating mutants are highlighted (black outlined wedge), and the mean P-shift scores of the truncating mutants are shown. Negative P-Shift scores (outer ring blue) predict loss of function (LOF). D) Unsupervised clustering of C2-Squamous-like, C4-BRCA/basal, and C9-OV cancers based on the expression patterns of 33 published TP53-related gene signatures. Sample subtype assignment (pink: C2-Squamous-like, blue: C4-BRCA/basal, purple: C9-OV) and TP53 mutation status (wild type: white, truncating: black, missense: grey) are indicated in the column color bar. Heatmap red-blue color intensity reflects magnitude (red: positive, white, average: blue: negative). See Supplemental Data File S4 (syn2491513) for complete list.
Figure 6. Divergence of the bladder cancer…
Figure 6. Divergence of the bladder cancer samples across multiple COCA subtypes (see also Supplemental Figure S8 and Table S6)
A) Kaplan-Meier survival analysis of bladder cancers within the C1-LUAD-enriched, C2-Squamous-like, and C8-BLCA subtypes. B) Heatmap of 17 proteins expressed at significantly different levels within the C2-Squamous-like relative to the C8-BLCA bladder cancer samples. Samples are arranged along the column by subtype (pink: C2, light blue: C8); and protein data are ordered along the rows by clustering. Rainbow color scale reflects magnitude (red: high, green: average, blue: low). C) HCD genes with differential mutation frequencies among the bladder samples clustered in COCA subtypes C1, C2 and C8. Differential frequencies reflect frequencies within, relative to frequencies outside of, the COCA subtype. D) Heatmap of 11 gene programs showing significant differential expression between the C2 and C8 bladder cancers. Samples are arranged along the column by subtype (pink: C2, light blue: C8), and gene programs are ordered along the rows by clustering. Red-blue color scale reflects magnitude (red: high, blue: low). E) PARADIGM sub-network of immune-related pathway biomarkers activated in C2 bladder cancers relative to the C8 subtype. Red-blue color scale represents relative activation (red: higher in C2, blue: higher in C8). Node size reflects relative significance, and node shape denotes feature type (diamond: multi-protein complex, inverted v: cellular process, circle: genes, square: gene family). Color of an edge reflects type of interaction within the PARADIGM SuperPathway (purple arrows: activation, green T: inhibition).

Source: PubMed

3
Subscribe