Pan-cancer multi-omics analysis and orthogonal experimental assessment of epigenetic driver genes

Andrea Halaburkova, Vincent Cahais, Alexei Novoloaca, Mariana Gomes da Silva Araujo, Rita Khoueiry, Akram Ghantous, Zdenko Herceg, Andrea Halaburkova, Vincent Cahais, Alexei Novoloaca, Mariana Gomes da Silva Araujo, Rita Khoueiry, Akram Ghantous, Zdenko Herceg

Abstract

The recent identification of recurrently mutated epigenetic regulator genes (ERGs) supports their critical role in tumorigenesis. We conducted a pan-cancer analysis integrating (epi)genome, transcriptome, and DNA methylome alterations in a curated list of 426 ERGs across 33 cancer types, comprising 10,845 tumor and 730 normal tissues. We found that, in addition to mutations, copy number alterations in ERGs were more frequent than previously anticipated and tightly linked to expression aberrations. Novel bioinformatics approaches, integrating the strengths of various driver prediction and multi-omics algorithms, and an orthogonal in vitro screen (CRISPR-Cas9) targeting all ERGs revealed genes with driver roles within and across malignancies and shared driver mechanisms operating across multiple cancer types and hallmarks. This is the largest and most comprehensive analysis thus far; it is also the first experimental effort to specifically identify ERG drivers (epidrivers) and characterize their deregulation and functional impact in oncogenic processes.

© 2020 Halaburkova et al.; Published by Cold Spring Harbor Laboratory Press.

Figures

Figure 1.
Figure 1.
Study design. (A) A four-stage approach to identify and characterize ERGs with cancer driver potential. (B) The compendium of ERGs curated and analyzed, comprising 426 genes classified into histone modifiers, DNA methylation regulators, chromatin remodeling factors (ChRC), helicases, and other chromatin modifiers (some of which were further divided into subgroups based on function or their presence in molecular complexes). Histone acetylation, histone methylation, and DNA methylation modifiers are further stratified each into “writers” (w), “editors” (e), and “readers” (r). (*) The histone modifying genes whose functions are not well characterized and which were, therefore, assigned based on ENCODE ChIP sequencing data; (**) the histone modifying genes without assignment of residues in the histone tails.
Figure 2.
Figure 2.
Pan-cancer analysis of genetic alterations across ERG categories and classes. (A,B) The percentage of samples with genetic deregulation in ERGs (A) and the percentage of ERGs showing different types of genetic deregulation (B), by cancer type. ERGs are considered altered if at least 1% of samples harbor these genetic aberrations. (C) Proportion of samples with SNAs versus that with deletions (−1, −2) or amplifications (+1, +2) in ERGs for each cancer type. Each gene is represented by two dots (red and green) depicting amplified and deleted CNAs, respectively. (D) Circos plots showing the relative amount of deregulation in CNAs by chromosomal distribution in two representative cancer types (LUAD and THYM, characterized by high and low CNA burden, respectively). The level of CNAs for each ERG was calculated as the proportion of samples considering all types of CNAs (amplification = +1, +2 and deletion = −1, −2) in ERGs in each cancer type. (E,F) Box plots showing the percentage of samples with SNAs (E) and deep CNAs (F) by gene and by cancer type. The most deregulated ERGs are highlighted for each cancer type. (G,H) Heatmaps representing the top genetically deregulated genes showing SNAs (G) and CNAs (H) in at least 10% and 15%, respectively, of the samples for any cancer type. Only samples with deep CNAs were included. ERGs are grouped into functional categories as indicated. (I) The percentages of ERGs that show genetic alteration among all cancer types by functional groups. Genetic alterations: (SNA) single-nucleotide alteration, (amp) deep copy number amplification, (amp_SNA) deep amplification co-occurring with SNA, (del) deep copy number deletion, (del_SNA) deep deletion co-occurring with SNA, and (ma) multiple alterations. In cases in which both types of CNAs (amplification and deletion) of one gene were present in the samples, we reported in B and H the alteration that was at least twice as prevalent as the other; otherwise, the alteration was reported under the multiple alteration category.
Figure 3.
Figure 3.
RNA expression alterations of ERGs across cancer types, in relation to genetic and DNA methylome variations. (A) Multi-omics plot of SNA, CNA, and RNA expression alterations across ERGs and cancer types. Amplifications, deletions, and SNAs were annotated as described in Methods. The most deregulated ERGs in RNA expression (with the y-axis value above 10) are highlighted for each cancer type. (B,C) Circos plots showing Pearson's correlation between CNAs (B) or SNAs (C) and expression Z-scores in different cancer types across the chromosomal regions. Positive and negative correlations are indicated in orange and blue, respectively. Only ERGs with correlation (R2) > 30% and FDR < 0.05 in at least in one cancer type were considered for the analysis in B; the R2 limit was set to 10% in C. (D) Expression quantitative trait methylation (eQTMs) analysis showing Pearson correlation values (x-axis) between RNA (RSEM counts) and methylation (beta) levels of promoter CpGs for each ERG in different cancer types. The line bar indicates highly significant CpGs [−log(P-value) > 50]. Red, blue, and black dots represent CpGs with FDR < 0.05, P < 0.05, and P > 0.05, respectively. (E) Number of ERGs or all genes with differential RNA expression in tumor relative to adjacent normal tissues for each cancer type (|log FC| > 2 and FDR < 0.05). The star denotes a P-value < 0.05 by a two-sample test of proportions of up- versus down-regulation. (F) Heatmaps showing the most differentially expressed ERGs comparing tumor samples with adjacent normal tissues among cancer types. Only the top differently expressed ERGs with |log FC| > 3 and FDR < 0.05 are annotated. (G) Volcano plots showing differentially expressed ERGs in tumors relative to adjacent normal tissues. ERGs are shown in blue (|log FC| > 1), and the most deregulated ERGs with |log FC| > 3 are highlighted for each cancer type (FDR < 0.05). Sample sizes for each cancer type are indicated in A and E.
Figure 4.
Figure 4.
Characterization of ERG driver potentials. (A) Heatmap showing the ConsensusDriver scores (with values ranging from 1.5 to 7.5) as obtained by Bailey et al. (2018). ERGs with a score ≥ 1.5 in at least one cancer type are shown. The top 10 deep amplifications or deletions (green circles), SNAs (blue empty diamonds), or significant Z-score (purple crosses) of each cancer are overlapped onto the heatmap. (B) Significant (FDR < 0.05) co-occurrence and mutual exclusivity for ConsensusDriver ERGs in a pan-cancer analysis. The node size is proportional to both the number and thickness of its connections with other nodes. Blue and red edges represent co-occurrence (odds ratio [OR] > 1) and mutual exclusivity (OR < 1), respectively. The transparency of the edges indicates the average OR across cancer types, and their thickness is proportional to the number of cancer types in which the OR is significant. The co-occurrence filter was set to at least 5% of the samples per cancer type (Methods). (C) Heatmap of the Multi-Omics Driver scores of ERGs per cancer type. The ERGs shown represent a pooled set of the top three ERGs in each cancer type, as ranked by the mutli-omics driver score. (D) Top 100 ERGs by Pan-Cancer Driver score using SNA (5% of samples), CNA (5% of samples), and expression data (15% of samples with significant Z-score or FDR < 0,05 with log10FC > 1). Results are represented as bar plots counting the number of cancers in which a given gene has a particular genomic or expression alteration. From outer to inner track: (1, pink) SNAs; (2, green) CNAs; (3, purple) Z-score; (4, orange) log10FC. Inside the last track, co-occurrence or mutual exclusivity was calculated as in B, except that the co-occurrence filter was set to at least 10% of the samples per cancer type. Genes are aggregated by their functional features. (E) Significant co-occurrence for the top 100 ERGs by Pan-Cancer Driver score. Co-occurrence or mutual exclusivity was calculated as in D but ordered instead by chromosome number. (F) Spider pie chart showing enrichment of the 426 ERGs in pathways affecting the 10 hallmarks of cancer; the corresponding P-values and ORs are illustrated by green gradients and black spots, respectively. The names of ERGs overlapping with the four significantly enriched hallmarks are indicated.
Figure 5.
Figure 5.
CRISPR-Cas9 screen to perform orthogonal assessment of the driver potential of ERGs in EMT. (A) The screening strategy used to identify positive and negative regulators of EMT among ERGs. (B) Western blot analysis of Cas9 expression in A549 lung cancer cells. “Pool” represents a heterogeneous population of transduced and stably Cas9 expressing cells derived from the parental cells. Individual cell clones derived by cloning rings are numbered 1, 2, 5, 6, 7, 8, and 9. Actin beta was used to normalize for equal loading. (C) Validation of the transduction efficiency of the lentiviral CRISPR ERG library 10 d after puromycin selection using FACS compared with uninfected A549 cells. (D) Enrichment of vimentin-positive (VIM+) population analyzed by FACS after CRISPR ERG library transduction at day 14 after puromycin selection. (E) Validation of cell sorting for the enrichment of VIM+ population by FACS based on the fluorescent antibody EPCAM (EPCAM loss is associated with the mesenchymal cell state) of VIM+, vimentin-negative (VIM−), uninfected cell line, and negative control antibody IgG. (F) Confirmation of cell enrichment for VIM+ and VIM− fractions after sorting. FACS-sorted VIM+ and VIM− populations were grown in culture for 2 wk and analyzed by FACS after staining with cadherin 2 (also known as N-cadherin) antibodies. (G) The overlap of the top EMT-associated ERG gRNAs after Illumina MiSeq deep sequencing; the numbers are derived from two statistical methods (DESeq2 and edgeR) at days 14, 21, and 28 after transduction. (H) Heatmap showing the top ERGs based on enriched and depleted gRNAs at days 14, 21, and 28 after transduction compared with day 0. (I) Volcano plot of ERG gRNAs at day 28 after transduction. (J) Expression analysis by qRT-PCR of EMT markers (cadherin 1 [also known as E-cadherin], vimentin, and cadherin 2) on single targeted A549-VimCas9 clones following EP400 loss of function, relative to expression in the parental A549 Vim Cas9 cell line. (*) P < 0.05, indicates results of one-way ANOVA test. Error bars are SEM of n = 2. (K) Representative image of scratch assay performed on the parental cell line and three generated EP400 KO clones at day 0 and after 24 h (left). On the right, a graph plot showing percentage area closure 24 h after the scratch as averaged of at least six areas analyzed for each clone and for the parental cell line. Experiments were performed in duplicates. (L) Transwell migration assay showing increase of migration at 13 h for A549-Vim Cas9 EP400 KO Cl4 compared to the parental cell line. (KO) Knockout; (Cl) clone; (vim) vimentin. (M) An example of network analysis of selected top ERGs (EP400) associated with the EMT population, obtained with the GeneMANIA package. (N,O) The bar plots show the mutation frequency of EMT-specific ERGs (identified in the CRISPR-Cas9 screen) in clinical samples from nonmetastatic (M0) and metastatic (M1) subsets (based on the annotation of TCGA samples).
Figure 6.
Figure 6.
CRISPR-Cas9 screen to perform orthogonal assessment of the driver potential of ERGs in cancer cell proliferation. (A) The screening strategy used to identify regulators of cell proliferation among ERGs in both A549 and MCF10A cell lines/clones. (B,C, left) Venn diagrams showing the genes associated with significantly enriched (B) or depleted (C) gRNAs in the screens performed on A549 and MCF10A cells using edgeR analysis in CRISPRAnalyzeR. (B,C, right) Heatmaps showing the adjusted P-values of the commonly enriched (B) or depleted (C) gRNAs in both cell lines. Data are presented as –log10 (adjusted P-value). (D) KEGG pathway analysis performed on genes associated with commonly depleted gRNAs (left) and with commonly depleted and enriched gRNAs (right) in both cell lines. All pathways in red show P < 0.05.

References

    1. Ahuja N, Sharma AR, Baylin SB. 2016. Epigenetic therapeutics: a new weapon in the war against cancer. Annu Rev Med 67: 73–89. 10.1146/annurev-med-111314-035900
    1. Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D, Weerasinghe A, Colaprico A, Wendl MC, Kim J, Reardon B, et al. 2018. Comprehensive characterization of cancer driver genes and mutations. Cell 173: 371–385.e18. 10.1016/j.cell.2018.02.060
    1. Bell O, Tiwari VK, Thomä NH, Schübeler D. 2011. Determinants and dynamics of genome accessibility. Nat Rev Genet 12: 554–564. 10.1038/nrg3017
    1. Bertrand D, Drissler S, Chia BK, Koh JY, Li C, Suphavilai C, Tan IB, Nagarajan N. 2018. ConsensusDriver improves upon individual algorithms for predicting driver alterations in different cancer types and individual patients. Cancer Res 78: 290–301. 10.1158/0008-5472.CAN-17-1345
    1. Brien GL, Valerio DG, Armstrong SA. 2016. Exploiting the epigenome to control cancer-promoting gene-expression programs. Cancer Cell 29: 464–476. 10.1016/j.ccell.2016.03.007
    1. Chakravarty D, Gao J, Phillips SM, Kundra R, Zhang H, Wang J, Rudolph JE, Yaeger R, Soumerai T, Nissan MH, et al. 2017. OncoKB: a precision oncology knowledge base. JCO Precis Oncol 1: 1–16 10.1200/PO.17.00011
    1. Ding L, Bailey MH, Porta-Pardo E, Thorsson V, Colaprico A, Bertrand D, Gibbs DL, Weerasinghe A, Huang KL, Tokheim C, et al. 2018. Perspective on oncogenic processes at the end of the beginning of cancer genomics. Cell 173: 305–320.e10. 10.1016/j.cell.2018.03.033
    1. Gonzalez-Perez A, Jene-Sanz A, Lopez-Bigas N. 2013. The mutational landscape of chromatin regulatory factors across 4,623 tumor samples. Genome Biol 14: r106 10.1186/gb-2013-14-9-r106
    1. Gu Z, Gu L, Eils R, Schlesner M, Brors B. 2014. Circlize implements and enhances circular visualization in R. Bioinformatics 30: 2811–2812. 10.1093/bioinformatics/btu393
    1. Hanahan D, Weinberg AR. 2011. Hallmarks of cancer: the next generation. Cell 144: 646–674. 10.1016/j.cell.2011.02.013
    1. Hess JM, Bernards A, Kim J, Miller M, Taylor-Weiner A, Haradhvala NJ, Lawrence MS, Getz G. 2019. Passenger hotspot mutations in cancer. Cancer Cell 36: 288–301.e14. 10.1016/j.ccell.2019.08.002
    1. Jones PA, Issa JP, Baylin S. 2016. Targeting the cancer epigenome for therapy. Nat Rev Genet 17: 630–641. 10.1038/nrg.2016.93
    1. Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, Jones SJ, Marra MA. 2009. Circos: an information aesthetic for comparative genomics. Genome Res 19: 1639–1645. 10.1101/gr.092759.109
    1. Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A, et al. 2016. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res 44: W90–W97. 10.1093/nar/gkw377
    1. Lahouel K, Younes L, Danilova L, Giardiello FM, Hruban RH, Groopman J, Kinzler KW, Vogelstein B, Geman D, Tomasetti C. 2020. Revisiting the tumorigenesis timeline with a data-driven generative model. Proc Natl Acad Sci 117: 857–864. 10.1073/pnas.1914589117
    1. Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol 15: 550 10.1186/s13059-014-0550-8
    1. McCarthy DJ, Chen Y, Smyth GK. 2012. Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Nucleic Acids Res 40: 4288–4297. 10.1093/nar/gks042
    1. Meyerson M, Gabriel S, Getz G. 2010. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 11: 685–696. 10.1038/nrg2841
    1. Murr R, Loizou JI, Yang YG, Cuenin C, Li H, Wang ZQ, Herceg Z. 2006. Histone acetylation by Trrap–Tip60 modulates loading of repair proteins and repair of DNA double-strand breaks. Nat Cell Biol 8: 91–99. 10.1038/ncb1343
    1. Ng PK, Li J, Jeong KJ, Shao S, Chen H, Tsang YH, Sengupta S, Wang Z, Bhavana VH, Tran R, et al. 2018. Systematic functional annotation of somatic mutations in cancer. Cancer Cell 33: 450–462.e10. 10.1016/j.ccell.2018.01.021
    1. Parmigiani G, Boca S, Lin J, Kinzler KW, Velculescu V, Vogelstein B. 2009. Design and analysis issues in genome-wide somatic mutation studies of cancer. Genomics 93: 17–21. 10.1016/j.ygeno.2008.07.005
    1. Plass C, Pfister SM, Lindroth AM, Bogatyrova O, Claus R, Lichter P. 2013. Mutations in regulators of the epigenome and their connections to global chromatin patterns in cancer. Nat Rev Genet 14: 765–780. 10.1038/nrg3554
    1. Robinson MD, McCarthy DJ, Smyth GK. 2010. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140. 10.1093/bioinformatics/btp616
    1. Sawan C, Vaissière T, Murr R, Herceg Z. 2008. Epigenetic drivers and genetic passengers on the road to cancer. Mutat Res 642: 1–13. 10.1016/j.mrfmmm.2008.03.002
    1. Shen H, Laird PW. 2013. Interplay between the cancer genome and epigenome. Cell 153: 38–55. 10.1016/j.cell.2013.03.008
    1. Tam WL, Weinberg RA. 2013. The epigenetics of epithelial-mesenchymal plasticity in cancer. Nat Med 19: 1438–1449. 10.1038/nm.3336
    1. Timp W, Feinberg AP. 2013. Cancer as a dysregulated epigenome allowing cellular growth advantage at the expense of the host. Nat Rev Cancer 13: 497–510. 10.1038/nrc3486
    1. Vogelstein B, Papadopoulos N, Velculescu VE, Zhou S, Diaz LA Jr, Kinzler KW. 2013. Cancer genome landscapes. Science 339: 1546–1558. 10.1126/science.1235122
    1. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, Chao P, Franz M, Grouios C, Kazi F, Lopes CT, et al. 2010. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38: W214–W220. 10.1093/nar/gkq537
    1. Winter J, Breinig M, Heigwer F, Brügemann D, Leible S, Pelz O, Zhan T, Boutros M. 2016. Carpools: an R package for exploratory data analysis and documentation of pooled CRISPR/Cas9 screens. Bioinformatics 32: 632–634. 10.1093/bioinformatics/btv617
    1. Xu Y, Zhang S, Lin S, Guo Y, Deng W, Zhang Y, Xue Y. 2017. WERAM: a database of writers, erasers and readers of histone acetylation and methylation in eukaryotes. Nucleic Acids Res 45: D264–D270.
    1. Yang Z, Jones A, Widschwendter M, Teschendorff AE. 2015. An integrative pan-cancer-wide analysis of epigenetic enzymes reveals universal patterns of epigenomic deregulation in cancer. Genome Biol 16: 140 10.1186/s13059-015-0699-9
    1. Zack TI, Schumacher SE, Carter SL, Cherniack AD, Saksena G, Tabak B, Lawrence MS, Zhsng CZ, Wala J, Mermel CH, et al. 2013. Pan-cancer patterns of somatic copy number alteration. Nat Genet 45: 1134–1140. 10.1038/ng.2760
    1. Zhao M, Sun J, Zhao Z. 2013. TSGene: a web resource for tumor suppressor genes. Nucleic Acids Res 41: D970–D976. 10.1093/nar/gks937

Source: PubMed

3
Iratkozz fel