DNA methylation-based classification of central nervous system tumours

David Capper, David T W Jones, Martin Sill, Volker Hovestadt, Daniel Schrimpf, Dominik Sturm, Christian Koelsche, Felix Sahm, Lukas Chavez, David E Reuss, Annekathrin Kratz, Annika K Wefers, Kristin Huang, Kristian W Pajtler, Leonille Schweizer, Damian Stichel, Adriana Olar, Nils W Engel, Kerstin Lindenberg, Patrick N Harter, Anne K Braczynski, Karl H Plate, Hildegard Dohmen, Boyan K Garvalov, Roland Coras, Annett Hölsken, Ekkehard Hewer, Melanie Bewerunge-Hudler, Matthias Schick, Roger Fischer, Rudi Beschorner, Jens Schittenhelm, Ori Staszewski, Khalida Wani, Pascale Varlet, Melanie Pages, Petra Temming, Dietmar Lohmann, Florian Selt, Hendrik Witt, Till Milde, Olaf Witt, Eleonora Aronica, Felice Giangaspero, Elisabeth Rushing, Wolfram Scheurlen, Christoph Geisenberger, Fausto J Rodriguez, Albert Becker, Matthias Preusser, Christine Haberler, Rolf Bjerkvig, Jane Cryan, Michael Farrell, Martina Deckert, Jürgen Hench, Stephan Frank, Jonathan Serrano, Kasthuri Kannan, Aristotelis Tsirigos, Wolfgang Brück, Silvia Hofer, Stefanie Brehmer, Marcel Seiz-Rosenhagen, Daniel Hänggi, Volkmar Hans, Stephanie Rozsnoki, Jordan R Hansford, Patricia Kohlhof, Bjarne W Kristensen, Matt Lechner, Beatriz Lopes, Christian Mawrin, Ralf Ketter, Andreas Kulozik, Ziad Khatib, Frank Heppner, Arend Koch, Anne Jouvet, Catherine Keohane, Helmut Mühleisen, Wolf Mueller, Ute Pohl, Marco Prinz, Axel Benner, Marc Zapatka, Nicholas G Gottardo, Pablo Hernáiz Driever, Christof M Kramm, Hermann L Müller, Stefan Rutkowski, Katja von Hoff, Michael C Frühwald, Astrid Gnekow, Gudrun Fleischhack, Stephan Tippelt, Gabriele Calaminus, Camelia-Maria Monoranu, Arie Perry, Chris Jones, Thomas S Jacques, Bernhard Radlwimmer, Marco Gessi, Torsten Pietsch, Johannes Schramm, Gabriele Schackert, Manfred Westphal, Guido Reifenberger, Pieter Wesseling, Michael Weller, Vincent Peter Collins, Ingmar Blümcke, Martin Bendszus, Jürgen Debus, Annie Huang, Nada Jabado, Paul A Northcott, Werner Paulus, Amar Gajjar, Giles W Robinson, Michael D Taylor, Zane Jaunmuktane, Marina Ryzhova, Michael Platten, Andreas Unterberg, Wolfgang Wick, Matthias A Karajannis, Michel Mittelbronn, Till Acker, Christian Hartmann, Kenneth Aldape, Ulrich Schüller, Rolf Buslei, Peter Lichter, Marcel Kool, Christel Herold-Mende, David W Ellison, Martin Hasselblatt, Matija Snuderl, Sebastian Brandner, Andrey Korshunov, Andreas von Deimling, Stefan M Pfister, David Capper, David T W Jones, Martin Sill, Volker Hovestadt, Daniel Schrimpf, Dominik Sturm, Christian Koelsche, Felix Sahm, Lukas Chavez, David E Reuss, Annekathrin Kratz, Annika K Wefers, Kristin Huang, Kristian W Pajtler, Leonille Schweizer, Damian Stichel, Adriana Olar, Nils W Engel, Kerstin Lindenberg, Patrick N Harter, Anne K Braczynski, Karl H Plate, Hildegard Dohmen, Boyan K Garvalov, Roland Coras, Annett Hölsken, Ekkehard Hewer, Melanie Bewerunge-Hudler, Matthias Schick, Roger Fischer, Rudi Beschorner, Jens Schittenhelm, Ori Staszewski, Khalida Wani, Pascale Varlet, Melanie Pages, Petra Temming, Dietmar Lohmann, Florian Selt, Hendrik Witt, Till Milde, Olaf Witt, Eleonora Aronica, Felice Giangaspero, Elisabeth Rushing, Wolfram Scheurlen, Christoph Geisenberger, Fausto J Rodriguez, Albert Becker, Matthias Preusser, Christine Haberler, Rolf Bjerkvig, Jane Cryan, Michael Farrell, Martina Deckert, Jürgen Hench, Stephan Frank, Jonathan Serrano, Kasthuri Kannan, Aristotelis Tsirigos, Wolfgang Brück, Silvia Hofer, Stefanie Brehmer, Marcel Seiz-Rosenhagen, Daniel Hänggi, Volkmar Hans, Stephanie Rozsnoki, Jordan R Hansford, Patricia Kohlhof, Bjarne W Kristensen, Matt Lechner, Beatriz Lopes, Christian Mawrin, Ralf Ketter, Andreas Kulozik, Ziad Khatib, Frank Heppner, Arend Koch, Anne Jouvet, Catherine Keohane, Helmut Mühleisen, Wolf Mueller, Ute Pohl, Marco Prinz, Axel Benner, Marc Zapatka, Nicholas G Gottardo, Pablo Hernáiz Driever, Christof M Kramm, Hermann L Müller, Stefan Rutkowski, Katja von Hoff, Michael C Frühwald, Astrid Gnekow, Gudrun Fleischhack, Stephan Tippelt, Gabriele Calaminus, Camelia-Maria Monoranu, Arie Perry, Chris Jones, Thomas S Jacques, Bernhard Radlwimmer, Marco Gessi, Torsten Pietsch, Johannes Schramm, Gabriele Schackert, Manfred Westphal, Guido Reifenberger, Pieter Wesseling, Michael Weller, Vincent Peter Collins, Ingmar Blümcke, Martin Bendszus, Jürgen Debus, Annie Huang, Nada Jabado, Paul A Northcott, Werner Paulus, Amar Gajjar, Giles W Robinson, Michael D Taylor, Zane Jaunmuktane, Marina Ryzhova, Michael Platten, Andreas Unterberg, Wolfgang Wick, Matthias A Karajannis, Michel Mittelbronn, Till Acker, Christian Hartmann, Kenneth Aldape, Ulrich Schüller, Rolf Buslei, Peter Lichter, Marcel Kool, Christel Herold-Mende, David W Ellison, Martin Hasselblatt, Matija Snuderl, Sebastian Brandner, Andrey Korshunov, Andreas von Deimling, Stefan M Pfister

Abstract

Accurate pathological diagnosis is crucial for optimal management of patients with cancer. For the approximately 100 known tumour types of the central nervous system, standardization of the diagnostic process has been shown to be particularly challenging-with substantial inter-observer variability in the histopathological diagnosis of many tumour types. Here we present a comprehensive approach for the DNA methylation-based classification of central nervous system tumours across all entities and age groups, and demonstrate its application in a routine diagnostic setting. We show that the availability of this method may have a substantial impact on diagnostic precision compared to standard methods, resulting in a change of diagnosis in up to 12% of prospective cases. For broader accessibility, we have designed a free online classifier tool, the use of which does not require any additional onsite data processing. Our results provide a blueprint for the generation of machine-learning-based tumour classifiers across other cancer entities, with the potential to fundamentally transform tumour pathology.

Figures

Extended Data Figure 1 |. Unsupervised clustering…
Extended Data Figure 1 |. Unsupervised clustering of the DNA methylation-based reference cohort.
a, Heatmap showing the pairwise Pearson correlation (lower left) of the 32,000 most variably methylated CpG probes of all 2,801 biologically independent samples of the reference cohort. A detailed view on closely related ependymal classes (upper right) and the three subclasses identified in ATRT tumours (lower right) indicates higher correlation within classes. The colour code and abbreviations are identical to main Figure 1a. b, Barplot showing eigenvalue frequencies of a principal component analysis (PCA) using the same 32,000 most variably methylated CpG probes of all 2,801 biologically independent samples as in (a). The number of non-trivial components were determined by comparing eigenvalues to the maximum eigenvalue of a PCA using randomized beta-values (shuffling of sample labels per probe). c, X and Y coordinates of the first five of a total of 500 iterations of t-SNE dimensionality reduction generated by random downsampling to 90% of the 2,801 biologically independent samples to assess clustering stability. Axis positions of individual cases are connected by a line coloured according to the colour code of Figure 1a. The depiction illustrates the close proximity of cases of the same class across iterations, indicative of a high stability independent of the exact composition of the reference cohort. d, Pairwise correlation of X and Y coordinates between 2,801 biologically independent samples over all iterations of the downsampling analysis demonstrates a very high correlation within classes (average correlation 0.982), indicating a high stability of the t-SNE analysis.
Extended Data Figure 2 |. Unsupervised clustering…
Extended Data Figure 2 |. Unsupervised clustering is not biased by a range of possible confounding factors.
a, t-SNE representations of the 2,801 biologically independent samples constituting the reference cohort as shown in Figure 1b overlaid with potentially confounding factors (b-f). b, Distribution of patient sex among the classes illustrates equal or near equal distribution of many classes, but also an expected enrichment for one sex in some classes (e.g. female in meningioma or CNS high-grade neuroepithelial tumour with MN1 alteration). c, Patient age illustrates the expected age distribution of many tumour classes. d-f, The slightly uneven distribution of type of material (e.g. pilocytic astrocytoma or meningioma) (d), array preparation date (e), and tissue source (f) are related to the specifics of assembling the reference cohort and do not indicate an apparent confounding effect on the unsupervised clustering.
Extended Data Figure 3 |. Estimation of…
Extended Data Figure 3 |. Estimation of tumour purity and relation to TCGA pan-glioma methylation classes.
a, A Random Forest model was trained to predict ABSOLUTE tumour purity estimates using the TCGA pan-glioma dataset (795 biologically independent samples). The plot shows ABSOLUTE purity estimates and out-of-bag Random Forest tumour purity predictions (i.e. using only RF trees for which the respective sample was not involved in the training). The estimated mean squared error is 0.015, indicating that this model is able to yield reasonable predictions of tumour purity from methylation data. b, Bar plot showing the distribution of Random Forest predicted purity in the reference dataset (2,801 biologically independent samples). Purity estimates have been transformed into five categories indicated by different shades of blue. The exact case-by-case values are given in Supplementary Table 2. The median estimated purity in the reference cohort is 66% (range 42% to 87%) and 78% of samples have an estimated purity of at least 60%. c, t-SNE representation of the reference cohort (2,801 biologically independent samples) overlayed with Random Forest predicted purity categories. Methylation classes are generally composed of mixed tumour purity categories. Tumour purity shows some association with the WHO grade (WHO I median tumour purity 60%, range 39–77%; WHO II median 66%, range 43–80%; WHO III median 68% range 54–84%; WHO IV median 69% range 49–87%). A further association of tumour purity with the composition of classes in the unsupervised t-SNE analysis was not evident. d, t-SNE representation of the reference cohort (2,801 biologically independent samples) overlayed with predicted TCGA pan-glioma DNA methylation classes according to Ceccarelli et al. 2016. Pan-glioma methylation classes were predicted by training a Random Forest (RF) on the Ceccarelli et al. 2016 dataset including methylation data of 418 low grade glioma and 377 glioblastoma samples acquired using the Illumina 450k and 27k platforms. The RF was trained using the 1,300 CpG signature as described by the authors and using the default settings of the RF algorithm implemented in the R package randomForest. Pan-glioma class prediction was only performed for subsets of mostly adult astrocytomas, oligodendrogliomas and glioblastomas (magnified areas) included in the Ceccarelli et al. 2016 data set. LGm1, LGm2 and LGm3 show a high overlap with the methylation classes A IDH HG, A IDH and O IDH, respectively. LGm4 shows highest overlap with methylation class GBM RTK II. LGm5 shows highest overlap with methylation classes GBM MES and GBM RTK I. LGm6 show highest overlap with DMG K27, GBM MID and GBM MYCN.
Extended Data Figure 4 |. Development of…
Extended Data Figure 4 |. Development of the Random Forest classifier.
a, The RF training consists of four steps. First, a basic filtering for probes that are not included on the EPIC array, probes located on the X and Y- chromosomes, probes affected by SNPs, and probes not mapping uniquely to the genome is performed. In a second step, the probe-wise batch effects between samples from FFPE and frozen material are estimated and adjusted by a linear model approach. In a third step, feature selection is performed by training a RF using all probes and selecting the 10,000 probes with highest variable importance measure. In a last step, the final RF is trained using only the 10,000 selected probes. The validation of the RF classifier involves a three-fold nested cross-validation (CV). In the outer loop of the CV the complete RF training procedure described before is applied to the training data and the resulting RF is used to predict the test data to generate RF scores. In the inner loop of the CV a three-fold CV is applied to training data of the outer loop in order to generate RF scores independent of the test data in the outer loop. These scores are then used to fit a calibration model, i.e. a L2-penalized, multinomial, logistic regression that takes the RF scores of the test data in the outer CV loop to estimate tumour class probabilities (P1, P2, P3). To fit a calibration model to estimate class probabilities of diagnostic samples using all data in the reference set, the RF scores generated in the outer CV loop are used. b, Schematic depiction of three exemplary binary decision trees of the Random Forest classifier (left), and magnification on five exemplary decisions nodes relevant for glioblastoma classification (right). For prediction, a diagnostic sample enters the root node of each of the 10,000 trees. At every decision node, the decision path is determined on the methylation level of a single CpG, until reaching a terminal node that provides the class prediction. The joint class prediction of all trees represents the raw prediction score. The colour code and abbreviations are identical to Figure 1a.
Extended Data Figure 5 |. Comparison of…
Extended Data Figure 5 |. Comparison of raw and calibrated classifier scores and threshold definition.
a, Density plots illustrating the distribution of raw and calibrated classifier scores for samples correctly classified during cross-validation (n=2,701 independent biological samples for raw and n=2769 independent biological samples for calibrated), depicted for each methylation class or methylation class family (MCF). Score calibration results in a harmonization of score distribution and allows the establishment of a shared classification threshold. Three thresholds for maximizing specificity (0.958), maximizing the Youden index (0.836), and the cutoff used in this study (0.9) are indicated by red lines (see also panels d and e). b, Multivariate score calibration exemplified in a ternary plot showing scores of the three ATRT subclasses (MYC, SHH, and TYR; together n=112 independent biological samples). Arrows indicate transformation of the scores for individual samples by the calibration model, which increases the discrimination between the three subclasses. c, The accuracy of prediction of the Random Forest classifier constructed of n=2801 biologically independent samples (measured by misclassification error, area under receiver operating characteristic curve (AUC), Brier score, multiclass Sensitivity and Specificity) is improved by score calibration and by combining classes into methylation class families (MCF). d, To determine a common threshold for the calibrated MCF scores, we performed a Receiver Operating Characteristic (ROC) analysis of the maximum calibrated MCF scores of all n=2801 biologically independent samples calculated via cross-validation. For this ROC analysis we defined a new binary class, i.e. samples correctly classified during the CV using the maximum calibrated MCF score for classification were considered as ‘classifiable’ (n=2769) and samples that got falsely classified by using this score were considered ‘non classifiable’ (n=32). Three thresholds for different sensitivity and specificity are highlighted in the ROC curve: A threshold of 0.958 achieving a maximum specificity of 1 with a sensitivity of 0.827, a threshold of 0.836 obtaining a maximum Youden index with Specificity 0.938 and sensitivity 0.934, and our recommended compromise threshold of 0.9 that results in a specificity of 0.938 and a sensitivity of 0.9. Bootstrapped 95% confidence intervals for estimated sensitivity and specificity are indicated in grey. e, Sensitivity and specificity for all possible thresholds applied to cross-validated maximum MCF classifier scores of all n=2801 biologically independent samples. Three thresholds for maximizing specificity (0.958), maximizing the Youden index (0.836) and 0.9 are highlighted by red lines.
Extended Data Figure 6 |. Diagnostic utility…
Extended Data Figure 6 |. Diagnostic utility of the DNA-methylation based classifier, assessed at different centres.
a, Implementation of the DNA methylation classifier by five external centres. In total, 401 independent biological samples were analysed. 78% matched to an established class with a cut-off score of ≥0.9 (class colours as in Figure 1a). A new diagnosis was established in 12% of cases. b, Depiction of individual centre results, illustrating the different composition of samples included in the analysis, variation in the rate of non-matching cases, and of cases where a new diagnosis was established. Case-by-case details are given in Supplementary Table 6.
Extended Data Figure 7 |. Inter-centre and…
Extended Data Figure 7 |. Inter-centre and inter-platform reproducibility of DNA methylation-based classification.
a, Calibrated scores of 53 independent biological samples representing diagnostic CNS tumour cases analysed at the University of Heidelberg and at the New York University pathology department. Both laboratories performed independent DNA extraction, array hybridization, and data analysis. Cases falling into green areas were classified identically in both centres (96%); cases in the red area were non-classifiable in one centre (4%). None of the 53 samples was assigned to a different methylation class by the two centres. b, Copy-number profiles calculated from the array data generated at both centres were highly comparable and allowed identification of chromosomal gains, losses, amplifications, and deletions. Calculations and interpretation were performed once at each centre. c, Plot of maximum raw classification scores of 16 different tumour samples generated using both 450k and EPIC arrays. All cases fall close to the bisecting line (red) indicating a high concordance of the scores. Further, the methylation class prediction was identical for all samples. d, The CNS tumour classifier also performs well with data generated by whole-genome bisulfite sequencing (WGBS). The plot shows classifier scores calculated from WGBS and 450k arrays of 50 cases comprising 11 different brain tumour entities (bisecting line in red). Methylation beta-values were calculated from high-coverage WGBS data (>10 fold average coverage) and run through the CNS tumour classifier and plotted against the same case analysed using 450k arrays. The highest class prediction score was identical in all cases.
Extended Data Figure 8 |
Extended Data Figure 8 |
Sample website PDF report of a IDH wildtype glioblastoma sample.
Extended Data Figure 9 |
Extended Data Figure 9 |
Exemplary workflow and timeline of diagnostic methylation profiling.
Figure 1 |. Establishing of the DNA…
Figure 1 |. Establishing of the DNA methylation-based CNS tumour reference cohort.
a, Overview of the 82 CNS tumour methylation classes and nine control tissue methylation classes of the reference cohort. The methylation classes are grouped by histology and color-coded. Category 1 methylation classes are equivalent to a WHO entity, category 2 methylation classes are a subgroup of a WHO entity, category 3 methylation classes are not equivalent to a unique WHO entity with combining of WHO grades, category 4 methylation classes are not equivalent to a unique WHO entity with combining of WHO entities, and category 5 methylation classes are not recognized as a WHO entity. Full names and further details of the abbreviated 91 classes are given in Supplementary Table 1. Embryonal tumours: shades of blue; Glioblastomas: shades of green; Other gliomas: shades of violet; Ependymomas: shades of red; Glio-neuronal tumours: shades of orange; IDH-mutated gliomas: shades of yellow; Choroid plexus tumours: shades of brown; Pineal region tumours: shades of mint green; Melanocytic tumours: shades of dark blue; Sellar region tumours: shades of cyan; Mesenchymal tumours: shades of pink; Nerve tumours: shades of beige; Haematopoietic tumours: shades of dark purple; Control tissues: shades of grey. b, Unsupervised clustering of reference cohort samples (n=2,801) using t-SNE dimensionality reduction. Individual samples are colour-coded in the respective class colour (n=91) and labelled with the class abbreviation. The colour code and abbreviations are identical to Figure 1a.
Figure 2 |. Development and cross-validation of…
Figure 2 |. Development and cross-validation of the DNA methylation-based CNS tumour classifier.
a, Schematic of principal classifier components (grey) and processing steps for individual test samples (white). The most informative probes are selected for training of the Random Forest classifier. The classifier produces raw scores representing the number of decision trees assigning a test sample to a specific methylation class. To enable inter-class-comparability a calibration model is used, which transforms raw into calibrated scores. Calibrated scores represent an estimated probability measure of methylation class assignment. b, Heatmap showing results of a three-fold cross-validation of the Random Forest classifier incorporating information of n=2801 biologically independent samples allotted to 91 methylation classes. Deviations from the bisecting line represent misclassification errors (using the maximum calibrated score for class prediction). Methylation class families (MCF) are indicated by black squares. The colour code and abbreviations are identical to Figure 1a.
Figure 3 |. Implementation of the classifier…
Figure 3 |. Implementation of the classifier in diagnostic practice.
a, Classifier validation by an independent prospective cohort of diagnostic samples. Pathological diagnosis was established by current pathological standard according to the 2016 version of the WHO classification of CNS tumours and compared to classification by methylation profiling. Cases were categorized as “confirmation of diagnosis”, “establishing new diagnosis”, “misleading profile”, or “no match to defined class”. b, Overview of methylation profiling result from 1,155 diagnostic samples and integration with pathological diagnosis.
Figure 4 |. Reassessment of discrepant cases…
Figure 4 |. Reassessment of discrepant cases and establishment of new diagnosis.
Discrepancy between pathological diagnosis (left) and methylation profiling (middle) was observed for 139 cases. For 129 cases histological and molecular reassessment (Supplementary Table 5) resulted in change of the initial diagnosis with formulation of a new integrated diagnosis (right). For 92 cases this involved change of WHO grading, with both down- (blue) and upgrading (red). Integrated diagnoses in brackets are not recognized as a WHO entity. For methylation class abbreviations see Supplementary Table 1.
Figure 5 |. DNA methylation-based identification of…
Figure 5 |. DNA methylation-based identification of potential new CNS tumour entities.
a, Unsupervised clustering of the combined reference (n=2,801, grey) and diagnostic cohort (n=1,104, coloured) using t-SNE dimensionality reduction. Abbreviated names indicate the reference cohort classes as in Figure 1. The diagnostic samples are colour coded as “confirmation of diagnosis” (n=838, green), “establishing new diagnosis” (n=129, blue), “misleading profile” (n=10, red) and “no match to defined class” (n=127, dark grey). The matching (green) and reclassified (blue) cases show high overlap with the reference cases. The non-classifiable (black) and the misleading (red) cases frequently fall in the periphery of the reference classes or are completely separate of these. The magnification (right) highlights two non-classifiable cases (here in magenta for easier identification) that group together in the t-SNE representation. b, Both highlighted non-classifiable cases occurred in female children, and had primitive neuroectodermal histology (glioblastoma- or embryonal tumour-like). Histology was assessed by three independent pathologists with similar results. c, Both cases shared a high-level amplification of chromosome 6q24.2 (common amplified region chr6:144,149,293–144,649,987). The common region includes only 5 protein coding genes: LTV1 (LTV1 ribosome biogenesis factor), ZC2HC1B (zinc finger C2HC-type containing 1B), PLAGL1 (PLAG1 like zinc finger 1), SF3B5 (splicing factor 3b subunit 5) and STX11 (syntaxin 11). This amplification was not observed in any of the other tumours from the reference or diagnostic cohort. Copy number analysis was performed once using copy number information deriving from the methylation array data.

References

    1. Louis DN, Ohgaki H, Wiestler OD & Cavenee WK WHO Classification of Tumours of the Central Nervous System (revised 4th edition). (IARC, 2016).
    1. van den Bent MJ Interobserver variation of the histopathological diagnosis in clinical trials on glioma: a clinician’s perspective. Acta Neuropathol . 120, 297–304, doi:10.1007/s00401-010-0725-7 (2010).
    1. Ellison DW et al. Histopathological grading of pediatric ependymoma: reproducibility and clinical relevance in European trial cohorts. J Negat Results Biomed 10, 7, doi:10.1186/1477-5751-10-7 (2011).
    1. Sturm D et al. New Brain Tumor Entities Emerge from Molecular Classification of CNS-PNETs. Cell 164, 1060–1072, doi:10.1016/j.cell.2016.01.015 (2016).
    1. Fernandez AF et al. A DNA methylation fingerprint of 1628 human samples. Genome Res . 22, 407–419, doi:10.1101/gr.119867.110 (2012).
    1. Hovestadt V et al. Decoding the regulatory landscape of medulloblastoma using DNA methylation sequencing. Nature 510, 537–541, doi:10.1038/nature13268 (2014).
    1. Moran S et al. Epigenetic profiling to classify cancer of unknown primary: a multicentre, retrospective analysis. Lancet Oncol . 17, 1386–1395, doi:10.1016/S1470-2045(16)30297-2 (2016).
    1. Hovestadt V et al. Robust molecular subgrouping and copy-number profiling of medulloblastoma from small amounts of archival tumour material using high-density DNA methylation arrays. Acta Neuropathol . 125, 913–916, doi:10.1007/s00401-013-1126-5 (2013).
    1. Sturm D et al. Hotspot mutations in H3F3A and IDH1 define distinct epigenetic and biological subgroups of glioblastoma. Cancer Cell 22, 425–437, doi:10.1016/j.ccr.2012.08.024 (2012).
    1. Reuss DE et al. Adult IDH wild type astrocytomas biologically and clinically resolve into other tumor entities. Acta Neuropathol . 130, 407–417, doi:10.1007/s00401-015-1454-8 (2015).
    1. Pajtler KW et al. Molecular Classification of Ependymal Tumors across All CNS Compartments, Histopathological Grades, and Age Groups. Cancer Cell 27, 728–743, doi:10.1016/j.ccell.2015.04.002 (2015).
    1. Lambert SR et al. Differential expression and methylation of brain developmental genes define location-specific subsets of pilocytic astrocytoma. Acta Neuropathol . 126, 291–301, doi:10.1007/s00401-013-1124-7 (2013).
    1. Thomas C et al. Methylation profiling of choroid plexus tumors reveals 3 clinically distinct subgroups. Neuro Oncol . 18, 790–796, doi:10.1093/neuonc/nov322 (2016).
    1. Mack SC et al. Epigenomic alterations define lethal CIMP-positive ependymomas of infancy. Nature 506, 445–450, doi:10.1038/nature13108 (2014).
    1. Johann PD et al. Atypical Teratoid/Rhabdoid Tumors Are Comprised of Three Epigenetic Subgroups with Distinct Enhancer Landscapes. Cancer Cell 29, 379–393, doi:10.1016/j.ccell.2016.02.001 (2016).
    1. Wiestler B et al. Integrated DNA methylation and copy-number profiling identify three clinically and biologically relevant groups of anaplastic glioma. Acta Neuropathol . 128, 561–571, doi:10.1007/s00401-014-1315-x (2014).
    1. van der Maaten L & Hinton G Visualizing data using t-SNE. The Journal of Machine Learning Research 9, 85 (2008).
    1. Ceccarelli M et al. Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma. Cell 164, 550–563, doi:10.1016/j.cell.2015.12.028 (2016).
    1. Breiman L Random forests. Machine learning 45, 5–32 (2001).
    1. Sokolova M & Lapalme G A systematic analysis of performance measures for classification tasks. Inf. Process. Manage . 45, 427–437, doi:10.1016/j.ipm.2009.03.002 (2009).
    1. Sahm F et al. Next-generation sequencing in routine brain tumor diagnostics enables an integrated diagnosis and identifies actionable targets. Acta Neuropathol . 131, 903–910, doi:10.1007/s00401-015-1519-8 (2016).
    1. Weller M et al. Molecular classification of diffuse cerebral WHO grade II/III gliomas using genome- and transcriptome-wide profiling improves stratification of prognostically distinct patient groups. Acta Neuropathol . 129, 679–693, doi:10.1007/s00401-015-1409-0 (2015).
    1. Cancer Genome Atlas Research, N. et al. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas. N. Engl. J. Med . 372, 2481–2498, doi:10.1056/NEJMoa1402121 (2015).
    1. conumee: Enhanced copy-number variation analysis using Illumina 450k methylation arrays. R package version 0.99.4, (2015).
    1. Bady P, Delorenzi M & Hegi ME Sensitivity Analysis of the MGMT-STP27 Model and Impact of Genetic and Epigenetic Context to Predict the MGMT Methylation Status in Gliomas and Other Tumors. J. Mol. Diagn . 18, 350–361, doi:10.1016/j.jmoldx.2015.11.009 (2016).
Online Only References
    1. Korshunov A et al. Histologically distinct neuroepithelial tumors with histone 3 G34 mutation are molecularly similar and comprise a single nosologic entity. Acta Neuropathol . 131, 137–146, doi:10.1007/s00401-015-1493-1 (2016).
    1. Korshunov A et al. Embryonal tumor with abundant neuropil and true rosettes (ETANTR), ependymoblastoma, and medulloepithelioma share molecular similarity and comprise a single clinicopathological entity. Acta Neuropathol . 128, 279–289, doi:10.1007/s00401-013-1228-0 (2014).
    1. Holsken A et al. Adamantinomatous and papillary craniopharyngiomas are characterized by distinct epigenomic as well as mutational and transcriptomic profiles. Acta Neuropathol Commun 4, 20, doi:10.1186/s40478-016-0287-6 (2016).
    1. Heim S et al. Papillary Tumor of the Pineal Region: A Distinct Molecular Entity. Brain Pathol . 26, 199–205, doi:10.1111/bpa.12282 (2016).
    1. Koelsche C et al. Melanotic tumors of the nervous system are characterized by distinct mutational, chromosomal and epigenomic profiles. Brain Pathol . 25, 202–208, doi:10.1111/bpa.12228 (2015).
    1. Jones DT et al. Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma. Nat. Genet . 45, 927–932, doi:10.1038/ng.2682 (2013).
    1. Jones DT et al. Dissecting the genomic complexity underlying medulloblastoma. Nature 488, 100–105, doi:10.1038/nature11284 (2012).
    1. Pietsch T et al. Prognostic significance of clinical, histopathological, and molecular characteristics of medulloblastomas in the prospective HIT2000 multicenter clinical trial cohort. Acta Neuropathol . 128, 137–149, doi:10.1007/s00401-014-1276-0 (2014).
    1. R: A language and environment for statistical computing. (R Foundation for Statistical Computing, Vienna, Austria, 2016).
    1. Huber W et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–121, doi:10.1038/nmeth.3252 (2015).
    1. Aryee MJ et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369, doi:10.1093/bioinformatics/btu049 (2014).
    1. Leek JT & Storey JD Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS genetics 3, 1724–1735, doi:10.1371/journal.pgen.0030161 (2007).
    1. Leek JT & Storey JD A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. U. S. A . 105, 18718–18723, doi:10.1073/pnas.0808709105 (2008).
    1. Breiman L Classification and regression trees. (Chapman & Hall/CRC, 1984).
    1. Liaw A & Wiener M Classification and Regression by randomForest. R News 2, 18–22 (2002).
    1. Chen C, Liaw A & Breiman L Using random forest to learn imbalanced data. University of California, Berkeley, 1–12 (2004).
    1. Kim KI & Simon R Overfitting, generalization, and MSE in class probability estimation with high-dimensional data. Biom J 56, 256–269, doi:10.1002/bimj.201300083 (2014).
    1. Boström H in Machine Learning and Applicati ons, 2008. ICMLA’08. Seventh International Conference on. 121–126 (IEEE).
    1. Smola AJ Advances in large margin classifiers. (MIT press, 2000).
    1. Friedman J, Hastie T & Tibshirani R Regularization paths for generalized linear models via coordinate descent. Journal of statistical software 33, 1 (2010).
    1. Appel IJ, Gronwald W & Spang R Estimating classification probabilities in high-dimensional diagnostic studies. Bioinformatics 27, 2563–2570 (2011).
    1. Hand DJ & Till RJ A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine learning 45, 171–186 (2001).
    1. Simon R Class probability estimation for medical studies. Biom J 56, 597–600, doi:10.1002/bimj.201300296 (2014).
    1. Brier GW Verification of forecasts expressed in terms of probability. Monthly Weather Review 78, 1–3, doi:10.1175/1520-0493(1950)078<0001:vofeit>;2 (1950).
    1. Carter SL et al. Absolute quantification of somatic DNA alterations in human cancer. Nat. Biotechnol . 30, 41 3-421, doi:10.1038/nbt.2203 (2012).

Source: PubMed

3
S'abonner