Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Nicolas Coudray, Paolo Santiago Ocampo, Theodore Sakellaropoulos, Navneet Narula, Matija Snuderl, David Fenyö, Andre L Moreira, Narges Razavian, Aristotelis Tsirigos, Nicolas Coudray, Paolo Santiago Ocampo, Theodore Sakellaropoulos, Navneet Narula, Matija Snuderl, David Fenyö, Andre L Moreira, Narges Razavian, Aristotelis Tsirigos

Abstract

Visual inspection of histopathology slides is one of the main methods used by pathologists to assess the stage, type and subtype of lung tumors. Adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) are the most prevalent subtypes of lung cancer, and their distinction requires visual inspection by an experienced pathologist. In this study, we trained a deep convolutional neural network (inception v3) on whole-slide images obtained from The Cancer Genome Atlas to accurately and automatically classify them into LUAD, LUSC or normal lung tissue. The performance of our method is comparable to that of pathologists, with an average area under the curve (AUC) of 0.97. Our model was validated on independent datasets of frozen tissues, formalin-fixed paraffin-embedded tissues and biopsies. Furthermore, we trained the network to predict the ten most commonly mutated genes in LUAD. We found that six of them-STK11, EGFR, FAT1, SETBP1, KRAS and TP53-can be predicted from pathology images, with AUCs from 0.733 to 0.856 as measured on a held-out population. These findings suggest that deep-learning models can assist pathologists in the detection of cancer subtype or gene mutations. Our approach can be applied to any cancer type, and the code is available at https://github.com/ncoudray/DeepPATH .

Conflict of interest statement

Competing Financial Interests Statement

The authors declare no competing interests.

Figures

Figure 1.. Data and strategy:
Figure 1.. Data and strategy:
(a) Number of whole-slide images per class. (b) Strategy: (b1) Images of lung cancer tissues were first downloaded from the Genomic Data Common database; (b2) slides were then separated into a training (70%), a validation (15%) and a test set (15%); (b3) slides were tiled by non-overlapping 512×512 pixels windows, omitting those with over 50% background; (b4) the Inception v3 architecture was used and partially or fully re-trained using the training and validation tiles; (b5) classifications were performed on tiles from an independent test set and the results were finally aggregated per slide to extract the heatmaps and the AUC statistics. (c) Size distribution of the images widths (gray) and heights (black). (d) Distribution of the number of tiles per slide.
Figure 2.. Classification of presence and type…
Figure 2.. Classification of presence and type of tumor on alternative cohorts:
Receiver Operating Characteristic (ROC) curves (left) from tests on (a) frozen sections (n=98 biologically independent slides), (b) formalin-fixed paraffin-embedded (FFPE) sections (n=140 biologically independent slides) and (c) biopsies (n=102 biologically independent slides) from NYU Langone Medical Center. On the right of each plot, we show examples of raw images with an overlap in light grey of the mask generated by a pathologist and the corresponding heatmaps obtained with the three-way classifier. Scale bars are 1 mm.
Figure 3.. Gene mutation prediction from histopathology…
Figure 3.. Gene mutation prediction from histopathology slides give promising results for at least 6 genes:
(a) Mutation probability distribution for slides where each mutation is present or absent (tile aggregation by averaging output probability). (b) ROC curves associated with the top four predictions (a). (c) Allele frequency as a function of slides classified by the deep learning network as having a certain gene mutation (P≥0.5), or the wild-type (P<0.5). p-values estimated with two-tailed Mann-Whitney U-test are shown as ns (p>0.05), * (p≤0.05), ** (p≤0.01) or *** (p≤0.001). For a, b and c, n=62 slides from 59 patients. For the two box plots, whiskers represent the minima and maxima. The middle line within the box represents the median.
Figure 4.. Spatial heterogeneity of predicted mutations.
Figure 4.. Spatial heterogeneity of predicted mutations.
(a) Probability distribution on LUAD tiles for the 6 predictable mutations with average values in dotted lines (n=327 non-overlapping tiles). The allele frequency is 0.33 for TP53, 0.25 for STK11 and 0 for the 4 other mutations. (b) heatmap of TP53 and (c) STK11 when only tiles classified as LUAD are selected, and in (d) and (e) when all the tiles are considered. Scale bars are 1 mm.

Source: PubMed

3
S'abonner