Assessing the Accuracy of a Deep Learning Method to Risk Stratify Indeterminate Pulmonary Nodules

Pierre P Massion, Sanja Antic, Sarim Ather, Carlos Arteta, Jan Brabec, Heidi Chen, Jerome Declerck, David Dufek, William Hickes, Timor Kadir, Jonas Kunst, Bennett A Landman, Reginald F Munden, Petr Novotny, Heiko Peschl, Lyndsey C Pickup, Catarina Santos, Gary T Smith, Ambika Talwar, Fergus Gleeson, Pierre P Massion, Sanja Antic, Sarim Ather, Carlos Arteta, Jan Brabec, Heidi Chen, Jerome Declerck, David Dufek, William Hickes, Timor Kadir, Jonas Kunst, Bennett A Landman, Reginald F Munden, Petr Novotny, Heiko Peschl, Lyndsey C Pickup, Catarina Santos, Gary T Smith, Ambika Talwar, Fergus Gleeson

Abstract

Rationale: The management of indeterminate pulmonary nodules (IPNs) remains challenging, resulting in invasive procedures and delays in diagnosis and treatment. Strategies to decrease the rate of unnecessary invasive procedures and optimize surveillance regimens are needed.Objectives: To develop and validate a deep learning method to improve the management of IPNs.Methods: A Lung Cancer Prediction Convolutional Neural Network model was trained using computed tomography images of IPNs from the National Lung Screening Trial, internally validated, and externally tested on cohorts from two academic institutions.Measurements and Main Results: The areas under the receiver operating characteristic curve in the external validation cohorts were 83.5% (95% confidence interval [CI], 75.4-90.7%) and 91.9% (95% CI, 88.7-94.7%), compared with 78.1% (95% CI, 68.7-86.4%) and 81.9 (95% CI, 76.1-87.1%), respectively, for a commonly used clinical risk model for incidental nodules. Using 5% and 65% malignancy thresholds defining low- and high-risk categories, the overall net reclassifications in the validation cohorts for cancers and benign nodules compared with the Mayo model were 0.34 (Vanderbilt) and 0.30 (Oxford) as a rule-in test, and 0.33 (Vanderbilt) and 0.58 (Oxford) as a rule-out test. Compared with traditional risk prediction models, the Lung Cancer Prediction Convolutional Neural Network was associated with improved accuracy in predicting the likelihood of disease at each threshold of management and in our external validation cohorts.Conclusions: This study demonstrates that this deep learning algorithm can correctly reclassify IPNs into low- or high-risk categories in more than a third of cancers and benign nodules when compared with conventional risk models, potentially reducing the number of unnecessary invasive procedures and delays in diagnosis.

Keywords: computer-aided image analysis; early detection; lung cancer; neural networks; risk stratification.

Figures

Figure 1.
Figure 1.
Schematics showing the (A) Lung Cancer Prediction Convolutional Neural Network (LCP-CNN) architecture, (B) the training procedure, and (C) application of the trained model to novel data. The input to the network is a three-dimensional anisotropically resampled box ∼56 mm in width.
Figure 2.
Figure 2.
Receiver operating characteristic curves and area under the curve (AUC) analysis of the (A) internal National Lung Screening Trial (NLST) dataset using eight-way cross-validation, (B) external Vanderbilt dataset, and (C) external Oxford dataset. The Brock model was used as a comparator for the screening population, and the Mayo model was used for the incidental nodule populations for the two independent validation datasets. LCP-CNN = Lung Cancer Prediction Convolutional Neural Network.
Figure 3.
Figure 3.
Reclassification diagrams. (A) National Lung Screening Trial (NLST) dataset for 200 cases and 200 benign nodules (randomly selected; numbers were limited for readability of the figure). (B) Vanderbilt University Medical Center dataset. (C) Oxford University Hospitals dataset. Reclassification diagrams are a useful way to visualize the impact of a new biomarker compared with a reference at predefined thresholds. Here we use rule-out and rule-in thresholds at 5% and 65%, respectively, as shown by the black lines. Red triangles indicate cancers, and blue circles indicate controls. If a new biomarker improves classification of cancers compared with the reference, then one would expect, for example, cases (red triangles) that were below 65% on the horizontal axis to move above 65% to the vertical axis, that is, from the central rectangular region to the region immediately above it. For example, on the Vanderbilt and Oxford datasets, 45% and 32% of the cancers, respectively, are reclassified up compared with the Mayo model. Similarly, a new biomarker improves benign classification compared with the reference if it moves controls (blue circles) that were above the 5% threshold on the horizontal axis to below 5% on the vertical axis. For nodules that stay within the three square regions intersected by the green diagonal, the Lung Cancer Prediction Convolutional Neural Network (LCP-CNN) does not add value because none of the nodules are correctly reclassified compared with the Brock or Mayo model. On the Vanderbilt and Oxford datasets, 33% and 61% of the benign nodules, respectively, are reclassified down compared with the Mayo model.

References

    1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2020. CA Cancer J Clin. 2020;70:7–30.
    1. American Lung Association. State of lung cancer. 2019. [accessed 2019 Feb 1]. Available from: .
    1. American Cancer Society. Cancer facts & figures. 2018. [accessed 2019 Feb 1]. Available from: .
    1. Aberle DR, Adams AM, Berg CD, Black WC, Clapp JD, Fagerstrom RM, et al. National Lung Screening Trial Research Team. Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 2011;365:395–409.
    1. De Koning HJ.NELSON study shows CT screening for nodule volume management reduces lung cancer mortality by 26 percent in men. Presented at the IASLC 19th World Conference on Lung Cancer. Sept 23–26, 2018; Toronto, Canada
    1. Gould MK, Tang T, Liu IL, Lee J, Zheng C, Danforth KN, et al. Recent trends in the identification of incidental pulmonary nodules. Am J Respir Crit Care Med. 2015;192:1208–1214.
    1. Nair A, Bartlett EC, Walsh SLF, Wells AU, Navani N, Hardavella G, et al. Lung Nodule Evaluation Group. Variable radiological lung nodule evaluation leads to divergent management recommendations. Eur Respir J. 2018;52:1801359.
    1. Penn A, Ma M, Chou BB, Tseng JR, Phan P. Inter-reader variability when applying the 2013 Fleischner guidelines for potential solitary subsolid lung nodules. Acta Radiol. 2015;56:1180–1186.
    1. American College of Radiology. Lung CT screening reporting and data system (Lung-RADSTM) [accessed 2019 Feb 1]. Available from: .
    1. McKee BJ, Regis SM, McKee AB, Flacke S, Wald C. Performance of ACR lung-RADS in a clinical CT lung screening program. J Am Coll Radiol. 2015;12:273–276.
    1. MacMahon H, Naidich DP, Goo JM, Lee KS, Leung ANC, Mayo JR, et al. Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology. 2017;284:228–243.
    1. Baldwin DR, Callister ME Guideline Development Group. The British Thoracic Society guidelines on the investigation and management of pulmonary nodules. Thorax. 2015;70:794–798.
    1. Gould MK, Donington J, Lynch WR, Mazzone PJ, Midthun DE, Naidich DP, et al. Evaluation of individuals with pulmonary nodules: when is it lung cancer: diagnosis and management of lung cancer, 3rd ed. American College of Chest Physicians evidence-based clinical practice guidelines. Chest. 2013;143:e93S–e120S.
    1. Gould MK, Ananth L, Barnett PG Veterans Affairs SNAP Cooperative Study Group. A clinical model to estimate the pretest probability of lung cancer in patients with solitary pulmonary nodules. Chest. 2007;131:383–388.
    1. Maiga AW, Deppen SA, Massion PP, Callaway-Lane C, Pinkerman R, Dittus RS, et al. Communication about the probability of cancer in indeterminate pulmonary nodules. JAMA Surg. 2018;153:353–357.
    1. Tanner NT, Aggarwal J, Gould MK, Kearney P, Diette G, Vachani A, et al. Management of pulmonary nodules by community pulmonologists: a multicenter observational study. Chest. 2015;148:1405–1414.
    1. Nair A, Baldwin DR, Field JK, Hansell DM, Devaraj A. Measurement methods and algorithms for the management of solid nodules. J Thorac Imaging. 2012;27:230–239.
    1. Lindell RM, Hartman TE, Swensen SJ, Jett JR, Midthun DE, Mandrekar JN. 5-year lung cancer screening experience: growth curves of 18 lung cancers compared to histologic type, CT attenuation, stage, survival, and size. Chest. 2009;136:1586–1595.
    1. van Klaveren RJ, Oudkerk M, Prokop M, Scholten ET, Nackaerts K, Vernhout R, et al. Management of lung nodules detected by volume CT scanning. N Engl J Med. 2009;361:2221–2229.
    1. Kostis WJ, Yankelevitz DF, Reeves AP, Fluture SC, Henschke CI. Small pulmonary nodules: reproducibility of three-dimensional volumetric measurement and estimation of time to follow-up CT. Radiology. 2004;231:446–452.
    1. Xu DM, van der Zaag-Loonen HJ, Oudkerk M, Wang Y, Vliegenthart R, Scholten ET, et al. Smooth or attached solid indeterminate nodules detected at baseline CT screening in the NELSON study: cancer risk during 1 year of follow-up. Radiology. 2009;250:264–272.
    1. Swensen SJ, Silverstein MD, Ilstrup DM, Schleck CD, Edell ES. The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules. Arch Intern Med. 1997;157:849–855.
    1. McWilliams A, Tammemagi MC, Mayo JR, Roberts H, Liu G, Soghrati K, et al. Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med. 2013;369:910–919.
    1. Hawkins S, Wang H, Liu Y, Garcia A, Stringfield O, Krewer H, et al. Predicting malignant nodules from screening CT scans. J Thorac Oncol. 2016;11:2120–2128. [Published erratum appears in J Thorac Oncol 13:280–281.]
    1. Armato SG, III, Drukker K, Li F, Hadjiiski L, Tourassi GD, Engelmann RM, et al. LUNGx Challenge for computerized lung nodule classification. J Med Imaging (Bellingham) 2016;3:044506.
    1. Paul R, Hall L, Goldgof D, Schabath M, Gillies R. Predicting nodule malignancy using a CNN ensemble approach. Proc Int Jt Conf Neural Netw. 2018;2018 10.1109/IJCNN.2018.8489345.
    1. Huang P, Park S, Yan R, Lee J, Chu LC, Lin CT, et al. Added value of computer-aided CT image features for early lung cancer diagnosis with small pulmonary nodules: a matched case-control study. Radiology. 2018;286:286–295.
    1. Zinovev D, Feigenbaum J, Furst J, Raicu D. Probabilistic lung nodule classification with belief decision trees. Conf Proc IEEE Eng Med Biol Soc. 2011;2011:4493–4498.
    1. Kang G, Liu K, Hou B, Zhang N. 3D multi-view convolutional neural networks for lung nodule classification. PLoS One. 2017;12:e0188290.
    1. Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, et al. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat Med. 2019;25:954–961.
    1. Kadir T, Arteta C, Pickup L, Declerck J, Massion P. Deep learning based risk stratification of patients with suspicious nodules [abstract] Am J Respir Crit Care Med. 2018;197:A4695.
    1. Kadir T, Arteta C, Pickup L, Novotny P, Sandford Z, Brabec J, et al. Solid and part-solid lung nodule classification using deep learning on the national lung screening trial dataset [abstract] Am J Respir Crit Care Med. 2018;197:A7417.
    1. Peschl H, Arteta C, Pickup L, Tsakok M, Ather S, Hussain S, et al. Deep learning for rule-out of unnecessary follow-up in patients with incidentally detected, indeterminate pulmonary nodules: results on an independent dataset [abstract] Radiology. 2018:SSG03–SSG06.
    1. Pepe MS, Feng Z, Janes H, Bossuyt PM, Potter JD. Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J Natl Cancer Inst. 2008;100:1432–1438.
    1. National Cancer Institute. Cancer data access system [accessed 2019 Feb 1]. Available from:
    1. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. [preprint]. arXiv; 2018 [accessed 2019 Feb]. Available from: .
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444.
    1. Pepe MS, Janes H, Li CI. Net risk reclassification p values: valid or misleading? J Natl Cancer Inst. 2014;106:dju041.
    1. Kerr KF, Wang Z, Janes H, McClelland RL, Psaty BM, Pepe MS. Net reclassification indices for evaluating risk prediction instruments: a critical review. Epidemiology. 2014;25:114–121.
    1. Efron introduction to the Bootstrap. New York: Chapman and Hall; 1993
    1. Fu WJ, Carroll RJ, Wang S. Estimating misclassification error with small samples via bootstrap cross-validation. Bioinformatics. 2005;21:1979–1986.
    1. Paynter NP, Cook NR. Adding tests to risk based guidelines: evaluating improvements in prediction for an intermediate risk group. BMJ. 2016;354:i4450.
    1. Ciompi F, Chung K, van Riel SJ, Setio AAA, Gerke PK, Jacobs C, et al. Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Sci Rep. 2017;7:46479.
    1. Aerts HJ, Velazquez ER, Leijenaar RT, Parmar C, Grossmann P, Carvalho S, et al. Decoding tumour phenotype by noninvasive imaging using a quantitative radiomics approach. Nat Commun. 2014;5:4006. [Published erratum appears in Nat Commun 5:4644.]24892406.
    1. Kumar V, Gu Y, Basu S, Berglund A, Eschrich SA, Schabath MB, et al. Radiomics: the process and the challenges. Magn Reson Imaging. 2012;30:1234–1248.
    1. Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer. 2012;48:441–446.
    1. Paul R, Hawkins SH, Schabath MB, Gillies RJ, Hall LO, Goldgof DB. Predicting malignant nodules by fusing deep features with classical radiomics features. J Med Imaging (Bellingham) 2018;5:011021.
    1. Huang P, Lin CT, Li Y, Tammemagi CM, Brock MV, Atkar-Khattra S, et al. Prediction of lung cancer risk at follow-up screening with low-dose CT: a training and validation study of a deep learning method. Lancet Digit Health. 2019;1:e353–e362.
    1. van Riel SJ, Sánchez CI, Bankier AA, Naidich DP, Verschakelen J, Scholten ET, et al. Observer variability for classification of pulmonary nodules on low-dose CT images and its effect on nodule management. Radiology. 2015;277:863–871.
    1. Fiore LD, D’Avolio LW. Detours on the road to personalized medicine: barriers to biomarker validation and implementation. JAMA. 2011;306:1914–1915.
    1. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15:361–387.
    1. Nair VS, Sundaram V, Desai M, Gould MK. Accuracy of models to identify lung nodule cancer risk in the national lung screening trial. Am J Respir Crit Care Med. 2018;197:1220–1223.
    1. Al-Ameri A, Malhotra P, Thygesen H, Plant PK, Vaidyanathan S, Karthik S, et al. Risk of malignancy in pulmonary nodules: a validation study of four prediction models. Lung Cancer. 2015;89:27–30.

Source: PubMed

3
订阅