External validation of a convolutional neural network artificial intelligence tool to predict malignancy in pulmonary nodules

David R Baldwin, Jennifer Gustafson, Lyndsey Pickup, Carlos Arteta, Petr Novotny, Jerome Declerck, Timor Kadir, Catarina Figueiras, Albert Sterba, Alan Exell, Vaclav Potesil, Paul Holland, Hazel Spence, Alison Clubley, Emma O'Dowd, Matthew Clark, Victoria Ashford-Turner, Matthew Ej Callister, Fergus V Gleeson, David R Baldwin, Jennifer Gustafson, Lyndsey Pickup, Carlos Arteta, Petr Novotny, Jerome Declerck, Timor Kadir, Catarina Figueiras, Albert Sterba, Alan Exell, Vaclav Potesil, Paul Holland, Hazel Spence, Alison Clubley, Emma O'Dowd, Matthew Clark, Victoria Ashford-Turner, Matthew Ej Callister, Fergus V Gleeson

Abstract

Background: Estimation of the risk of malignancy in pulmonary nodules detected by CT is central in clinical management. The use of artificial intelligence (AI) offers an opportunity to improve risk prediction. Here we compare the performance of an AI algorithm, the lung cancer prediction convolutional neural network (LCP-CNN), with that of the Brock University model, recommended in UK guidelines.

Methods: A dataset of incidentally detected pulmonary nodules measuring 5-15 mm was collected retrospectively from three UK hospitals for use in a validation study. Ground truth diagnosis for each nodule was based on histology (required for any cancer), resolution, stability or (for pulmonary lymph nodes only) expert opinion. There were 1397 nodules in 1187 patients, of which 234 nodules in 229 (19.3%) patients were cancer. Model discrimination and performance statistics at predefined score thresholds were compared between the Brock model and the LCP-CNN.

Results: The area under the curve for LCP-CNN was 89.6% (95% CI 87.6 to 91.5), compared with 86.8% (95% CI 84.3 to 89.1) for the Brock model (p≤0.005). Using the LCP-CNN, we found that 24.5% of nodules scored below the lowest cancer nodule score, compared with 10.9% using the Brock score. Using the predefined thresholds, we found that the LCP-CNN gave one false negative (0.4% of cancers), whereas the Brock model gave six (2.5%), while specificity statistics were similar between the two models.

Conclusion: The LCP-CNN score has better discrimination and allows a larger proportion of benign nodules to be identified without missing cancers than the Brock model. This has the potential to substantially reduce the proportion of surveillance CT scans required and thus save significant resources.

Keywords: CT imaging; lung cancer; non-small cell lung cancer.

Conflict of interest statement

Competing interests: Several members of the authorship are employed by Optellum, the company that has developed the risk prediction artificial intelligence tool.

© Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Figures

Figure 1
Figure 1
Collection of the ideal retrospective dataset. AI, artificial intelligence, EDC, electronic data capture.
Figure 2
Figure 2
Receiver operating characteristic curves for the three centres and the full dataset. For each curve, the distance it follows along the upper horizontal axis is directly related to its ability to rule out benign nodules, and in all plots, the magenta curve for the LCP-CNN dominates that upper part of the plot. The LCP-CNN also approaches the y-axis at a higher sensitivity value than the Brock or diameter curves, indicating that at the high-specificity end (ie, ruling in cancers rather than ruling out benign nodules), the LCP-CNN also offers better stratification than the two simpler methods. AUC, area under the cuve; LCP-CNN, lung cancer prediction convolutional neural network.
Figure 3
Figure 3
Low-scoring cancer cases. (A) Woman aged 61 years (smoking status: ex-smoker) with a 7 mm cancer located in RUL, scoring 1.19 (Brock=3.50). The median HU value in the aortic arch is 37. (B) Man aged 61 years (smoking status: unknown) with a 10 mm cancer located in the lingula lobe, scoring 2.18 (Brock=5.83). The median HU value in the aortic arch is 135. (C) Man aged 67 years (smoking status: current smoker) with a 7 mm cancer located in RLL, scoring 2.55 (Brock=1.31). The median HU value in the aortic arch is 50. (D) Woman aged 71 years (smoking status: unknown) with a 7 mm cancer located in RLL, scoring 3.46 (Brock=2.26). The median HU value in the aortic arch is 217. CT appears not to be using a breath-hold protocol. The only cancer actually stratified into the ‘rule-out’ set is (A), possibly because of its atypical shape and smooth appearance. The cancer in (B) was not reimaged for another 2 years after this scan, and the patient’s lungs had several similar lesions that did not grow into cancers. For cases such as (D), reimaging the nodule with a standard breath-hold protocol would be expected to give a cleaner image on which the lung cancer prediction convolutional neural network yields a higher score. HU, Hounsfield unit; RLL, right lower lobe; RUL, right upper lobe.
Figure 4
Figure 4
Benign and cancer nodules of 8, 10 and 12 mm illustrating typical scoring behaviour of the LCP-CNN. (A) Woman aged 72 years (smoking status: ex-smoker) with a 8 mm benign nodule located in the lingula lobe, scoring 2.07 (Brock score 9.92). The median HU value in the aortic arch is 246. (B) Woman aged 75 years (smoking status: current) with an 8 mm cancer located in LUL, scoring 69.27 (Brock score 8.20). The median HU value in the aortic arch is 84. (C) Woman aged 77 years (smoking status: unknown) with a 10 mm benign nodule located in the left lower lobe, scoring 1.51 (Brock score 8.47). The median HU value in the aortic arch is 155. (D) Woman aged 83 years (smoking status: ex-smoker) with a 10 mm cancer located in LUL, scoring 82.54 (Brock score 31.49). The median HU value in the aortic arch is 53. (E) Man aged 65 years (smoking status: current) with a 12 mm benign nodule located in RUL, scoring 5.47 (Brock score 16.93). The median HU value in the aortic arch is 39. (F) Man aged 69 years (smoking status: ex-smoker) with a 12 mm cancer located in RUL, scoring 78.29 (Brock score 21.23). The median HU value in the aortic arch is 90. LCP-CNN, lung cancer prediction convolutional neural network. HU, Hounsfield unit; LUL, left upper lobe; RUL, right upper lobe.

References

    1. Callister MEJ, Baldwin DR, Akram AR, et al. . British thoracic Society guidelines for the investigation and management of pulmonary nodules. Thorax 2015;70(Suppl 2):ii1–54. 10.1136/thoraxjnl-2015-207168
    1. MacMahon H, Naidich DP, Goo JM, et al. . Guidelines for management of incidental pulmonary nodules detected on CT images: from the Fleischner Society 2017. Radiology 2017;284:228–43. 10.1148/radiol.2017161659
    1. Rami-Porta R, Bolejack V, Crowley J, et al. . The IASLC lung cancer staging project: proposals for the revisions of the T descriptors in the forthcoming eighth edition of the TNM classification for lung cancer. J Thorac Oncol 2015;10:990–1003. 10.1097/JTO.0000000000000559
    1. McWilliams A, Tammemagi MC, Mayo JR, et al. . Probability of cancer in pulmonary nodules detected on first screening CT. N Engl J Med 2013;369:910–9. 10.1056/NEJMoa1214726
    1. van Riel SJ, Ciompi F, Jacobs C, et al. . Malignancy risk estimation of screen-detected nodules at baseline CT: comparison of the PanCan model, Lung-RADS and NCCN guidelines. Eur Radiol 2017;27:4019–29. 10.1007/s00330-017-4767-2
    1. Winkler Wille MM, van Riel SJ, Saghir Z, et al. . Predictive accuracy of the PanCan lung cancer risk prediction model -external validation based on CT from the Danish lung cancer screening trial. Eur Radiol 2015;25:3093–9. 10.1007/s00330-015-3689-0
    1. Al-Ameri A, Malhotra P, Thygesen H, et al. . Risk of malignancy in pulmonary nodules: a validation study of four prediction models. Lung Cancer 2015;89:27–30. 10.1016/j.lungcan.2015.03.018
    1. Nair A, Baldwin DR, Field JK, et al. . Measurement methods and algorithms for the management of solid nodules. J Thorac Imaging 2012;27:230–9. 10.1097/RTI.0b013e31824f83e1
    1. Revel M-P, Bissery A, Bienvenu M, et al. . Are two-dimensional CT measurements of small noncalcified pulmonary nodules reliable? Radiology 2004;231:453–8. 10.1148/radiol.2312030167
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–44. 10.1038/nature14539
    1. Huang G, Liu Z, Lvd M. , eds Densely connected Convolutional networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. , 21-26 July 2017.
    1. Aberle DR, Adams AM, Berg CD, et al. . Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med 2011;365:395–409. 10.1056/NEJMoa1102873
    1. Oke JL, Pickup LC, Declerck J, et al. . Development and validation of clinical prediction models to risk stratify patients presenting with small pulmonary nodules: a research protocol. Diagn Progn Res 2018;2:22 10.1186/s41512-018-0044-3
    1. Efron BTR. An introduction to the bootstrap. New York: Chapman and Hall, 1993.
    1. Horeweg N, van Rosmalen J, Heuvelmans MA, et al. . Lung cancer probability in patients with CT-detected pulmonary nodules: a prespecified analysis of data from the Nelson trial of low-dose CT screening. Lancet Oncol 2014;15:1332–41. 10.1016/S1470-2045(14)70389-4

Source: PubMed

3
Subscribe