Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs

Eui Jin Hwang, Sunggyun Park, Kwang-Nam Jin, Jung Im Kim, So Young Choi, Jong Hyuk Lee, Jin Mo Goo, Jaehong Aum, Jae-Joon Yim, Julien G Cohen, Gilbert R Ferretti, Chang Min Park, DLAD Development and Evaluation Group, Eui Jin Hwang, Sunggyun Park, Kwang-Nam Jin, Jung Im Kim, So Young Choi, Jong Hyuk Lee, Jin Mo Goo, Jaehong Aum, Jae-Joon Yim, Julien G Cohen, Gilbert R Ferretti, Chang Min Park, DLAD Development and Evaluation Group

Abstract

Importance: Interpretation of chest radiographs is a challenging task prone to errors, requiring expert readers. An automated system that can accurately classify chest radiographs may help streamline the clinical workflow.

Objectives: To develop a deep learning-based algorithm that can classify normal and abnormal results from chest radiographs with major thoracic diseases including pulmonary malignant neoplasm, active tuberculosis, pneumonia, and pneumothorax and to validate the algorithm's performance using independent data sets.

Design, setting, and participants: This diagnostic study developed a deep learning-based algorithm using single-center data collected between November 1, 2016, and January 31, 2017. The algorithm was externally validated with multicenter data collected between May 1 and July 31, 2018. A total of 54 221 chest radiographs with normal findings from 47 917 individuals (21 556 men and 26 361 women; mean [SD] age, 51 [16] years) and 35 613 chest radiographs with abnormal findings from 14 102 individuals (8373 men and 5729 women; mean [SD] age, 62 [15] years) were used to develop the algorithm. A total of 486 chest radiographs with normal results and 529 with abnormal results (1 from each participant; 628 men and 387 women; mean [SD] age, 53 [18] years) from 5 institutions were used for external validation. Fifteen physicians, including nonradiology physicians, board-certified radiologists, and thoracic radiologists, participated in observer performance testing. Data were analyzed in August 2018.

Exposures: Deep learning-based algorithm.

Main outcomes and measures: Image-wise classification performances measured by area under the receiver operating characteristic curve; lesion-wise localization performances measured by area under the alternative free-response receiver operating characteristic curve.

Results: The algorithm demonstrated a median (range) area under the curve of 0.979 (0.973-1.000) for image-wise classification and 0.972 (0.923-0.985) for lesion-wise localization; the algorithm demonstrated significantly higher performance than all 3 physician groups in both image-wise classification (0.983 vs 0.814-0.932; all P < .005) and lesion-wise localization (0.985 vs 0.781-0.907; all P < .001). Significant improvements in both image-wise classification (0.814-0.932 to 0.904-0.958; all P < .005) and lesion-wise localization (0.781-0.907 to 0.873-0.938; all P < .001) were observed in all 3 physician groups with assistance of the algorithm.

Conclusions and relevance: The algorithm consistently outperformed physicians, including thoracic radiologists, in the discrimination of chest radiographs with major thoracic diseases, demonstrating its potential to improve the quality and efficiency of clinical practice.

Conflict of interest statement

Conflict of Interest Disclosures: Dr Goo reported grants from Lunit Inc during the conduct of the study. Dr Ferretti reported personal fees from Boehringer, Roche, Bristol-Myers Squibb, and GEMS and nonfinancial support from Guerbet outside the submitted work. No other disclosures were reported.

Figures

Figure 1.. Results of External Validation Tests…
Figure 1.. Results of External Validation Tests and Observer Performance Tests
The deep learning–based automatic detection algorithm (DLAD) showed consistently high image-wise classification (area under the receiver operating characteristic curve [AUROC], 0.973-1.000) (A) and lesion-wise localization (area under the alternative free-response receiver operating characteristic curve [AUAFROC], 0.923-0.985) (B) performances in external validation tests. In comparison of performance with physicians, the DLAD showed significantly high classification (AUROC, 0.983 vs 0.814-0.932) (C) and localization (AUAFROC, 0.985 vs 0.781-0.907) (D) performances than all observer groups.
Figure 2.. Representative Case From the Observer…
Figure 2.. Representative Case From the Observer Performance Test (Malignant Neoplasm)
A, The chest radiograph (CR) shows nodular opacity at the right lower lung field (arrowhead), which was initially detected by 2 of 15 observers. B, The corresponding computed tomographic (CT) image reveals a nodule at the right middle lobe. C, The deep learning–based automatic detection algorithm (DLAD) correctly localized the lesion (probability score, 0.291). Four observers additionally detected the lesion after checking the output.
Figure 3.. Representative Case From the Observer…
Figure 3.. Representative Case From the Observer Performance Test (Pneumonia)
A, The chest radiograph (CR) shows subtle patchy increased opacity at the left middle lung field, which was initially missed by all 15 observers. B, The corresponding computed tomographic (CT) image shows patchy ground glass opacity at the left upper lobe. C, The deep learning–based automatic detection algorithm (DLAD) correctly localized the lesion (probability score, 0.371). Seven observers correctly detected the lesion after checking the result.

References

    1. McComb BL, Chung JH, Crabtree TD, et al. ; Expert Panel on Thoracic Imaging . ACR Appropriateness Criteria® Routine Chest Radiography. J Thorac Imaging. 2016;31(2):. doi:10.1097/RTI.0000000000000200
    1. Speets AM, van der Graaf Y, Hoes AW, et al. . Chest radiography in general practice: indications, diagnostic yield and consequences for patient management. Br J Gen Pract. 2006;56(529):574-.
    1. United Nations Scientific Committee on the Effects of Atomic Radiation Sources and Effects of Ionizing Radiation: UNSCEAR 2008 Report. Vol 1 New York, NY: United Nations; 2010.
    1. Coche EE, Ghaye B, de Mey J, Duyck P, eds. Comparative Interpretation of CT and Standard Radiography of the Chest. New York, NY: Springer Science & Business Media; 2011. doi:10.1007/978-3-540-79942-9
    1. Donald JJ, Barnard SA. Common patterns in 558 diagnostic radiology errors. J Med Imaging Radiat Oncol. 2012;56(2):173-178. doi:10.1111/j.1754-9485.2012.02348.x
    1. Nakajima Y, Yamada K, Imamura K, Kobayashi K; Working Group of Japanese College of Radiology . Radiologist supply and workload: international comparison. Radiat Med. 2008;26(8):455-465. doi:10.1007/s11604-008-0259-2
    1. van Ginneken B, ter Haar Romeny BM, Viergever MA. Computer-aided diagnosis in chest radiography: a survey. IEEE Trans Med Imaging. 2001;20(12):1228-1241. doi:10.1109/42.974918
    1. Lee KH, Goo JM, Park CM, Lee HJ, Jin KN. Computer-aided detection of malignant lung nodules on chest radiographs: effect on observers’ performance. Korean J Radiol. 2012;13(5):564-571. doi:10.3348/kjr.2012.13.5.564
    1. Mazzone PJ, Obuchowski N, Phillips M, Risius B, Bazerbashi B, Meziane M. Lung cancer screening with computer aided detection chest radiography: design and results of a randomized, controlled trial. PLoS One. 2013;8(3):e59650. doi:10.1371/journal.pone.0059650
    1. Pande T, Cohen C, Pai M, Ahmad Khan F. Computer-aided detection of pulmonary tuberculosis on digital chest radiographs: a systematic review. Int J Tuberc Lung Dis. 2016;20(9):1226-1230. doi:10.5588/ijtld.15.0926
    1. Rahman MT, Codlin AJ, Rahman MM, et al. . An evaluation of automated chest radiography reading software for tuberculosis screening among public- and private-sector patients. Eur Respir J. 2017;49(5):1602159. doi:10.1183/13993003.02159-2016
    1. Sanada S, Doi K, MacMahon H. Image feature analysis and computer-aided diagnosis in digital radiography: automated detection of pneumothorax in chest images. Med Phys. 1992;19(5):1153-1160. doi:10.1118/1.596790
    1. Schalekamp S, van Ginneken B, Karssemeijer N, Schaefer-Prokop CM. Chest radiography: new technological developments and their applications. Semin Respir Crit Care Med. 2014;35(1):3-16. doi:10.1055/s-0033-1363447
    1. Gulshan V, Peng L, Coram M, et al. . Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316(22):2402-2410. doi:10.1001/jama.2016.17216
    1. Esteva A, Kuprel B, Novoa RA, et al. . Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542(7639):115-118. doi:10.1038/nature21056
    1. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, et al. ; CAMELYON16 Consortium . Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318(22):2199-2210. doi:10.1001/jama.2017.14585
    1. Li Z, Wang C, Han M, et al. Thoracic disease identification and localization with limited supervision. Paper presented at: 2018 IEEE Conference on Computer Vision and Pattern Recognition; June 21, 2018; Salt Lake City, UT. . Accessed February 12, 2019.
    1. Kermany DS, Goldbaum M, Cai W, et al. . Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell. 2018;172(5):1122-1131. doi:10.1016/j.cell.2018.02.010
    1. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284(2):574-582. doi:10.1148/radiol.2017162326
    1. Nam JG, Park S, Hwang EJ, et al. . Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology. 2019;290(1):218-228. doi:10.1148/radiol.2018180237
    1. Hwang EJ, Park S, Jin KN, et al. ; DLAD Development and Evaluation Group . Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs. [published online November 12, 2018]. Clin Infect Dis. doi:10.1093/cid/ciy967
    1. Ferkol T, Schraufnagel D. The global burden of respiratory disease. Ann Am Thorac Soc. 2014;11(3):404-406. doi:10.1513/AnnalsATS.201311-405PS
    1. Forum of International Respiratory Societies The Global Impact of Respiratory Disease. 2nd ed Sheffield, UK: European Respiratory Society; 2017.
    1. MacDuff A, Arnold A, Harvey J; BTS Pleural Disease Guideline Group . Management of spontaneous pneumothorax: British Thoracic Society Pleural Disease Guideline 2010. Thorax. 2010;65(suppl 2):ii18-ii31. doi:10.1136/thx.2010.136986
    1. Baumann MH, Strange C, Heffner JE, et al. ; AACP Pneumothorax Consensus Group . Management of spontaneous pneumothorax: an American College of Chest Physicians Delphi consensus statement. Chest. 2001;119(2):590-602. doi:10.1378/chest.119.2.590
    1. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ Densely connected convolutional networks. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition; July 23, 2017; Honolulu, HI. . Accessed February 12, 2019.
    1. R: a language and environment for statistical computing [computer program]. Version 3.4.3. Vienna, Austria: R Foundation for Statistical Computing; 2017.
    1. Zhai X, Chakraborty D. RJafroc: Analysis of data acquired using the receiver operating characteristic paradigm and its extensions. . Published May 14, 2015. Accessed February 12, 2019.
    1. Chakraborty DP, Zhai X. Analysis of data acquired using ROC paradigm and its extensions. . Published May 14, 2015. Accessed February 12, 2019.
    1. Dorfman DD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol. 1992;27(9):723-731. doi:10.1097/00004424-199209000-00015
    1. Fletcher JG, Yu L, Li Z, et al. . Observer performance in the detection and classification of malignant hepatic nodules and masses with CT image-space denoising and iterative reconstruction. Radiology. 2015;276(2):465-478. doi:10.1148/radiol.2015141991
    1. Story M, Congalton RG. Accuracy assessment: a user’s perspective. Photogramm Eng Remote Sensing. 1986;52(3):397-399.
    1. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat. 1979;6(2):65-70.
    1. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L Imagenet: a large-scale hierarchical image database. Paper presented at: 2009 IEEE Conference on Computer Vision and Pattern Recognition; June 22, 2009; Miami, FL. . Accessed February 12, 2019.
    1. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM Chestx-ray8: hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. Paper presented at: 2017 IEEE Conference on Computer Vision and Pattern Recognition; July 23, 2017; Honolulu, HI. . Accessed February 12, 2019.
    1. Jaeger S, Candemir S, Antani S, Wáng YX, Lu PX, Thoma G. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant Imaging Med Surg. 2014;4(6):475-477.
    1. Liu C, Mao J, Sha F, Yuille AL Attention correctness in neural image captioning. Paper presented at: Thirty-first AAAI Conference on Artificial Intelligence; February 7, 2017; San Francisco, CA. . Accessed February 12, 2019.
    1. Samek W, Wiegand T, Müller K-R Explainable artificial intelligence: understanding, visualizing and interpreting deep learning models. . Published August 28, 2017. Accessed February 12, 2019.
    1. Onuki T, Ueda S, Yamaoka M, et al. . Primary and secondary spontaneous pneumothorax: prevalence, clinical features, and in-hospital mortality. Can Respir J. 2017;2017:6014967. doi:10.1155/2017/6014967

Source: PubMed

3
S'abonner