Development and validation of a deep learning algorithm detecting 10 common abnormalities on chest radiographs

Ju Gang Nam, Minchul Kim, Jongchan Park, Eui Jin Hwang, Jong Hyuk Lee, Jung Hee Hong, Jin Mo Goo, Chang Min Park, Ju Gang Nam, Minchul Kim, Jongchan Park, Eui Jin Hwang, Jong Hyuk Lee, Jung Hee Hong, Jin Mo Goo, Chang Min Park

Abstract

We aimed to develop a deep learning algorithm detecting 10 common abnormalities (DLAD-10) on chest radiographs, and to evaluate its impact in diagnostic accuracy, timeliness of reporting and workflow efficacy.DLAD-10 was trained with 146 717 radiographs from 108 053 patients using a ResNet34-based neural network with lesion-specific channels for 10 common radiological abnormalities (pneumothorax, mediastinal widening, pneumoperitoneum, nodule/mass, consolidation, pleural effusion, linear atelectasis, fibrosis, calcification and cardiomegaly). For external validation, the performance of DLAD-10 on a same-day computed tomography (CT)-confirmed dataset (normal:abnormal 53:147) and an open-source dataset (PadChest; normal:abnormal 339:334) was compared with that of three radiologists. Separate simulated reading tests were conducted on another dataset adjusted to real-world disease prevalence in the emergency department, consisting of four critical, 52 urgent and 146 nonurgent cases. Six radiologists participated in the simulated reading sessions with and without DLAD-10.DLAD-10 exhibited area under the receiver operating characteristic curve values of 0.895-1.00 in the CT-confirmed dataset and 0.913-0.997 in the PadChest dataset. DLAD-10 correctly classified significantly more critical abnormalities (95.0% (57/60)) than pooled radiologists (84.4% (152/180); p=0.01). In simulated reading tests for emergency department patients, pooled readers detected significantly more critical (70.8% (17/24) versus 29.2% (7/24); p=0.006) and urgent (82.7% (258/312) versus 78.2% (244/312); p=0.04) abnormalities when aided by DLAD-10. DLAD-10 assistance shortened the mean±sd time-to-report critical and urgent radiographs (640.5±466.3 versus 3371.0±1352.5 s and 1840.3±1141.1 versus 2127.1±1468.2 s, respectively; all p<0.01) and reduced the mean±sd interpretation time (20.5±22.8 versus 23.5±23.7 s; p<0.001).DLAD-10 showed excellent performance, improving radiologists' performance and shortening the reporting time for critical and urgent cases.

Conflict of interest statement

Conflict of interest: J.G. Nam reports grants from the National Research Foundation of Korea funded by the Ministry of Science and ICT (NRF-2018R1A5A1060031), and from Seoul National University Hospital Research Fund (03-2019-0190), during the conduct of the study. Conflict of interest: M. Kim is an employee of Lunit Inc., and was involved in the development of the algorithm and writing the corresponding part of the manuscript, but did not have control over any of the validation data submitted for publication. Conflict of interest: J. Park is an employee of Lunit Inc., and was involved in the development of the algorithm and writing the corresponding part of the manuscript, but did not have control over any of the validation data submitted for publication. Conflict of interest: E.J. Hwang has nothing to disclose. Conflict of interest: J.H. Lee has nothing to disclose. Conflict of interest: J.H. Hong has nothing to disclose. Conflict of interest: J.M. Goo has nothing to disclose. Conflict of interest: C.M. Park reports grants from the National Research Foundation of Korea funded by the Ministry of Science and ICT (NRF-2018R1A5A1060031), and from Seoul National University Hospital Research Fund (03-2019-0190), during the conduct of the study.

Copyright ©ERS 2021.

Figures

FIGURE 1
FIGURE 1
Development and validation of DLAD-10. SNUH: Seoul National University Hospital; ILD: interstitial lung disease; CT: computed tomography. See main text and supplementary figure E1 for details of the training stage.
FIGURE 2
FIGURE 2
Examples of DLAD-10 output. a) Each of 10 possible abnormalities was localised and displayed with its probability score. Urgency categorisation was performed based on the most urgent abnormality. This image was categorised as critical as it contained pneumothorax (Ptx) (in addition to nodule (Ndl) and pleural effusion (PEf)). b) A 47-year-old female patient visited the emergency department complaining of vague chest pain. A small pneumoperitoneum (Ppm) was detected by DLAD-10, while no readers detected the lesion in the conventional reading session. In the DLAD-10-aided reading session, all readers detected pneumoperitoneum. c) A 24-year-old male patient visited the emergency department due to left chest pain. A small left pneumothorax (Ptx) was detected by DLAD-10. Three readers reported pneumothorax in the conventional reading session and all six readers reported it in the DLAD-10-aided reading session. The arrows on the computed tomography scans in b) and c) indicate the corresponding abnormalities visualised on the chest radiographs.
FIGURE 3
FIGURE 3
Results of DLAD-10 and three thoracic radiologists for the Seoul National University Hospital external validation dataset: the area under the receiver operating characteristic curve (AUROC) of DLAD-10 and the performance of each radiologist are presented for each abnormality. a) Pneumothorax, b) pneumoperitoneum, c) mediastinal widening, d) nodule, e) consolidation, f) pleural effusion, g) atelectasis or fibrosis, h) calcification and i) cardiomegaly.

References

    1. Mettler FA Jr, Mahesh M, Bhargavan-Chatfield M, et al. . Patient exposure from radiologic and nuclear medicine procedures in the United States: procedure volume and effective dose for the period 2006–2016. Radiology 2020; 295: 418–427. doi:10.1148/radiol.2020192256
    1. United Nations Scientific Committee on the Effects of Atomic Radiation. Sources and Effects of Ionizing Radiation. Annex D. New York, United Nations, 2000.
    1. White CS, Flukinger T, Jeudy J, et al. . Use of a computer-aided detection system to detect missed lung cancer at chest radiography. Radiology 2009; 252: 273–281. doi:10.1148/radiol.2522081319
    1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019; 25: 44–56. doi:10.1038/s41591-018-0300-7
    1. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 2017; 284: 574–582. doi:10.1148/radiol.2017162326
    1. Nam JG, Park S, Hwang EJ, et al. . Development and validation of deep learning-based automatic detection algorithm for malignant pulmonary nodules on chest radiographs. Radiology 2019; 290: 218–228. doi:10.1148/radiol.2018180237
    1. Hwang EJ, Park S, Jin K-N, et al. . Development and validation of a deep learning-based automatic detection algorithm for active pulmonary tuberculosis on chest radiographs. Clin Infect Dis 2019; 69: 739–747. doi:10.1093/cid/ciy967
    1. Park S, Lee SM, Kim N, et al. . Application of deep learning-based computer-aided detection system: detecting pneumothorax on chest radiograph after biopsy. Eur Radiol 2019; 29: 5341–5348. doi:10.1007/s00330-019-06130-x
    1. Hwang EJ, Hong JH, Lee KH, et al. . Deep learning algorithm for surveillance of pneumothorax after lung biopsy: a multicenter diagnostic cohort study. Eur Radiol 2020; 30: 3660–3671. doi:10.1007/s00330-020-06771-3
    1. Park S, Lee SM, Lee KH, et al. . Deep learning-based detection system for multiclass lesions on chest radiographs: comparison with observer readings. Eur Radiol 2020; 30: 1359–1368. doi:10.1007/s00330-019-06532-x
    1. Hwang EJ, Park S, Jin K-N, et al. . Development and validation of a deep learning-based automated detection algorithm for major thoracic diseases on chest radiographs. JAMA Netw Open 2019; 2: e191095. doi:10.1001/jamanetworkopen.2019.1095
    1. Rajpurkar P, Irvin J, Ball RL, et al. . Deep learning for chest radiograph diagnosis: a retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med 2018; 15: e1002686. doi:10.1371/journal.pmed.1002686
    1. Rajpurkar P, Irvin J, Zhu K, et al. . Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv 2017; preprint [].
    1. Ferkol T, Schraufnagel D. The global burden of respiratory disease. Ann Am Thorac Soc 2014; 11: 404–406. doi:10.1513/AnnalsATS.201311-405PS
    1. Hansell DM, Bankier AA, MacMahon H, et al. . Fleischner Society: glossary of terms for thoracic imaging. Radiology 2008; 246: 697–722. doi:10.1148/radiol.2462070712
    1. Lai V, Tsang WK, Chan WC, et al. . Diagnostic accuracy of mediastinal width measurement on posteroanterior and anteroposterior chest radiographs in the depiction of acute nontraumatic thoracic aortic dissection. Emerg Radiol 2012; 19: 309–315. doi:10.1007/s10140-012-1034-3
    1. He K, Zhang X, Ren S, et al. . Deep residual learning for image recognition. 2016. Date last accessed: 13 November 2020.
    1. Kim M, Park J, Na S, et al. . Learning visual context by comparison. arXiv 2020; preprint [].
    1. Cubuk ED, Zoph B, Mane D, et al. . Autoaugment: learning augmentation strategies from data. 2019. Date last accessed: 13 November 2020.
    1. Annarumma M, Withey SJ, Bakewell RJ, et al. . Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 2019; 291: 196–202. doi:10.1148/radiol.2018180921
    1. Hemingway H, Shipley M, Christie D, et al. . Is cardiothoracic ratio in healthy middle aged men an independent predictor of coronary heart disease mortality? Whitehall study 25 year follow up. BMJ 1998; 316: 1353–1354. doi:10.1136/bmj.316.7141.1353
    1. Zaman MJS, Sanders J, Crook AM, et al. . Cardiothoracic ratio within the “normal” range independently predicts mortality in patients undergoing coronary angiography. Heart 2007; 93: 491–494. doi:10.1136/hrt.2006.101238
    1. Bustos A, Pertusa A, Salinas J-M, et al. . PadChest: a large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis 2020; 66: 101797. doi:10.1016/j.media.2020.101797
    1. Hwang EJ, Nam JG, Lim WH, et al. . Deep learning for chest radiograph diagnosis in the emergency department. Radiology 2019; 293: 573–580. doi:10.1148/radiol.2019191225
    1. Raven MC, Lowe RA, Maselli J, et al. . Comparison of presenting complaint vs discharge diagnosis for identifying “nonemergency” emergency department visits. JAMA 2013; 309: 1145–1153. doi:10.1001/jama.2013.1948
    1. Youden WJ. Index for rating diagnostic tests. Cancer 1950; 3: 32–35. doi:10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>;2-3
    1. Pedregosa F, Varoquaux G, Gramfort A, et al. . Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825–2830.
    1. Majkowska A, Mittal S, Steiner DF, et al. . Chest radiograph interpretation with deep learning models: assessment with radiologist-adjudicated reference standards and population-adjusted evaluation. Radiology 2020; 294: 421–431. doi:10.1148/radiol.2019191293
    1. McBee MP, Awan OA, Colucci AT, et al. . Deep learning in radiology. Acad Radiol 2018; 25: 1472–1480. doi:10.1016/j.acra.2018.02.018

Source: PubMed

Подписаться