Performance of a deep-learning algorithm for referable thoracic abnormalities on chest radiographs: A multicenter study of a health screening cohort

Eun Young Kim, Young Jae Kim, Won-Jun Choi, Gi Pyo Lee, Ye Ra Choi, Kwang Nam Jin, Young Jun Cho, Eun Young Kim, Young Jae Kim, Won-Jun Choi, Gi Pyo Lee, Ye Ra Choi, Kwang Nam Jin, Young Jun Cho

Abstract

Purpose: This study evaluated the performance of a commercially available deep-learning algorithm (DLA) (Insight CXR, Lunit, Seoul, South Korea) for referable thoracic abnormalities on chest X-ray (CXR) using a consecutively collected multicenter health screening cohort.

Methods and materials: A consecutive health screening cohort of participants who underwent both CXR and chest computed tomography (CT) within 1 month was retrospectively collected from three institutions' health care clinics (n = 5,887). Referable thoracic abnormalities were defined as any radiologic findings requiring further diagnostic evaluation or management, including DLA-target lesions of nodule/mass, consolidation, or pneumothorax. We evaluated the diagnostic performance of the DLA for referable thoracic abnormalities using the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, and specificity using ground truth based on chest CT (CT-GT). In addition, for CT-GT-positive cases, three independent radiologist readings were performed on CXR and clear visible (when more than two radiologists called) and visible (at least one radiologist called) abnormalities were defined as CXR-GTs (clear visible CXR-GT and visible CXR-GT, respectively) to evaluate the performance of the DLA.

Results: Among 5,887 subjects (4,329 males; mean age 54±11 years), referable thoracic abnormalities were found in 618 (10.5%) based on CT-GT. DLA-target lesions were observed in 223 (4.0%), nodule/mass in 202 (3.4%), consolidation in 31 (0.5%), pneumothorax in one 1 (<0.1%), and DLA-non-target lesions in 409 (6.9%). For referable thoracic abnormalities based on CT-GT, the DLA showed an AUC of 0.771 (95% confidence interval [CI], 0.751-0.791), a sensitivity of 69.6%, and a specificity of 74.0%. Based on CXR-GT, the prevalence of referable thoracic abnormalities decreased, with visible and clear visible abnormalities found in 405 (6.9%) and 227 (3.9%) cases, respectively. The performance of the DLA increased significantly when using CXR-GTs, with an AUC of 0.839 (95% CI, 0.829-0.848), a sensitivity of 82.7%, and s specificity of 73.2% based on visible CXR-GT and an AUC of 0.872 (95% CI, 0.863-0.880, P <0.001 for the AUC comparison of GT-CT vs. clear visible CXR-GT), a sensitivity of 83.3%, and a specificity of 78.8% based on clear visible CXR-GT.

Conclusion: The DLA provided fair-to-good stand-alone performance for the detection of referable thoracic abnormalities in a multicenter consecutive health screening cohort. The DLA showed varied performance according to the different methods of ground truth.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1. Flow chart of the study…
Fig 1. Flow chart of the study population.
Fig 2. The prevalence of normal cases,…
Fig 2. The prevalence of normal cases, and target and non-target lesions of a deep-learning algorithm (DLA) showing significant differences between the three institutions.
Institution G has fewer normal cases and more DLA-non-target lesions compared to those of the other two institutions.
Fig 3. Receiver operating characteristic curve (ROC)…
Fig 3. Receiver operating characteristic curve (ROC) curve of a deep-learning algorithm (DLA) for referable thoracic abnormalities on chest radiography based on different standard reference methods.
The area under the ROC curve (AUC) shows better performance when using visible and clear visible CXR compared to using CT as ground truth methods, except for institution G (C).

References

    1. Tigges S, Roberts DL, Vydareny KH, Schulman DA. Routine chest radiography in a primary care setting. Radiology. 2004;233(2):575–8. 10.1148/radiol.2332031796 WOS:000224650400036.
    1. Shin DW, Cho B, Guallar E. Korean National Health Insurance Database. JAMA Intern Med. 2016;176(1):138 10.1001/jamainternmed.2015.7110 .
    1. Donald JJ, Barnard SA. Common patterns in 558 diagnostic radiology errors. J Med Imaging Radiat Oncol. 2012;56(2):173–8. 10.1111/j.1754-9485.2012.02348.x .
    1. Malhotra P, Gupta S, Koundal D. Computer Aided Diagnosis of Pneumonia from Chest Radiographs. Journal of Computational and Theoretical Nanoscience. 2019;16(10):4202–13.
    1. Oliveira LL, Silva SA, Ribeiro LH, de Oliveira RM, Coelho CJ, AL SA. Computer-aided diagnosis in chest radiography for detection of childhood pneumonia. Int J Med Inform. 2008;77(8):555–64. Epub 2007/12/11. 10.1016/j.ijmedinf.2007.10.010 .
    1. Omar H, Babalık A. Detection of Pneumonia from X-Ray Images using Convolutional Neural Network. Proceedings Book; 2019:183.
    1. Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, et al. Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs. JAMA Netw Open. 2019;2(3):e191095 10.1001/jamanetworkopen.2019.1095
    1. Rajpurkar P, Irvin J, Ball RL, Zhu K, Yang B, Mehta H, et al. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists. PLoS Med. 2018;15(11):e1002686 10.1371/journal.pmed.1002686
    1. Nitta J, Nakao M, Imanishi K, Matsuda T. Deep Learning Based Lung Region Segmentation with Data Preprocessing by Generative Adversarial Nets. Annu Int Conf IEEE Eng Med Biol Soc. 2020;2020:1278–81. 10.1109/EMBC44109.2020.9176214 .
    1. Portela RDS, Pereira JRG, Costa MGF, Filho C. Lung Region Segmentation in Chest X-Ray Images using Deep Convolutional Neural Networks. Annu Int Conf IEEE Eng Med Biol Soc. 2020;2020:1246–9. 10.1109/EMBC44109.2020.9175478 .
    1. Nam JG, Park S, Hwang EJ, Lee JH, Jin KN, Lim KY, et al. Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology. 2019;290(1):218–28. 10.1148/radiol.2018180237 .
    1. Johnson AEW, Pollard TJ, Berkowitz SJ, Greenbaum NR, Lungren MP, Deng CY, et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci Data. 2019;6(1):317 10.1038/s41597-019-0322-0
    1. Hansell DM, Bankier AA, MacMahon H, McLoud TC, Muller NL, Remy J. Fleischner Society: glossary of terms for thoracic imaging. Radiology. 2008;246(3):697–722. 10.1148/radiol.2462070712 .
    1. The international conference for the tenth revision of the International Classification of Diseases. Strengthening of Epidemiological and Statistical Services Unit. World Health Organization, Geneva. World Health Stat Q. 1990;43(4):204–45. .
    1. Hwang EJ, Park S, Jin KN, Kim JI, Choi SY, Lee JH, et al. Development and Validation of a Deep Learning-based Automatic Detection Algorithm for Active Pulmonary Tuberculosis on Chest Radiographs. Clin Infect Dis. 2019;69(5):739–47. 10.1093/cid/ciy967
    1. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016; 2016. p. 770–778.
    1. Kim M, Park J, Na S, Park CM, Yoo D. Learning Visual Context by Comparison. arXiv preprint arXiv:200707506 2020.
    1. Cubuk ED, Zoph B, Mane D, Vasudevan V, Le QV. Autoaugment: Learning augmentation strategies from data. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2019; 2019. p. 113–123.

Source: PubMed

3
S'abonner