Clinical Validation of a Deep Learning Algorithm for Detection of Pneumonia on Chest Radiographs in Emergency Department Patients with Acute Febrile Respiratory Illness

Jae Hyun Kim, Jin Young Kim, Gun Ha Kim, Donghoon Kang, In Jung Kim, Jeongkuk Seo, Jason R Andrews, Chang Min Park, Jae Hyun Kim, Jin Young Kim, Gun Ha Kim, Donghoon Kang, In Jung Kim, Jeongkuk Seo, Jason R Andrews, Chang Min Park

Abstract

Early identification of pneumonia is essential in patients with acute febrile respiratory illness (FRI). We evaluated the performance and added value of a commercial deep learning (DL) algorithm in detecting pneumonia on chest radiographs (CRs) of patients visiting the emergency department (ED) with acute FRI. This single-centre, retrospective study included 377 consecutive patients who visited the ED and the resulting 387 CRs in August 2018-January 2019. The performance of a DL algorithm in detection of pneumonia on CRs was evaluated based on area under the receiver operating characteristics (AUROC) curves, sensitivity, specificity, negative predictive values (NPVs), and positive predictive values (PPVs). Three ED physicians independently reviewed CRs with observer performance test to detect pneumonia, which was re-evaluated with the algorithm eight weeks later. AUROC, sensitivity, and specificity measurements were compared between "DL algorithm" vs. "physicians-only" and between "physicians-only" vs. "physicians aided with the algorithm". Among 377 patients, 83 (22.0%) had pneumonia. AUROC, sensitivity, specificity, PPV, and NPV of the algorithm for detection of pneumonia on CRs were 0.861, 58.3%, 94.4%, 74.2%, and 89.1%, respectively. For the detection of 'visible pneumonia on CR' (60 CRs from 59 patients), AUROC, sensitivity, specificity, PPV, and NPV were 0.940, 81.7%, 94.4%, 74.2%, and 96.3%, respectively. In the observer performance test, the algorithm performed better than the physicians for pneumonia (AUROC, 0.861 vs. 0.788, p = 0.017; specificity, 94.4% vs. 88.7%, p < 0.0001) and visible pneumonia (AUROC, 0.940 vs. 0.871, p = 0.007; sensitivity, 81.7% vs. 73.9%, p = 0.034; specificity, 94.4% vs. 88.7%, p < 0.0001). Detection of pneumonia (sensitivity, 82.2% vs. 53.2%, p = 0.008; specificity, 98.1% vs. 88.7%; p < 0.0001) and 'visible pneumonia' (sensitivity, 82.2% vs. 73.9%, p = 0.014; specificity, 98.1% vs. 88.7%, p < 0.0001) significantly improved when the algorithm was used by the physicians. Mean reading time for the physicians decreased from 165 to 101 min with the assistance of the algorithm. Thus, the DL algorithm showed a better diagnosis of pneumonia, particularly visible pneumonia on CR, and improved diagnosis by ED physicians in patients with acute FRI.

Keywords: acute febrile respiratory illness; artificial intelligence; chest radiograph; deep learning algorithm; emergency department.

Conflict of interest statement

The authors declare no conflict of interest. Chang Min Park received research grants from the Lunit Inc. outside the present study.

Figures

Figure 1
Figure 1
Flow chart for the determination of reference standard. FRI = febrile respiratory illness, CR = chest radiograph, CT = computed tomography.
Figure 2
Figure 2
AUROCs of DL algorithm and ED physicians (pneumonia vs. non-pneumonia). (a) The DL algorithm showed significantly higher performance than that for ED physicians (0.861 vs. 0.788; p = 0.019). (b) ED physicians’ performance was improved after assistance with DL algorithm (0.788 vs. 0.816; p = 0.068). AUROC = area under the receiver operating characteristics curve, DL = deep learning, ED = emergency department.
Figure 3
Figure 3
AUROCs of DL algorithm and ED physicians (visible pneumonia on CR vs. non-pneumonia). (a) The DL algorithm showed significantly higher performance than that for ED physicians (0.940 vs. 0.871; p = 0.007). (b) ED physicians’ performance was significantly improved after assistance with DL algorithm (0.871 vs. 0.916; p = 0.002).
Figure 4
Figure 4
Representative case of the observer performance test. (a) The CR demonstrates patchy opacity in the left middle lung field (arrow), which was initially detected by only one of three observers. (b) The CT taken on the same day shows branching opacities and centrilobular nodules at the left upper lobe. (c) The DL algorithm correctly detected the lesion (probability score, 0.577). After assistance from the DL algorithm, all observers detected the lesion.
Figure 5
Figure 5
False positive interpretations of the DL algorithm. (a,b) The CR shows radio-opaque letters “ROK ARMY” (arrows) of the shirt at the left middle lung field. (c) The DL algorithm wrongly localised the radio-opaque letters (probability score, 0.348). (d) There is an accidentally included abdominal shield at the lower part of the CR. (e) The DL algorithm wrongly detected the abdominal shield (probability score, 0.684). None of the three observers identified these foreign bodies as lesions.

References

    1. Ferkol T., Schraufnagel D. The global burden of respiratory disease. Ann. Am. Thorac. Soc. 2014;11:404–406. doi: 10.1513/AnnalsATS.201311-405PS.
    1. Hooker E.A., Mallow P.J., Oglesby M.M. Characteristics and Trends of Emergency Department Visits in the United States (2010–2014) J. Emerg. Med. 2019;56:344–351. doi: 10.1016/j.jemermed.2018.12.025.
    1. Jokerst C., Chung J.H., Ackman J.B., Carter B., Colletti P.M., Crabtree T.D., de Groot P.M., Iannettoni M.D., Maldonado F., McComb B.L., et al. ACR Appropriateness Criteria((R)) Acute Respiratory Illness in Immunocompetent Patients. J. Am. Coll. Radiol. 2018;15:S240–S251. doi: 10.1016/j.jacr.2018.09.012.
    1. Sandrock C., Stollenwerk N. Acute febrile respiratory illness in the ICU: Reducing disease transmission. Chest. 2008;133:1221–1231. doi: 10.1378/chest.07-0778.
    1. Eng J., Mysko W.K., Weller G.E., Renard R., Gitlin J.N., Bluemke D.A., Magid D., Kelen G.D., Scott W.W., Jr. Interpretation of Emergency Department radiographs: A comparison of emergency medicine physicians with radiologists, residents with faculty, and film with digital display. AJR Am. J. Roentgenol. 2000;175:1233–1238. doi: 10.2214/ajr.175.5.1751233.
    1. Gatt M.E., Spectre G., Paltiel O., Hiller N., Stalnikowicz R. Chest radiographs in the emergency department: Is the radiologist really necessary? Postgrad. Med. J. 2003;79:214–217. doi: 10.1136/pmj.79.930.214.
    1. Al aseri Z. Accuracy of chest radiograph interpretation by emergency physicians. Emerg. Radiol. 2009;16:111–114. doi: 10.1007/s10140-008-0763-9.
    1. Petinaux B., Bhat R., Boniface K., Aristizabal J. Accuracy of radiographic readings in the emergency department. Am. J. Emerg. Med. 2011;29:18–25. doi: 10.1016/j.ajem.2009.07.011.
    1. Hwang E.J., Nam J.G., Lim W.H., Park S.J., Jeong Y.S., Kang J.H., Hong E.K., Kim T.M., Goo J.M., Park S., et al. Deep Learning for Chest Radiograph Diagnosis in the Emergency Department. Radiology. 2019;293:573–580. doi: 10.1148/radiol.2019191225.
    1. Rajkomar A., Dean J., Kohane I. Machine Learning in Medicine. N. Engl. J. Med. 2019;380:1347–1358. doi: 10.1056/NEJMra1814259.
    1. Gulshan V., Peng L., Coram M., Stumpe M.C., Wu D., Narayanaswamy A., Venugopalan S., Widner K., Madams T., Cuadros J., et al. Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs. JAMA. 2016;316:2402–2410. doi: 10.1001/jama.2016.17216.
    1. Ting D.S.W., Cheung C.Y., Lim G., Tan G.S.W., Quang N.D., Gan A., Hamzah H., Garcia-Franco R., San Yeo I.Y., Lee S.Y., et al. Development and Validation of a Deep Learning System for Diabetic Retinopathy and Related Eye Diseases Using Retinal Images From Multiethnic Populations With Diabetes. JAMA. 2017;318:2211–2223. doi: 10.1001/jama.2017.18152.
    1. Ehteshami Bejnordi B., Veta M., Johannes van Diest P., van Ginneken B., Karssemeijer N., Litjens G., van der Laak J., Hermsen M., Manson Q.F., Balkenhol M., et al. Diagnostic Assessment of Deep Learning Algorithms for Detection of Lymph Node Metastases in Women With Breast Cancer. JAMA. 2017;318:2199–2210. doi: 10.1001/jama.2017.14585.
    1. Nam J.G., Park S., Hwang E.J., Lee J.H., Jin K.N., Lim K.Y., Vu T.H., Sohn J.H., Hwang S., Goo J.M., et al. Development and Validation of Deep Learning-based Automatic Detection Algorithm for Malignant Pulmonary Nodules on Chest Radiographs. Radiology. 2019;290:218–228. doi: 10.1148/radiol.2018180237.
    1. Choi K.J., Jang J.K., Lee S.S., Sung Y.S., Shim W.H., Kim H.S., Yun J., Choi J.Y., Lee Y., Kang B.K., et al. Development and Validation of a Deep Learning System for Staging Liver Fibrosis by Using Contrast Agent-enhanced CT Images in the Liver. Radiology. 2018;289:688–697. doi: 10.1148/radiol.2018180763.
    1. Hwang E.J., Park S., Jin K.N., Kim J.I., Choi S.Y., Lee J.H., Goo J.M., Aum J., Yim J.J., Cohen J.G., et al. Development and Validation of a Deep Learning-Based Automated Detection Algorithm for Major Thoracic Diseases on Chest Radiographs. JAMA Netw. Open. 2019;2:e191095. doi: 10.1001/jamanetworkopen.2019.1095.
    1. DeLong E.R., DeLong D.M., Clarke-Pearson D.L. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;44:837–845. doi: 10.2307/2531595.
    1. Shi H., Han X., Jiang N., Cao Y., Alwalid O., Gu J., Fan Y., Zheng C. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: A descriptive study. Lancet Infect. Dis. 2020 doi: 10.1016/S1473-3099(20)30086-4.

Source: PubMed

3
S'abonner