Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study
Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida, Jonas Teuwen, Mireille Broeders, Gisella Gennaro, Paola Clauser, Thomas H Helbich, Margarita Chevalier, Thomas Mertelmeier, Matthew G Wallis, Ingvar Andersson, Sophia Zackrisson, Ioannis Sechopoulos, Ritse M Mann, Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida, Jonas Teuwen, Mireille Broeders, Gisella Gennaro, Paola Clauser, Thomas H Helbich, Margarita Chevalier, Thomas Mertelmeier, Matthew G Wallis, Ingvar Andersson, Sophia Zackrisson, Ioannis Sechopoulos, Ritse M Mann
Abstract
Purpose: To study the feasibility of automatically identifying normal digital mammography (DM) exams with artificial intelligence (AI) to reduce the breast cancer screening reading workload.
Methods and materials: A total of 2652 DM exams (653 cancer) and interpretations by 101 radiologists were gathered from nine previously performed multi-reader multi-case receiver operating characteristic (MRMC ROC) studies. An AI system was used to obtain a score between 1 and 10 for each exam, representing the likelihood of cancer present. Using all AI scores between 1 and 9 as possible thresholds, the exams were divided into groups of low- and high likelihood of cancer present. It was assumed that, under the pre-selection scenario, only the high-likelihood group would be read by radiologists, while all low-likelihood exams would be reported as normal. The area under the reader-averaged ROC curve (AUC) was calculated for the original evaluations and for the pre-selection scenarios and compared using a non-inferiority hypothesis.
Results: Setting the low/high-likelihood threshold at an AI score of 5 (high likelihood > 5) results in a trade-off of approximately halving (- 47%) the workload to be read by radiologists while excluding 7% of true-positive exams. Using an AI score of 2 as threshold yields a workload reduction of 17% while only excluding 1% of true-positive exams. Pre-selection did not change the average AUC of radiologists (inferior 95% CI > - 0.05) for any threshold except at the extreme AI score of 9.
Conclusion: It is possible to automatically pre-select exams using AI to significantly reduce the breast cancer screening reading workload.
Key points: • There is potential to use artificial intelligence to automatically reduce the breast cancer screening reading workload by excluding exams with a low likelihood of cancer. • The exclusion of exams with the lowest likelihood of cancer in screening might not change radiologists' breast cancer detection performance. • When excluding exams with the lowest likelihood of cancer, the decrease in true-positive recalls would be balanced by a simultaneous reduction in false-positive recalls.
Keywords: Artificial intelligence; Breast cancer; Deep learning; Mammography; Screening.
Conflict of interest statement
The authors of this manuscript declare relationships with the following companies:
The authors KL, PC, TH, TM, SZ, IS, and RM of this manuscript declare relationships with Siemens Healthineers (Erlangen, Germany): TM is an employee, KL, PC, TH, SZ, IS, and RM received research grants.
The authors AR, AG, and RM declare relationships with ScreenPoint Medical BV (Nijmegen, Netherlands): AR and AG are employees, RM is an advisor.
Figures
References
- Smith RA, Cokkinides V, Brooks D, Saslow D, Brawley OW. Cancer screening in the United States, 2010: a review of current American Cancer Society guidelines and issues in cancer screening. CA Cancer J Clin. 2010;60:99–119. doi: 10.3322/caac.20063.
- Broeders M, Moss S, Nyström L, et al. The impact of mammographic screening on breast cancer mortality in Europe: a review of observational studies. J Med Screen. 2012;19:14–25. doi: 10.1258/jms.2012.012078.
- Independent UK Panel on Breast Cancer Screening The benefits and harms of breast cancer screening: an independent review. Lancet. 2012;380:1778–1786. doi: 10.1016/S0140-6736(12)61611-0.
- Welch HG, Prorok PC, O’Malley AJ, Kramer BS. Breast-cancer tumor size, overdiagnosis, and mammography screening effectiveness. N Engl J Med. 2016;375:1438–1447. doi: 10.1056/NEJMoa1600249.
- Breast Cancer Surveillance Consortium (BCSC) Performance measures for 1,838,372 screening mammography examinations from 2004 to 2008 by age–based on BCSC data through 2009. National Cancer Institute. Available via . Accessed 29 Sep 2017
- Brewer NT, Salz T, Lillie SE. Systematic review: the long-term effects of false-positive mammograms. Ann Intern Med. 2007;146:502–510. doi: 10.7326/0003-4819-146-7-200704030-00006.
- Karssemeijer N, Bluekens AM, Beijerinck D, et al. Breast cancer screening results 5 years after introduction of digital mammography in a population-based screening program. Radiology. 2009;253:353–358. doi: 10.1148/radiol.2532090225.
- Evans KK, Birdwell RL, Wolfe JM. If you don’t find it often, you often don’t find it: why some cancers are missed in breast cancer screening. PLoS One. 2013;8:e64366. doi: 10.1371/journal.pone.0064366.
- Huynh PT, Jarolimek AM, Daye S. The false-negative mammogram. Radiographics. 1998;18:1137–1154. doi: 10.1148/radiographics.18.5.9747612.
- Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007;356:1399–1409. doi: 10.1056/NEJMoa066099.
- Lehman CD, Wellman RD, Buist DS, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175:1828–1837. doi: 10.1001/jamainternmed.2015.5231.
- Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. doi: 10.1016/j.media.2017.07.005.
- Trister AD, Buist DSM, Lee CI (2017) Will machine learning tip the balance in breast cancer screening? JAMA Oncol. 10.1001/jamaoncol.2017.0473
- Rodriguez-Ruiz A, Lång K, Gubern-Merida A et al (2019) Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 10.1093/jnci/djy222
- Rodríguez-Ruiz A, Krupinski E, Mordang J-J et al (2018) Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 181371
- Rimmer A (2017) Radiologist shortage leaves patient care at risk, warns royal college. BMJ 359
- National Health Institutes England, Public Health England, British Society of Breast Radiology, Royal College of Radiologists (2017) The breast imaging and diagnostic workforce in the United Kingdom. Available via . Accessed 30 Dec 2018
- Wing P, Langelier MH. Workforce shortages in breast imaging: impact on mammography utilization. Am J Roentgenol. 2009;192:370–378. doi: 10.2214/AJR.08.1665.
- Wallis MG, Moa E, Zanca F, Leifland K, Danielsson M. Two-view and single-view tomosynthesis versus full-field digital mammography: high-resolution X-ray imaging observer study. Radiology. 2012;262:788–796. doi: 10.1148/radiol.11103514.
- Visser R, Veldkamp WJ, Beijerinck D, et al. Increase in perceived case suspiciousness due to local contrast optimisation in digital screening mammography. Eur Radiol. 2012;22:908–914. doi: 10.1007/s00330-011-2320-2.
- Hupse R, Samulski M, Lobbes MB, et al. Computer-aided detection of masses at mammography: interactive decision support versus prompts. Radiology. 2013;266:123–129. doi: 10.1148/radiol.12120218.
- Gennaro G, Hendrick RE, Ruppel P, et al. Performance comparison of single-view digital breast tomosynthesis plus single-view digital mammography with two-view digital mammography. Eur Radiol. 2013;23:664–672. doi: 10.1007/s00330-012-2649-1.
- Siemens Medical Solutions USA Inc (2015) FDA Application: Mammomat Inspiration with Digital Breast Tomosynthesis. Available via . Accessed March 3 2018
- Garayoa J, Chevalier M, Castillo M, et al. Diagnostic value of the stand-alone synthetic image in digital breast tomosynthesis examinations. Eur Radiol. 2018;28:565–572. doi: 10.1007/s00330-017-4991-9.
- Rodriguez-Ruiz A, Gubern-Merida A, Imhof-Tas M et al (2017) One-view digital breast tomosynthesis as a stand-alone modality for breast cancer detection: do we need more? Eur Radiol. 10.1007/s00330-017-5167-3
- Clauser P, Baltzer PA, Kapetas P, et al. Synthetic 2-dimensional mammography can replace digital mammography as an adjunct to wide-angle digital breast tomosynthesis. Invest Radiol. 2019;54:83–88. doi: 10.1097/RLI.0000000000000513.
- Blackwelder WC. “Proving the null hypothesis” in clinical trials. Control Clin Trials. 1982;3:345–353. doi: 10.1016/0197-2456(82)90024-1.
- Chen W, Petrick NA, Sahiner B. Hypothesis testing in noninferiority and equivalence MRMC ROC studies. Acad Radiol. 2012;19:1158–1165. doi: 10.1016/j.acra.2012.04.011.
- Gallas BD, Bandos A, Samuelson FW, Wagner RF. A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Commun Stat - Theory Methods. 2009;38:2586–2603. doi: 10.1080/03610920802610084.
- Gallas B (2017) iMRMC v4.0: application for analyzing and sizing MRMC reader studies. Division of imaging, diagnostics, and software reliability, OSEL/CDRH/FDA, Silver Spring, MD. Available via , . Accessed 30 Dec 2018
- Gennaro G (2018) The “perfect” reader study. Eur J Radiol In press
- Chen W, Gong Q, Gallas BD (2018) Efficiency gain of paired split-plot designs in MRMC ROC studies. Medical imaging 2018: image perception, observer performance, and technology assessment. International Society for Optics and Photonics, pp 105770F
- Gallas BD, Brown DG. Reader studies for validation of CAD systems. Neural Netw. 2008;21:387–397. doi: 10.1016/j.neunet.2007.12.013.
- Chen W, Samuelson FW. The average receiver operating characteristic curve in multireader multicase imaging studies. Br J Radiol. 2014;87:20140016. doi: 10.1259/bjr.20140016.
- Dang PA, Freer PE, Humphrey KL, Halpern EF, Rafferty EA. Addition of tomosynthesis to conventional digital mammography: effect on image interpretation time of screening examinations. Radiology. 2014;270:49–56. doi: 10.1148/radiol.13130765.
Source: PubMed