Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study

Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida, Jonas Teuwen, Mireille Broeders, Gisella Gennaro, Paola Clauser, Thomas H Helbich, Margarita Chevalier, Thomas Mertelmeier, Matthew G Wallis, Ingvar Andersson, Sophia Zackrisson, Ioannis Sechopoulos, Ritse M Mann, Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida, Jonas Teuwen, Mireille Broeders, Gisella Gennaro, Paola Clauser, Thomas H Helbich, Margarita Chevalier, Thomas Mertelmeier, Matthew G Wallis, Ingvar Andersson, Sophia Zackrisson, Ioannis Sechopoulos, Ritse M Mann

Abstract

Purpose: To study the feasibility of automatically identifying normal digital mammography (DM) exams with artificial intelligence (AI) to reduce the breast cancer screening reading workload.

Methods and materials: A total of 2652 DM exams (653 cancer) and interpretations by 101 radiologists were gathered from nine previously performed multi-reader multi-case receiver operating characteristic (MRMC ROC) studies. An AI system was used to obtain a score between 1 and 10 for each exam, representing the likelihood of cancer present. Using all AI scores between 1 and 9 as possible thresholds, the exams were divided into groups of low- and high likelihood of cancer present. It was assumed that, under the pre-selection scenario, only the high-likelihood group would be read by radiologists, while all low-likelihood exams would be reported as normal. The area under the reader-averaged ROC curve (AUC) was calculated for the original evaluations and for the pre-selection scenarios and compared using a non-inferiority hypothesis.

Results: Setting the low/high-likelihood threshold at an AI score of 5 (high likelihood > 5) results in a trade-off of approximately halving (- 47%) the workload to be read by radiologists while excluding 7% of true-positive exams. Using an AI score of 2 as threshold yields a workload reduction of 17% while only excluding 1% of true-positive exams. Pre-selection did not change the average AUC of radiologists (inferior 95% CI > - 0.05) for any threshold except at the extreme AI score of 9.

Conclusion: It is possible to automatically pre-select exams using AI to significantly reduce the breast cancer screening reading workload.

Key points: • There is potential to use artificial intelligence to automatically reduce the breast cancer screening reading workload by excluding exams with a low likelihood of cancer. • The exclusion of exams with the lowest likelihood of cancer in screening might not change radiologists' breast cancer detection performance. • When excluding exams with the lowest likelihood of cancer, the decrease in true-positive recalls would be balanced by a simultaneous reduction in false-positive recalls.

Keywords: Artificial intelligence; Breast cancer; Deep learning; Mammography; Screening.

Conflict of interest statement

The authors of this manuscript declare relationships with the following companies:

The authors KL, PC, TH, TM, SZ, IS, and RM of this manuscript declare relationships with Siemens Healthineers (Erlangen, Germany): TM is an employee, KL, PC, TH, SZ, IS, and RM received research grants.

The authors AR, AG, and RM declare relationships with ScreenPoint Medical BV (Nijmegen, Netherlands): AR and AG are employees, RM is an advisor.

Figures

**Fig. 1**
Distribution of normal (a), cancer (b), and benign exams (c) as a function of AI score, representing the likelihood of cancer present (1–10, 10 means high likelihood of cancer present). The contribution of each dataset to the overall percentage of exams is shown

**Fig. 2**
An example of the nine exams in our study that contained cancer but were assigned an AI score of 1 or 2, the lowest cancer-present likelihood categories. None of the 6 radiologists recalled this exam during the original MRMC study (read without priors), suggesting that the cancer visibility with mammography is poor in these exams (and in fact, the cancer may have been detected by other means)

**Fig. 3**
Proportion (%) of exams that would be excluded from the final sample to be evaluated by the radiologists, using all possible AI scores as thresholds values for pre-selection for reading

**Fig. 4**
ROC curves (a) and change (b) in AUC values of the average of radiologists in the original population, as well as in all possible pre-selected populations (using all possible AI scores as threshold values for pre-selection for reading; if the case is not pre-selected, the radiologist score is converted to the lowest possible cancer suspicion score for the MRMC study). 95% confidence intervals are Bonferroni-corrected

References

1. Smith RA, Cokkinides V, Brooks D, Saslow D, Brawley OW. Cancer screening in the United States, 2010: a review of current American Cancer Society guidelines and issues in cancer screening. CA Cancer J Clin. 2010;60:99–119. doi: 10.3322/caac.20063.
1. Broeders M, Moss S, Nyström L, et al. The impact of mammographic screening on breast cancer mortality in Europe: a review of observational studies. J Med Screen. 2012;19:14–25. doi: 10.1258/jms.2012.012078.
1. Independent UK Panel on Breast Cancer Screening The benefits and harms of breast cancer screening: an independent review. Lancet. 2012;380:1778–1786. doi: 10.1016/S0140-6736(12)61611-0.
1. Welch HG, Prorok PC, O’Malley AJ, Kramer BS. Breast-cancer tumor size, overdiagnosis, and mammography screening effectiveness. N Engl J Med. 2016;375:1438–1447. doi: 10.1056/NEJMoa1600249.
1. Breast Cancer Surveillance Consortium (BCSC) Performance measures for 1,838,372 screening mammography examinations from 2004 to 2008 by age–based on BCSC data through 2009. National Cancer Institute. Available via . Accessed 29 Sep 2017
1. Brewer NT, Salz T, Lillie SE. Systematic review: the long-term effects of false-positive mammograms. Ann Intern Med. 2007;146:502–510. doi: 10.7326/0003-4819-146-7-200704030-00006.
1. Karssemeijer N, Bluekens AM, Beijerinck D, et al. Breast cancer screening results 5 years after introduction of digital mammography in a population-based screening program. Radiology. 2009;253:353–358. doi: 10.1148/radiol.2532090225.
1. Evans KK, Birdwell RL, Wolfe JM. If you don’t find it often, you often don’t find it: why some cancers are missed in breast cancer screening. PLoS One. 2013;8:e64366. doi: 10.1371/journal.pone.0064366.
1. Huynh PT, Jarolimek AM, Daye S. The false-negative mammogram. Radiographics. 1998;18:1137–1154. doi: 10.1148/radiographics.18.5.9747612.
1. Fenton JJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med. 2007;356:1399–1409. doi: 10.1056/NEJMoa066099.
1. Lehman CD, Wellman RD, Buist DS, et al. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med. 2015;175:1828–1837. doi: 10.1001/jamainternmed.2015.5231.
1. Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60–88. doi: 10.1016/j.media.2017.07.005.
1. Trister AD, Buist DSM, Lee CI (2017) Will machine learning tip the balance in breast cancer screening? JAMA Oncol. 10.1001/jamaoncol.2017.0473
1. Rodriguez-Ruiz A, Lång K, Gubern-Merida A et al (2019) Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 10.1093/jnci/djy222
1. Rodríguez-Ruiz A, Krupinski E, Mordang J-J et al (2018) Detection of breast cancer with mammography: effect of an artificial intelligence support system. Radiology 181371
1. Rimmer A (2017) Radiologist shortage leaves patient care at risk, warns royal college. BMJ 359
1. National Health Institutes England, Public Health England, British Society of Breast Radiology, Royal College of Radiologists (2017) The breast imaging and diagnostic workforce in the United Kingdom. Available via . Accessed 30 Dec 2018
1. Wing P, Langelier MH. Workforce shortages in breast imaging: impact on mammography utilization. Am J Roentgenol. 2009;192:370–378. doi: 10.2214/AJR.08.1665.
1. Wallis MG, Moa E, Zanca F, Leifland K, Danielsson M. Two-view and single-view tomosynthesis versus full-field digital mammography: high-resolution X-ray imaging observer study. Radiology. 2012;262:788–796. doi: 10.1148/radiol.11103514.
1. Visser R, Veldkamp WJ, Beijerinck D, et al. Increase in perceived case suspiciousness due to local contrast optimisation in digital screening mammography. Eur Radiol. 2012;22:908–914. doi: 10.1007/s00330-011-2320-2.
1. Hupse R, Samulski M, Lobbes MB, et al. Computer-aided detection of masses at mammography: interactive decision support versus prompts. Radiology. 2013;266:123–129. doi: 10.1148/radiol.12120218.
1. Gennaro G, Hendrick RE, Ruppel P, et al. Performance comparison of single-view digital breast tomosynthesis plus single-view digital mammography with two-view digital mammography. Eur Radiol. 2013;23:664–672. doi: 10.1007/s00330-012-2649-1.
1. Siemens Medical Solutions USA Inc (2015) FDA Application: Mammomat Inspiration with Digital Breast Tomosynthesis. Available via . Accessed March 3 2018
1. Garayoa J, Chevalier M, Castillo M, et al. Diagnostic value of the stand-alone synthetic image in digital breast tomosynthesis examinations. Eur Radiol. 2018;28:565–572. doi: 10.1007/s00330-017-4991-9.
1. Rodriguez-Ruiz A, Gubern-Merida A, Imhof-Tas M et al (2017) One-view digital breast tomosynthesis as a stand-alone modality for breast cancer detection: do we need more? Eur Radiol. 10.1007/s00330-017-5167-3
1. Clauser P, Baltzer PA, Kapetas P, et al. Synthetic 2-dimensional mammography can replace digital mammography as an adjunct to wide-angle digital breast tomosynthesis. Invest Radiol. 2019;54:83–88. doi: 10.1097/RLI.0000000000000513.
1. Blackwelder WC. “Proving the null hypothesis” in clinical trials. Control Clin Trials. 1982;3:345–353. doi: 10.1016/0197-2456(82)90024-1.
1. Chen W, Petrick NA, Sahiner B. Hypothesis testing in noninferiority and equivalence MRMC ROC studies. Acad Radiol. 2012;19:1158–1165. doi: 10.1016/j.acra.2012.04.011.
1. Gallas BD, Bandos A, Samuelson FW, Wagner RF. A framework for random-effects ROC analysis: biases with the bootstrap and other variance estimators. Commun Stat - Theory Methods. 2009;38:2586–2603. doi: 10.1080/03610920802610084.
1. Gallas B (2017) iMRMC v4.0: application for analyzing and sizing MRMC reader studies. Division of imaging, diagnostics, and software reliability, OSEL/CDRH/FDA, Silver Spring, MD. Available via , . Accessed 30 Dec 2018
1. Gennaro G (2018) The “perfect” reader study. Eur J Radiol In press
1. Chen W, Gong Q, Gallas BD (2018) Efficiency gain of paired split-plot designs in MRMC ROC studies. Medical imaging 2018: image perception, observer performance, and technology assessment. International Society for Optics and Photonics, pp 105770F
1. Gallas BD, Brown DG. Reader studies for validation of CAD systems. Neural Netw. 2008;21:387–397. doi: 10.1016/j.neunet.2007.12.013.
1. Chen W, Samuelson FW. The average receiver operating characteristic curve in multireader multicase imaging studies. Br J Radiol. 2014;87:20140016. doi: 10.1259/bjr.20140016.
1. Dang PA, Freer PE, Humphrey KL, Halpern EF, Rafferty EA. Addition of tomosynthesis to conventional digital mammography: effect on image interpretation time of screening examinations. Radiology. 2014;270:49–56. doi: 10.1148/radiol.13130765.

Source: PubMed

Can we reduce the workload of mammographic screening by automatic identification of normal exams with artificial intelligence? A feasibility study

Abstract

Conflict of interest statement

Figures

References

스폰서 및 공동 작업자

건강 상태

약물 개입