Machine learning in GI endoscopy: practical guidance in how to interpret a novel field

Fons van der Sommen, Jeroen de Groof, Maarten Struyvenberg, Joost van der Putten, Tim Boers, Kiki Fockens, Erik J Schoon, Wouter Curvers, Peter de With, Yuichi Mori, Michael Byrne, Jacques J G H M Bergman, Fons van der Sommen, Jeroen de Groof, Maarten Struyvenberg, Joost van der Putten, Tim Boers, Kiki Fockens, Erik J Schoon, Wouter Curvers, Peter de With, Yuichi Mori, Michael Byrne, Jacques J G H M Bergman

Abstract

There has been a vast increase in GI literature focused on the use of machine learning in endoscopy. The relative novelty of this field poses a challenge for reviewers and readers of GI journals. To appreciate scientific quality and novelty of machine learning studies, understanding of the technical basis and commonly used techniques is required. Clinicians often lack this technical background, while machine learning experts may be unfamiliar with clinical relevance and implications for daily practice. Therefore, there is an increasing need for a multidisciplinary, international evaluation on how to perform high-quality machine learning research in endoscopy. This review aims to provide guidance for readers and reviewers of peer-reviewed GI journals to allow critical appraisal of the most relevant quality requirements of machine learning studies. The paper provides an overview of common trends and their potential pitfalls and proposes comprehensive quality requirements in six overarching themes: terminology, data, algorithm description, experimental setup, interpretation of results and machine learning in clinical practice.

Keywords: computerised image analysis; endoscopy; gastrointesinal endoscopy.

Conflict of interest statement

Competing interests: None declared.

© Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY. Published by BMJ.

Figures

Figure 1
Figure 1
Graphical display of overfitting of training data. In this figure, the leftmost panel displays data points of two classes, in which the class is indicated by the colour. The centre panel shows the same data including the prediction of a model trained on that data as the background colour. Overfitting is clearly visible as the model isolates points of the red class, rather than capturing the class as a whole. The rightmost panel shows the prediction of a different model as background colour. Although this model makes mistakes (red points can be seen on a blue background and vice versa), this model demonstrates better generalisation, as it captures the class distributions rather than individual points.
Figure 2
Figure 2
Visualisation of training, validation and test set and overfitting, and their appropriate use. The training dataset is used to train the model, followed by validation. In case of unsatisfactory performance, the model is changed, retrained and again validated. In case of satisfactory performance, the model is then tested on a separate test set to evaluate model performance.
Figure 3
Figure 3
Graphical display of fourfold cross-validation.
Figure 4
Figure 4
Exemplary case of subtle Barrett’s neoplasia, delineated by three experts (yellow, blue and green). Parts of the lesion (‘the sweet spot’) are recognised by all experts (black), yet other parts are only recognised by one or two experts. Reprinted from Bergman J, de Groof AJ, Pech O, et al. An interactive web-based educational tool improves detection and delineation of Barrett's esophagus-related neoplasia. Gastroenterology 2019;156:1299-1308, with permission from Elsevier.

References

    1. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, et al. . Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 2017;318:2199–210. 10.1001/jama.2017.14585
    1. Ghafoorian M, Karssemeijer N, Heskes T, et al. . Location sensitive deep Convolutional neural networks for segmentation of white matter hyperintensities. Sci Rep 2017;7:5110. 10.1038/s41598-017-05300-5
    1. Ciompi F, Chung K, van Riel SJ, et al. . Towards automatic pulmonary nodule management in lung cancer screening with deep learning. Sci Rep 2017;7:46479. 10.1038/srep46479
    1. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using Convolutional neural networks. Radiology 2017;284:574–82. 10.1148/radiol.2017162326
    1. Kooi T, Litjens G, van Ginneken B, et al. . Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal 2017;35:303–12. 10.1016/j.media.2016.07.007
    1. Kominami Y, Yoshida S, Tanaka S, et al. . Computer-Aided diagnosis of colorectal polyp histology by using a real-time image recognition system and narrow-band imaging magnifying colonoscopy. Gastrointest Endosc 2016;83:643–9. 10.1016/j.gie.2015.08.004
    1. Misawa M, Kudo S-E, Mori Y, et al. . Characterization of colorectal lesions using a computer-aided diagnostic system for narrow-band imaging Endocytoscopy. Gastroenterology 2016;150:1531–2. 10.1053/j.gastro.2016.04.004
    1. Urban G, Tripathi P, Alkayali T, et al. . Deep Learning Localizes and Identifies Polyps in Real Time With 96% Accuracy in Screening Colonoscopy. Gastroenterology 2018;155:1069–78. 10.1053/j.gastro.2018.06.037
    1. de Groof J, van der Sommen F, van der Putten J, et al. . The argos project: the development of a computer-aided detection system to improve detection of Barrett's neoplasia on white light endoscopy. United European Gastroenterol J 2019;7:538–47. 10.1177/2050640619837443
    1. Byrne MF, Chapados N, Soudan F, et al. . Real-Time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 2019;68:94–100. 10.1136/gutjnl-2017-314547
    1. Mori Y, Kudo S-E, Misawa M, et al. . Real-Time use of artificial intelligence in identification of diminutive polyps during colonoscopy: a prospective study. Ann Intern Med 2018;169:357–66. 10.7326/M18-0249
    1. Maeda Y, Kudo S-E, Mori Y, et al. . Fully automated diagnostic system with artificial intelligence using endocytoscopy to identify the presence of histologic inflammation associated with ulcerative colitis (with video). Gastrointest Endosc 2019;89:408–15. 10.1016/j.gie.2018.09.024
    1. van der Sommen F, Curvers WL, Nagengast WB. Novel developments in endoscopic mucosal imaging. Gastroenterology 2018;154:1876–86. 10.1053/j.gastro.2018.01.070
    1. Vinsard DG, Mori Y, Misawa M, et al. . Quality assurance of computer-aided detection and diagnosis in colonoscopy. Gastrointest Endosc 2019;90:55–63. 10.1016/j.gie.2019.03.019
    1. Deng J, Dong W, Socher R, et al. . ImageNet: a large-scale hierarchical image database. IEEE Conf Comput Vis Pattern Recognit 2009:2–9.
    1. Yosinski J, Clune J, Bengio Y, et al. . How transferable are features in deep neural networks? Advances in Neural Information Processing Systems 27 (NIPS ’14), NIPS Foundation 2014;2014:3320–8.
    1. Ahmad OF, Soares AS, Mazomenos E, et al. . Artificial intelligence and computer-aided diagnosis in colonoscopy: current evidence and future directions. Lancet Gastroenterol Hepatol 2019;4:71–80. 10.1016/S2468-1253(18)30282-6
    1. van der Sommen F, Zinger S, Curvers WL, et al. . Computer-Aided detection of early neoplastic lesions in Barrett's esophagus. Endoscopy 2016;48:617–24. 10.1055/s-0042-105284
    1. Maier-Hein L, Eisenmann M, Reinke A, et al. . Why rankings of biomedical image analysis competitions should be interpreted with care. Nat Commun 2018;9:5217. 10.1038/s41467-018-07619-7
    1. Chen P-J, Lin M-C, Lai M-J, et al. . Accurate classification of diminutive colorectal polyps using computer-aided analysis. Gastroenterology 2018;154:568–75. 10.1053/j.gastro.2017.10.010
    1. Gross S, Trautwein C, Behrens A, et al. . Computer-Based classification of small colorectal polyps by using narrow-band imaging with optical magnification. Gastrointest Endosc 2011;74:1354–9. 10.1016/j.gie.2011.08.001
    1. Ponugoti P, Rastogi A, Kaltenbach T, et al. . Disagreement between high confidence endoscopic adenoma prediction and histopathological diagnosis in colonic lesions ≤ 3 mm in size. Endoscopy 2019;51:221–6. 10.1055/a-0831-2348
    1. Horie Y, Yoshio T, Aoyama K, et al. . Diagnostic outcomes of esophageal cancer by artificial intelligence using convolutional neural networks. Gastrointest Endosc 2019;89:25–32. 10.1016/j.gie.2018.07.037
    1. Iakovidis DK, Koulaouzidis A. Automatic lesion detection in capsule endoscopy based on color saliency: closer to an essential adjunct for reviewing software. Gastrointest Endosc 2014;80:877–83. 10.1016/j.gie.2014.06.026
    1. Bergman JJGHM, de Groof AJ, Pech O, et al. . An interactive web-based educational tool improves detection and delineation of Barrett's Esophagus-Related neoplasia. Gastroenterology 2019;156:1299–308. 10.1053/j.gastro.2018.12.021
    1. van der Putten J, van der Sommen F, de Groof J, et al. . Modeling clinical assessor intervariability using deep hypersphere encoder–decoder networks. Neural Computing and Applications 2019;63 10.1007/s00521-019-04607-w
    1. Ahmad O. Barriers and pitfalls for artificial intelligence in gastroenterology: ethical and regulatory issues. Elsevier Editorial System(tm) for Techniques in Gastrointestinal Endoscopy; In Press.
    1. Lipton Z, Steinhardt J. Troubling trends in machine learning scholarship 2018.
    1. Dice LR. Measures of the amount of ecologic association between species. Ecology 1945;26:297–302. 10.2307/1932409
    1. Fvd S, Zinger S, Schoon EJ, et al. . Sweet-spot training for early esophageal cancer detection: SPIE 2016.
    1. Warfield SK, Zou KH, Wells WM. Simultaneous truth and performance level estimation (staple): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 2004;23:903–21. 10.1109/TMI.2004.828354
    1. Chinzei K, Shimizu A, Mori K, et al. . Regulatory science on AI-based medical devices and systems. Advanced Biomedical Engineering 2018;7:118–23. 10.14326/abe.7.118
    1. Byrne MF. Artificial intelligence and the future of endoscopy: should we be quietly excited? Endoscopy 2019;51:511–2. 10.1055/a-0831-2549
    1. Schulz KF, Altman DG, CONSORT MD. Statement: updated guidelines for reporting parallel group randomised trials. Bmj 2010;2010:c332.
    1. Collins GS, Moons KGM. Reporting of artificial intelligence prediction models. Lancet 2019;393:1577–9. 10.1016/S0140-6736(19)30037-6
    1. Zhou J, Wu L, Wan X, et al. . A novel artificial intelligence system for the assessment of bowel preparation (with video). Gastrointestinal Endoscopy 2019.

Source: PubMed

3
Abonnere