An Observational Study of Deep Learning and Automated Evaluation of Cervical Images for Cancer Screening

Liming Hu, David Bell, Sameer Antani, Zhiyun Xue, Kai Yu, Matthew P Horning, Noni Gachuhi, Benjamin Wilson, Mayoore S Jaiswal, Brian Befano, L Rodney Long, Rolando Herrero, Mark H Einstein, Robert D Burk, Maria Demarco, Julia C Gage, Ana Cecilia Rodriguez, Nicolas Wentzensen, Mark Schiffman, Liming Hu, David Bell, Sameer Antani, Zhiyun Xue, Kai Yu, Matthew P Horning, Noni Gachuhi, Benjamin Wilson, Mayoore S Jaiswal, Brian Befano, L Rodney Long, Rolando Herrero, Mark H Einstein, Robert D Burk, Maria Demarco, Julia C Gage, Ana Cecilia Rodriguez, Nicolas Wentzensen, Mark Schiffman

Abstract

Background: Human papillomavirus vaccination and cervical screening are lacking in most lower resource settings, where approximately 80% of more than 500 000 cancer cases occur annually. Visual inspection of the cervix following acetic acid application is practical but not reproducible or accurate. The objective of this study was to develop a "deep learning"-based visual evaluation algorithm that automatically recognizes cervical precancer/cancer.

Methods: A population-based longitudinal cohort of 9406 women ages 18-94 years in Guanacaste, Costa Rica was followed for 7 years (1993-2000), incorporating multiple cervical screening methods and histopathologic confirmation of precancers. Tumor registry linkage identified cancers up to 18 years. Archived, digitized cervical images from screening, taken with a fixed-focus camera ("cervicography"), were used for training/validation of the deep learning-based algorithm. The resultant image prediction score (0-1) could be categorized to balance sensitivity and specificity for detection of precancer/cancer. All statistical tests were two-sided.

Results: Automated visual evaluation of enrollment cervigrams identified cumulative precancer/cancer cases with greater accuracy (area under the curve [AUC] = 0.91, 95% confidence interval [CI] = 0.89 to 0.93) than original cervigram interpretation (AUC = 0.69, 95% CI = 0.63 to 0.74; P < .001) or conventional cytology (AUC = 0.71, 95% CI = 0.65 to 0.77; P < .001). A single visual screening round restricted to women at the prime screening ages of 25-49 years could identify 127 (55.7%) of 228 precancers (cervical intraepithelial neoplasia 2/cervical intraepithelial neoplasia 3/adenocarcinoma in situ [AIS]) diagnosed cumulatively in the entire adult population (ages 18-94 years) while referring 11.0% for management.

Conclusions: The results support consideration of automated visual evaluation of cervical images from contemporary digital cameras. If achieved, this might permit dissemination of effective point-of-care cervical screening.

Published by Oxford University Press 2019.

Figures

Figure 1.
Figure 1.
Cervical images used for training and validation. The images were drawn from the Proyecto Epidemiologico Guanacaste, a longitudinal cohort study of human papillomavirus infection, other screening tests, and risk of cervical precancer/cancer (1993–2001). The training and initial validation made use of the last images taken prior to case diagnosis and approximately 3:1 controls frequency matched to cases on time of study (enrollment/follow-up). The main analysis focused on images (excluding all images from women in the training set) from cohort enrollment and examined how the automated image analysis of enrollment screening images performed in prediction of cases found over the course of the entire cohort study. In an analysis that counted the absolute numbers of cases detected and controls referred, it was necessary to reweight, that is, to multiply the findings in the randomly selected validation test by the inverse of the 30% sampling fraction to estimate numbers for all cases and matched controls (see Methods). CIN = cervical intraepithelial neoplasia.
Figure 2.
Figure 2.
The system architecture of the automated visual evaluation algorithm. Two models are trained: a cervix locator (top), and the automated visual evaluation detection algorithm (bottom). The final validation algorithm incorporated both cervix locator and automated visual evaluation.
Figure 3.
Figure 3.
Comparison of the mean scores obtained by comparing enrollment images from women whose worst diagnosis within the cohort was cervical intraepithelial neoplasia (CIN) 2, CIN3, or cancer. These subsets of the cases were not statistically significantly different with regard to severity scores generated by the algorithm. Therefore, we combined all into a single case group. We show a box and whisker plot of 32 CIN2, 38 CIN3, and seven cancers, giving quartiles, means, and outliers of case probability for each diagnosis within the cohort.
Figure 4.
Figure 4.
Receiver operating characteristic (ROC) curve of automated visual evaluation of cervical images and comparison of performance in identification of cervical intraepithelial neoplasia 2+. ROC-like curves are shown for the categorical variables for simple visual and statistical comparison with automated visual evaluation (two-sided chi-squared tests). The thresholds are listed on each curve, showing the sensitivity and 1-specificity applicable to that cutpoint. Automated visual evaluation was as accurate or more than all of the screening tests used in the cohort study, including: A) automated visual evaluation; B) cervicography: area under the curve (AUC); C) conventional Pap smear; D) liquid-based cytology; E) first-generation neural network-based cytology; and F) MY09-MY11 PCR-based human papillomavirus (HPV) testing. ASC-US = atypical squamous cells of undetermined significance; HSIL = high-grade squamous intraepithelial lesion; LSIL = low-grade squamous intraepithelial lesion.

Source: PubMed

3
구독하다