Population-Based Screening for Endometrial Cancer: Human vs. Machine Intelligence

Gregory R Hart, Vanessa Yan, Gloria S Huang, Ying Liang, Bradley J Nartowt, Wazir Muhammad, Jun Deng, Gregory R Hart, Vanessa Yan, Gloria S Huang, Ying Liang, Bradley J Nartowt, Wazir Muhammad, Jun Deng

Abstract

Incidence and mortality rates of endometrial cancer are increasing, leading to increased interest in endometrial cancer risk prediction and stratification to help in screening and prevention. Previous risk models have had moderate success with the area under the curve (AUC) ranging from 0.68 to 0.77. Here we demonstrate a population-based machine learning model for endometrial cancer screening that achieves a testing AUC of 0.96. We train seven machine learning algorithms based solely on personal health data, without any genomic, imaging, biomarkers, or invasive procedures. The data come from the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO). We further compare our machine learning model with 15 gynecologic oncologists and primary care physicians in the stratification of endometrial cancer risk for 100 women. We find a random forest model that achieves a testing AUC of 0.96 and a neural network model that achieves a testing AUC of 0.91. We test both models in risk stratification against 15 practicing physicians. Our random forest model is 2.5 times better at identifying above-average risk women with a 2-fold reduction in the false positive rate. Our neural network model is 2 times better at identifying above-average risk women with a 3-fold reduction in the false positive rate. Our machine learning models provide a non-invasive and cost-effective way to identify high-risk sub-populations who may benefit from early screening of endometrial cancer, prior to disease onset. Through statistical biopsy of personal health data, we have identified a new and effective approach for early cancer detection and prevention for individual patients.

Keywords: cancer screening; early detection; endometrial cancer; machine learning; statistical biopsy.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2020 Hart, Yan, Huang, Liang, Nartowt, Muhammad and Deng.

Figures

FIGURE 1
FIGURE 1
A) The sensitivity and specificity of the random forest for both the training and testing data as a function of the threshold value and (B) The sensitivity and specificity of the neural network for both the training and testing data as a function of the threshold value.
FIGURE 2
FIGURE 2
A) Area under the ROC curve for the random forest on both the training and testing data. Similar performance on both datasets indicates that the random forest has no overfit and (B) Area under the ROC curve for the neural network on both the training and testing data. Similar performance on both datasets indicates that the neural network has no overfit.
FIGURE 3
FIGURE 3
(A) Kaplan-Meier plot of the below- (green), at- (yellow), and above- (red) average risk groups created from the testing data by our random forest model. Also shown are the p-value and hazard ratio (HR) between each group. Those in the above-average risk group clearly have the highest chance of developing cancer and (B) Kaplan-Meier plot of the below- (green), at- (yellow), and above- (red) average risk groups created from the testing data by our neural network model with 95% confidence intervals (shaded). Also shown are the p-value and hazard ratio (HR) between each group. Those in the above-average risk group clearly have the highest chance of developing cancer.

References

    1. American Cancer Society (2017). Cancer facts and figures 2017. Available from: .
    1. Anderson K. E., Anderson E., Mink P. J., Hong C. P., Kushi L. H., Sellers T. A., et al. (2001). Diabetes and endometrial cancer in the Iowa women’s health study. Cancer Epidemiol. Biomarkers Prev. 10, 611–616.
    1. Arnold M., Pandeya N., Byrnes G., Renehan A. G., Stevens G. A., Ezzati M., et al. (2015). Global burden of cancer attributable to high body-mass index in 2012: a population-based study. Lancet Oncol. 16, 36–46. 10.1016/s1470-2045(14)71123-4.
    1. Aune D., Sen A., Vatten L. J. (2017). Hypertension and the risk of endometrial cancer: a systematic review and meta-analysis of case-control and cohort studies. Sci. Rep. 7, 44808 10.1038/srep44808.
    1. Aune D., Navarro Rosenblatt D. A., Chan D. S. M., Vingeliene S., Abar L., Vieira A. R., et al. (2015). Anthropometric factors and endometrial cancer risk: a systematic review and dose-response meta-analysis of prospective studies. Ann. Oncol. 26, 1635–1648. 10.1093/annonc/mdv142.
    1. Bishop C. M. (2006). Pattern recognition and machine learning. Berlin, Germany: Springer, 738.
    1. Collins G. S., Reitsma J. B., Altman D. G., Moons K. G. M. (2015). Transparent reporting of a multivariable prediction model for individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann. Intern. Med. 162, 55–63. 10.7326/m14-0697.
    1. Crosbie E. J., Zwahlen M., Kitchener H. C., Egger M., Renehan A. G. (2010). Body mass index, hormone replacement therapy, and endometrial cancer risk: a meta-analysis. Cancer Epidemiol. Biomark. Prev. 19, 3119–3130. 10.1158/1055-9965.epi-10-0832.
    1. Dossus L., Allen N., Kaaks R., Bakken K., Lund E., Tjonneland A., et al. (2010). Reproductive risk factors and endometrial cancer: the European prospective investigation into cancer and nutrition. Int. J. Cancer. 127, 442–451. 10.1002/ijc.25050
    1. Hanley J. A., McNeil B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36. 10.1148/radiology.143.1.7063747.
    1. Hart G. R., Roffman D. A., Decker R., Deng J. (2018). A multi-parameterized artificial neural network for lung cancer risk prediction. PloS One. 13, e0205264 10.1371/journal.pone.0205264.
    1. Hosono S., Matsuo K., Hirose K., Ito H., Suzuki T., Kawase T., et al. (2011). Weight gain during adulthood and body weight at age 20 are associated with the risk of endometrial cancer in Japanese women. J. Epidemiol. 21, 466–473. 10.2188/jea.je20110020.
    1. Howlader N., Noone A., Krapcho M., Miller D., Bishop K., Kosary C. L., et al. (2017). SEER cancer statistics review, 1975-2014. Available from: (Accessed May 26, 2007).
    1. Hüsing A., Dossus L., Ferrari P., Tjønneland A., Hansen L., Fagherazzi G., et al. (2016). An epidemiological model for prediction of endometrial cancer risk in Europe. Eur. J. Epidemiol. 31, 51–60. 10.1007/s10654-015-0030-9.
    1. Kitson S. J., Evans D. G., Crosbie E. J. (2017). Identifying high-risk women for endometrial cancer prevention strategies: proposal of an endometrial cancer risk prediction model. Cancer Prev. Res. 10, 1–13. 10.1158/1940-6207.capr-16-0224.
    1. Kramer B. S., Gohagan J., Prorok P. C., Smart C. (1993). A National Cancer Institute sponsored screening trial for prostatic, lung, colorectal, and ovarian cancers. Cancer 71, 589–593. 10.1002/cncr.2820710215.
    1. Muhammad W., Hart G. R., Nartowt B. J., Farrell J. J., Johung K., Liang Y., et al. (2019). Pancreatic cancer prediction through an artificial neural network. Front. Artif. Intell. 2, 2 10.3389/frai.2019.00002.
    1. Parikh-Patel A., White R. H., Allen M., Cress R. (2009). Risk of cancer among rheumatoid arthritis patients in California. Cancer Causes Control. 20, 1001–1010. 10.1007/s10552-009-9298-y.
    1. Pfeiffer R. M., Park Y., Kreimer A. R., Lacey J. V., Pee D., Greenlee R. T., et al. (2013). Risk prediction for breast, endometrial, and ovarian cancer in white women aged 50 y or older: derivation and validation from population-based cohort studies. PLoS Med. 10, e1001492 10.1371/journal.pmed.1001492.
    1. Renehan A. G., Tyson M., Egger M., Heller R. F., Zwahlen M. (2008). Body-mass index and incidence of cancer: a systematic review and meta-analysis of prospective observational studies. Lancet 371, 569–578. 10.1016/s0140-6736(08)60269-x.
    1. Roffman D., Hart G. R., Girardi M., Ko C. J., Deng J. (2018a). Predicting non-melanoma skin cancer via a multi-parameterized artificial neural network. Sci. Rep. 8, 1701 10.1038/s41598-018-19907-9.
    1. Roffman D. A., Hart G. R., Leapman M. S., Yu J. B., Guo F. L., Ali I., et al. (2018b). Development and validation of a multiparameterized artificial neural network for prostate cancer risk prediction and stratification. JCO Clin. Cancer Inform. 2, 1–10. 10.1200/CCI.17.00119.
    1. Smith R. A., Andrews K. S., Brooks D., Fedewa S. A., Manassaram-Baptiste D., Saslow D., et al. (2018). Cancer screening in the United States, 2018: a review of current American Cancer Society guidelines and current issues in cancer screening. CA A Cancer J. Clin. 68, 297–316. 10.3322/caac.21446.
    1. Smith R. A., von Eschenbach A. C., Wender R., Levin B., Byers T., Rothenberger D., et al. (2001). American cancer society guidelines for the early detection of cancer: update of early detection guidelines for prostate, colorectal, and endometrial cancers: also: update 2001–testing for early lung cancer detection. CA A Cancer J. Clin. 51, 38–75. 10.3322/canjclin.51.1.38.
    1. Zhou B., Yang L., Sun Q., Cong R., Gu H., Tang N., et al. (2008). Cigarette smoking and the risk of endometrial cancer: a meta-analysis. Am. J. Med. 121, 501–508. 10.1016/j.amjmed.2008.01.044.

Source: PubMed

3
購読する