Machine-learning prediction of cancer survival: a retrospective study using electronic administrative records and a cancer registry

Sunil Gupta, Truyen Tran, Wei Luo, Dinh Phung, Richard Lee Kennedy, Adam Broad, David Campbell, David Kipp, Madhu Singh, Mustafa Khasraw, Leigh Matheson, David M Ashley, Svetha Venkatesh, Sunil Gupta, Truyen Tran, Wei Luo, Dinh Phung, Richard Lee Kennedy, Adam Broad, David Campbell, David Kipp, Madhu Singh, Mustafa Khasraw, Leigh Matheson, David M Ashley, Svetha Venkatesh

Abstract

Objectives: Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collected digital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventional methods in predicting clinical outcomes.

Setting: A regional cancer centre in Australia.

Participants: Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data.

Primary and secondary outcome measures: Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC).

Results: The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, and upper gastrointestinal tumours.

Conclusions: Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems.

Keywords: Cancer; Electronic Medical Record; Machine Learning; Prediction; Survival.

References

    1. Zhao X, Rodland EA, Sorlie T, et al. Combining gene signatures improves prediction of breast cancer survival. PLoS ONE 2011;6:e17845.
    1. Chang CM, Su YC, Lai NS, et al. The combined effect of individual and neighborhood socioeconomic status on cancer survival rates. PLoS ONE 2012;7:e44325.
    1. Li C, Zhang S, Zhang H, et al. Using the k-nearest neighbor algorithm for the classification of lymph node metastasis in gastric cancer. Comput Math Methods Med 2012;2012:876545.
    1. Huang ML, Hung YH, Lee WM, et al. Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. J Med Syst 2012;36:407–14
    1. Appari A, Eric Johnson M, Anthony DL. Meaningful use of electronic health record systems and process quality of care: evidence from a panel data analysis of U.S. acute-care hospitals. Health Serv Res 2013;48:354–75
    1. Fitzhenry F, Murff HJ, Matheny ME, et al. Exploring the frontier of electronic health record surveillance: the case of postoperative complications. Med Care 2013;51:509–16
    1. Lund L, Borre M, Jacobsen J, et al. Impact of comorbidity on survival of Danish prostate cancer patients, 1995–2006: a population-based cohort study. Urology 2008;72:1258–62
    1. Tetsche MS, Norgaard M, Jacobsen J, et al. Comorbidity and ovarian cancer survival in Denmark, 1995–2005: a population-based cohort study. Int J Gynecol Cancer 2008;18:421–7
    1. Lieffers JR, Baracos VE, Winget M, et al. A comparison of Charlson and Elixhauser comorbidity measures to predict colorectal cancer survival using administrative health data. Cancer 2011;117:1957–65
    1. Braithwaite D, Moore DH, Satariano WA, et al. Prognostic impact of comorbidity among long-term breast cancer survivors: results from the lace study. Cancer Epidemiol Biomarkers Prev 2012;21:1115–25
    1. Jones LE, Doebbeling CC. Beyond the traditional prognostic indicators: the impact of primary care utilization on cancer survival. J Clin Oncol 2007;25:5793–9
    1. Sant M, Minicozzi P, Allemani C, et al. Regional inequalities in cancer care persist in Italy and can influence survival. Cancer Epidemiol 2012;36:541–7
    1. Burke HB, Goodman PH, Rosen DB, et al. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 1997;79:857–62
    1. Lundin M, Lundin J, Burke HB, et al. Artificial neural networks applied to survival prediction in breast cancer. Oncology 1999;57:281–6
    1. Manilich EA, Kiran RP, Radivoyevitch T, et al. A novel data-driven prognostic model for staging of colorectal cancer. J Am Coll Surg 2011;213:579–88, 588.e1–2
    1. Gao P, Zhou X, Wang ZN, et al. Which is a more accurate predictor in colorectal survival analysis? Nine data mining algorithms vs. the TNM staging system. PLoS ONE 2012;7:e42015.
    1. Kim W, Kim KS, Lee JE, et al. Development of novel breast cancer recurrence prediction model using support vector machine. J Breast Cancer 2012;15:230–8
    1. Johnson CJ, Weir HK, Fink AK, et al. Accuracy of Cancer Mortality Study Group The impact of National Death Index linkages on population-based cancer survival rates in the United States. Cancer Epidemiol 2013;37:20–8
    1. Khoury MJ, Lam TK, Ioannidis JP, et al. Transforming epidemiology for 21st century medicine and public health. Cancer Epidemiol Biomarkers Prev 2013;22:508–16
    1. Cox DR, Oakes D. Analysis of survival data. CRC Press, 1984
    1. Cortes C, Vapnik V. Support vector machine. Mach Learn 1995;20:273–97
    1. Lin H-T, Lin C-J, Weng RC. A note on platt's probabilistic outputs for support vector machines. Mach Learn 2007; 68:267–76
    1. Politis D, Romano J, Wolf M. Subsampling. New York: Springer-Verlag, 1999
    1. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Softw 2010;33:1–22.
    1. Hastie T, Tibshirani R, Friedman J, et al. The elements of statistical learning: data mining, inference and prediction. Math Intelligencer 2005;27:83–5
    1. Chen HC, Kodell RL, Cheng KF, et al. Assessment of performance of survival prediction models for cancer prognosis. BMC Med Res Methodol 2012;12:102.
    1. Chen HC, Chen JJ. Assessment of reproducibility of cancer survival risk predictions across medical centers. BMC Med Res Methodol 2013;13:25.

Source: PubMed

3
Předplatit