Development of PancRISK, a urine biomarker-based risk score for stratified screening of pancreatic cancer patients

Oleg Blyuss, Alexey Zaikin, Valeriia Cherepanova, Daniel Munblit, Elena M Kiseleva, Olga M Prytomanova, Stephen W Duffy, Tatjana Crnogorac-Jurcevic, Oleg Blyuss, Alexey Zaikin, Valeriia Cherepanova, Daniel Munblit, Elena M Kiseleva, Olga M Prytomanova, Stephen W Duffy, Tatjana Crnogorac-Jurcevic

Abstract

Background: An accurate and simple risk prediction model that would facilitate earlier detection of pancreatic adenocarcinoma (PDAC) is not available at present. In this study, we compare different algorithms of risk prediction in order to select the best one for constructing a biomarker-based risk score, PancRISK.

Methods: Three hundred and seventy-nine patients with available measurements of three urine biomarkers, (LYVE1, REG1B and TFF1) using retrospectively collected samples, as well as creatinine and age, were randomly split into training and validation sets, following stratification into cases (PDAC) and controls (healthy patients). Several machine learning algorithms were used, and their performance characteristics were compared. The latter included AUC (area under ROC curve) and sensitivity at clinically relevant specificity.

Results: None of the algorithms significantly outperformed all others. A logistic regression model, the easiest to interpret, was incorporated into a PancRISK score and subsequently evaluated on the whole data set. The PancRISK performance could be even further improved when CA19-9, commonly used PDAC biomarker, is added to the model.

Conclusion: PancRISK score enables easy interpretation of the biomarker panel data and is currently being tested to confirm that it can be used for stratification of patients at risk of developing pancreatic cancer completely non-invasively, using urine samples.

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1. Performance characteristics of urine biomarkers…
Fig. 1. Performance characteristics of urine biomarkers interpreted using logistic regression, neural network, neuro-fuzzy technology, random forest and support vector machine for detection of pancreatic cancer (PDAC) cases.
Circle points give particular values of sensitivity and specificity provided by random forest and support vector machine. LR logistic regression, NN neural network, NFT neuro-fuzzy technology, RF random forest, SVM support vector machine, AUC area under ROC curve.

References

    1. Cassidy A, Duffy SW, Myles JP, Liloglou T, Field YK. Lung cancer risk prediction: a tool for early detection. Int. J. Cancer. 2006;120:1–6. doi: 10.1002/ijc.22331.
    1. Wang X, Oldani MJ, Zhao X, Huang X, Qian Q. A review of cancer risk prediction models with genetic variants. Cancer Inform. 2014;13:19–28.
    1. Tyrer J, Duffy SW, Cuzick J. A breast cancer prediction model incorporating familial and personal risk factors. Stat. Med. 2004;23:1111–1130. doi: 10.1002/sim.1668.
    1. Wen CP, Lin J, Yang YC, Tsai MK, Tsao CK, Etzel C, et al. Hepatocellular carcinoma risk prediction model for the general population: the predictive power of transaminases. J. Natl Cancer Inst. 2012;104:1599–1611. doi: 10.1093/jnci/djs372.
    1. Blyuss O, Burnell M, Ryan A, Gentry-Maharaj A, Marino I, Kalsi J, et al. Comparison of longitudinal algorithms as first line tests for ovarian cancer screening: a nested cohort study within UK Collaborative Trial of Ovarian Cancer Screening (UKCTOCS) Clin. Cancer Res. 2018;24:4726–4733. doi: 10.1158/1078-0432.CCR-18-0208.
    1. Zhao D, Weng C. Combining PubMed knowledge and HER data to develop a weighted Bayesian network for pancreatic cancer risk prediction. J. Biomed. Inform. 2011;44:859–868. doi: 10.1016/j.jbi.2011.05.004.
    1. Klein AP, Lindstrom S, Mendelsohn JB, Steplowski E, Arslan AA, Bas Bueno-de-Mesquita H. An absolute risk model to identify individuals at elevated risk for pancreatic cancer in the general population. PLoS ONE. 2013;8:e72311. doi: 10.1371/journal.pone.0072311.
    1. Risch HA, Yu H, Lingeng Lu, Kidd MS. Detectable symptomatology preceding the diagnosis of pancreatic cancer and absolute risk of pancreatic cancer diagnosis. Am. J. Epidemiol. 2015;182:26–34. doi: 10.1093/aje/kwv026.
    1. Hippisley-Cox J, Coupland C. Development and validation of risk prediction algorithms to estimate future risk of common cancers in men and women: prospective cohort study. BMJ Open. 2015;5:e007825. doi: 10.1136/bmjopen-2015-007825.
    1. Pang, T., Ding, G., Wu, Z., Jiang, G., Yang, Y., Zhang, X. et al. A novel scoring system to analyse combined effect of lifestyle factors on pancreatic cancer risk: a retrospective case-control study. Sci. Rep.7, 13657 (2017).
    1. Kim, J., Yuan, C., Babic, A., Bao, Y., Brais, L. K. & Welch, M. W. Abstract 4945: Absolute risk prediction models for pancreatic cancer. Cancer Res.78, 4945 (2018).
    1. Nakatochi M, Lin Y, Ito H, Hara K, Kinoshita F, Kobayashi Y. Prediction model for pancreatic cancer risk in the general Japanese population. PLoS ONE. 2018;13:e0203386. doi: 10.1371/journal.pone.0203386.
    1. Wang W, Chen S, Brune KA, Hruban RH, Parmigiani G, Klein AP. PancPRO: risk assessment for individuals with a family history of pancreatic cancer. J. Clin. Oncol. 2007;25:1417–1422. doi: 10.1200/JCO.2006.09.2452.
    1. Cai QC, Chen Y, Xiao Y, Zhu W, Xu QF, Zhong L, et al. A prediction rule for estimating pancreatic cancer risk in chronic pancreatitis patients with focal pancreatic mass lesions with prior negative EUS-FNA cytology. Scand. J. Gastroenterol. 2011;46:464–470. doi: 10.3109/00365521.2010.539256.
    1. Ruckert F, Brussig T, Kuhn M, Kersting S, Bunk A, Hunger M, et al. Malignancy in chronic pancreatitis: analysis of diagnostic procedures and proposal of a clinical algorithm. Pancreatology. 2013;13:243–249. doi: 10.1016/j.pan.2013.03.014.
    1. Boursi B, Finkelman B, Giantonio BJ, Haynes K, Rustgi AK, Rhim AD, et al. A clinical prediction model to assess risk for pancreatic cancer among patients with new-onset diabetes. Gastroenterology. 2017;152:840–850. doi: 10.1053/j.gastro.2016.11.046.
    1. Sharma A, Kandlakunta H, Singh Nagpal SJ, Feng Z, Hoos W, Petersen GM, et al. Model to determine risk of pancreatic cancer in patients with new-onset diabetes. Gastroenterology. 2018;155:730–739. doi: 10.1053/j.gastro.2018.05.023.
    1. Radon TP, Massat NJ, Jones R, Alrawashdeh W, Dumartin L, Ennis D, et al. Identification of a three-biomarker panel in urine for early detection of pancreatic adenocarcinoma. Clin. Cancer Res. 2015;21:3512–3521. doi: 10.1158/1078-0432.CCR-14-2467.
    1. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, 2006).
    1. Manaswini P, Sahu RK. Multilayer perceptron network in HIV/AIDS application. Int. J. Comput. Appl. Eng. Sci. 2011;1:41–48.
    1. Yan H, Jiang Y, Zheng J, Peng C, Li Q. A multilayer perceptron-based medical decision support system for heart disease diagnosis. Expert Syst. Appl. 2006;30:272–281. doi: 10.1016/j.eswa.2005.07.022.
    1. Shaikhina T, Khovanova NA. Handling limited datasets with neural networks in medical applidations: a small-data approach. Artif. Intell. Med. 2017;75:51–63. doi: 10.1016/j.artmed.2016.12.003.
    1. Hothorn T, Hornik K, Zeileis A. Unbiased recursive partitioning: a conditional inference framework. J. Comput. Graphical Stat. 2006;15:651–674. doi: 10.1198/106186006X133933.
    1. Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 2007;8:25. doi: 10.1186/1471-2105-8-25.
    1. Strobl, C., Boulesteix, A. L., Kneib, T., Augustin, T. & Zeileis, A. Conditional variable importance for random forests. BMC Bioinformatics9, 307 (2008).
    1. Marjanovic, M., Bajat, B. & Kovacevic, M. Landslide susceptibility assessment with machine learning algorithms. In Proc. International Conference on Intelligent Networking and Collaborative Systems 273–278 (IEEE, 2009).
    1. Kiseleva EM, Koriashkina LS. Theory of continuous optimal set partitioning problems as a universal mathematical formalism for constructing voronoi diagrams and their generalizations. I. Theoretical foundations. Cybern. Syst. Anal. 2015;3:325–335. doi: 10.1007/s10559-015-9725-x.
    1. Blyuss, O., Koriashkina, L., Kiseleva, E. & Molchanov, R. Optimal placement of irradiation sources in the planning of radiotherapy: mathematical models and methods of solving. Comput. Math. Methods Med. 2015, 142987 (2015).
    1. Paiva RP, Dourado A. Interpretability and learning in neuro-fuzzy systems. Fuzzy Sets Syst. 2004;147:17–38. doi: 10.1016/j.fss.2003.11.012.
    1. Kiseleva EM, Prytomanova OM, Zhuravel SV. Algorithm for solving a continuous problem of optimal partitioning with neurolinguistic identification of functions in target functional. J. Automation Inf. Sci. 2018;3:1–20.
    1. Kiseleva EM, Prytomanova OM, Zhuravel SV. Valuation of startups investment attractiveness based on neuro-fuzzy technologies. J. Automation Inf. Sci. 2016;9:1–22.
    1. Steyerberg EW, Harrell FE, Jr, Borsboom GJ, Eijkemans MJ, Vergouwe Y, Habbema JD. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J. Clin. Epidemiol. 2001;54:774–781. doi: 10.1016/S0895-4356(01)00341-9.
    1. Zou H, Hastie T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B. 2005;67:301–320. doi: 10.1111/j.1467-9868.2005.00503.x.
    1. Chollet F. Deep Learning with Python (Manning Publications Company, 2017).
    1. Pradhan B. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 2013;51:350–365. doi: 10.1016/j.cageo.2012.08.023.
    1. Kiseleva EM, Koriashkina LS. Theory of continuous optimal set partitioning problems as a universal mathematical formalism for constructing voronoi diagrams and their generalizations. II. Algorithms for constructing Voronoi diagrams based on the theory of optimal set partitioning. Cybern. Syst. Anal. 2015;4:489–499. doi: 10.1007/s10559-015-9740-y.
    1. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics. 1988;44:837–845. doi: 10.2307/2531595.
    1. Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. A simulation study of the number of events per variable in logistic regression analysis. J. Clin. Epidemiol. 1996;49:1373–1379. doi: 10.1016/S0895-4356(96)00236-3.

Source: PubMed

3
Abonneren