Design and Selection of Machine Learning Methods Using Radiomics and Dosiomics for Normal Tissue Complication Probability Modeling of Xerostomia

Hubert S Gabryś, Florian Buettner, Florian Sterzing, Henrik Hauswald, Mark Bangert, Hubert S Gabryś, Florian Buettner, Florian Sterzing, Henrik Hauswald, Mark Bangert

Abstract

Purpose: The purpose of this study is to investigate whether machine learning with dosiomic, radiomic, and demographic features allows for xerostomia risk assessment more precise than normal tissue complication probability (NTCP) models based on the mean radiation dose to parotid glands.

Material and methods: A cohort of 153 head-and-neck cancer patients was used to model xerostomia at 0-6 months (early), 6-15 months (late), 15-24 months (long-term), and at any time (a longitudinal model) after radiotherapy. Predictive power of the features was evaluated by the area under the receiver operating characteristic curve (AUC) of univariate logistic regression models. The multivariate NTCP models were tuned and tested with single and nested cross-validation, respectively. We compared predictive performance of seven classification algorithms, six feature selection methods, and ten data cleaning/class balancing techniques using the Friedman test and the Nemenyi post hoc analysis.

Results: NTCP models based on the parotid mean dose failed to predict xerostomia (AUCs < 0.60). The most informative predictors were found for late and long-term xerostomia. Late xerostomia correlated with the contralateral dose gradient in the anterior-posterior (AUC = 0.72) and the right-left (AUC = 0.68) direction, whereas long-term xerostomia was associated with parotid volumes (AUCs > 0.85), dose gradients in the right-left (AUCs > 0.78), and the anterior-posterior (AUCs > 0.72) direction. Multivariate models of long-term xerostomia were typically based on the parotid volume, the parotid eccentricity, and the dose-volume histogram (DVH) spread with the generalization AUCs ranging from 0.74 to 0.88. On average, support vector machines and extra-trees were the top performing classifiers, whereas the algorithms based on logistic regression were the best choice for feature selection. We found no advantage in using data cleaning or class balancing methods.

Conclusion: We demonstrated that incorporation of organ- and dose-shape descriptors is beneficial for xerostomia prediction in highly conformal radiotherapy treatments. Due to strong reliance on patient-specific, dose-independent factors, our results underscore the need for development of personalized data-driven risk profiles for NTCP models of xerostomia. The facilitated machine learning pipeline is described in detail and can serve as a valuable reference for future work in radiomic and dosiomic NTCP modeling.

Keywords: IMRT; NTCP; dosiomics; head and neck; machine learning; radiomics; radiotherapy; xerostomia.

Figures

Figure 1
Figure 1
Frequency of the follow-up reports collection.
Figure 2
Figure 2
The workflow of a multivariate five-step model building comprising, in this order, feature-group selection, feature scaling, sampling, feature selection, and classification.
Figure 3
Figure 3
Predictive power of individual features in the time-specific models measured with the area under the receiver operating characteristic curve (AUC). The left-hand side vertical axis lists the features, the right-hand side vertical axis lists the feature groups. The AUCs were calculated from the corresponding Mann–Whitney U statistic. Bars marked with * are significant at the false discovery rate (FDR) ≤ 0.05.
Figure 4
Figure 4
The mean dose and the absolute right–left dose gradient distribution in our patient cohort.
Figure 5
Figure 5
A comparison of classification, feature selection, and sampling algorithms in terms of their predictive performance in model tuning. All heat maps in a given column belong to a single end point, whereas all heat maps in a given row correspond to a single classifier. In each heat map, rows represent feature selection algorithms and columns correspond to sampling methods. The color maps are normalized per end point. The color bar ticks correspond to the worst, average, and the best model performance.
Figure 6
Figure 6
Heat maps showing a proportion of times a given algorithm on the vertical axis outperformed another algorithm on the horizontal axis in terms of the best AUC in model tuning. For example, support vector machines (SVM) performed better than extra-trees (ET) in 73% of the time-specific models.
Figure 7
Figure 7
A comparison of classification, feature selection, and sampling methods against one another with the Nemenyi test. Lower ranks correspond to better performance of the algorithm, that is rank 1 is the best. Algorithms which ranks differ by less than the critical difference (CD) are not significantly different at 0.05 significance level and are connected by the black bars.
Figure 8
Figure 8
Features underlying the multivariate models of long-term xerostomia. i, ipsilateral gland; c, contralateral gland.

References

    1. Deasy JO, Moiseenko V, Marks L, Chao KSC, Nam J, Eisbruch A. Radiotherapy dose-volume effects on salivary gland function. Int J Radiat Oncol Biol Phys (2010) 76(3 Suppl):58–63.10.1016/j.ijrobp.2009.06.090
    1. Houweling AC, Philippens MEP, Dijkema T, Roesink JM, Terhaard CHJ, Schilstra C, et al. A comparison of dose-response models for the parotid gland in a large group of head-and-neck cancer patients. Int J Radiat Oncol Biol Phys (2010) 76(4):1259–65.10.1016/j.ijrobp.2009.07.1685
    1. Beetz I, Schilstra C, Burlage FR, Koken PW, Doornaert P, Bijl HP, et al. Development of NTCP models for head and neck cancer patients treated with three-dimensional conformal radiotherapy for xerostomia and sticky saliva: the role of dosimetric and clinical factors. Radiother Oncol (2012) 105(1):86–93.10.1016/j.radonc.2011.05.010
    1. Buettner F, Miah AB, Gulliford SL, Hall E, Harrington KJ, Webb S, et al. Novel approaches to improve the therapeutic index of head and neck radiotherapy: an analysis of data from the PARSPORT randomised phase III trial. Radiother Oncol (2012) 103(1):82–7.10.1016/j.radonc.2012.02.006
    1. Lee T-F, Liou M-H, Ting H-M, Chang L, Lee H-Y, Wan Leung S, et al. Patient- and therapy-related factors associated with the incidence of xerostomia in nasopharyngeal carcinoma patients receiving parotid-sparing helical tomotherapy. Sci Rep (2015) 5:13165.10.1038/srep13165
    1. Gabrys HS, Buettner F, Sterzing F, Hauswald H, Bangert M. Parotid gland mean dose as a xerostomia predictor in low-dose domains. Acta Oncol (2017) 56(9):1197–203.10.1080/0284186X.2017.1324209
    1. Eisbruch A, Kim HM, Terrell JE, Marsh LH, Dawson LA, Ship JA. Xerostomia and its predictors following parotid-sparing irradiation of head-and-neck cancer. Int J Radiat Oncol Biol Phys (2001) 50(3):695–704.10.1016/S0360-3016(01)01512-7
    1. Lee T-F, Chao PJ, Ting HM, Chang L, Huang YJ, Wu JM, et al. Using multivariate regression model with least absolute shrinkage and selection operator (LASSO) to predict the incidence of xerostomia after intensity-modulated radiotherapy for head and neck cancer. PLoS One (2014) 9(2):e89700.10.1371/journal.pone.0089700
    1. Hawkins PG, Lee JY, Mao Y, Li P, Green M, Worden FP, et al. Sparing all salivary glands with IMRT for head and neck cancer: longitudinal study of patient-reported xerostomia and head-and-neck quality of life. Radiother Oncol (2018) 126(1):68–74.10.1016/j.radonc.2017.08.002
    1. Luijk PV, Pringle S, Deasy JO, Moiseenko VV, Faber H, Hovan A, et al. Sparing the region of the salivary gland containing stem cells preserves saliva production after radiotherapy for head and neck cancer. Sci Transl Med (2015) 7(305):1–8.10.1126/scitranslmed.aac4441
    1. van Dijk LV, Brouwer CL, van der Schaaf A, Burgerhof JGM, Beukinga RJ, Langendijk JA, et al. CT image biomarkers to improve patient-specific prediction of radiation-induced xerostomia and sticky saliva. Radiother Oncol (2017) 122(2):185–91.10.1016/j.radonc.2016.07.007
    1. van Dijk LV, Brouwer CL, Paul H, Laan VD, Johannes GM, Langendijk JA, et al. Geometric image biomarker changes of the parotid gland are associated with late xerostomia. Int J Radiat Oncol Biol Phys (2017) 99(5):1101–10.10.1016/j.ijrobp.2017.08.003
    1. El Naqa I, Bradley JD, Lindsay PE, Hope AJ, Deasy JO. Predicting radiotherapy outcomes using statistical learning techniques. Phys Med Biol (2009) 54(18):S9–30.10.1088/0031-9155/54/18/S02
    1. Gulliford S. Modelling of normal tissue complication probabilities (NTCP): review of application of machine learning in predicting NTCP. In: El Naqa I, Li R, Murphy MJ, editors. Machine Learning in Radiation Oncology. Cham: Springer; (2015). p. 277–310.
    1. Dean JA, Welsh LC, Wong KH, Aleksic A, Dunne E, Islam MR, et al. Normal tissue complication probability (NTCP) modelling of severe acute mucositis using a novel oral mucosal surface organ at risk. Clin Oncol (2017) 29(4):263–73.10.1016/j.clon.2016.12.001
    1. Chen S, Zhou S, Yin F-F, Marks LB, Das SK. Investigation of the support vector machine algorithm to predict lung radiation-induced pneumonitis. Med Phys (2007) 34(10):3808–14.10.1118/1.2776669
    1. Ospina JD, Zhu J, Chira C, Bossi A, Delobel JB, Beckendorf V, et al. Random forests to predict rectal toxicity following prostate cancer radiation therapy. Int J Radiat Oncol Biol Phys (2014) 89(5):1024–31.10.1016/j.ijrobp.2014.04.027
    1. Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics (2005) 21(5):631–43.10.1093/bioinformatics/bti033
    1. Olson RS, La Cava W, Mustahsan Z, Varik A, Moore JH. Data-Driven Advice for Applying Machine Learning to Bioinformatics Problems. (2017). ArXiv.
    1. Parmar C, Grossmann P, Rietveld D, Rietbergen MM, Lambin P, Aerts HJWL. Radiomic machine learning classifiers for prognostic biomarkers of head & neck cancer. Front Oncol (2015) 5:272.10.3389/fonc.2015.00272
    1. National Cancer Institute (U.S.). Common Terminology Criteria for Adverse Events (CTCAE) v4.03. Bethesda, MD: U.S. Department of Health and Human Services; (2010).
    1. Salkind NJ. Encyclopedia of Measurement and Statistics. Thousand Oaks: SAGE Publications; (2007). p. 508–10.
    1. Eisbruch A, Ten Haken RK, Kim HM, Marsh LH, Ship JA. Dose, volume, and function relationships in parotid salivary glands following conformal and intensity-modulated irradiation of head and neck cancer. Int J Radiat Oncol Biol Phys (1999) 45(3):577–87.10.1016/S0360-3016(99)90269-9
    1. Roesink JM, Moerland MA, Battermann JJ, Hordijk GJ, Terhaard CH. Quantitative dose-volume response analysis of changes in parotid gland function after radiotherapy in the head-and-neck region. Int J Radiat Oncol Biol Phys (2001) 51(4):938–46.10.1016/S0360-3016(01)01717-5
    1. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver characteristic (ROC) curve. Radiology (1982) 143:29–36.10.1148/radiology.143.1.7063747
    1. Qin G, Hotilovac L. Comparison of non-parametric confidence intervals for the area under the ROC curve of a continuous-scale diagnostic test. Stat Methods Med Res (2008) 17(2):207–21.10.1177/0962280207087173
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B (1995) 57(1):289–300.
    1. Gavrilov Y, Benjamini Y, Sarkar SK. An adaptive step-down procedure with proven FDR control under independence. Ann Stat (2009) 37(2):619–29.10.1214/07-AOS586
    1. Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intell Data Anal (2002) 6(5):429–49.
    1. He H, Garcia EA. Learning from imbalanced data. IEEE Trans Knowl Data Eng (2009) 21(9):1263–84.10.1109/TKDE.2008.239
    1. Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res (2003) 3:1157–82.
    1. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res (2012) 13:281–305.
    1. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: a comparison of resampling methods. Bioinformatics (2005) 21(15):3301–7.10.1093/bioinformatics/bti499
    1. Krzanowski W, Hand D. Assessing error rate estimators: the leave-one-out method reconsidered. Aust N Z J Stat (1997) 39(1):35–46.10.1111/j.1467-842X.1997.tb00521.x
    1. Airola A, Pahikkala T, Waegeman W, De Baets B, Salakoski T. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Comput Stat Data Anal (2011) 55(4):1828–44.10.1016/j.csda.2010.11.018
    1. Holm S. A simple sequentially rejective multiple test procedure. Scand J Stat (1979) 6:65–70.
    1. Cawley GC, Talbot NLC. On over-fitting in model selection and subsequent selection bias in performance evaluation. J Mach Learn Res (2010) 11:2079–107.
    1. Lemaitre G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res (2017) 18(17):1–5.
    1. Hunter JD. Matplotlib: a 2D graphics environment. Comput Sci Eng (2007) 9(3):99–104.10.1109/MCSE.2007.55
    1. Van Der Walt S, Colbert SC, Varoquaux G. The NumPy array: a structure for efficient numerical computation. Comput Sci Eng (2011) 13(2):22–30.10.1109/MCSE.2011.37
    1. Demšar J, Curk T, Erjavec A, Hočevar T, Milutinovič M, Možina M, et al. Orange: data mining toolbox in Python. J Mach Learn Res (2013) 14:2349–53.
    1. McKinney W. Data structures for statistical computing in Python. In: van der Walt S, Millman J, editors. SciPy 2010: Proceedings of the 9th Python in Science Conference Austin, TX, USA (2001) p. 51–6.
    1. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: machine learning in Python. J Mach Learn Res (2011) 12:2825–30.
    1. Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. (2016). p. 1–6. arXiv Prepr. arXiv1603.02754v3.
    1. Gonzalez RC, Woods RE. Digital Image Processing. 3rd ed Upper Saddle River, NJ: Prentice-Hall, Inc; (2006).
    1. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res (2002) 16:321–57.
    1. He H, Bai Y, Garcia EA, Li S. ADASYN: adaptive synthetic sampling approach for imbalanced learning. In Proc 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence) Hong Kong, China (2008). p. 1322–8.
    1. Tomek I. Two modifications of CNN. IEEE Trans Syst Man Cybern (1976) 6:769–72.
    1. Hart PE. The condensed nearest neighbour rule. IEEE Trans Inf Theory (1968) 14(5):515–6.10.1109/TIT.1968.1054155
    1. Kubat M, Matwin S. Addressing the course of imbalanced training sets: one-sided selection. In: Fisher DH, editor. Proceedings of the Fourteenth International Conference on Machine Learning (ICML) Nashville, TN, USA/San Francisco: Morgan Kaufmann (1997). p. 179–86.
    1. Wilson DR. Asymptotic properties of nearest neighbor rules using edited data. Inst Electr Electron Eng Trans Syst Man Cybern (1972) 2(3):408–21.
    1. Laurikkala J. Improving identification of difficult small classes by balancing class distribution. In: Quaglini S, Barahona P, Andreassen S, editors. AIME 2001 Artificial Intelligence in Medicine: Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe Cascais, Portugal/Berlin: Springer (2001) p. 63–6.
    1. Batista GEAPA, Prati RC, Monard MC. A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor Newsl (2004) 6(1):20–9.10.1145/1007730.1007735
    1. Gu Q, Li Z, Han J. Generalized Fisher Score for feature selection. CoRR (2012) 3:327–30.
    1. Tang J, Alelyani S, Liu H. Feature selection for classification: a review. In: Aggarwal CC, editor. Data Classification Algorithms and Applications. Boca Raton, FL: CRC Press; (2014). p. 37–64.
    1. Duda RO, Hart PE, Stork DG. Pattern Classification. New York, NY: John Wiley and Sons; (2012).
    1. Lowry R, editor. One-way analysis of variance for independent samples. Concepts and Applications of Inferential Statistics. Poughkeepsie, NY: DOER – Directory of Open Educational Resources; (2014).
    1. Murphy KP. Machine Learning: A Probabilistic Perspective. Cambridge, MA: The MIT Press; (2012).
    1. Kohavi R, John G. Wrappers for feature subset selection. Artif Intell (1997) 97(97):273–324.10.1016/S0004-3702(97)00043-X
    1. Guyon I, Weston J, Barnhill S, Vapnik V. Gene selection for cancer classification using support vector machines. Mach Learn (2002) 46(1–3):389–422.10.1023/A:1012487302797
    1. Hastie T, Tibshirani RJ, Friedman J. The Elements of Statistical Learning: Data Mining, Inference and Prediction. 2 ed New York, NY: Springer; (2009).
    1. Ng AY. Feature selection, L1 vs. L2 regularization, and rotational invariance. In: Brodley C, editor. ICML 2004: Proceedings of the Twenty-First International Conference on Machine Learning Banff, Alberta, Canada/New York: ACM (2004). 78 p.
    1. Bishop CM. Pattern Recognition and Machine Learning. 1 ed New York, NY: Springer; (2006).
    1. Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc B (2005) 67:301–20.10.1111/j.1467-9868.2005.00527.x
    1. Burges CJC. A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov (1998) 2:121–67.10.1023/A:1009715923555
    1. Geurts P, Ernst D, Wehenkel L. Extremely randomized trees. Mach Learn (2006) 63(1):3–42.10.1007/s10994-006-6226-1
    1. Freund Y, Schapire RE. A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci (1997) 55(1):119–39.10.1006/jcss.1997.1504

Source: PubMed

3
S'abonner