Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies

R Da-Ano, I Masson, F Lucia, M Doré, P Robin, J Alfieri, C Rousseau, A Mervoyer, C Reinhold, J Castelli, R De Crevoisier, J F Rameé, O Pradier, U Schick, D Visvikis, M Hatt, R Da-Ano, I Masson, F Lucia, M Doré, P Robin, J Alfieri, C Rousseau, A Mervoyer, C Reinhold, J Castelli, R De Crevoisier, J F Rameé, O Pradier, U Schick, D Visvikis, M Hatt

Abstract

Multicenter studies are needed to demonstrate the clinical potential value of radiomics as a prognostic tool. However, variability in scanner models, acquisition protocols and reconstruction settings are unavoidable and radiomic features are notoriously sensitive to these factors, which hinders pooling them in a statistical analysis. A statistical harmonization method called ComBat was developed to deal with the "batch effect" in gene expression microarray data and was used in radiomics studies to deal with the "center-effect". Our goal was to evaluate modifications in ComBat allowing for more flexibility in choosing a reference and improving robustness of the estimation. Two modified ComBat versions were evaluated: M-ComBat allows to transform all features distributions to a chosen reference, instead of the overall mean, providing more flexibility. B-ComBat adds bootstrap and Monte Carlo for improved robustness in the estimation. BM-ComBat combines both modifications. The four versions were compared regarding their ability to harmonize features in a multicenter context in two different clinical datasets. The first contains 119 locally advanced cervical cancer patients from 3 centers, with magnetic resonance imaging and positron emission tomography imaging. In that case ComBat was applied with 3 labels corresponding to each center. The second one contains 98 locally advanced laryngeal cancer patients from 5 centers with contrast-enhanced computed tomography. In that specific case, because imaging settings were highly heterogeneous even within each of the five centers, unsupervised clustering was used to determine two labels for applying ComBat. The impact of each harmonization was evaluated through three different machine learning pipelines for the modelling step in predicting the clinical outcomes, across two performance metrics (balanced accuracy and Matthews correlation coefficient). Before harmonization, almost all radiomic features had significantly different distributions between labels. These differences were successfully removed with all ComBat versions. The predictive ability of the radiomic models was always improved with harmonization and the improved ComBat provided the best results. This was observed consistently in both datasets, through all machine learning pipelines and performance metrics. The proposed modifications allow for more flexibility and robustness in the estimation. They also slightly but consistently improve the predictive power of resulting radiomic models.

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Workflow for the analysis in LACC and LALC datasets.
Figure 2
Figure 2
PCA and summary distribution in LACC: Scatter plots of top 2 principal components of the radiomic features across the three labels (centers) using untransformed data or data transformed with the 4 versions of ComBat. (using R (3.5.1) and R Studio (1.1.456, R Studios Inc., Boston, MA, https://cran.rproject. org/).
Figure 3
Figure 3
PCA and summary distribution in LALC: Scatter plots of top 2 principal components of theradiomic features across the two labels (clusters) using untransformed data or data transformed with the 4 versions of ComBat (using R (3.5.1) and R Studio (1.1.456, R Studios Inc., Boston, MA, https://cran.rproject. org/).

References

    1. Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images are more than pictures, they are data. Radiology. 2016;2:563–77. doi: 10.1148/radiol.2015151169.
    1. Lambin P, et al. Radiomics: Extracting more information from medical imagesusing advanced feature analysis. Eur. J. Cancer. 2012;48:441–446. doi: 10.1016/j.ejca.2011.11.036.
    1. Kumar V, et al. Radiomics: The process and the challenges. Magn. Respn. Imag. 2012;30:1234–1248. doi: 10.1016/j.mri.2012.06.010.
    1. Laure, R. T., Defraene, G., De Ruysscher, D., Lambin, P. & van Elmpt, W. Quantitativeradiomics studies for tissue characterization: A review of technology and methodological procedures. Brit. J. Radiol. 90 (2017).
    1. Lambin P, et al. Radiomics: extracting more information from medical images using advanced feature analysis. European journal of cancer. 2012;48:441–446. doi: 10.1016/j.ejca.2011.11.036.
    1. Leijenaar P, et al. Radiomics: the bridge between medical imaging and personalized medicine. Nature Reviews Clinical Oncology. 2017;14:749. doi: 10.1038/nrclinonc.2017.141.
    1. Sun C, et al. Radiomic analysis for pretreatment predictionof response to neoadjuvant chemotherapy in locally advanced cervical cancer: A multicentre study. EBioMedicine. 2019;46:160–169. doi: 10.1016/j.ebiom.2019.07.
    1. Dissaux, G. et al. Pre-treatment 18f-fdg pet/ct radiomics predict local recurrence in patients treated with stereotactic radiotherapy for early-stage non-small cell lung cancer: a multicentric study. J Nucl Med., 10.2967/jnumed.119.228106 (2019).
    1. Lucia F, et al. External validation of a combined pet and mri radiomics for prediction of recurrence in cervical cancer patients treated with chemotheraphy. Eur J Nucl Med Mol Imaging. 2019;46:864–877. doi: 10.1007/s00259-018-4231-9.
    1. Bai ZC, et al. Multiregional radiomics features from multiparametric mri for prediction of mgmt methylation status in glioblastoma multiforme: A multicentre study. Eur Radiol. 2018;28:3640–3650. doi: 10.1007/s00330-017-5302-1.
    1. Zwanenburg A, Löck S. Why validation of prognostic models matters? Radiother Oncol. 2018;127:370–373. doi: 10.1016/j.radonc.2018.03.004.
    1. Hatt M, Lucia F, Schick U, Visvikis D. Multicentric validation of radiomics findings:challenges and opportunities. EBioMedicine. 2019;47:20–21. doi: 10.1016/j.ebiom.2019.08.054.
    1. Galavis PE, Hollensen C, Jallow N, Paliwal B, Jeraj R. Variability of textural featuresin fdg pet images due to different acquisition modes and reconstruction parameters. Acta Oncol. 2010;49:1012–1016. doi: 10.3109/0284186X.2010.498437.
    1. Yan J, et al. Impact of image reconstruction settings on texture features in 18f-fdg pet. J Nucl Med. 2015;56:1667–1673. doi: 10.2967/jnumed.115.156927.
    1. Peerlings J, et al. Stability ofradiomics features in apparent diffusion coefficient maps from a multi-centre test-retest trial. Sci Rep. 2019;9:4800. doi: 10.1038/s41598-019-41344-5..
    1. Shafiq-UI-Hassan M, et al. Intrinsic dependencies of ct radiomic features on voxel size and number of gray levels. Med Phys. 2007;44:1050–1062. doi: 10.1002/mp.12123.
    1. Luo R, et al. Radiomics features harmonization for ct and cbct in rectal cancer. Radiotherapy and Oncology. 2017;123:S81–S82. doi: 10.1016/S0167-8140(17)30603-5.
    1. Boellaard R, et al. Fdg pet/ct: Eanm procedureguidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging. 2015;42:328–354. doi: 10.1007/s00259-014-2961-x.
    1. Kaalep A, et al. Feasibility of state of the art pet/ct systems for performance harmonization. Eur J Nucl Med Mol Imaging. 2018;45:1344–1361. doi: 10.1007/s00259-018-3977-4.
    1. Choe J, et al. Deep learning-based image conversion of ct reconstruction kernels improves radiomics reproducibility for pulmonary nodules or masses. Radiology. 2019;292:365–373. doi: 10.1148/radiol.2019181960.
    1. Hognon, C. et al. Standardization of multicentric image datasets with generative adversarial networks. IEEE MIC (2019).
    1. Chatterjee, A. et al. Creating robust predictive radiomic models for data from independent institutions using normalization. IEEE Trans Radiat Plasma Med Sci. 1–1, 10.1109/TRPMS.2019.2893860 (2019).
    1. Orlhac, F. et al. A post-reconstruction harmonization method for multicenter radiomic studies in pet. J Nucl Med. 10.2967/jnumed.117.199935 (2018).
    1. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics. 2007;8:118–27. doi: 10.1093/biostatistics/kxj037.
    1. Stein CK, et al. Removing batch effects from purified plasma cell gene expression microarrays with modified combat. BMC Bioinformatics. 2015;16:63. doi: 10.1186/s12859-015-047803.
    1. Chen C, et al. Removing batch effects in analysis of expression microarray data: An evaluation of six batch adjustment methods. PLoS ONE. 2011;6:17238. doi: 10.1371/journal.pone.0017238.
    1. Luo J, et al. A comparison of batch effect removal methods for enhancement of prediction performance using maqc-ii microarry gene expression data. Pharmacogenomics J. 2010;10:278–91. doi: 10.1038/tpj.2010.57.
    1. Kupfer P, et al. Batch correction of microarray data substantially improves the identification of genes differentially expressed in rheumatoid arthritis and osteoarthritis. BMC Med Genomics. 2012;5:23. doi: 10.1186/1755-8794-5-23.
    1. Konstantinopoulos PA, et al. Integrated analysis of multiple microarray datasets identifies a reproducible survival predictor in ovarian cancer. PLoS ONE. 2011;6:18202. doi: 10.1371/journal.pone.0018202.
    1. Lucia F, et al. Prediction of outcome using pretreatment 18 f-fdg pet/ct and mri radiomics in locally advanced cervical cancer treated with chemoradiotherapy. European journal of nuclear medicine and molecular imaging. 2018;45:768–786. doi: 10.1007/s00259-017-3898-7.
    1. Hatt M, Le Rest CC, Turzo A, Roux C, Visvikis D. A fuzzy locally adaptive bayesian segmentation approach for volume determination in pet. IEEE transactions on medical imaging. 2009;28:881–893. doi: 10.1109/TMI.2008.2012036.
    1. Pieper, S., Halle, M. & Kikinis, R. 3d slicer”, in biomedical imaging: Nano to macro. IEEE International Symposium on. IEEE 632–635, 10.1109/ISBI.2004.1398617 (2004).
    1. Zwanenburg, A. et al. Image biomarker standardisation initiative-feature definitions. arXiv preprint arXiv:1612.07003 (2016).
    1. Zwanenburg, A. et al. The image biomarker standardization initiative: standardized quantitative radiomics for high throughput image-based phenotyping. Radiology 295(2), 328–338 (2020).
    1. Murtagh, F. & Contreras, P. Methods of hierarchical clustering. ArXiv11050121 Cs Math Stat (2011).
    1. Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;1:53–65. doi: 10.1016/0377-0427(87)90125-7.
    1. Witten, I. H., Frank, E., Hall, M. A. & Pal, C. J. Datamining: Practical machine learning tools and techniques. Morgan Kaufmann (2016).
    1. Fonti, V. & Belitser, E. Feature selection using lasso. Research Paper in Business Analytics (2017).
    1. Breiman L, Kinahan PE, Hricak H. “random forests”. Machine learning. 2001;45:5–32. doi: 10.1023/A:1010933404324.
    1. Vapnik, V. N. The nature of statistical learning theory. New York: Springer-Verlag (1995).
    1. Hastie, T., Tibshirani, R. & Friedman, J. Unsupervised learning”, in the elements of statistical learning. Springer 485–585 (2009).
    1. Varma S, Simon R. Bias in error estimation when using cross- validation for model selection. BMC bioinformatics. 2006;7:91. doi: 10.1186/1471-2105-7-91.
    1. Lal, T. N., Chapelle, O., Weston, J. & Elisseeff, A. Embedded methods” in feature extraction: Foundations and applications studies in fuzziness and soft computing. Physica-Verlag, Springer 137–165 (2006).
    1. Chicco D. Ten quick tips for machine learning in computational biology. BioData mining. 2017;10:35. doi: 10.1186/s13040-017-0155-3.
    1. Deist TM, et al. Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers. Med Phys. 2018;45:3449–3459. doi: 10.1002/mp.12967.
    1. Upadhaya RJ, et al. Comparison of radiomics models built through machine learning in a multicentric context with independent testing: Identical data, similar algorithms, different methodologies. IEEE Trans. Radiat. Plasma Med. Sci. 2019;3:192–200. doi: 10.1109/TRPMS.2018.2878934.
    1. Muller, C. et al. Removing batch effects from longitudinal gene expression–quantile normalization plus combat as best approach for microarray transciptome data. Radiology, 10.1371/journal.pone0156594 (2016).
    1. Shafiq-UI-Hassan M, et al. Voxel size and gray level normalization of ct radiomic features in lung cancer. Sci Rep. 2018;8:0545. doi: 10.1038/s41598-017-19071-6.
    1. Olrhac, F. et al. Validation of a method to compensate multicenter effects affecting ct radiomics features. Radiological Society of North America, In press. <hal-01953538> 219 (2019).

Source: PubMed

3
S'abonner