Radiomics Repeatability Pitfalls in a Scan-Rescan MRI Study of Glioblastoma

Katharina V Hoebel, Jay B Patel, Andrew L Beers, Ken Chang, Praveer Singh, James M Brown, Marco C Pinho, Tracy T Batchelor, Elizabeth R Gerstner, Bruce R Rosen, Jayashree Kalpathy-Cramer, Katharina V Hoebel, Jay B Patel, Andrew L Beers, Ken Chang, Praveer Singh, James M Brown, Marco C Pinho, Tracy T Batchelor, Elizabeth R Gerstner, Bruce R Rosen, Jayashree Kalpathy-Cramer

Abstract

Purpose: To determine the influence of preprocessing on the repeatability and redundancy of radiomics features extracted using a popular open-source radiomics software package in a scan-rescan glioblastoma MRI study.

Materials and methods: In this study, a secondary analysis of T2-weighted fluid-attenuated inversion recovery (FLAIR) and T1-weighted postcontrast images from 48 patients (mean age, 56 years [range, 22-77 years]) diagnosed with glioblastoma were included from two prospective studies (ClinicalTrials.gov NCT00662506 [2009-2011] and NCT00756106 [2008-2011]). All patients underwent two baseline scans 2-6 days apart using identical imaging protocols on 3-T MRI systems. No treatment occurred between scan and rescan, and tumors were essentially unchanged visually. Radiomic features were extracted by using PyRadiomics (https://pyradiomics.readthedocs.io/) under varying conditions, including normalization strategies and intensity quantization. Subsequently, intraclass correlation coefficients were determined between feature values of the scan and rescan.

Results: Shape features showed a higher repeatability than intensity (adjusted P < .001) and texture features (adjusted P < .001) for both T2-weighted FLAIR and T1-weighted postcontrast images. Normalization improved the overlap between the region of interest intensity histograms of scan and rescan (adjusted P < .001 for both T2-weighted FLAIR and T1-weighted postcontrast images), except in scans where brain extraction fails. As such, normalization significantly improves the repeatability of intensity features from T2-weighted FLAIR scans (adjusted P = .003 [z score normalization] and adjusted P = .002 [histogram matching]). The use of a relative intensity binning strategy as opposed to default absolute intensity binning reduces correlation between gray-level co-occurrence matrix features after normalization.

Conclusion: Both normalization and intensity quantization have an effect on the level of repeatability and redundancy of features, emphasizing the importance of both accurate reporting of methodology in radiomics articles and understanding the limitations of choices made in pipeline design. Supplemental material is available for this article. © RSNA, 2020See also the commentary by Tiwari and Verma in this issue.

Conflict of interest statement

Disclosures of Conflicts of Interest: K.V.H. disclosed no relevant relationships. J.B.P. disclosed no relevant relationships. A.L.B. disclosed no relevant relationships. K.C. disclosed no relevant relationships. P.S. disclosed no relevant relationships. J.M.B. disclosed no relevant relationships. M.C.P. disclosed no relevant relationships. T.T.B. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: received research support from Champions Biotechnology, AstraZeneca, Pfizer, and Millennium; is on the advisory board for UpToDate; is a consultant for Genomicare, Merck, NXDC, Amgen, Roche, Oxigene, Foundation Medicine, and Proximagen; provided CME lectures or material for UpToDate, Research to Practice, Oakstone Medical Publishing, and Imedex. Other relationships: disclosed no relevant relationships. E.R.G. disclosed no relevant relationships. B.R.R. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: is on the advisory board for ARIA, Butterfly, DGMIF (Daegu-Gyeongbuk Medical Innovation Foundation), QMENTA, and Subtle Medical; is a consultant for Broadview Ventures, Janssen Scientific, ECRI Institute, GlaxoSmithKline, Hyperfine Research, Peking University, Wolf Greenfield, Superconducting Systems, Robins Kaplin, Millennium Pharmaceuticals, GE Healthcare, Siemens, Quinn Emanuel Trial Lawyers, Samsung, and Shenzhen Maternity and Child Health Care Hospital; is a founder of BLINKAI Technologies. Other relationships: disclosed no relevant relationships. J.K.C. Activities related to the present article: disclosed no relevant relationships. Activities not related to the present article: is a consultant and advisory board member for Infotech, Soft. Other relationships: disclosed no relevant relationships.

2020 by the Radiological Society of North America, Inc.

Figures

Figure 1:
Figure 1:
Distribution of intraclass correlation coefficient (ICC) values per feature group under default feature extraction settings. Each boxplot represents the distribution of one radiomics feature group (shape, intensity, texture) between scan and rescan for the cohort of 48 patients. A, T2-weighted fluid-attenuated inversion recovery (T2W-FLAIR). B, T1-weighted (T1W) postcontrast. Features were extracted from nonnormalized images using the PyRadiomics default settings (no normalization, constant bin width for intensity quantization).
Figure 2:
Figure 2:
Effect of normalization on the region of interest (ROI) intensity histograms. Intensity histograms of the ROI segmentations from the scan (blue) and rescan (orange) of representative cases on both T2-weighted fluid-attenuated inversion recovery (T2W-FLAIR) and T1-weighted (T1W) postcontrast sequences of, A, a representative case and, C, a failure case. The first column shows ROI intensity histograms without preprocessing; the second column, after brain extraction and normalization via histogram matching; and the third column, after brain extraction and z score normalization. The overlap between the histograms is quantified by Jensen-Shannon divergence (JSD). B, D, Axial sections from the T2-weighted FLAIR and T1-weighted postcontrast scan and rescan after brain extraction of the corresponding cases, A, C, respectively.
Figure 3:
Figure 3:
Jensen-Shannon divergence (JSD) distributions with and without brain extraction. Distribution of the JSD between the region of interest intensity histograms of the scan and rescan for the entire cohort using T2-weighted fluid-attenuated inversion recovery (T2W-FLAIR) (left) and T1-weighted (T1W) postcontrast (right) for not-normalized, z score–normalized, and histogram-matched images, each with (blue) and without (orange) brain extraction performed before normalization. For each normalization approach (no normalization, z score normalization, histogram-matched), the absence of brain extraction before normalization did not have a significant effect on the JSD.
Figure 4:
Figure 4:
Distribution of intensity and texture intraclass correlation coefficient (ICC) values under different conditions. ICC for, A, B, intensity and, C, D, texture features extracted from T2-weighted fluid-attenuated inversion recovery (T2W FLAIR) (left) and T1-weighted (T1W) postcontrast (right) using either z score normalization (z-score) or histogram matching (hist-m.) compared with features extracted from not-normalized (no norm) images. Significant differences in the feature group mean ICC between feature extraction strategies (paired Wilcoxon test) are indicated with brackets.
Figure 5:
Figure 5:
Distribution of intensity and texture intraclass correlation coefficient (ICC) values depending on the region of interest (ROI) definition. ICC for intensity and texture features extracted from, A, T2-weighted fluid-attenuated inversion recovery (T2W FLAIR) and, B, T1-weighted (T1W) postcontrast using manual ROI masks separately outlined for scan and rescan (blue) or the union of both masks to extract features from the scan as well as rescan (orange). There is no statistically significant difference (paired Wilcoxon test) in the ICC distributions between the ROI definitions.

Source: PubMed

3
Tilaa