Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates

Anders Eklund, Thomas E Nichols, Hans Knutsson, Anders Eklund, Thomas E Nichols, Hans Knutsson

Abstract

The most widely used task functional magnetic resonance imaging (fMRI) analyses use parametric statistical methods that depend on a variety of assumptions. In this work, we use real resting-state data and a total of 3 million random task group analyses to compute empirical familywise error rates for the fMRI software packages SPM, FSL, and AFNI, as well as a nonparametric permutation method. For a nominal familywise error rate of 5%, the parametric statistical methods are shown to be conservative for voxelwise inference and invalid for clusterwise inference. Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape. By comparison, the nonparametric permutation test is found to produce nominal results for voxelwise as well as clusterwise inference. These findings speak to the need of validating the statistical methods being used in the field of neuroimaging.

Keywords: cluster inference; fMRI; false positives; permutation test; statistics.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Results for one-sample t test, showing estimated FWE rates for (A) Beijing and (B) Cambridge data analyzed with 6 mm of smoothing and four different activity paradigms (B1, B2, E1, and E2), for SPM, FSL, AFNI, and a permutation test. These results are for a group size of 20. The estimated FWE rates are simply the number of analyses with any significant group activation divided by the number of analyses (1,000). From Left to Right: Cluster inference using a cluster-defining threshold (CDT) of P = 0.01 and a FWE-corrected threshold of P = 0.05, cluster inference using a CDT of P = 0.001 and a FWE-corrected threshold of P = 0.05, and voxel inference using a FWE-corrected threshold of P = 0.05. Note that the default CDT is P = 0.001 in SPM and P = 0.01 in FSL (AFNI does not have a default setting).
Fig. 2.
Fig. 2.
Results for two-sample t test and ad hoc clusterwise inference, showing estimated FWE rates for 6 mm of smoothing and four different activity paradigms (B1, B2, E1, and E2), for SPM, FSL, and AFNI. These results were generated using the Beijing data and 20 subjects in each group analysis. Each statistic map was first thresholded using a CDT of P = 0.001 (uncorrected for multiple comparisons), and the surviving clusters were then compared with a cluster extent threshold of 80 mm3 (10 voxels for SPM and FSL which used 2 × 2 × 2 mm3 voxels, three voxels for AFNI, which used 3 × 3 × 3 mm3 voxels). The estimated FWE rates are simply the number of analyses with a significant result divided by the number of analyses (1,000).

Source: PubMed

3
구독하다