Circular analysis in systems neuroscience: the dangers of double dipping

Nikolaus Kriegeskorte, W Kyle Simmons, Patrick S F Bellgowan, Chris I Baker, Nikolaus Kriegeskorte, W Kyle Simmons, Patrick S F Bellgowan, Chris I Baker

Abstract

A neuroscientific experiment typically generates a large amount of data, of which only a small fraction is analyzed in detail and presented in a publication. However, selection among noisy measurements can render circular an otherwise appropriate analysis and invalidate results. Here we argue that systems neuroscience needs to adjust some widespread practices to avoid the circularity that can arise from selection. In particular, 'double dipping', the use of the same dataset for selection and selective analysis, will give distorted descriptive statistics and invalid statistical inference whenever the results statistics are not inherently independent of the selection criteria under the null hypothesis. To demonstrate the problem, we apply widely used analyses to noise data known to not contain the experimental effects in question. Spurious effects can appear in the context of both univariate activation analysis and multivariate pattern-information analysis. We suggest a policy for avoiding circularity.

Figures

Fig. 1. Intuitive diagrams for understanding circular…
Fig. 1. Intuitive diagrams for understanding circular analysis
(a) The top row serves to remind us that our results reflect our data indirectly: through the lens of an often complicated analysis, whose assumptions are not always fully explicit. The bottom row illustrates how the assumptions (and hypotheses) can interact with the data to shape the results. Ideally (bottom left), the results reflect some aspect of the data (blue) without distortion (although the assumptions will determine what aspect of the data is reflected in the results). But sometimes (bottom center) a close inspection of the analysis reveals that the data get lost in the process and the assumptions (red) predetermine the results. In that case the analysis is completely circular (red dotted line). More frequently in practice (bottom right), the assumptions tinge the results (magenta). The results are then distorted by circularity, but still reflect the data to some degree (magenta dotted lines). (b) Three diagrams illustrate the three most common causes of circularity: selection (left), weighting (center), and sorting (right). Selection, weighting, and sorting criteria reflect assumptions and hypotheses (red). Each of the three can tinge the results, distorting the estimates presented and invalidating statistical tests, if the results statistics are not independent of the criteria for selection, weighting, or sorting.
Fig. 2. Example 1: Data selection can…
Fig. 2. Example 1: Data selection can bias pattern-information analysis
(a) In order to assess to what extent human inferior-temporal activity patterns reflect bottom-up sensory signals and top-down task constraints, we measured activity patterns with fMRI while subjects viewed object images of different categories and judged either whether the object shown was “animate” (task 1) or whether it was “pleasant” (task 2). (b) We selected all inferior-temporal voxels for which any two-sided t test contrasting two conditions was significant at p<0.001 (uncorrected for multiple tests). We then cleanly divided the data by using odd runs for training and even runs for testing. We used a linear classifier to determine whether the activity pattern would allow us to decode the stimulus category (light gray bars) and the judgment task (dark gray bars). Results (top left) suggested that both stimulus and task can be decoded with high accuracy, significantly above chance. However, application of the same analysis to Gaussian random data (top right), also suggested high decoding accuracies significantly above chance. This shows that spurious effects can appear when data from the test set is used in the initial data-selection process. Such spurious effects can be avoided by performing selection using data independent of the test data (bottom row). Error bars indicate +/−1 across-subject standard error of the mean. For details on experiment and analysis, see Example 1: Pattern-information analysis.
Fig. 3. Example 2: ROI definition can…
Fig. 3. Example 2: ROI definition can bias activation analysis
A simulated fMRI block-design experiment demonstrates that nonindependent ROI definition can distort effects and produce spuriously significant results, even when the ROI is defined by rigorous mapping procedures (accounting for multiple tests) and highlights a truly activated region. Error bars indicate +/− 1 standard error of the mean. (a) The layout of this panel matches the intuitive diagrams of Fig. 1a: The data in Fig. 1a correspond to the true effects (left); the assumptions to the contrast hypothesis (top), and the results to ROI-average activation analyses (right). A 100-voxel region (blue contour in central slice map) was simulated to be active during conditions A and B, but not during conditions C and D (left). The t map for contrast A-D is shown for the central slice through the region (center). When thresholded at p<0.05 (corrected for multiple tests by a cluster threshold criterion), a cluster appears (magenta contour), which highlights the true activated region (blue contour). The ROI is somewhat affected by the noise in the data (difference between blue and magenta contours). The noise pushes some truly activated voxels below the threshold and lifts some nonactivated voxels above the threshold (white arrows). This can be interpreted as overfitting. The bar graph for the overfitted ROI (bottom right, same data as used for mapping) reflects the activation of the region during conditions A and B as well as the absence of activation during conditions C and D. However, in comparison to the true effects (left) it is substantially distorted by the selection contrast A-D (top). In particular, the contrast A-B (simulated to be zero) exhibits spurious significance (p<0.01). When we use independent data to define the ROI (green contour), no such distortion is observed (top right). For details on the simulation and analysis, see Example 2: Regional activation analysis in the text. (b) The simulation illustrates how data selection blends truth (left) and hypothesis (right) by distorting results (top) so as to better conform to the selection criterion.
Fig. 4. A policy for noncircular analysis
Fig. 4. A policy for noncircular analysis
This flow diagram suggests a procedure for choosing an appropriate analysis that avoids the pitfalls of circularity. Considering the most common errors (bottom left, red letter references) can help recognize circularity in assessing a given analysis. The gist of the policy is as follows: We first consider performing a nonselective analysis only. If selective analysis is needed and we can demonstrate that the results are independent of the selection criterion under the null hypothesis, then all data are used for selective analysis. If we cannot demonstrate this, then a split-data analysis can serve to ensure independence. (For details, see Supplementary Information, A policy for noncircular analysis.)

Source: PubMed

3
订阅