A study of clustered data and approaches to its analysis

Sally Galbraith, James A Daniel, Bryce Vissel, Sally Galbraith, James A Daniel, Bryce Vissel

Abstract

Statistical analysis is critical in the interpretation of experimental data across the life sciences, including neuroscience. The nature of the data collected has a critical role in determining the best statistical approach to take. One particularly prevalent type of data is referred to as "clustered data." Clustered data are characterized as data that can be classified into a number of distinct groups or "clusters" within a particular study. Clustered data arise most commonly in neuroscience when data are compiled across multiple experiments, for example in electrophysiological or optical recordings taken from synaptic terminals, with each experiment providing a distinct cluster of data. However, there are many other types of experimental design that can yield clustered data. Here, we provide a statistical model for intracluster correlation and systematically investigate a range of methods for analyzing clustered data. Our analysis reveals that it is critical to take data clustering into account and suggests appropriate statistical approaches that can be used to account for data clustering.

Figures

Figure 1.
Figure 1.
A sample dataset generated from model 1. For illustrative purposes, a single dataset generated under model 1 is shown. On the left are shown the individual observations using different colors for the different clusters within each group. In this simulation, no difference exists between group 1 and group 2 data. However, the similarity of observations within a cluster, which induces intracluster correlation, is apparent here, as observations within each cluster lie close to each other. On the right are shown the data reduced to cluster means. The color of each mean represents the cluster to which it belongs.

Source: PubMed

3
Abonnieren