Parallel but not equivalent: challenges and solutions for repeated assessment of cognition over time

Alden L Gross, Sharon K Inouye, George W Rebok, Jason Brandt, Paul K Crane, Jeanine M Parisi, Doug Tommet, Karen Bandeen-Roche, Michelle C Carlson, Richard N Jones, Alden L Gross, Sharon K Inouye, George W Rebok, Jason Brandt, Paul K Crane, Jeanine M Parisi, Doug Tommet, Karen Bandeen-Roche, Michelle C Carlson, Richard N Jones

Abstract

Objective: Analyses of individual differences in change may be unintentionally biased when versions of a neuropsychological test used at different follow-ups are not of equivalent difficulty. This study's objective was to compare mean, linear, and equipercentile equating methods and demonstrate their utility in longitudinal research.

Study design and setting: The Advanced Cognitive Training for Independent and Vital Elderly (ACTIVE, N = 1,401) study is a longitudinal randomized trial of cognitive training. The Alzheimer's Disease Neuroimaging Initiative (ADNI, n = 819) is an observational cohort study. Nonequivalent alternate versions of the Auditory Verbal Learning Test (AVLT) were administered in both studies.

Results: Using visual displays, raw and mean-equated AVLT scores in both studies showed obvious nonlinear trajectories in reference groups that should show minimal change and poor equivalence over time (ps ≤ .001), and raw scores demonstrated poor fits in models of within-person change (root mean square errors of approximation, RMSEAs > 0.12). Linear and equipercentile equating produced more similar means in reference groups (ps ≥ .09) and performed better in growth models (RMSEAs < 0.05).

Conclusion: Equipercentile equating is the preferred equating method because it accommodates tests more difficult than a reference test at different percentiles of performance and performs well in models of within-person trajectory. The method has broad applications in both clinical and research settings to enhance the ability to use nonequivalent test forms.

Figures

Figure A1
Figure A1
Parallel but Nonequivalent Forms: Cumulative Probability Plots of Raw and Equated AVLT Scores in ACTIVE Legend. Cumulative probability plots overlay distributions of raw or equated AVLT recall sum scores among control participants from each ACTIVE study visit (n=698). Plots for each visit are not labeled clearly because the purpose of this diagnostic plot is to assess degree of overlap of each visit’s cumulative distribution; visits are plotted in the following colors (baseline: gray; immediate post-training: black; first annual: red; second annual: blue; third annual: green; fifth annual: orange). Results suggest linear and equipercentile equating produces more overlap than other methods. As an example of how to interpret this plot, in the raw scores panel, the blue line shows that about 30% of ACTIVE control participants recalled up to 39 words in year 1, and 100% of participants at all waves recalled 75 words or less (the test's ceiling).
Figure A2
Figure A2
Parallel but Nonequivalent Forms: Cumulative Probability Plots of Raw and Equated AVLT Scores in ADNI Legend. Cumulative probability plots overlay distributions of raw or equated AVLT recall sum scores among MCI patients from each ADNI study visit (n=397). Plots for each visit are not labeled labeled clearly because the purpose of this diagnostic plot is to assess degree of overlap of each visit’s cumulative distribution; visits are plotted in the following colors (baseline: black; 6 month: red; 12 month: blue; 18 month: green; 24 month: orange; 30 month: yellow; 36 month: gray). Results suggest equipercentile equating produces more overlap than other methods. As an example of how to interpret this plot, in the raw scores panel, the gray line shows that about 10% of ADNI MCI participants recalled up to 12 words at 36 months, and 100% of participants at all waves recalled 75 words or less (the test's ceiling).
Figure 1
Figure 1
Parallel but Nonequivalent Forms: Plots of Raw and Equated AVLT Scores Over Time in ACTIVE (N=1,401) Legend. Time trend plots present means of AVLT scores by study visit in the ACTIVE control and memory-trained groups. Means in the time trend plots are adjusted for selective attrition using random effects models that assume data are missing at random conditional on indicators for time and group. Letters correspond to AVLT list versions administered at a visit: in ACTIVE, the baseline and year 3, and post-training and year 5, study visits used the same AVLT form.
Figure 2
Figure 2
Parallel but Nonequivalent Forms: Plots of Raw and Equated AVLT Scores Over Time in ADNI MCI Participants (N=397) Legend. Time trend plots present means of AVLT scores by study visit in the ADNI MCI group. Means in the time trend plots are adjusted for selective attrition using random effects models that assume data are missing at random conditional on indicators for time and group. Letters correspond to AVLT list versions administered at a visit: in ADNI, the baseline, 12 month, and 24 month visits used the same form and the 6 month, 18 month, and 36 month visits used a different form.

Source: PubMed

3
Abonner