Comparing Web-Based and Lab-Based Cognitive Assessment Using the Cambridge Neuropsychological Test Automated Battery: A Within-Subjects Counterbalanced Study

Rosa Backx, Caroline Skirrow, Pasquale Dente, Jennifer H Barnett, Francesca K Cormack, Rosa Backx, Caroline Skirrow, Pasquale Dente, Jennifer H Barnett, Francesca K Cormack

Abstract

Background: Computerized assessments are already used to derive accurate and reliable measures of cognitive function. Web-based cognitive assessment could improve the accessibility and flexibility of research and clinical assessment, widen participation, and promote research recruitment while simultaneously reducing costs. However, differences in context may influence task performance.

Objective: This study aims to determine the comparability of an unsupervised, web-based administration of the Cambridge Neuropsychological Test Automated Battery (CANTAB) against a typical in-person lab-based assessment, using a within-subjects counterbalanced design. The study aims to test (1) reliability, quantifying the relationship between measurements across settings using correlational approaches; (2) equivalence, the extent to which test results in different settings produce similar overall results; and (3) agreement, by quantifying acceptable limits to bias and differences between measurement environments.

Methods: A total of 51 healthy adults (32 women and 19 men; mean age 36.8, SD 15.6 years) completed 2 testing sessions, which were completed on average 1 week apart (SD 4.5 days). Assessments included equivalent tests of emotion recognition (emotion recognition task [ERT]), visual recognition (pattern recognition memory [PRM]), episodic memory (paired associate learning [PAL]), working memory and spatial planning (spatial working memory [SWM] and one touch stockings of Cambridge), and sustained attention (rapid visual information processing [RVP]). Participants were randomly allocated to one of the two groups, either assessed in-person in the laboratory first (n=33) or with unsupervised web-based assessments on their personal computing systems first (n=18). Performance indices (errors, correct trials, and response sensitivity) and median reaction times were extracted. Intraclass and bivariate correlations examined intersetting reliability, linear mixed models and Bayesian paired sample t tests tested for equivalence, and Bland-Altman plots examined agreement.

Results: Intraclass correlation (ICC) coefficients ranged from ρ=0.23-0.67, with high correlations in 3 performance indices (from PAL, SWM, and RVP tasks; ρ≥0.60). High ICC values were also seen for reaction time measures from 2 tasks (PRM and ERT tasks; ρ≥0.60). However, reaction times were slower during web-based assessments, which undermined both equivalence and agreement for reaction time measures. Performance indices did not differ between assessment settings and generally showed satisfactory agreement.

Conclusions: Our findings support the comparability of CANTAB performance indices (errors, correct trials, and response sensitivity) in unsupervised, web-based assessments with in-person and laboratory tests. Reaction times are not as easily translatable from in-person to web-based testing, likely due to variations in computer hardware. The results underline the importance of examining more than one index to ascertain comparability, as high correlations can present in the context of systematic differences, which are a product of differences between measurement environments. Further work is now needed to examine web-based assessments in clinical populations and in larger samples to improve sensitivity for detecting subtler differences between test settings.

Keywords: CANTAB; cognition; mobile health; neuropsychological tests; reliability.

Conflict of interest statement

Conflicts of Interest: All authors are employed by Cambridge Cognition and have no other conflicts of interest to declare.

©Rosa Backx, Caroline Skirrow, Pasquale Dente, Jennifer H Barnett, Francesca K Cormack. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 04.08.2020.

Figures

Figure 1
Figure 1
Screenshots of Cambridge Neuropsychological Test Automated Battery tests administered: (A) Paired Associate Learning, (B) One Touch Stockings of Cambridge, (C) Pattern Recognition Memory, (D) Spatial Working Memory, (E) Emotion Recognition Task, and (F) Rapid Visual Information Processing.
Figure 2
Figure 2
Comparability of Paired Associate Learning Total Errors Adjusted across test settings. Density plot for (A) web-based assessment and (B) in-person assessment showing similar distributions; (C) scatterplot with reference line showing linear relationship between assessment settings (ρ=0.54); (D) Bland-Altman plot: mean difference (solid black line) is close to zero, showing no bias; dashed lines delimit limits of agreement. Comparable magnitudes of difference are seen throughout the range of measurements, and 96% of the data within limits of agreement.
Figure 3
Figure 3
Comparison of Paired Associate Learning First Attempt Memory Score across test settings. Density plot for (A) web-based assessment and (B) in-person assessment showing similar distributions; (C) scatterplot with reference line showing linear relationship between assessment settings (ρ=0.45); (D) Bland-Altman plot: mean difference (solid black line) is close to zero, showing no bias; dashed lines delimit limits of agreement. Proportional bias is seen: greater differences at lower mean measurements and 94% of data within limits of agreement.
Figure 4
Figure 4
Comparability of Emotion Recognition Task median correct reaction time (in ms) across test settings. Density plot for (A) web-based assessment and (B) in-person assessment, showing broader distribution of timings (range 500-3000 ms) and slower overall timings for web-based assessment compared to in-person assessment (range 500-2500 ms); (C) scatterplot with reference line showing strong linear relationship between assessment settings (ρ=0.73); (D) Bland-Altman plot: mean difference (solid black line) is shifted above zero, demonstrating bias; dashed lines show limits of agreement. Comparable magnitudes of difference are seen throughout the range of measurements, and 94% of the data within limits of agreement.

References

    1. Morrison GE, Simone CM, Ng NF, Hardy JL. Reliability and validity of the neurocognitive performance test, a web-based neuropsychological assessment. Front Psychol. 2015;6:1652. doi: 10.3389/fpsyg.2015.01652. doi: 10.3389/fpsyg.2015.01652.
    1. Haworth CM, Harlaar N, Kovas Y, Davis OS, Oliver BR, Hayiou-Thomas ME, Frances J, Busfield P, McMillan A, Dale PS, Plomin R. Internet cognitive testing of large samples needed in genetic research. Twin Res Hum Genet. 2007 Aug;10(4):554–63. doi: 10.1375/twin.10.4.554.
    1. Feenstra HE, Vermeulen IE, Murre JM, Schagen SB. Online cognition: factors facilitating reliable online neuropsychological test results. Clin Neuropsychol. 2017 Jan;31(1):59–84. doi: 10.1080/13854046.2016.1190405.
    1. Kraut R, Olson J, Banaji M, Bruckman A, Cohen J, Couper M. Psychological research online: report of board of scientific affairs' advisory group on the conduct of research on the internet. Am Psychol. 2004;59(2):105–17. doi: 10.1037/0003-066X.59.2.105.
    1. Hansen T, Lehn H, Evensmoen H, Håberg A. Initial assessment of reliability of a self-administered web-based neuropsychological test battery. Comput Hum Behav. 2016 Oct;63:91–7. doi: 10.1016/j.chb.2016.05.025.
    1. Barenboym DA, Wurm LH, Cano A. A comparison of stimulus ratings made online and in person: gender and method effects. Behav Res Methods. 2010 Mar;42(1):273–85. doi: 10.3758/BRM.42.1.273.
    1. Gosling SD, Vazire S, Srivastava S, John OP. Should we trust web-based studies? A comparative analysis of six preconceptions about internet questionnaires. Am Psychol. 2004;59(2):93–104. doi: 10.1037/0003-066X.59.2.93.
    1. Bauer RM, Iverson GL, Cernich AN, Binder LM, Ruff RM, Naugle RI. Computerized neuropsychological assessment devices: joint position paper of the American academy of clinical neuropsychology and the national academy of neuropsychology. Clin Neuropsychol. 2012;26(2):177–96. doi: 10.1080/13854046.2012.663001.
    1. Hoskins LL, Binder LM, Chaytor NS, Williamson DJ, Drane DL. Comparison of oral and computerized versions of the word memory test. Arch Clin Neuropsychol. 2010 Nov;25(7):591–600. doi: 10.1093/arclin/acq060.
    1. Skitka LJ, Sargis EG. The internet as psychological laboratory. Annu Rev Psychol. 2006;57:529–55. doi: 10.1146/annurev.psych.57.102904.190048.
    1. Schmand B. Why are neuropsychologists so reluctant to embrace modern assessment techniques? Clin Neuropsychol. 2019 Feb;33(2):209–19. doi: 10.1080/13854046.2018.1523468.
    1. Parsons TD, McMahan T, Kane R. Practice parameters facilitating adoption of advanced technologies for enhancing neuropsychological assessment paradigms. Clin Neuropsychol. 2018 Jan;32(1):16–41. doi: 10.1080/13854046.2017.1337932.
    1. Germine L, Nakayama K, Duchaine BC, Chabris CF, Chatterjee G, Wilmer JB. Is the web as good as the lab? Comparable performance from web and lab in cognitive/perceptual experiments. Psychon Bull Rev. 2012 Oct;19(5):847–57. doi: 10.3758/s13423-012-0296-9.
    1. Wilmer JB, Germine L, Chabris CF, Chatterjee G, Williams M, Loken E, Nakayama K, Duchaine B. Human face recognition ability is specific and highly heritable. Proc Natl Acad Sci U S A. 2010 Mar 16;107(11):5238–41. doi: 10.1073/pnas.0913053107.
    1. Hansen TI, Haferstrom EC, Brunner JF, Lehn H, Håberg AK. Initial validation of a web-based self-administered neuropsychological test battery for older adults and seniors. J Clin Exp Neuropsychol. 2015;37(6):581–94. doi: 10.1080/13803395.2015.1038220.
    1. Assmann KE, Bailet M, Lecoffre AC, Galan P, Hercberg S, Amieva H, Kesse-Guyot E. Comparison between a self-administered and supervised version of a web-based cognitive test battery: results from the nutrinet-santé cohort study. J Med Internet Res. 2016 Apr 5;18(4):e68. doi: 10.2196/jmir.4862.
    1. Cromer JA, Harel BT, Yu K, Valadka JS, Brunwin JW, Crawford CD, Mayes LC, Maruff P. Comparison of cognitive performance on the cogstate brief battery when taken in-clinic, in-group, and unsupervised. Clin Neuropsychol. 2015;29(4):542–58. doi: 10.1080/13854046.2015.1054437.
    1. Feenstra HE, Murre JM, Vermeulen IE, Kieffer JM, Schagen SB. Reliability and validity of a self-administered tool for online neuropsychological testing: the Amsterdam cognition scan. J Clin Exp Neuropsychol. 2018 Apr;40(3):253–73. doi: 10.1080/13803395.2017.1339017.
    1. Silverstein SM, Berten S, Olson P, Paul R, Willams LM, Cooper N, Gordon E. Development and validation of a world-wide-web-based neurocognitive assessment battery: WebNeuro. Behav Res Methods. 2007 Nov;39(4):940–9. doi: 10.3758/bf03192989.
    1. Wild K, Howieson D, Webbe F, Seelye A, Kaye J. Status of computerized cognitive testing in aging: a systematic review. Alzheimers Dement. 2008 Nov;4(6):428–37. doi: 10.1016/j.jalz.2008.07.003.
    1. CANTAB The most sensitive and validated cognitive research software available. Cambridge Cognition. [2020-07-13]. .
    1. Barnett J, Blackwell A, Sahakian B, Robbins T. The paired associates learning (PAL) test: 30 years of CANTAB translational neuroscience from laboratory to bedside in dementia research. Curr Top Behav Neurosci. 2016;28:449–74. doi: 10.1007/7854_2015_5001.
    1. Wykes T, Sturt E. The measurement of social behaviour in psychiatric patients: an assessment of the reliability and validity of the SBS schedule. Br J Psychiatry. 1986 Jan;148:1–11. doi: 10.1192/bjp.148.1.1.
    1. Rathbone A, Shaw S, Kumbhare D. Package ‘ICC.Sample.Size’. CRAN-R Project. 2015. [2020-07-09]. .
    1. R Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing. [2020-07-13].
    1. Zou GY. Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med. 2012 Dec 20;31(29):3972–81. doi: 10.1002/sim.5466.
    1. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess. 1994 Dec;6(4):284–90. doi: 10.1037/1040-3590.6.4.284.
    1. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007 May;39(2):175–91. doi: 10.3758/bf03193146.
    1. Kühberger A, Fritz A, Scherndl T. Publication bias in psychology: a diagnosis based on the correlation between effect size and sample size. PLoS One. 2014;9(9):e105825. doi: 10.1371/journal.pone.0105825.
    1. Richard FD, Bond CF, Stokes-Zoota JJ. One hundred years of social psychology quantitatively described. Rev Gen Psychol. 2003 Dec;7(4):331–63. doi: 10.1037/1089-2680.7.4.331.
    1. Jarosz AF, Wiley J. What are the odds? A practical guide to computing and reporting Bayes factors. J Probl Solving. 2014 Nov 7;7(1):9. doi: 10.7771/1932-6246.1167.
    1. SurveyMonkey. [2020-07-13].
    1. Cognitive research: CANTAB Connect research. Cambridge Cognition. [2020-07-09]. .
    1. Web-based testing: conduct virtual research and test participants online. Cambridge Cognition. [2020-07-09]. .
    1. Owen AM, Downes JJ, Sahakian BJ, Polkey CE, Robbins TW. Planning and spatial working memory following frontal lobe lesions in man. Neuropsychologia. 1990;28(10):1021–34. doi: 10.1016/0028-3932(90)90137-d.
    1. Owen AM, Sahakian BJ, Semple J, Polkey CE, Robbins TW. Visuo-spatial short-term recognition memory and learning after temporal lobe excisions, frontal lobe excisions or amygdalo-hippocampectomy in man. Neuropsychologia. 1995 Jan;33(1):1–24. doi: 10.1016/0028-3932(94)00098-a.
    1. Rabbitt P, Lowe C. Patterns of cognitive ageing. Psychol Res. 2000;63(3-4):308–16. doi: 10.1007/s004269900009.
    1. Ospina L, Shanahan M, Perez-Rodriguez M, Chan C, Clari R, Burdick K. Alexithymia predicts poorer social and everyday functioning in schizophrenia and bipolar disorder. Psychiatry Res. 2019 Mar;273:218–26. doi: 10.1016/j.psychres.2019.01.033.
    1. Sahakian B, Jones G, Levy R, Gray J, Warburton D. The effects of nicotine on attention, information processing, and short-term memory in patients with dementia of the Alzheimer type. Br J Psychiatry. 1989 Jun;154:797–800. doi: 10.1192/bjp.154.6.797.
    1. Jasp Team. [2020-07-13].
    1. Aguinis H, Gottfredson RK, Joo H. Best-practice recommendations for defining, identifying, and handling outliers. Organ Res Methods. 2013 Jan 14;16(2):270–301. doi: 10.1177/1094428112470848.
    1. Ruano L, Sousa A, Severo M, Alves I, Colunas M, Barreto R, Mateus C, Moreira S, Conde E, Bento V, Lunet N, Pais J, Cruz VT. Development of a self-administered web-based test for longitudinal cognitive assessment. Sci Rep. 2016 Jan 8;6:19114. doi: 10.1038/srep19114. doi: 10.1038/srep19114.
    1. Vaz S, Falkmer T, Passmore AE, Parsons R, Andreou P. The case for using the repeatability coefficient when calculating test-retest reliability. PLoS One. 2013;8(9):e73990. doi: 10.1371/journal.pone.0073990.
    1. McGraw KO, Wong SP. Forming inferences about some intraclass correlation coefficients. Psychol Methods. 1996 Jan;1(1):30–46. doi: 10.1037/1082-989X.1.1.30.
    1. Weir JP. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005 Mar;19(1):231–40. doi: 10.1519/15184.1.
    1. Koo TK, Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med. 2016 Jun;15(2):155–63. doi: 10.1016/j.jcm.2016.02.012.
    1. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull. 1979 Mar;86(2):420–8. doi: 10.1037//0033-2909.86.2.420.
    1. Compute six intraclass correlation measures. SAS Institute Inc. [2020-07-09]. .
    1. Baayen R, Davidson D, Bates D. Mixed-effects modeling with crossed random effects for subjects and items. J Mem Lang. 2008 Nov;59(4):390–412. doi: 10.1016/j.jml.2007.12.005.
    1. Mulder J, Wagenmakers E. Editors’ introduction to the special issue 'Bayes factors for testing hypotheses in psychological research: practical relevance and new developments'. J Math Psychol. 2016 Jun;72:1–5. doi: 10.1016/j.jmp.2016.01.002.
    1. Wagenmakers E, Love J, Marsman M, Jamil T, Ly A, Verhagen J, Selker R, Gronau QF, Dropmann D, Boutin B, Meerhoff F, Knight P, Raj A, van Kesteren E, van Doorn J, Šmíra M, Epskamp S, Etz A, Matzke D, de Jong T, van den Bergh D, Sarafoglou A, Steingroever H, Derks K, Rouder JN, Morey RD. Bayesian inference for psychology. Part II: example applications with JASP. Psychon Bull Rev. 2018 Feb;25(1):58–76. doi: 10.3758/s13423-017-1323-7.
    1. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Mar 8;1(8476):307–10.
    1. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999 Jun;8(2):135–60. doi: 10.1177/096228029900800204.
    1. Bland JM, Altman DG. The use of transformation when comparing two means. Br Med J. 1996 May 4;312(7039):1153. doi: 10.1136/bmj.312.7039.1153.
    1. Giavarina D. Understanding Bland Altman analysis. Biochem Med (Zagreb) 2015;25(2):141–51. doi: 10.11613/BM.2015.015.
    1. Salthouse TA, Nesselroade JR, Berish DE. Short-term variability in cognitive performance and the calibration of longitudinal change. J Gerontol B Psychol Sci Soc Sci. 2006 May;61(3):P144–51. doi: 10.1093/geronb/61.3.p144.
    1. Calamia M, Markon K, Tranel D. The robust reliability of neuropsychological measures: meta-analyses of test-retest correlations. Clin Neuropsychol. 2013;27(7):1077–105. doi: 10.1080/13854046.2013.809795.
    1. Calamia M, Markon K, Tranel D. Scoring higher the second time around: meta-analyses of practice effects in neuropsychological assessment. Clin Neuropsychol. 2012;26(4):543–70. doi: 10.1080/13854046.2012.680913.
    1. Lowe C, Rabbitt P. Test/re-test reliability of the CANTAB and ISPOCD neuropsychological batteries: theoretical and practical issues. Cambridge neuropsychological test automated battery. International study of post-operative cognitive dysfunction. Neuropsychologia. 1998 Sep;36(9):915–23. doi: 10.1016/s0028-3932(98)00036-0.

Source: PubMed

3
Suscribir