Sample size used to validate a scale: a review of publications on newly-developed patient reported outcomes measures

Emmanuelle Anthoine, Leïla Moret, Antoine Regnault, Véronique Sébille, Jean-Benoit Hardouin, Emmanuelle Anthoine, Leïla Moret, Antoine Regnault, Véronique Sébille, Jean-Benoit Hardouin

Abstract

Purpose: New patient reported outcome (PRO) measures are regularly developed to assess various aspects of the patients' perspective on their disease and treatment. For these instruments to be useful in clinical research, they must undergo a proper psychometric validation, including demonstration of cross-sectional and longitudinal measurement properties. This quantitative evaluation requires a study to be conducted on an appropriate sample size. The aim of this research was to list and describe practices in PRO and proxy PRO primary psychometric validation studies, focusing primarily on the practices used to determine sample size.

Methods: A literature review of articles published in PubMed between January 2009 and September 2011 was conducted. Three selection criteria were applied including a search strategy, an article selection strategy, and data extraction. Agreements between authors were assessed, and practices of validation were described.

Results: Data were extracted from 114 relevant articles. Within these, sample size determination was low (9.6%, 11/114), and were reported as either an arbitrary minimum sample size (n = 2), a subject to item ratio (n = 4), or the method was not explicitly stated (n = 5). Very few articles (4%, 5/114) compared a posteriori their sample size to a subject to item ratio. Content validity, construct validity, criterion validity and internal consistency were the most frequently measurement properties assessed in the validation studies. Approximately 92% of the articles reported a subject to item ratio greater than or equal to 2, whereas 25% had a ratio greater than or equal to 20. About 90% of articles had a sample size greater than or equal to 100, whereas 7% had a sample size greater than or equal to 1000.

Conclusions: The sample size determination for psychometric validation studies is rarely ever justified a priori. This emphasizes the lack of clear scientifically sound recommendations on this topic. Existing methods to determine the sample size needed to assess the various measurement properties of interest should be made more easily available.

Figures

Figure 1
Figure 1
Flow chart of selection process.
Figure 2
Figure 2
Repartition of the articles according to thresholds recommended in the literature. a: According to thresholds of subject to item ratio. b: According to thresholds of sample size.

References

    1. US Food and Drug Administration . Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims. Guidance for Industry. 2009.
    1. Fayers PM, Machin D. Quality of Life: The Assessment, Analysis and Interpretation of Patient-Reported Outcomes. 2. West Sussex, England: John Wiley & Sons Ltd; 2007.
    1. Moher D, Schulz KF, Altman D. The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA. 2001;285:1987–1991. doi: 10.1001/jama.285.15.1987.
    1. Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. J Clin Epidemiol. 2008;61:344–349. doi: 10.1016/j.jclinepi.2007.11.008.
    1. Des Jarlais DC, Lyles C, Crepaz N. Improving the reporting quality of nonrandomized evaluations of behavioral and public health interventions: the TREND statement. Am J Public Health. 2004;94:361–366. doi: 10.2105/AJPH.94.3.361.
    1. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, Moher D, Rennie D, de Vet HCW, Lijmer JG. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Ann Intern Med. 2003;138:W1–W12. doi: 10.7326/0003-4819-138-1-200301070-00012-w1.
    1. Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, von Elm E, Khoury MJ, Cohen B, Davey-Smith G, Grimshaw J, Scheet P, Gwinn M, Williamson RE, Zou GY, Hutchings K, Johnson CY, Tait V, Wiens M, Golding J, van Duijn C, McLaughlin J, Paterson A, Wells G, Fortier I, Freedman M, Zecevic M, King R, Infante-Rivard C, Stewart A, Birkett N. Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement. Eur J Epidemiol. 2009;24:37–55. doi: 10.1007/s10654-008-9302-y.
    1. Calvert M, Blazeby J, Altman DG, Revicki DA, Moher D, Brundage MD. Reporting of patient-reported outcomes in randomized trials: the CONSORT PRO extension. JAMA. 2013;309:814–822. doi: 10.1001/jama.2013.879.
    1. Scientific Advisory Committee of the Medical Outcomes Trust Assessing health status and quality-of-life instruments: attributes and review criteria. Qual Life Res. 2002;11:193–205. doi: 10.1023/A:1015291021312.
    1. Revicki DA, Osoba D, Fairclough D, Barofsky I, Berzon R, Leidy NK, Rothman M. Recommendations on health-related quality of life research to support labeling and promotional claims in the United States. Qual Life Res. 2000;9:887–900. doi: 10.1023/A:1008996223999.
    1. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HCW. The COSMIN checklist for assessing the methodological quality of studies on measurement properties of health status measurement instruments: an international Delphi study. Qual Life Res. 2010;19:539–549. doi: 10.1007/s11136-010-9606-8.
    1. Reeve BB, Wyrwich KW, Wu AW, Velikova G, Terwee CB, Snyder CF, Schwartz C, Revicki DA, Moinpour CM, McLeod LD, Lyons JC, Lenderking WR, Hinds PS, Hays RD, Greenhalgh J, Gershon R, Feeny D, Fayers PM, Cella D, Brundage M, Ahmed S, Aaronson NK, Butt Z. ISOQOL recommends minimum standards for patient-reported outcome measures used in patient-centered outcomes and comparative effectiveness research. Qual Life Res. 2013;22:1889–1905. doi: 10.1007/s11136-012-0344-y.
    1. Coste J, Fermanian J, Venot A. Methodological and statistical problems in the construction of composite measurement scales: a survey of six medical and epidemiological journals. Stat Med. 1995;14:331–345. doi: 10.1002/sim.4780140402.
    1. Tacconelli E. Systematic reviews: CRD’s guidance for undertaking reviews in health care. Lancet Infect Dis. 2010;10:226. doi: 10.1016/S1473-3099(10)70065-7.
    1. Mokkink LB, Terwee CB, Stratford PW, Alonso J, Patrick DL, Riphagen I, Knol DL, Bouter LM, de Vet HCW. Evaluation of the methodological quality of systematic reviews of health status measurement instruments. Qual Life Res. 2009;18:313–333. doi: 10.1007/s11136-009-9451-9.
    1. Mokkink LB, Terwee CB, Knol DL, Stratford PW, Alonso J, Patrick DL, Bouter LM, de Vet HC. The COSMIN checklist for evaluating the methodological quality of studies on measurement properties: a clarification of its content. BMC Med Res Methodol. 2010;10:22. doi: 10.1186/1471-2288-10-22.
    1. Mokkink LB, Terwee CB, Patrick DL, Alonso J, Stratford PW, Knol DL, Bouter LM, de Vet HCW. The COSMIN study reached international consensus on taxonomy, terminology, and definitions of measurement properties for health-related patient-reported outcomes. J Clin Epidemiol. 2010;63:737–745. doi: 10.1016/j.jclinepi.2010.02.006.
    1. Cicchetti DV, Feinstein AR. High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol. 1990;43:551–558. doi: 10.1016/0895-4356(90)90159-M.
    1. De Boer MR, Moll AC, de Vet HCW, Terwee CB, Völker-Dieben HJM, van Rens GHMB. Psychometric properties of vision-related quality of life questionnaires: a systematic review. Ophthalmic Physiol Opt. 2004;24:257–273. doi: 10.1111/j.1475-1313.2004.00187.x.
    1. Schellingerhout JM, Verhagen AP, Heymans MW, Koes BW, de Vet HC, Terwee CB. Measurement properties of disease-specific questionnaires in patients with neck pain: a systematic review. Qual Life Res. 2012;21:659–670. doi: 10.1007/s11136-011-9965-9.
    1. Chassany O, Holtmann G, Malagelada J, Gebauer U, Doerfler H, Devault K. Systematic review: health-related quality of life (HRQOL) questionnaires in gastro-oesophageal reflux disease. Aliment Pharmacol Ther. 2008;27:1053–1070. doi: 10.1111/j.1365-2036.2008.03683.x.
    1. Martinez-Martin P, Jeukens-Visser M, Lyons KE, Rodriguez-Blazquez C, Selai C, Siderowf A, Welsh M, Poewe W, Rascol O, Sampaio C, Stebbins GT, Goetz CG, Schrag A. Health-related quality-of-life scales in Parkinson’s disease: critique and recommendations. Mov Disord. 2011;26:2371–2380. doi: 10.1002/mds.23834.
    1. Adair B, Said CM, Rodda J, Morris ME. Psychometric properties of functional mobility tools in hereditary spastic paraplegia and other childhood neurological conditions. Dev Med Child Neurol. 2012;54:596–605. doi: 10.1111/j.1469-8749.2012.04284.x.
    1. Bowling A. The Psychometric Properties of the Older People’s Quality of Life Questionnaire, Compared with the CASP-19 and the WHOQOL-OLD. Curr Gerontol Geriatr Res. 2009;2009:12. doi: 10.1155/2009/298950.
    1. Deal LS, Williams VSL, DiBenedetti DB, Fehnel SE. Development and psychometric evaluation of the endometriosis treatment satisfaction questionnaire. Qual Life Res. 2010;19:899–905. doi: 10.1007/s11136-010-9640-6.
    1. MacCallum RC, Widaman KF, Zhang S, Hong S. Sample size in factor analysis. Psychol Methods. 1999;4:84–99. doi: 10.1037/1082-989X.4.1.84.
    1. Hair JE, Anderson RE, Tatham RL, Black WC. Multivariate Data Analysis: With Readings. Englewood Cliffs, NJ: Prentice -Hall; 1995. p. 757.
    1. Kline P. Psychometrics and Psychology. London: Academic Press; 1979. p. 381.
    1. Everitt BS. Multivariate analysis: the need for data, and other problems. Br J Psychiatry. 1975;126:237–240. doi: 10.1192/bjp.126.3.237.
    1. Gorsuch RL. Factor Analysis. 2. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983. p. 448.
    1. Cattell RB. The Scientific Use of Factor Analysis in Behavioral and Life Sciences. New-York: Springer; 1978. p. 618.
    1. Comrey AL, Lee HB. A First Course in Factor Analysis. 2. Hillsdale, NJ: Lawrence Erlbaum Associates; 1992. p. 488.
    1. Schumacker RE, Lomax RG. A Beginner’s Guide to Structural Equation Modeling: Second Edition. New-York: Routledge Academic; 2004. p. 498.
    1. Boomsma A, Hoogland JJ: The Robustness of LISREL Modeling Revisited. In Structural Equation Modeling: Present and Future. R Cudeck; 2001:139–168.
    1. Hoogland JJ, Boomsma A. Robustness studies in covariance structure modeling: an overview and a meta-analysis. Sociol Methods Res. 1998;26:329–367. doi: 10.1177/0049124198026003003.
    1. Muthen LK, Muthen BO. How to use a Monte Carlo study to decide on sample size and determine power. Struct Equ Model. 2002;9:599–620. doi: 10.1207/S15328007SEM0904_8.
    1. MacCallum RC, Browne MW, Sugawara HM. Power analysis and determination of sample size for covariance structure modeling. Psychol Methods. 1996;1:130–149. doi: 10.1037/1082-989X.1.2.130.
    1. Lai K, Kelley K. Accuracy in parameter estimation for targeted effects in structural equation modeling: sample size planning for narrow confidence intervals. Psychol Methods. 2011;16:127–148. doi: 10.1037/a0021764.
    1. Lomas J, Pickard L, Mohide A. Patient versus clinician item generation for quality-of-life measures. The case of language-disabled adults. Med Care. 1987;25:764–769. doi: 10.1097/00005650-198708000-00009.

Source: PubMed

3
Abonner