A tutorial on sample size calculation for multiple-period cluster randomized parallel, cross-over and stepped-wedge trials using the Shiny CRT Calculator

Karla Hemming, Jessica Kasza, Richard Hooper, Andrew Forbes, Monica Taljaard, Karla Hemming, Jessica Kasza, Richard Hooper, Andrew Forbes, Monica Taljaard

Abstract

It has long been recognized that sample size calculations for cluster randomized trials require consideration of the correlation between multiple observations within the same cluster. When measurements are taken at anything other than a single point in time, these correlations depend not only on the cluster but also on the time separation between measurements and additionally, on whether different participants (cross-sectional designs) or the same participants (cohort designs) are repeatedly measured. This is particularly relevant in trials with multiple periods of measurement, such as the cluster cross-over and stepped-wedge designs, but also to some degree in parallel designs. Several papers describing sample size methodology for these designs have been published, but this methodology might not be accessible to all researchers. In this article we provide a tutorial on sample size calculation for cluster randomized designs with particular emphasis on designs with multiple periods of measurement and provide a web-based tool, the Shiny CRT Calculator, to allow researchers to easily conduct these sample size calculations. We consider both cross-sectional and cohort designs and allow for a variety of assumed within-cluster correlation structures. We consider cluster heterogeneity in treatment effects (for designs where treatment is crossed with cluster), as well as individually randomized group-treatment trials with differential clustering between arms, for example designs where clustering arises from interventions being delivered in groups. The calculator will compute power or precision, as a function of cluster size or number of clusters, for a wide variety of designs and correlation structures. We illustrate the methodology and the flexibility of the Shiny CRT Calculator using a range of examples.

© The Author(s) 2020. Published by Oxford University Press on behalf of the International Epidemiological Association.

Figures

Figure 1.
Figure 1.
Schematic representation of different multi-period cluster randomsied trials.
Figure 2.
Figure 2.
Illustration of the interface of the Shiny CRT Calculator.
Figure 3.
Figure 3.
(a) Two-arm parallel CRT. Scenario includes 25 clusters per arm; proportion under control condition is 0.010 and proportion under intervention condition is 0.007; significance level is 0.05; within period ICC is 0.005 (lower value 0.001 and higher value 0.01). Expected cluster size (over 2 years) is 5000. (b) Two-period CRXO design. Scenario includes 25 clusters per sequence; proportion under control condition is 0.010 and proportion under intervention condition is 0.007; significance level is 0.05; within period ICC is 0.005 (lower value 0.001 and higher value 0.01); CAC is 0.8 (lower and higher values 80% and 120% of base case CAC, i.e. 0.64 and 0.96).Expected cluster-size per period (1 year) is 2500.
Figure 4.
Figure 4.
(a) Binary outcome. Scenario includes four clusters per sequence and five sequences in a stepped-wedge design; proportion under control condition is 0.28 and proportion under intervention condition is 0.38; significance level is 0.025; a two-period correlation structure; within period ICC is 0.025 (lower value 0.01 and higher value 0.06); CAC is 0.92 (lower and higher values are 0.74 (80% of base-case) and 1). The expected average cluster-period size is 20. (b) Continuous outcome. Scenario includes four clusters per sequence and five sequences in a stepped-wedge design; to detect a standardized mean difference of 0.25; significance level is 0.025; a two-period correlation structure; within period ICC is 0.056 (lower value 0.023 and higher value 0.13); CAC is 0.08 (lower and higher values 80% and 120% of base case CAC, i.e. 0.064 and 0.096). The expected average cluster-period size is 10. (c) Continuous outcome (high CAC). Scenario includes four clusters per sequence and five sequences in a stepped-wedge design; to detect a standardized mean difference of 0.25; significance level is 0.025; a two-period correlation structure; within period ICC is 0.056 (lower value 0.023 and higher value 0.13); CAC is 0.8 (lower and higher values 80% and 120% of base case CAC, i.e. 0.64 and 0.96). The expected average cluster-period size is 10. (d) Binary outcome, discrete time decay. Scenario includes four clusters per sequence and five sequences in a stepped-wedge design; proportion under control condition is 0.28 and proportion under intervention condition is 0.38; significance level is 0.025; within period ICC is 0.03 (lower value 0.01 and higher value 0.1); CAC is 0.9 (lower and higher values are 0.74 (80% of base-case) and 1). The expected average cluster-period size is 20.
Figure 5.
Figure 5.
(a) Example 3: power as a function of cluster size in treatment arm for a trial with clustering in one arm only (30 clusters in treatment arm; 400 in control arm). Scenario includes individuals randomized to one of two arms. Assumes 400 individuals are randomized to the control arm [intra-cluster correlation (ICC) 0]; and that there are 30 clusters in the intervention arm (ICC 0.1; lower value 0.05 and higher value 0.15); proportion under control condition is 0.5 and proportion under intervention condition is 0.6; significance level is 0.05. X-axis is cluster size under treatment condition. (b) Example 3: power as a function of cluster size in treatment arm for a trial with clustering in one arm only (30 clusters in treatment arm; 700 in control arm). Scenario includes individuals randomized to one of two arms. Assumes 700 individuals are randomized to the control arm [intra-cluster correlation (ICC) 0]; and that there are 30 clusters in the intervention arm (ICC 0.1; lower value 0.05 and higher value 0.15); proportion under control condition is 0.5 and proportion under intervention condition is 0.6; significance level is 0.05. X-axis is cluster size under treatment condition. (c) Example 3: power as a function of cluster size in treatment arm for a trial with clustering in one arm only (40 clusters in treatment arm; 400 in control arm). Scenario includes individuals randomized to one of two arms. Assumes 400 individuals are randomized to the control arm [intra-cluster correlation (ICC) 0]; and that there are 40 clusters in the intervention arm (ICC 0.1; lower value 0.05 and higher value 0.15); proportion under control condition is 0.5 and proportion under intervention condition is 0.6; significance level is 0.05. X-axis is cluster size under treatment condition. (d) Example 3: power as a function of cluster size in treatment arm for a trial with clustering in one arm only (40 clusters in treatment arm; 700 in control arm). Scenario includes individuals randomized to one of two arms. Assumes 700 individuals are randomized to the control arm [intra-cluster correlation (ICC) 0]; and that there are 40 clusters in the intervention arm (ICC 0.1; lower value 0.05 and higher value 0.15); proportion under control condition is 0.5 and proportion under intervention condition is 0.6; significance level is 0.05. X-axis is cluster size under treatment condition.
Figure 6.
Figure 6.
Example 3: power as a function of cluster size in treatment arm, number in control arm and total sample size for a trial with clustering in one arm only (cluster size 20 in treatment arm). Scenario includes individuals randomized to one of two arms. [intra-cluster correlation (ICC) 0.1; lower value and higher values not shown for ease of presentation]; proportion under control condition is 0.5 and proportion under intervention condition is 0.6; significance level is 0.05. Axes show number of clusters under treatment condition; number of individuals randomized to the control condition and the resulting total sample size (TSS). Power is 80%.

References

    1. Donner A, Klar N.. Pitfalls of and controversies in cluster randomization trials. Am J Public Health 2004;94:416–22.
    1. Eldridge S, Kerry S.. A Practical Guide to Cluster Randomized Trials in Health Services Research. Chichester, UK: Wiley, 2012.
    1. Hayes RJ, Bennett S.. Simple sample size calculation for cluster-randomized trials. Int J Epidemiol 1999;28:319–26.
    1. Teerenstra S, Eldridge S, Graff M, de Hoop E, Borm GF.. A simple sample size formula for analysis of covariance in cluster randomized trials. Stat Med 2012;31:2169–78.
    1. Hooper R, Forbes A, Hemming K, Takeda A, Beresford L.. Analysis of cluster randomized trials with an assessment of outcome at baseline. BMJ 2018;360:k1121.
    1. Giraudeau B, Ravaud P, Donner A.. Sample size calculation for cluster randomized cross-over trials. Stat Med 2008;27:5578–85.
    1. Arnup SJ, McKenzie JE, Hemming K, Pilcher D, Forbes AB.. Understanding the cluster randomized crossover design: a graphical illustraton of the components of variation and a sample size tutorial. Trials 2017;18:381.
    1. Forbes AB, Akram M, Pilcher D, Cooper J, Bellomo R.. Cluster randomized crossover trials with binary data and unbalanced cluster sizes: application to studies of near-universal interventions in intensive care. Clin Trials 2015;12:34–44.
    1. Arnup SJ, Forbes AB, Kahan BC, Morgan KE, McKenzie JE.. Appropriate statistical methods were infrequently used in cluster-randomized crossover trials. J Clin Epidemiol 2016;74:40–50.
    1. Matthews JN. Multi-period crossover trials. Stat Methods Med Res 1994;3:383–405.
    1. Hemming K, Lilford R, Girling AJ.. Stepped-wedge cluster randomized controlled trials: a generic framework including parallel and multiple-level designs. Stat Med 2015;34:181–96.
    1. Hemming K, Taljaard M, McKenzie JE. et al. Reporting of stepped wedge cluster randomized trials: extension of the CONSORT 2010 statement with explanation and elaboration. BMJ 2018;363:k1614.
    1. Copas AJ, Lewis JJ, Thompson JA, Davey C, Baio G, Hargreaves JR.. Designing a stepped wedge trial: three main designs, carry-over effects and randomization approaches. Trials 2015;16:352.
    1. Hooper R, Bourke L.. Cluster randomized trials with repeated cross sections: alternatives to parallel group designs. BMJ 2015;350:h2925.
    1. Hussey MA, Hughes JP.. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials 2007;28:182–91.
    1. Fitzmaurice GM, Laird NM, Ware JH.. Applied Longitudinal Analysis. Hoboken, NJ: Wiley‐Interscience, 2004.
    1. Eldridge SM, Ukoumunne OC, Carlin JB.. The intra‐cluster correlation coefficient in cluster randomized trials: a review of definitions. Int Stat Rev 2009;77:378–94.
    1. Rutterford C, Copas A, Eldridge S.. Methods for sample size determination in cluster randomized trials. Int J Epidemiol 2015;44:1051–67.
    1. Kasza J, Forbes AB.. Inference for the treatment effect in multiple-period cluster randomized trials when random effect correlation structure is misspecified. Stat Methods Med Res 2019;88:3112–22.
    1. Feldman HA, McKinlay SM.. Cohort versus cross-sectional design in large field trials: precision, sample size, and a unifying model. Stat Med 1994;13:61–78.
    1. Hemming K. Sample size calculations for stepped wedge trials using design effects are only approximate in some circumstances. Trials 2016;17:234.
    1. Kasza J, Hemming K, Hooper R, Matthews J, Forbes AB.. Impact of non-uniform correlation structure on sample size and power in multiple-period cluster randomized trials. Stat Methods Med Res 2019;28:703–16.
    1. Hooper R, Teerenstra S, de Hoop E, Eldridge S.. Sample size calculation for stepped wedge and other longitudinal cluster randomized trials. Stat Med 2016;35:4718–28.
    1. Yelland LN, Salter AB, Ryan P, Laurence CO.. Adjusted intraclass correlation coefficients for binary data: methods and estimates from a cluster-randomized trial in primary care. Clin Trials 2011;8:48–58.
    1. Martin J, Girling A, Nirantharakumar K, Ryan R, Marshall T, Hemming K.. Intra-cluster and inter-period correlation coefficients for cross-sectional cluster randomized controlled trials for type-2 diabetes in UK primary care. Trials 2016;17:402.
    1. Eldridge SM, Costelloe CE, Kahan BC, Lancaster GA, Kerry SM.. How big should the pilot study for my cluster randomized trial be? Stat Methods Med Res 2016;25:1039–56.
    1. Campbell MK, Grimshaw JM, Elbourne DR.. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med Res Methodol 2004;4:9.
    1. Cook JA, Bruckner T, MacLennan GS, Seiler CM.. Clustering in surgical trials - database of intracluster correlations. Trials 2012;13:2.
    1. Gulliford MC, Ukoumunne OC, Chinn S.. Components of variance and intraclass correlations for the design of community-based surveys and intervention studies: data from the Health Survey for England 1994. Am J Epidemiol 1999;149:876–83.
    1. Martin J. Advancing knowledge in stepped-wedge cluster randomized trials.PhD thesis. Institute of Applied Health Research, University of Birmingham, UK. 2017.
    1. Grantham KL, Kasza J, Heritier S. et al. Accounting for a decaying correlation structure in sample size determination for cluster randomized trials with continuous recruitment. Stat Med 2019;38:1918–34.
    1. Girling AJ, Hemming K.. Statistical efficiency and optimal design for stepped cluster studies under linear mixed effects models. Stat Med 2016;35:2149–66.
    1. Kasza J, Forbes A.. Information content of cluster period cells in stepped wedge trials. Biometrics 2019;75:144–52.
    1. Lundström E, Isaksson E, Wester P, Laska AC, Näsman P.. Enhancing Recruitment Using Teleconference and Commitment Contract (ERUTECC): study protocol for a randomized, stepped-wedge cluster trial within the EFFECTS trial. Trials 2018;19:14.
    1. Hemming K, Eldridge S, Forbes G, Weijer C, Taljaard M.. How to design efficient cluster randomized trials. BMJ 2017;358:j3064.
    1. Taljaard M, Teerenstra S, Ivers NM, Fergusson DA.. Substantial risks associated with few clusters in cluster randomized and stepped wedge designs. Clin Trials 2016;13:459–63.
    1. Campbell MK, Thomson S, Ramsay CR, MacLennan GS, Grimshaw JM.. Sample size calculator for cluster randomized trials. Comput Biol Med 2004;34:113–25.
    1. Lawrie J, Carlin JB, Forbes AB.. Optimal stepped wedge designs. Stat Probab Lett 2015;99:210–14.
    1. Roberts C, Roberts SA.. Design and analysis of clinical trials with clustering effects due to treatment. Clin Trials 2005;2:152–62.
    1. Moerbeek M, Wong WK.. Sample size formulae for trials comparing group and individual treatments in a multilevel model. Stat Med 2008;27:2850–64.
    1. Hughes JP, Granston TS, Heagerty PJ.. Current issues in the design and analysis of stepped wedge trials. Contemp Clin Trials 2015;45:55–60.
    1. Hemming K, Taljaard M, Forbes A.. Modeling clustering and treatment effect heterogeneity in parallel and stepped-wedge cluster randomized trials. Stat Med 2018;37:883–98.
    1. Murray DM. Design and Analysis of Group Randomized Trials. New York, NY: Oxford University Press, 1998.
    1. van Breukelen GJP, Candel M.. How to design and analyse cluster randomized trials with a small number of clusters? Comment on Leyrat et al. Int J Epidemiol 2018., Apr 18. doi: 10.1093/ije/dyy061. [Epub ahead of print.]
    1. Girling AJ. Relative efficiency of unequal cluster sizes in stepped wedge and other trial designs under longitudinal or cross-sectional sampling. Stat Med 2018, Sept 12. doi: 10.1002/sim.7943.
    1. Candel MJ, Van Breukelen GJ.. Sample size adjustments for varying cluster sizes in cluster randomized trials with binary outcomes analyzed with second-order PQL mixed logistic regression. Stat Med 2010;29:1488–501.
    1. van Breukelen GJ, Candel MJ.. Calculating sample sizes for cluster randomized trials: we can keep it simple and efficient! J Clin Epidemiol 2012;65:1212–28.
    1. Martin JT, Hemming K, Girling A.. The impact of varying cluster size in cross-sectional stepped-wedge cluster randomized trials. BMC Med Res Methodol 2019;19:123.
    1. Leyrat C, Morgan KE, Leurent B, Kahan BC.. Cluster randomized trials with a small number of clusters: which analyses should be used? Int J Epidemiol 2018, Mar 27. doi: 10.1093/ije/dyy057. [Epub ahead of print.]
    1. Chang W, Cheng J, Allaire J.. Shiny: Web Application Framework for R, Version 1.4. 2019. (6 December 2017, date last accessed).
    1. Lydersen S, Fagerland MW, Laake P.. Recommended tests for association in 2 x 2 tables. Stat Med 2009;28:1159–75.
    1. Ayorinde JO, Summers DM, Pankhurst L. et al. PreImplantation trial of histopathology in renal allografts (PITHIA): a stepped-wedge cluster randomized controlled trial protocol. BMJ Open 2019;9:e026166.
    1. Candel M, Van Breukelen G.. Sample size calculation for treatment effects in randomized trials with fixed cluster sizes and heterogeneous intraclass correlations and variances. Stat Methods Med Res 2015;24:557–73.
    1. Lemme F, Van Breukelen GJP, Candel M, Berger M.. The effect of heterogeneous variance on efficiency and power of cluster randomized trials with a balanced 2x2 factorial design. Stat Methods Med Res 2015;24:574–93.
    1. Hemming K, Taljaard M, Forbes A.. Analysis of cluster randomized stepped wedge trials with repeated cross-sectional samples. Trials 2017;18:110.
    1. Hooper R. Versatile sample size calculation using simulation. STATA J 2013;13:21–38.

Source: PubMed

3
購読する