Sample size calculation for a stepped wedge trial

Gianluca Baio, Andrew Copas, Gareth Ambler, James Hargreaves, Emma Beard, Rumana Z Omar, Gianluca Baio, Andrew Copas, Gareth Ambler, James Hargreaves, Emma Beard, Rumana Z Omar

Abstract

Background: Stepped wedge trials (SWTs) can be considered as a variant of a clustered randomised trial, although in many ways they embed additional complications from the point of view of statistical design and analysis. While the literature is rich for standard parallel or clustered randomised clinical trials (CRTs), it is much less so for SWTs. The specific features of SWTs need to be addressed properly in the sample size calculations to ensure valid estimates of the intervention effect.

Methods: We critically review the available literature on analytical methods to perform sample size and power calculations in a SWT. In particular, we highlight the specific assumptions underlying currently used methods and comment on their validity and potential for extensions. Finally, we propose the use of simulation-based methods to overcome some of the limitations of analytical formulae. We performed a simulation exercise in which we compared simulation-based sample size computations with analytical methods and assessed the impact of varying the basic parameters to the resulting sample size/power, in the case of continuous and binary outcomes and assuming both cross-sectional data and the closed cohort design.

Results: We compared the sample size requirements for a SWT in comparison to CRTs based on comparable number of measurements in each cluster. In line with the existing literature, we found that when the level of correlation within the clusters is relatively high (for example, greater than 0.1), the SWT requires a smaller number of clusters. For low values of the intracluster correlation, the two designs produce more similar requirements in terms of total number of clusters. We validated our simulation-based approach and compared the results of sample size calculations to analytical methods; the simulation-based procedures perform well, producing results that are extremely similar to the analytical methods. We found that usually the SWT is relatively insensitive to variations in the intracluster correlation, and that failure to account for a potential time effect will artificially and grossly overestimate the power of a study.

Conclusions: We provide a framework for handling the sample size and power calculations of a SWT and suggest that simulation-based procedures may be more effective, especially in dealing with the specific features of the study at hand. In selected situations and depending on the level of intracluster correlation and the cluster size, SWTs may be more efficient than comparable CRTs. However, the decision about the design to be implemented will be based on a wide range of considerations, including the cost associated with the number of clusters, number of measurements and the trial duration.

Figures

Fig. 1
Fig. 1
Power curves for a continuous outcome assuming: 25 clusters, each with 20 subjects; 6 time points including one baseline. We varied the intervention effect size and the ICC variations. Panel (a) shows the analysis for a repeated closed cohort (cross-sectional) design, while panel (b) depicts the results for a closed cohort design. In panel (b) the selected ICCs are reported for cluster and participant level
Fig. 2
Fig. 2
Power curves for a binary outcome assuming: 25 clusters, each with 20 subjects; 6 time points including one baseline. We varied the intervention effect size and the ICC variations. Panel (a) shows the analysis for a repeated closed cohort (cross-sectional) design, while panel (b) depicts the results for a closed cohort design. In panel (b) the selected ICCs are reported for cluster and participant level
Fig. 3
Fig. 3
Power curves for a continuous outcome assuming 24 clusters, each with 20 subjects. We varied the ICC and the number of randomisation crossover points. Panel (a) shows the analysis for a repeated closed cohort (cross-sectional) design, while panel (b) depicts the results for a closed cohort design (assuming individual-level ICC of 0.0016)
Fig. 4
Fig. 4
Power curves for a binary outcome assuming 24 clusters, each with 20 subjects. We varied the ICC and the number of randomisation crossover points. Panel (a) shows the analysis for a repeated closed cohort (cross-sectional) design, while panel (b) depicts the results for a closed cohort design (assuming individual-level ICC of 0.0016)
Fig. 5
Fig. 5
Power curves for a continuous outcome assuming 25 clusters, each with 20 subjects and 6 time points at which measurements are taken (including one baseline time). We varied the way in which the assumed linear time effect is included in the model (if at all). Panel (a) shows the results for a repeated cohort design; panel (b) shows the results for the closed cohort design, assuming a cluster-level ICC of 0.1 and varying the participant-level ICC; panel (c) shows the results for the closed cohort design, assuming a cluster-level ICC of 0.5 and varying the participant-level ICC
Fig. 6
Fig. 6
Power curves for a binary outcome assuming 25 clusters, each with 20 subjects and 6 time points at which measurements are taken (including one baseline time). We varied the way in which the assumed linear time effect is included in the model (if at all). Panel (a) shows the results for a repeated cohort design; panel (b) shows the results for the closed cohort design, assuming a cluster-level ICC of 0.1 and varying the participant-level ICC; panel (c) shows the results for the closed cohort design, assuming a cluster-level ICC of 0.5 and varying the participant-level ICC

References

    1. Murray D. The design and analysis of group randomised trials. Oxford, UK: Oxford University Press; 1998.
    1. Gail M, Byar D, Pechacek T, Corle D. Aspects of statistical design for the Community Intervention Trial for Smoking Cessation (COMMIT) Control Clin Trials. 1992;13:6–21. doi: 10.1016/0197-2456(92)90026-V.
    1. Donner A, Birkett N, Buck C. Randomization by cluster: sample size requirements and analysis. Am J Epidemiol. 1981;114:906–14.
    1. Donner A. Sample size requirements for stratified cluster randomization designs. Stat Med. 1992;11:743–50. doi: 10.1002/sim.4780110605.
    1. Shoukri M, Martin S. Estimating the number of clusters for the analysis of correlated binary response variables from unbalanced data. Stat Med. 1992;11:751–60. doi: 10.1002/sim.4780110606.
    1. Shipley M, Smith P, Dramaix M. Calculation of power for matched pair studies when randomization is by group. Int J Epidemiol. 1989;18:457–61. doi: 10.1093/ije/18.2.457.
    1. Hsieh F. Sample size formulae for intervention studies with the cluster as unit of randomization. Stat Med. 1988;8:1195–201. doi: 10.1002/sim.4780071113.
    1. Donner A, Klar N. Design and analysis of cluster randomisation trials in health research. London, UK: Arnold; 2000.
    1. Liu A, Shih W, Gehan E. Sample size and power determination for clustered repeated measurements. Stat Med. 2002;21:1787–801. doi: 10.1002/sim.1154.
    1. Hargreaves J, Copas A, Beard E, Osrin D, Lewis J, Davey C, et al.Five questions to consider before conducting a stepped wedge trial. Trials. 2015.
    1. Beard E, Lewis J, Prost A, Copas A, Davey C, Osrin D, et al.Stepped wedge randomised controlled trials: systematic review. Trials. 2015.
    1. Brown C, Lilford R. The stepped wedge trial design: a systematic review. BMC Med Res Methodol. 2006;6:54. doi: 10.1186/1471-2288-6-54.
    1. Mdege N, Man M, Brown C, Torgersen D. Systematic review of stepped wedge cluster randomised trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64:936–48. doi: 10.1016/j.jclinepi.2010.12.003.
    1. Hussey M, Hughes J. Design and analysis of stepped wedge cluster randomised trials. Contemporary Clin Trials. 2007;28:182–91. doi: 10.1016/j.cct.2006.05.007.
    1. Woertman W, de Hoop E, Moerbeek M, Zuidema S, Gerritsen D, Teerenstra S. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66(7):52–8. doi: 10.1016/j.jclinepi.2013.01.009.
    1. Moulton L, Golub J, Burovni B, Cavalcante S, Pacheco A, Saraceni V, et al. Statistical design of THRio: a phased implementation clinic-randomized study of a tuberculosis preventive therapy intervention. Clin Trials. 2007;4:190–9. doi: 10.1177/1740774507076937.
    1. Hemming K, Lilford R, Girling A. Stepped-wedge cluster randomised controlled trials: a generic framework including parallel and multiple-level design. Stat Med. 2015;34(2):181–196. doi: 10.1002/sim.6325.
    1. Hemming K, Haines T, Chilton A, Girling A, Lilford R. The stepped wedge cluster randomised trial: rationale, design, analysis and reporting. Br Med J. 2015;350:h391. doi: 10.1136/bmj.h391.
    1. Handley M, Schillinger D, Shiboski S. Quasi-experimental designs in practice-based research settings: design and implementation considerations. J Am Board Fam Med. 2011;24(5):589–96. doi: 10.3122/jabfm.2011.05.110067.
    1. Hemming K, Girling A. A menu-driven facility for power and detectable-difference calculations in stepped-wedge cluster-randomized trials. Stat J. 2014;14(2):363–380.
    1. StataCorp . Stata 13 base reference Manual. College Station, TX: Stata Press; 2013.
    1. Copas A, Lewis J, Thompson J, Davey C, Fielding K, Baio G, et al.Designing a stepped wedge trial: three main designs, carry-over effects and randomisation approaches. Trials. 2015.
    1. Hayes R, Bennett S. Simple sample size calculations for cluster randomised trials. Int J Epidemiol. 1999;28:319–26. doi: 10.1093/ije/28.2.319.
    1. Dimairo M, Bradburn M, Walters S. Sample size determination through power simulation; practical lessons from a stepped wedge cluster randomised trial (SW CRT) Trials. 2011;12(1):26. doi: 10.1186/1745-6215-12-S1-A26.
    1. Gelman A, Hill J. Data analysis using regression and multilevel/hierarchical models. Cambridge, UK: Cambridge University Press; 2006.
    1. Burton A, Altman D, Royston P, Holder R. The design of simulation studies in medical statistics. Stat Med. 2006;25:4279–292. doi: 10.1002/sim.2673.
    1. Landau S, Stahl S. Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res. 2013;22(3):324–45. doi: 10.1177/0962280212439578.
    1. Kitson A, Schultz T, Long L, Shanks A, Wiechula R, Chapman I, et al.The prevention and reduction of weight loss in an acute tertiary care setting: protocol for a pragmatic stepped wedge randomised cluster trial (the PRoWL project). BMC Health Serv Res. 2013;13(299). .
    1. Schultz T, Kitson A, Soenen S, Long L, Shanks A, Wiechula R, Chapman I, Lange K. Does a multidisciplinary nutritional intervention prevent nutritional decline in hospital patients? A stepped wedge randomised cluster trial. e-SPEN J. 2014;9(2):84–90. doi: 10.1016/j.clnme.2014.01.002.
    1. Bacchieri G, Barros A, Santos J, Goncalves H, Gigante D. A community intervention to prevent traffic accidents among bicycle commuters. Revista de Saude Publica. 2010;44(5):867–75. doi: 10.1590/S0034-89102010000500012.
    1. Spiegelhalter D, Abrams K, Myles J. Bayesian approaches to clinical trials and health-care evaluation. London, UK: Wiley and Sons; 2004.
    1. Hemming K, Girling A, Martin J, Bond S. Stepped wedge cluster randomized trials are efficient and provide a method of evaluation without which some interventions would not be evaluated. J Clin Epidemiol. 2013;66(9):1058–9. doi: 10.1016/j.jclinepi.2012.12.020.
    1. Duncan G, Kalton G. Issues of design and analysis of surveys across time. Int Stat Rev. 1987;55:97–117. doi: 10.2307/1403273.
    1. R Core Team . R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2014.
    1. de Hoop E, Teerenstra S. Sample size determination in cluster randomized stepped wedge designs; 2013. .
    1. Murrey D, Blitstein J. Methods to reduce the impact of intraclass correlation in group-randomised trials. Eval Rev. 2013;27(1):79–103. doi: 10.1177/0193841X02239019.
    1. Keriel-Gascou M, Buchet-Poyau K, Rabilloud M, Duclos A, Colin C. A stepped wedge cluster randomized trial is preferable for assessing complex health interventions. J Clin Epidemiol. 2014;67(7):831–3. doi: 10.1016/j.jclinepi.2014.02.016.
    1. de Hoop E, Woertman W, Teerenstra S. The stepped wedge cluster randomised trial always requires fewer clusters but not always fewer measurements, that is. participants than a parallel cluster randomised trial in a cross-sectional design. J Cli Epidemiol. 2013;66:1428. doi: 10.1016/j.jclinepi.2013.07.008.
    1. Kotz D, Spigt M, Arts I, Crutzen R, Viechtbauer W. The stepped wedge design does not inherently have more power than a cluster randomized controlled trial. J Clin Epidemiol. 2013;66(9):1059–60. doi: 10.1016/j.jclinepi.2013.05.004.
    1. Pearson D, Torgerson D, McDougall C, Bowles R. Parable of two agencies, one of which randomizes. Ann Am Acad Polit Soci Sci. 2010;628:11–29. doi: 10.1177/0002716209351500.
    1. Feng Z, Diehr P, Peterson A, McLerran D. Selected statistical issues in group randomized trials. Annu Rev Public Health. 2001;22:167. doi: 10.1146/annurev.publhealth.22.1.167.
    1. Babyak M. What you see may not be what you get: a brief nontechnical introduction to overfitting in regression-type models. Psychosom Med. 2014;66:411–21.
    1. Eldridge S, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006;35:1292–300. doi: 10.1093/ije/dyl129.

Source: PubMed

3
Tilaa