Sample size requirements to estimate key design parameters from external pilot randomised controlled trials: a simulation study

M Dawn Teare, Munyaradzi Dimairo, Neil Shephard, Alex Hayman, Amy Whitehead, Stephen J Walters, M Dawn Teare, Munyaradzi Dimairo, Neil Shephard, Alex Hayman, Amy Whitehead, Stephen J Walters

Abstract

Background: External pilot or feasibility studies can be used to estimate key unknown parameters to inform the design of the definitive randomised controlled trial (RCT). However, there is little consensus on how large pilot studies need to be, and some suggest inflating estimates to adjust for the lack of precision when planning the definitive RCT.

Methods: We use a simulation approach to illustrate the sampling distribution of the standard deviation for continuous outcomes and the event rate for binary outcomes. We present the impact of increasing the pilot sample size on the precision and bias of these estimates, and predicted power under three realistic scenarios. We also illustrate the consequences of using a confidence interval argument to inflate estimates so the required power is achieved with a pre-specified level of confidence. We limit our attention to external pilot and feasibility studies prior to a two-parallel-balanced-group superiority RCT.

Results: For normally distributed outcomes, the relative gain in precision of the pooled standard deviation (SDp) is less than 10% (for each five subjects added per group) once the total sample size is 70. For true proportions between 0.1 and 0.5, we find the gain in precision for each five subjects added to the pilot sample is less than 5% once the sample size is 60. Adjusting the required sample sizes for the imprecision in the pilot study estimates can result in excessively large definitive RCTs and also requires a pilot sample size of 60 to 90 for the true effect sizes considered here.

Conclusions: We recommend that an external pilot study has at least 70 measured subjects (35 per group) when estimating the SDp for a continuous outcome. If the event rate in an intervention group needs to be estimated by the pilot then a total of 60 to 100 subjects is required. Hence if the primary outcome is binary a total of at least 120 subjects (60 in each group) may be required in the pilot trial. It is very much more efficient to use a larger pilot study, than to guard against the lack of precision by using inflated estimates.

Figures

Figure 1
Figure 1
Multiple box and whisker plot of SDp estimates by pooled sample size of the pilot study. The vertical axis shows the value of the SDp estimate for 10,000 simulations per pilot study size. The horizontal axis is graduated by the pooled pilot study size.
Figure 2
Figure 2
Percentage gain in precision of SDp on increasing the pooled sample size. This shows the relative reduction in the average width of the confidence interval when an additional five subjects are added to a group.
Figure 3
Figure 3
Distribution of planned RCT study power when using the SDpestimate derived from the pilot study. The planned study size is used to calculate the true power if SD = 1 is assumed. The graph shown is for a true effect size of 0.2. The vertical axis is true power. The x-axis shows the size of the two-arm pilot study.
Figure 4
Figure 4
Distribution of planned sample sizes using crude SDp estimates and adjusting for a specified level of confidence. (a) Effect size = 0.2. (b) Effect size = 0.35. (c) Effect size = 0.5. The upper part of each graph shows the distribution of planned sample sizes by pilot study size. The lower part shows the same but using the inflation adjustment to guarantee the specified power with 80% confidence. The x-axis shows the planned sample size and the vertical axis shows the pilot study size. The dashed vertical line shows the sample size associated with a true power of 90% and the dotted line for 80%.
Figure 5
Figure 5
Distribution of total sample size required when using pilot sample derived SDp estimated with and without inflation. (a) Effect size = 0.2. (b) Effect size = 0.35. (c) Effect size = 0.5. This figure is similar to Figure 4; however, now the total sample size includes the pilot study size. The dashed and dotted vertical lines represent the sample size required for 90% and 80% power, respectively, if the true SD were known and the pilot study were not necessary.
Figure 6
Figure 6
Distribution of estimated event rates on increasing sample size. Distributions for a true event rate of 0.1 (a) and a true event rate of 0.5 (b).
Figure 7
Figure 7
Distribution of relative gain in precision for binary outcomes as pilot study size increases. This graph compares the width of the confidence intervals for n + 5 subjects and n subjects. This is scaled by the width of the interval when there are n subjects.
Figure 8
Figure 8
Distribution of mean coverage probability by true proportion and pilot sample size.

References

    1. NIHR Annual Report 2012/2013. [ ]
    1. Vickers AJ. Underpowering in randomized trials reporting a sample size calculation. J Clin Epidemiol. 2003;56(8):717–720. doi: 10.1016/S0895-4356(03)00141-0.
    1. Charles P, Giraudeau B, Dechartres A, Baron G, Ravaud P. Reporting of sample size calculation in randomised controlled trials: review. BMJ. 2009;338:b1732. doi: 10.1136/bmj.b1732.
    1. Clark T, Berger U, Mansmann U. Sample size determinations in original research protocols for randomised clinical trials submitted to UK research ethics committees: review. BMJ. 2013;346:f1135. doi: 10.1136/bmj.f1135.
    1. McDonald AM, Knight RC, Campbell MK, Entwistle VA, Grant AM, Cook JA, Elbourne DR, Francis D, Garcia J, Roberts I. What influences recruitment to randomised controlled trials? A review of trials funded by two UK funding agencies. Trials. 2006;7(1):9. doi: 10.1186/1745-6215-7-9.
    1. Sully BG, Julious SA, Nicholl J. A reinvestigation of recruitment to randomised, controlled, multicenter trials: a review of trials funded by two UK funding agencies. Trials. 2013;14(1):166. doi: 10.1186/1745-6215-14-166.
    1. NIHR, Feasibility and pilot studies. [ ]
    1. Arnold DM, Burns KEA, Adhikari NKJ, Kho ME, Meade MO, Cook DJ. The design and interpretation of pilot trials in clinical research in critical care. Crit Care Med. 2009;37(1):S69–S74.
    1. Thabane L, Ma J, Chu R, Cheng J, Ismaila A, Rios L, Robson R, Thabane M, Giangregorio L, Goldsmith C. A tutorial on pilot studies: the what, why and how. BMC Med Res Methodol. 2010;10(1):1. doi: 10.1186/1471-2288-10-1.
    1. Lee EC, Whitehead AL, Jacques RM, Julious SA. The statistical interpretation of pilot trials: should significance thresholds be reconsidered? BMC Med Res Methodol. 2014;14:41. doi: 10.1186/1471-2288-14-41.
    1. Proschan MA. Two-stage sample size re-estimation based on a nuisance parameter: a review. J Biopharm Stat. 2005;15(4):559–574. doi: 10.1081/BIP-200062852.
    1. Birkett MA, Day SJ. Internal pilot studies for estimating sample size. Stat Med. 1994;13(23–24):2455–2463.
    1. Wittes J, Brittain E. The role of internal pilot-studies in increasing the efficiency of clinical-trials. Stat Med. 1990;9(1–2):65–72.
    1. Friede T, Kieser M. Blinded sample size re-estimation in superiority and noninferiority trials: bias versus variance in variance estimation. Pharm Stat. 2013;12(3):141–146. doi: 10.1002/pst.1564.
    1. Browne RH. On the use of a pilot sample for sample-size determination. Stat Med. 1995;14(17):1933–1940. doi: 10.1002/sim.4780141709.
    1. Julious SA. Sample size of 12 per group rule of thumb for a pilot study. Pharm Stat. 2005;4(4):287–291. doi: 10.1002/pst.185.
    1. Julious SA. Designing clinical trials with uncertain estimates of variability. Pharm Stat. 2004;3(4):261–268. doi: 10.1002/pst.139.
    1. Sim J, Lewis M. The size of a pilot study for a clinical trial should be calculated in relation to considerations of precision and efficiency. J Clin Epidemiol. 2012;65(3):301–308. doi: 10.1016/j.jclinepi.2011.07.011.
    1. Kieser M, Wassmer G. On the use of the upper confidence limit for the variance from a pilot sample for sample size determination. Biom J. 1996;38(8):941–949. doi: 10.1002/bimj.4710380806.
    1. Bland JM. The tyranny of power: is there a better way to calculate sample size? BMJ. 2009;339:b3985. doi: 10.1136/bmj.b3985.
    1. Sahu SK, Smith TMF. A Bayesian method of sample size determination with practical applications. J R Stat Soc Ser A – Stat Soc. 2006;169:235–253. doi: 10.1111/j.1467-985X.2006.00408.x.
    1. O’Hagan A, Stevens JW, Campbell MJ. Assurance in clinical trial design. Pharm Stat. 2005;4(3):187–201. doi: 10.1002/pst.175.
    1. Brutti P, De Santis F. Robust Bayesian sample size determination for avoiding the range of equivalence in clinical trials. J Stat Plann Inference. 2008;138(6):1577–1591. doi: 10.1016/j.jspi.2007.05.041.
    1. Kirkwood BR, Sterne JAC. Essential Medical Statistics. 2. Oxford: Blackwell Science; 2003.
    1. Campbell MJ, Walters SJ, Machin D. Medical Statistics: A Textbook for the Health Sciences. 4. Chichester: Wiley; 2007.
    1. Satterthwaite FE. An approximate distribution of estimates of variance components. Biometrics Bull. 1946;2:110–114. doi: 10.2307/3002019.
    1. Welch BL. The generalization of ‘Student’s’ problem when several different population variances are involved. Biometrika. 1947;34:28–35.
    1. StataCorp. Statistical Software: Release 12. TX: College Station; 2011.
    1. Team RC. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2013.
    1. Cohen J. Statistical Power Analysis for the Behavioural Sciences. 2. Hillsdale, NJ: Lawrence Erlbaum; 1988.
    1. Agresti A, Coull BA. Approximate is better than ‘exact’ for interval estimation of binomial proportions. Am Statistician. 1998;52(2):119–126.
    1. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;30(25):4279–4292.

Source: PubMed

3
Suscribir