Sample size considerations for the external validation of a multivariable prognostic model: a resampling study

Gary S Collins, Emmanuel O Ogundimu, Douglas G Altman, Gary S Collins, Emmanuel O Ogundimu, Douglas G Altman

Abstract

After developing a prognostic model, it is essential to evaluate the performance of the model in samples independent from those used to develop the model, which is often referred to as external validation. However, despite its importance, very little is known about the sample size requirements for conducting an external validation. Using a large real data set and resampling methods, we investigate the impact of sample size on the performance of six published prognostic models. Focussing on unbiased and precise estimation of performance measures (e.g. the c-index, D statistic and calibration), we provide guidance on sample size for investigators designing an external validation study. Our study suggests that externally validating a prognostic model requires a minimum of 100 events and ideally 200 (or more) events.

Keywords: external validation; prognostic model; sample size.

© 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

Figures

Figure 1
Figure 1
Empirical performance of QRISK2 (women), measured using the c‐index, D statistic, RD2, ρOXS2, Brier score and calibration slope.
Figure 2
Figure 2
Mean percent, standardized bias and RMSE of the c‐index, D statistic, RD2, ρOXS2, Brier score and calibration slope.
Figure 3
Figure 3
Coverage rates and 95% confidence interval widths for the c‐index, D statistic, RD2, ρOXS2 and calibration slope. [Bootstrap standard errors for ρOXS2 based on 1000 simulations and 200 bootstrap replications].
Figure 4
Figure 4
Width of the 95% confidence interval of the c‐index, D statistic RD2, ρOXS2 and calibration slope (QRISK2 women). [Bootstrap standard errors for ρOXS2 based on 1000 simulations and 200 bootstrap replications].
Figure 5
Figure 5
Calibration plots for QRISK2 (women). The red dashed line denoted perfect prediction. The blue line is the model calibration using the entire data set.
Figure 6
Figure 6
Proportion of estimates within 0.5, 2.5, 5, 1 and 0% of the true value for QRISK2 (women).

References

    1. Altman DG, Royston P. What do we mean by validating a prognostic model? Statistics in Medicine 2000; 19:453–473.
    1. Moons KG, Kengne AP, Grobbee DE, Royston P, Vergouwe Y, Altman DG, Woodward M. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98:691–698.
    1. Collins GS, Mallett S, Omar O, Yu LM. Developing risk prediction models for type 2 diabetes: a systematic review of methodology and reporting. BMC Medicine 2011; 9:103.
    1. Shariat SF, Karakiewicz PI, Roehrborn CG, Kattan MW. An updated catalog of prostate cancer predictive tools. Cancer 2008; 113:3075–3099.
    1. Perel P, Edwards P, Wentz R, Roberts I. Systematic review of prognostic models in traumatic brain injury. BMC Medical Informatics and Decision Making 2006; 6:38.
    1. Müller‐Riemenschneider F, Holmberg C, Rieckmann N, Kliems H, Rufer V, Müller‐Nordhorn J, Willich SN. Barriers to routine risk‐Score use for healthy 0rimary cCare {atients. Archives of Internal Medicine 2010; 170:719–724.
    1. Collins GS, de Groot JA, Dutton S, Omar O, Shanyinde M, Tajar A, Voysey M, Wharton R, Yu LM, Moons KG, Altman DG. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Medical Research Methodology 2014; 14:40.
    1. Steyerberg EW, Moons KGM, van der Windt DA, Hayden JA, Perel P, Schroter S, Riley RD, Hemingway H, Altman DG. Prognosis research Strategy (PROGRESS) 3: Prognostic Model Research. PLoS Medicine 2013; 10:e1001381.
    1. Royston P, Altman DG. External validation of a cox prognostic model: principles and methods. BMC Medical Research Methodology 2013; 13:33.
    1. Vergouwe Y, Moons KG, Steyerberg EW. External validity of risk models: Use of benchmark values to disentangle a case‐mix effect from incorrect coefficients. American Journal of Epidemiology 2010; 172:971–980.
    1. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Annals of Internal Medicine 1999; 130:515–524.
    1. Vergouwe Y, Steyerberg EW, Eijkemans MJC, Habbema JDF. Substantial effective sample sizes were required for external validation studies of predictive logistic regression models. Journal of Clinical Epidemiology 2005; 58:475–483.
    1. Peek N, Arts DG, Bosman RJ, van der Voort PH, de Keizer NF. External validation of prognostic models for critically ill patients required substantial sample sizes. Journal of Clinical Epidemiology 2007; 60:491–501.
    1. Brusselaers N, Juhász I, Erdei I, Monstrey S, Blot S. Evaluation of mortality following severe burns injury in Hungary: external validation of a prediction model developed on Belgian burn data. Burns 2009; 35:1009–1014.
    1. McCowan C, Donnan PT, Dewar J, Thompson A, Fahey T. Identifying suspected breast cancer: development and validation of a clinical prediction rule. British Journal of General Practice 2011; 61:e205–e214.
    1. Chamogeorgakis T, Toumpoulis I, Tomos P, Ieromonachos C, Angouras D, Georgiannakis E, Michail P, Rokkas C. External validation of the modified Thoracoscore in a new thoracic surgery program: prediction of in‐hospital mortality. Interactive Cardiovascular and Thoracic Surgery 2009; 9:464–466.
    1. Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, Vickers AJ, Ransohoff DF, Collins GS. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): Explanation and Elaboration. Annals of Internal Medicine 2015; 162:W1–W73.
    1. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent R34eporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): the TRIPOD Statement. Annals of Internal Medicine 2015; 162:55–63.
    1. Maxwell SE, Kelley K, Rausch JR. Sample size planning for statistical power and accuracy in parameter estimation. Annual Review of Psychology 2008; 59:537–563.
    1. Hippisley‐Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, Brindle P. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008; 336:1475–1482.
    1. Hippisley‐Cox J, Coupland C, Robson J, Sheikh A, Brindle P. Predicting risk of type 2 diabetes in England and Wales: prospective derivation and validation of QDScore. BMJ 2009; 338:b880.
    1. D'Agostino RB Sr, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, Kannel WB. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 2008; 117:743–753.
    1. Collins GS, Altman DG. External validation of QDSCORE for predicting the 10‐year risk of developing Type 2 diabetes. Diabetic Medicine 2011; 28:599–607.
    1. Collins GS, Altman DG. Predicting the 10‐year risk of cardiovascular disease in the United Kingdom: independent and external validation of an updated version of QRISK2. BMJ 2012; 344:e4181.
    1. Collins GS, Altman DG. Predicting the adverse risk of statin treatment: an independent and external validation of Qstatin risk scores in the UK. Heart 2012; 98:1091–1097.
    1. Collins GS, Altman DG. Identifying patients with undetected colorectal cancer: an independent validation of QCancer (colorectal). British Journal of Cancer 2012; 107:260–265.
    1. Collins GS, Altman DG. Predicting the risk of chronic kidney disease in the UK: an evaluation of QKidney® scores using a primary care database. British Journal of General Practice 2012; 62:243–250.
    1. Collins GS, Altman DG. Identifying women with undetected ovarian cancer: independent validation of QCancer (ovarian) prediction model. European Journal of Cancer Care 2012; 22:423–429.
    1. Collins GS, Altman DG. Identifying patients with undetected gastro‐oesophageal cancer: external validation of QCancer (gastro‐oesophageal). European Journal of Cancer 2012; 49:1040–1048.
    1. Collins GS, Altman DG. Identifying patients with undetected renal tract cancer in primary care: An independent and external validation of QCancer (Renal) prediction model. Cancer Epidemiology 2012; 37:115–120.
    1. Harrell FE Jr, Califf RM, Pryor DB, Lee KL, Rosati RA. Evaluating the yield of medical tests. JAMA 1982; 247:2543–2546.
    1. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Statistics in Medicine 2004; 23:723–748.
    1. van Houwelingen HC. Validation. calibration, revision and combination of prognostic survival models. Statistics in Medicine 2000; 19:3401–3415.
    1. Royston P. Explained variation for survival models. Stata Journal 2006; 6:83–96.
    1. O'Quigley J, Xu R, Stare J. Explained randomness in proportional hazards models. Statistics in Medicine 2005; 24:479–489.
    1. Graf E, Schmoor C, Sauerbrei W, Schumacher M. Assessment and comparison of prognostic classification schemes for survival data. Statistics in Medicine 1999; 18:2529–2545.
    1. Harrell FE Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Statistics in Medicine 1996; 15:361–387.
    1. Kooperberg C, Stone CJ, Truong YK. Hazard regression. Journal of the American Statistical Association 1995; 90:78–94.
    1. Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic and ordinal regression, and survival analysis (2 edn). Springer: New York, 2015.
    1. Choodari‐Oskooei B, Royston P, Parmar MK. A simulation study of predictive ability measures in a survival model I: Explained variation measures. Statistics in Medicine 2011; 31:2644–2659.
    1. Choodari‐Oskooei B, Royston P, Parmar MK. A simulation study of predictive ability measures in a survival model II: explained randomness and predictive accuracy. Statistics in Medicine 2012; 31:2627–2643.
    1. Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010; 21:128–138.
    1. Schumacher M, Binder H, Gerds T. Assessment of survival prediction models based on microarray data. Bioinformatics 2007; 23:1768–1774.
    1. Gerds T, Schumacher M. Consistent estimation of the expected Brier score in general survival models with right‐censored event times. Biometrical Journal 2006; 6:1029–1040.
    1. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Statistics in Medicine 2006; 25:4279–4292.
    1. Tang LQ, Song JW, Belin TR, Unutzer J. A comparison of imputation methods in a longitudinal randomized clinical trial. Statistics in Medicine 2005; 24:2111–2128.
    1. Efron B, Tibshirani R. An introduction to the bootstrap. Chapman & Hall: New York, 1993.
    1. Altman DG, Vergouwe Y, Royston P, Moons KGM. Prognosis and prognostic research: validating a prognostic model. BMJ 2009; 338:b605.
    1. Siontis GCM, Tzoulaki I, Castaldi PJ, Ioannidis JPA. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. Journal of Clinical Epidemiology 2015; 68:25–34.
    1. Debray TP, Vergouwe Y, Koffijberg H, Nieboer D, Steyerberg EW, Moons KG. A new framework to enhance the interpretation of external validation studies of clinical prediction models. Journal of Clinical Epidemiology 2015; 68:279–289.
    1. Collins GS, Altman DG. An independent and external validation of QRISK2 cardiovascular disease risk score: a prospective open cohort study. BMJ 2010; 340:c2442.

Source: PubMed

3
S'abonner