Sample size for binary logistic prediction models: Beyond events per variable criteria

Maarten van Smeden, Karel Gm Moons, Joris Ah de Groot, Gary S Collins, Douglas G Altman, Marinus Jc Eijkemans, Johannes B Reitsma, Maarten van Smeden, Karel Gm Moons, Joris Ah de Groot, Gary S Collins, Douglas G Altman, Marinus Jc Eijkemans, Johannes B Reitsma

Abstract

Binary logistic regression is one of the most frequently applied statistical approaches for developing clinical prediction models. Developers of such models often rely on an Events Per Variable criterion (EPV), notably EPV ≥10, to determine the minimal sample size required and the maximum number of candidate predictors that can be examined. We present an extensive simulation study in which we studied the influence of EPV, events fraction, number of candidate predictors, the correlations and distributions of candidate predictor variables, area under the ROC curve, and predictor effects on out-of-sample predictive performance of prediction models. The out-of-sample performance (calibration, discrimination and probability prediction error) of developed prediction models was studied before and after regression shrinkage and variable selection. The results indicate that EPV does not have a strong relation with metrics of predictive performance, and is not an appropriate criterion for (binary) prediction model development studies. We show that out-of-sample predictive performance can better be approximated by considering the number of predictors, the total sample size and the events fraction. We propose that the development of new sample size criteria for prediction models should be based on these three parameters, and provide suggestions for improving sample size determination.

Keywords: EPV; Logistic regression; prediction models; predictive performance; sample size; simulations.

Figures

Figure 1.
Figure 1.
Marginal out-of-sample predictive performance.
Figure 2.
Figure 2.
Boxplot distribution of out-of-sample predictive performance outcomes (restricted to conditions with events fraction = 1/2).
Figure 3.
Figure 3.
Average relative out-of-sample performances of modeling strategies per simulation factor level.
Figure 4.
Figure 4.
Relation required sample size and events fraction. Calculations based on metamodels with criterion values that were kept constant. For illustration purposes, the criterion values were chosen such that they would intersect at events fraction = 1/2.

References

    1. Bouwmeester W, Zuithoff NP, Mallett S, et al. Reporting and methods in clinical prediction research: a systematic review. PLoS Med 2012; 9: e1001221–e1001221.
    1. Moons KGM, Altman DG, Reitsma JB, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73.
    1. Collins GS, Reitsma JB, Altman DG, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med 2015; 162: 55–55.
    1. Altman DG, Royston P. What do we mean by validating a prognostic model? Stat Med 2000; 19: 453–473.
    1. Moons KGM, Kengne AP, Grobbee DE, et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–698.
    1. Harrell FE, Lee KL, Califf RM, et al. Regression modelling strategies for improved prognostic prediction. Stat Med 1984; 3: 143–152.
    1. Harrell FE, Lee KL, Mark DB. Tutorial in biostatistics – multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 1996; 15: 361–387.
    1. Harrell FE. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis, New York, NY: Springer, 2001.
    1. Steyerberg EW, Eijkemans MJC, Harrell FE, et al. Prognostic modeling with logistic regression analysis. Med Decis Mak 2001; 21: 45–56.
    1. Steyerberg EW. Clinical prediction models, New York, NY: Springer, 2009.
    1. Steyerberg EW, Eijkemans MJC, Harrell FE, et al. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med 2000; 19: 1059–1079.
    1. Steyerberg EW, Bleeker SE, Moll HA, et al. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J Clin Epidemiol 2003; 56: 441–447.
    1. Ambler G, Brady AR, Royston P. Simplifying a prognostic model: a simulation study based on clinical data. Stat Med 2002; 21: 3803–3822.
    1. Moons KGM, de Groot JAH, Bouwmeester W, et al. Critical appraisal and data extraction for systematic reviews of prediction modelling studies: the CHARMS checklist. PLoS Med 2014; 11: e1001744–e1001744.
    1. Pavlou M, Ambler G, Seaman SR, et al. Review and evaluation of penalised regression methods for risk prediction in low-dimensional data with few events. Stat Med 2016; 35: 1159–1177.
    1. Pavlou M, Ambler G, Seaman SR, et al. How to develop a more accurate risk prediction model when there are few events. BMJ 2015; 351: h3868–h3868.
    1. Moons KGM, Kengne AP, Woodward M, et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012; 98: 683–690.
    1. Courvoisier DS, Combescure C, Agoritsas T, et al. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J Clin Epidemiol 2011; 64: 993–1000.
    1. Van Smeden M, de Groot JAH, Moons KGM, et al. No rationale for 1 variable per 10 events criterion for binary logistic regression analysis. BMC Med Res Methodol 2016; 16: 163–163.
    1. Ogundimu EO, Altman DG, Collins GS. Adequate sample size for developing prediction models is not simply related to events per variable. J Clin Epidemiol 2016; 76: 175–182.
    1. Puhr R, Heinze G, Nold M, et al. Firth’s logistic regression with rare events: accurate effect estimates and predictions? Stat Med 2017; 36: 2302–2317.
    1. Demidenko E. Sample size determination for logistic regression revisited. Stat Med 2006; 26: 3385–3397.
    1. Cessie SL, Houwelingen JC. Ridge estimators in logistic regression. Appl Stat 1992, pp. 191–201.
    1. Tibshirani R. Regression shrinkage and selection via the Lasso. J Royal Stat Soc Ser B (Stat Methodol) 1996; 58: 267–288.
    1. Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993; 80: 27–38.
    1. Van Houwelingen JC, Le Cessie S. Predictive value of statistical models. Stat Med 1990; 9: 1303–1325.
    1. Harwell MR, Rubinstein EN, Hayes WS, et al. Summarizing Monte Carlo results in methodological research: the one-and two-factor fixed effects ANOVA cases. J Educ Stat 1992; 17: 315–339.
    1. Kleijnen JP, Sargent RG. A methodology for fitting and validating metamodels in simulation. Eur J Operation Res 2000; 120: 14–29.
    1. Agresti A. Categorical data analysis 2002; Vol. 2, Hoboken, NJ: John Wiley & Sons, Inc..
    1. James W and Stein C. Estimation with quadratic loss. In: Proceedings of the fourth Berkeley symposium on mathematical statistics and probability. Berkeley, CA: University of California Press, 1961, pp.361–379.
    1. Efron B, Morris C. Stein’s paradox in statistics. Scientific Am 1977; 236: 119–127.
    1. Gart J, Zweifel J. On the bias of various estimators of the logit and its variance with application to quantal bioassay. Biometrika 1967; 54: 181–187.
    1. Jewell N. Small-sample bias of point estimators of the odds ratio from matched sets. Biometrics 1984; 40: 421–435.
    1. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373–1379.
    1. Vittinghoff E, McCulloch CE. Relaxing the rule of ten events per variable in logistic and cox regression. Am J Epidemiol 2007; 165: 710–718.
    1. Nemes S, Jonasson J, Genell A, et al. Bias in odds ratios by logistic regression modelling and sample size. BMC Med Res Methodol 2009; 9: 56–56.
    1. Albert A, Anderson J. On the existence of maximum likelihood estimates in logistic regression models. Biometrika 1984; 71: 1–10.
    1. Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med 2002; 21: 2409–2419.
    1. Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970; 12: 55–67.
    1. Mantel N. Why stepdown procedures in variable selection. Technometrics 1970; 12: 621–625.
    1. Altman DG, Andersen PK. Bootstrap investigation of the stability of a Cox regression model. Stat Med 1989; 8: 771–83.
    1. Sauerbrei W, Schumacher M. A bootstrap resampling procedure for model building: application to the Cox regression model. Stat Med 1992; 11: 2093–2109.
    1. Van Houwelingen JC, Le Cessie S. Predictive value of statistical models. Stat Med 1990; 9: 1303–1325.
    1. Pajouheshnia R, Pestman WR, Teerenstra S, et al. A computational approach to compare regression modelling strategies in prediction research. BMC Med Res Methodol 2016; 16: 107–107.
    1. Heinze G. A comparative investigation of methods for logistic regression with separated or nearly separated data. Stat Med 2006; 25: 4216–4226.
    1. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning, New York, NY: Springer, 2009.
    1. Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. J Stat Software 2010; 33: 1–22.
    1. Rahman MS, Sultana M. Performance of Firth-and logF-type penalized methods in risk prediction for small or sparse binary data. BMC Med Res Methodol 2017; 17: 33–33.
    1. Zou H, Hastie T. Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B (Stat Methodol) 2005; 67: 301–320.
    1. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: 29–36.
    1. Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 2016; 35: 214–226.
    1. Cox DR. Two further applications of a model for binary regression. Biometrika 1958; 45: 562–565.
    1. Miller ME, Langefeld CD, Tierney WM, et al. Validation of probabilistic predictions. Med Decis Making 1993; 13: 49–58.
    1. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010; 21: 128–38.
    1. Brier GW. Verification of forecasts expressed in terms of probability. Monthly weather review 1950; 78: 1–3.
    1. R Core Team. A language and environment for statistical computing, (2014, accessed 24 April 2018).
    1. Venables WN, Ripley BD. Modern applied statistics with S, New York, NY: Springer, 2002.
    1. Gelman A, Jakulin A, Pittau MG, et al. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat 2008; 2: 1360–1383.
    1. Greenland S, Mansournia MA. Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions. Stat Med 2015; 34: 3133–3143.
    1. Breiman L. Better subset regression using the nonnegative garrote. Technometrics 1995; 37: 373–384.
    1. Breiman L. Random forests. Mach Learn 2001; 45: 5–32.

Source: PubMed

3
Subskrybuj