Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models

Peter C Austin, Ewout W Steyerberg, Peter C Austin, Ewout W Steyerberg

Abstract

We conducted an extensive set of empirical analyses to examine the effect of the number of events per variable (EPV) on the relative performance of three different methods for assessing the predictive accuracy of a logistic regression model: apparent performance in the analysis sample, split-sample validation, and optimism correction using bootstrap methods. Using a single dataset of patients hospitalized with heart failure, we compared the estimates of discriminatory performance from these methods to those for a very large independent validation sample arising from the same population. As anticipated, the apparent performance was optimistically biased, with the degree of optimism diminishing as the number of events per variable increased. Differences between the bootstrap-corrected approach and the use of an independent validation sample were minimal once the number of events per variable was at least 20. Split-sample assessment resulted in too pessimistic and highly uncertain estimates of model performance. Apparent performance estimates had lower mean squared error compared to split-sample estimates, but the lowest mean squared error was obtained by bootstrap-corrected optimism estimates. For bias, variance, and mean squared error of the performance estimates, the penalty incurred by using split-sample validation was equivalent to reducing the sample size by a proportion equivalent to the proportion of the sample that was withheld for model validation. In conclusion, split-sample validation is inefficient and apparent performance is too optimistic for internal validation of regression-based prediction models. Modern validation methods, such as bootstrap-based optimism correction, are preferable. While these findings may be unsurprising to many statisticians, the results of the current study reinforce what should be considered good statistical practice in the development and validation of clinical prediction models.

Keywords: bootstrap; c-statistic; clinical prediction models; data splitting; discrimination; logistic regression; model validation; receiver operating characteristic curve.

Figures

Figure 1.
Figure 1.
Mean estimated c−statistic for different validation methods.
Figure 2.
Figure 2.
Standard deviation of estimated c−statistic for different validation methods.
Figure 3.
Figure 3.
Mean squared error (MSE) of different estimation methods.

References

    1. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med 1999; 130: 515–524.
    1. Snee RD. Validation of regression models: methods and examples. Technometrics 1977; 19: 415–428.
    1. Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. Data mining, Inference, and prediction, New York, NY: Springer-Verlag, 2001.
    1. Picard RR, Berk KN. Data splitting. Am Statist 1990; 44: 140–147.
    1. Harrell FE., Jr Regression modeling strategies, New York, NY: Springer-Verlag, 2001.
    1. Steyerberg EW. Clinical prediction models, New York: Springer-Verlag, 2009.
    1. Steyerberg EW, Harrell FE, Jr, Borsboom GJ, et al. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54: 774–781.
    1. Peduzzi P, Concato J, Kemper E, et al. A simulation study of the number of events per variable in logistic regression analysis. J Clin Epidemiol 1996; 49: 1373–1379.
    1. Lachenbruch PA, Goldstein M. Discriminant analysis. Biometrics 1979; 35: 69–84.
    1. Efron B, Tibshirani RJ. An introduction to the bootstrap, New York, NY: Chapman & Hall, 1993.
    1. Tu JV, Donovan LR, Lee DS, et al. Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. J Am Med Assoc 2009; 302: 2330–2337.
    1. Lee DS, Austin PC, Rouleau JL, et al. Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. J Am Med Assoc 2003; 290: 2581–2587.
    1. Austin PC. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality. Stat Med 2007; 26: 2937–2957.
    1. Austin PC, Lee DS, Steyerberg EW, et al. Regression trees for predicting mortality in patients with cardiovascular disease: what improvement is achieved by using ensemble-based methods? Biometric J 2012; 54: 657–673.
    1. Berk KN. Validating regression procedures with new data. Technometrics 1984; 26: 331–338.
    1. Airola A, Pahikkala T, Waegeman W, et al. An experimental comparison of cross-validation techniques for estimating the area under the ROC curve. Comput Stat Data Anal 2011; 55: 1828–1844.
    1. Lachenbruch PA, Mickey MR. Estimation of error rates in discriminant analysis. Technometrics 1968; 10: 1–11.
    1. Smith GC, Seaman SR, Wood AM, et al. Correcting for optimistic prediction in small data sets. Am J Epidemiol 2014; 180: 318–324.
    1. Breiman L, Friedman JH, Olshen RA, et al. Classification and regression trees, Boca Raton: Chapman & Hall/CRC, 1998.
    1. Breiman L. Random forests. Mach Learn 2001; 45: 5–32.
    1. Buhlmann P, Hathorn T. Boosting algorithms: regularization, prediction and model fitting. Stat Sci 2007; 22: 477–505.
    1. Freund Y, Schapire R. Experiments with a new boosting algorithm. In: Machine learning: proceedings of the thirteenth international conference, San Francisco, CA: Morgan Kauffman, 1996, pp. 148–156.
    1. Austin PC, Steyerberg EW. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol 2012; 12: 82–82.
    1. Austin PC, Steyerberg EW. Predictive accuracy of risk factors and markers: a simulation study of the effect of novel markers on different performance measures for logistic regression models. Stat Med 2013; 32: 661–672.

Source: PubMed

3
S'abonner