How to develop a more accurate risk prediction model when there are few events

Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann, Perry Elliott, Michael King, Rumana Z Omar, Menelaos Pavlou, Gareth Ambler, Shaun R Seaman, Oliver Guttmann, Perry Elliott, Michael King, Rumana Z Omar

Abstract

When the number of events is low relative to the number of predictors, standard regression could produce overfitted risk models that make inaccurate predictions. Use of penalised regression may improve the accuracy of risk prediction

Conflict of interest statement

Competing interests: We have read and understood the BMJ Group policy on declaration of interests and declare no competing interests.

Figures

https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4784839/bin/pavm022798.f1_default.jpg
Fig 1 Distribution of predicted risk scores estimated using standard, ridge, and lasso regression
https://www.ncbi.nlm.nih.gov/pmc/articles/instance/4784839/bin/pavm022798.f2_default.jpg
Fig 2 Observed proportions versus average predicted risk of the event (using standard, ridge and lasso regression). Overestimation of risk for high risk patients can be seen when standard regression is used

References

    1. Moons KG, Royston P, Vergouwe Y, Grobbee DE, Altman DG. Prognosis and prognostic research: what, why, and how? BMJ 2009;338:b375.
    1. Steyerberg EW, Vickers AJ, Cook NR, et al. Assessing the performance of prediction models a framework for traditional and novel measures. Epidemiology 2010;21:128-38.
    1. Ambler G, Omar R, Royston P, et al. Generic, simple risk stratification model for heart valve surgery. Circulation 2005;112:224-31.
    1. Hippisley-Cox J, Coupland C, Vinogradova Y, et al. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ 2007;335:136.
    1. D’Agostino RB, Vasan RS, Pencina MJ, et al. General cardiovascular risk profile for use in primary care: the Framingham heart study. Circulation 2008;117:743-53.
    1. O’Mahony C, Jichi F, Pavlou M, et al. A novel clinical risk prediction model for sudden cardiac death in hypertrophic cardiomyopathy. Eur Heart J 2013;35:2010-20.
    1. Harrell FE Jr. Regression modeling strategies: with applications to linear models, logistic regression, and survival analysis. Springer, 2001.
    1. Omar R, Morton L, Halliday D, et al. Outlet strut fracture of Bjork-Shiley convexo concave heart valves: the UK cohort study. Heart 2001;86:57-62.
    1. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594.
    1. Peduzzi P, Concato J, Feinstein AR, Holford TR. Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. J Clin Epidemiol 1995;48:1503-10.
    1. Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Prognostic modeling with logistic regression analysis: in search of a sensible strategy in small data sets. Med Decision Making 2001;21:45-56.
    1. Ye J. On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 1998;93:120-31.
    1. Zou H, Hastie T. Regularization and variable selection via the elastic net. J Roy Statist Soc Ser B 2005;67:301-20.
    1. Ambler G, Seaman S, Omar RZ. An evaluation of penalised survival methods for developing prognostic models with rare events. Stat Med 2011;31:1150-61.
    1. Moons KG, Altman DG, Vergouwe Y, Royston P. Prognosis and prognostic research: application and impact of prognostic models in clinical practice. BMJ 2009;338:b606.
    1. Robert Tibshirani. Regression shrinkage and selection via the lasso. J Roy Statist Soc Ser B 1994;58:267-88.
    1. Verweij PJM, Van Houwelingen HC. Penalized likelihood in Cox regression. Stat Med 1994;13:2427-36.
    1. Cessie SL, Houwelingen JCV. Ridge estimators in logistic regression. J Roy Statist Soc Ser C 1992;41:191-201.
    1. Hosmer DW Jr, Lemeshow S. Applied logistic regression. John Wiley & Sons, 2004.
    1. Gelman A, Jakulin A, Pittau MG, Su Y-S. A weakly informative default prior distribution for logistic and other regression models. Ann Appl Stat 2008:1360-83.
    1. Debray TPA, Koffijberg H, Nieboer D, et al. Meta-analysis and aggregation of multiple published prediction models. Stat Med 2014;33:2341-62.
    1. Heinze G, Schemper M. A solution to the problem of separation in logistic regression. Stat Med 2002;21:2409-19.

Source: PubMed

3
Sottoscrivi