External validation of prognostic models: what, why, how, when and where?

Chava L Ramspek, Kitty J Jager, Friedo W Dekker, Carmine Zoccali, Merel van Diepen, Chava L Ramspek, Kitty J Jager, Friedo W Dekker, Carmine Zoccali, Merel van Diepen

Abstract

Prognostic models that aim to improve the prediction of clinical events, individualized treatment and decision-making are increasingly being developed and published. However, relatively few models are externally validated and validation by independent researchers is rare. External validation is necessary to determine a prediction model's reproducibility and generalizability to new and different patients. Various methodological considerations are important when assessing or designing an external validation study. In this article, an overview is provided of these considerations, starting with what external validation is, what types of external validation can be distinguished and why such studies are a crucial step towards the clinical implementation of accurate prediction models. Statistical analyses and interpretation of external validation results are reviewed in an intuitive manner and considerations for selecting an appropriate existing prediction model and external validation population are discussed. This study enables clinicians and researchers to gain a deeper understanding of how to interpret model validation results and how to translate these results to their own patient population.

Keywords: educational; external validation; methodology; prediction models.

© The Author(s) 2020. Published by Oxford University Press on behalf of ERA-EDTA.

Figures

FIGURE 1
FIGURE 1
Illustration of different validation types. A developed prediction model can be validated in various ways and in populations that differ from the development cohort to varying degrees. Internal validation uses the patients from the development population and can therefore always be performed. As internal validation does not include new patients, it mainly provides information on the reproducibility of the prediction model. Temporal validation is often considered to lie midway between internal and external validation. It entails validating the model on new patients who were included in the same study as patients from the development cohort but sampled at an earlier or later time point. It provides some information on both the reproducibility and generalizability of a model. External validation mainly provides evidence on the generalizability to various different patient populations. Patients included in external validation studies may differ from the development population in various ways: they may be from different countries (geographic validation), from different types of care facilities or have different general characteristics (e.g. frail older patients versus fit young patients). Not every model needs to be validated in all the ways depicted. In certain cases, internal validation or only geographic external validation may be sufficient; this is dependent on the research question and size of the development cohort.
FIGURE 2
FIGURE 2
Cumulative histogram of the number of hits on PubMed when using a simple search strategy of prediction models and adding external validation to this search. Search strategies are given in Appendix A. PubMed was searched from 1961 to 2019. The total number of prediction model studies retrieved was 84 032, of which 4309 were found when adding an external validation search term. The percentage of studies with external validation increased over the years; in 1990, 0.5% of published prediction studies mentioned external validation, while in 2019 this was 7%.
FIGURE 3
FIGURE 3
Example of a calibration plot. The dotted line at 45 degrees indicates perfect calibration, as predicted and observed probabilities are equal. The 10 dots represent tenths of the population divided based on predicted probability. The 10% of patients with the lowest predicted probability are grouped together. Within this group the average predicted risk and proportion of patients who experience the outcome (observed probability) are computed. This is repeated for subsequent tenths of the patient population. The blue line is a smoothed lowess line. For a logistic model this is computed by plotting each patient individually according to their predicted probability and outcome (0 or 1) and plotting a flexible averaged line based on these points. In this example calibration plot we can see that the model overpredicts risk; when the predicted risk is 60%, the observed risk is ∼35%. This overprediction is more extreme for the high-risk x-axis. If a prediction model has suggested cut-off points for risk groups, then we recommend plotting these various risk groups in the calibration plot (instead of tenths of the population).

References

    1. Siontis GC, Tzoulaki I, Castaldi PJ. et al. External validation of new risk prediction models is infrequent and reveals worse prognostic discrimination. J Clin Epidemiol 2015; 68: 25–34
    1. Ramspek CL, de Jong Y, Dekker FW. et al. Towards the best kidney failure prediction tool: a systematic review and selection aid. Nephrol Dial Transplant2020; 35: 1527–1538
    1. Ramspek CL, Voskamp PW, van Ittersum FJ. et al. Prediction models for the mortality risk in chronic dialysis patients: a systematic review and independent external validation study. Clin Epidemiol 2017; 9: 451–464
    1. Moons KGM, Altman DG, Reitsma JB. et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med 2015; 162: W1–W73
    1. Steyerberg EW. Clinical Prediction Models. A Practical Approach to Development, Validation, and Updating. Berlin: Springer, 2009
    1. Steyerberg EW, Harrell FE Jr, Borsboom GJ. et al. Internal validation of predictive models: efficiency of some procedures for logistic regression analysis. J Clin Epidemiol 2001; 54: 774–781
    1. Efron B, Tibshirani RJ.. An Introduction to the Bootstrap. Boca Raton, FL: CRC Press, 1994
    1. Altman DG, Vergouwe Y, Royston P. et al. Prognosis and prognostic research: validating a prognostic model. BMJ 2009; 338: b605.
    1. Steyerberg EW, Harrell FE Jr.. Prediction models need appropriate internal, internal-external, and external validation. J Clin Epidemiol 2016; 69: 245–247
    1. Austin PC, Steyerberg EW.. Events per variable (EPV) and the relative performance of different strategies for estimating the out-of-sample validity of logistic regression models. Stat Methods Med Res 2017; 26: 796–808
    1. Siontis GC, Ioannidis JP.. Response to letter by Forike et al.: more rigorous, not less, external validation is needed. J Clin Epidemiol 2016; 69: 250–251
    1. Tangri N, Grams ME, Levey AS. et al. Multinational assessment of accuracy of equations for predicting risk of kidney failure: a meta-analysis. JAMA 2016; 315: 164–174
    1. Hingwala J, Wojciechowski P, Hiebert B. et al. Risk-based triage for nephrology referrals using the kidney failure risk equation. Can J Kidney Health Dis 2017; 4: 2054358117722782.
    1. Grams ME, Ang Y, Ballew SH. et al. Predicting timing of clinical outcomes in patients with chronic kidney disease and severely decreased glomerular filtration rate. Kidney Int 2018; 94: 1025–1026
    1. Eckardt KU, Bansal N, Coresh J. et al. Improving the prognosis of patients with severely decreased glomerular filtration rate (CKD G4+): conclusions from a Kidney Disease: Improving Global Outcomes (KDIGO) Controversies Conference. Kidney Int 2018; 93: 1281–1292
    1. Rao PS, Schaubel DE, Guidinger MK. et al. A comprehensive risk quantification score for deceased donor kidneys: the kidney donor risk index. Transplantation 2009; 88: 231–236
    1. Israni AK, Salkowski N, Gustafson S. et al. New national allocation policy for deceased donor kidneys in the United States and possible effect on patient outcomes. J Am Soc Nephrol 2014; 25: 1842–1848
    1. Debray TPA, Vergouwe Y, Koffijberg H. et al. A new framework to enhance the interpretation of external validation studies of clinical prediction models. J Clin Epidemiol 2015; 68: 279–289
    1. Riley RD, Snell KI, Ensor J. et al. Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med 2019; 38: 1276–1296
    1. Bleeker SE, Moll HA, Steyerberg EW. et al. External validation is necessary in prediction research: a clinical example. J Clin Epidemiol 2003; 56: 826–832
    1. Royston P, Altman DG.. External validation of a Cox prognostic model: principles and methods. BMC Med Res Methodol 2013; 13: 33.
    1. Steyerberg EW, Vickers AJ, Cook NR. et al. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 2010; 21: 128–138
    1. Collins GS, de Groot JA, Dutton S. et al. External validation of multivariable prediction models: a systematic review of methodological conduct and reporting. BMC Med Res Methodol 2014; 14: 40.
    1. Moons KG, Kengne AP, Woodward M. et al. Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012; 98: 683–690
    1. Uno H, Cai T, Pencina MJ. et al. On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 2011; 30: 1105–1117
    1. Cook NR. Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve. Clin Chem 2008; 54: 17–23
    1. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007; 115: 928–935
    1. Moons KG, Kengne AP, Grobbee DE. et al. Risk prediction models: II. External validation, model updating, and impact assessment. Heart 2012; 98: 691–698
    1. Altman DG, Royston P.. What do we mean by validating a prognostic model? Stat Med 2000; 19: 453–473
    1. Van Houwelingen HC, Thorogood J.. Construction, validation and updating of a prognostic model for kidney graft survival. Stat Med 1995; 14: 1999–2008
    1. Wolff RF, Moons KGM, Riley RD. et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med 2019; 170: 51–58
    1. de Jong Y, Ramspek CL, van der Endt VHW. et al. A systematic review and external validation of stroke prediction models demonstrates poor performance in dialysis patients. J Clin Epidemiol 2020; 123: 69–79
    1. Collins GS, Ogundimu EO, Altman DG.. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med 2016; 35: 214–226
    1. Peek N, Arts DG, Bosman RJ. et al. External validation of prognostic models for critically ill patients required substantial sample sizes. J Clin Epidemiol 2007; 60: 491–501
    1. Luijken K, Wynants L, van Smeden M. et al. Changing predictor measurement procedures affected the performance of prediction models in clinical examples. J Clin Epidemiol 2020; 119: 7–18
    1. Riley RD, Ensor J, Snell KI. et al. External validation of clinical prediction models using big datasets from e-health records or IPD meta-analysis: opportunities and challenges. BMJ 2016; 353: i3140.
    1. Brindle P, Emberson J, Lampe F. et al. Predictive accuracy of the Framingham coronary risk score in British men: prospective cohort study. BMJ 2003; 327: 1267–1260
    1. Chen JH, Asch SM.. Machine learning and prediction in medicine—beyond the peak of inflated expectations. N Engl J Med 2017; 376: 2507–2509
    1. Collins GS, Moons KGM.. Reporting of artificial intelligence prediction models. Lancet 2019; 393: 1577–1579
    1. Vickers AJ, Elkin EB.. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making 2006; 26: 565–574
    1. Kappen TH, van Klei WA, van Wolfswinkel L. et al. Evaluating the impact of prediction models: lessons learned, challenges, and recommendations. Diagn Progn Res 2018; 2: 11.
    1. Jenniskens K, Lagerweij GR, Naaktgeboren CA. et al. Decision analytic modeling was useful to assess the impact of a prediction model on health outcomes before a randomized trial. J Clin Epidemiol 2019; 115: 106–115

Source: PubMed

3
Suscribir