Sample size calculation to externally validate scoring systems based on logistic regression models

Antonio Palazón-Bru, David Manuel Folgado-de la Rosa, Ernesto Cortés-Castell, María Teresa López-Cascales, Vicente Francisco Gil-Guillén, Antonio Palazón-Bru, David Manuel Folgado-de la Rosa, Ernesto Cortés-Castell, María Teresa López-Cascales, Vicente Francisco Gil-Guillén

Abstract

Background: A sample size containing at least 100 events and 100 non-events has been suggested to validate a predictive model, regardless of the model being validated and that certain factors can influence calibration of the predictive model (discrimination, parameterization and incidence). Scoring systems based on binary logistic regression models are a specific type of predictive model.

Objective: The aim of this study was to develop an algorithm to determine the sample size for validating a scoring system based on a binary logistic regression model and to apply it to a case study.

Methods: The algorithm was based on bootstrap samples in which the area under the ROC curve, the observed event probabilities through smooth curves, and a measure to determine the lack of calibration (estimated calibration index) were calculated. To illustrate its use for interested researchers, the algorithm was applied to a scoring system, based on a binary logistic regression model, to determine mortality in intensive care units.

Results: In the case study provided, the algorithm obtained a sample size with 69 events, which is lower than the value suggested in the literature.

Conclusion: An algorithm is provided for finding the appropriate sample size to validate scoring systems based on binary logistic regression models. This could be applied to determine the sample size in other similar cases.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1. Confidence intervals for the area…
Fig 1. Confidence intervals for the area under the ROC curve according to the number of events in the sample.
AUC, area under the ROC curve.
Fig 2. Estimated calibration index values according…
Fig 2. Estimated calibration index values according to the number of events in the sample.
ECI, estimated calibration index.
Fig 3. Smooth calibration plots (linear splines)…
Fig 3. Smooth calibration plots (linear splines) for several sample sizes.
The dashed lines denote the confidence intervals. The central line denotes the perfect prediction.

References

    1. Hosmer DW, Lemeshow S. Applied logistic regression. New York, USA: Wiley; 2000.
    1. Sullivan LM, Massaro JM, D'Agostino RB Sr. Presentation of multivariate data for clinical use: the Framingham study risk score functions. Stat Med. 2004; 23: 1631–1660. doi:
    1. Steyerberg EW. Clinical prediction models. A practical approach to development, validation, and updating. New York, USA: Springer-Verlag; 2009.
    1. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology. 1982; 143: 29–36. doi:
    1. Van Calster B, Nieboer D, Vergouwe Y, De Cock B, Pencina MJ, Steyerberg EW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016; 74: 167–176. doi:
    1. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement. Ann Intern Med. 2015; 162: 55–63. Erratum in: Ann Intern Med. 2015; 162: 600. doi:
    1. Chow S, Wang H, Shao J. Sample Size Calculations in Clinical Research. 2nd ed. New York, USA: Chapman & Hall/CRC; 2008.
    1. Collins GS, Ogundimu EO, Altman DG. Sample size considerations for the external validation of a multivariable prognostic model: a resampling study. Stat Med. 2016; 35: 214–226. doi:
    1. Austin PC, Steyerberg EW. Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers. Stat Med. 2014; 33: 517–535. doi:
    1. Dólera-Moreno C, Palazón-Bru A, Colomina-Climent F, Gil-Guillén VF. Construction and internal validation of a new mortality risk score for patients admitted to the intensive care unit. Int J Clin Pract. 2016.
    1. Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978; 8: 283–298.
    1. Van Hoorde K, Van Huffel S, Timmerman D, Bourne T, Van Calster B. A spline-based tool to assess and visualize the calibration of multiclass risk predictions. J Biomed Inform. 2015; 54: 283–293. doi:

Source: PubMed

3
Prenumerera