External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients

Yvonne Vergouwe, Karel G M Moons, Ewout W Steyerberg, Yvonne Vergouwe, Karel G M Moons, Ewout W Steyerberg

Abstract

Various performance measures related to calibration and discrimination are available for the assessment of risk models. When the validity of a risk model is assessed in a new population, estimates of the model's performance can be influenced in several ways. The regression coefficients can be incorrect, which indeed results in an invalid model. However, the distribution of patient characteristics (case mix) may also influence the performance of the model. Here the authors consider a number of typical situations that can be encountered in external validation studies. Theoretical relations between differences in development and validation samples and performance measures are studied by simulation. Benchmark values for the performance measures are proposed to disentangle a case-mix effect from incorrect regression coefficients, when interpreting the model's estimated performance in validation samples. The authors demonstrate the use of the benchmark values using data on traumatic brain injury obtained from the International Tirilazad Trial and the North American Tirilazad Trial (1991-1994).

Figures

**Figure 1.**
Calibration, discrimination, and overall performance of the 2 risk models when applied in the development sample or in similar validation samples. Model 1 (panel A) contained 1 predictor; model 2 (panel B) contained 10 predictors (for details, see text). The triangles represent deciles of subjects grouped by similar predicted risk. The distribution of subjects is indicated with spikes at the bottom of the graph, separately for persons with and without the outcome. Brier, Brier score; c stat, c statistic (indicating discriminative ability); H-L, Hosmer-Lemeshow (P value corresponding to Hosmer-Lemeshow test); interc, intercept (given that the calibration slope equals 1); R2, Nagelkerke's R2; slope, calibration slope in a model y ∼ linear predictor.

**Figure 2.**
Influence on the performance of model 1 (1 predictor included), when more or less severe cases are selected (panels A and B) and more or less heterogeneous cases are selected (panels C and D) according to observed predictor values (“x”). Panels A and B: 50% of the subjects were selected, with higher or lower likelihood of selection with higher x values. Panels C and D: approximately 35% of the subjects were selected, with higher or lower likelihood of selection with more extreme x values. See the legend of Figure 1 for explanation of lines and symbols.

**Figure 3.**
Influence on the performance of model 1 (1 predictor included), when more or less severe cases are selected (panels A and B) and more or less heterogeneous cases are selected (panels C and D) according to an omitted predictor (“z”). Panels A and B: 50% of the subjects were selected, with higher or lower likelihood of selection with higher z values. Panels C and D: approximately 35% of the subjects were selected, with higher or lower likelihood of selection with more extreme z values. See the legend of Figure 1 for explanation of lines and symbols.

**Figure 4.**
Influence on the performance of model 2 (10 predictors included) when regression coefficients are different. Predictor effects were different in the validation sample, that is, 0.5 or 1.5 times as large (panel A), or the model was overfitted in the development phase (panel B). See the legend of Figure 1 for explanation of lines and symbols.

**Figure 5.**
Performance of a risk model for traumatic brain injury. The model was developed in the International Tirilazad Trial and validated in the North American Tirilazad Trial, 1991–1994. The 4 panels show model performance A) in the development sample; B) in the validation sample; C) in the validation sample with outcome values generated such that the model calibrates perfectly; and D) in the validation sample with refitted regression coefficients. See the legend of Figure 1 for explanation of lines and symbols.

Source: PubMed

External validity of risk models: Use of benchmark values to disentangle a case-mix effect from incorrect coefficients

Abstract

Figures

Sponzoři a spolupracovníci

Zdravotní podmínky

Drogové intervence