Assessing risk prediction models using individual participant data from multiple studies

Lisa Pennells, Stephen Kaptoge, Ian R White, Simon G Thompson, Angela M Wood, Emerging Risk Factors Collaboration, Robert W Tipping, Aaron R Folsom, David J Couper, Christie M Ballantyne, Josef Coresh, S Goya Wannamethee, Richard W Morris, Stefan Kiechl, Johann Willeit, Peter Willeit, Georg Schett, Shah Ebrahim, Debbie A Lawlor, John W Yarnell, John Gallacher, Mary Cushman, Bruce M Psaty, Russ Tracy, Anne Tybjærg-Hansen, Jackie F Price, Amanda J Lee, Stela McLachlan, Kay-Tee Khaw, Nicholas J Wareham, Hermann Brenner, Ben Schöttker, Heiko Müller, Jan-Håkan Jansson, Patrik Wennberg, Veikko Salomaa, Kennet Harald, Pekka Jousilahti, Erkki Vartiainen, Mark Woodward, Ralph B D'Agostino, Else-Marie Bladbjerg, Torben Jørgensen, Yutaka Kiyohara, Hisatomi Arima, Yasufumi Doi, Toshiharu Ninomiya, Jacqueline M Dekker, Giel Nijpels, Coen D A Stehouwer, Jussi Kauhanen, Jukka T Salonen, Tom W Meade, Jackie A Cooper, Mary Cushman, Aaron R Folsom, Bruce M Psaty, Steven Shea, Angela Döring, Lewis H Kuller, Greg Grandits, Richard F Gillum, Michael Mussolino, Eric B Rimm, Sue E Hankinson, Joann E Manson, Jennifer K Pai, Susan Kirkland, Jonathan A Shaffer, Daichi Shimbo, Stephan J L Bakker, Ron T Gansevoort, Hans L Hillege, Philippe Amouyel, Dominique Arveiler, Alun Evans, Jean Ferrières, Naveed Sattar, Rudi G Westendorp, Brendan M Buckley, Bernard Cantin, Benoît Lamarche, Elizabeth Barrett-Connor, Deborah L Wingard, Richele Bettencourt, Vilmundur Gudnason, Thor Aspelund, Gunnar Sigurdsson, Bolli Thorsson, Maryam Kavousi, Jacqueline C Witteman, Albert Hofman, Oscar H Franco, Barbara V Howard, Ying Zhang, Lyle Best, Jason G Umans, Altan Onat, Johan Sundström, J Michael Gaziano, Meir Stampfer, Paul M Ridker, J Michael Gaziano, Paul M Ridker, Michael Marmot, Robert Clarke, Rory Collins, Astrid Fletcher, Eric Brunner, Martin Shipley, Mika Kivimäki, Paul M Ridker, Julie Buring, Nancy Cook, Ian Ford, James Shepherd, Stuart M Cobbe, Michele Robertson, Matthew Walker, Sarah Watson, Myriam Alexander, Adam S Butterworth, Emanuele Di Angelantonio, Pei Gao, Philip Haycock, Stephen Kaptoge, Lisa Pennells, Simon G Thompson, Matthew Walker, Sarah Watson, Ian R White, Angela M Wood, David Wormser, John Danesh, Lisa Pennells, Stephen Kaptoge, Ian R White, Simon G Thompson, Angela M Wood, Emerging Risk Factors Collaboration, Robert W Tipping, Aaron R Folsom, David J Couper, Christie M Ballantyne, Josef Coresh, S Goya Wannamethee, Richard W Morris, Stefan Kiechl, Johann Willeit, Peter Willeit, Georg Schett, Shah Ebrahim, Debbie A Lawlor, John W Yarnell, John Gallacher, Mary Cushman, Bruce M Psaty, Russ Tracy, Anne Tybjærg-Hansen, Jackie F Price, Amanda J Lee, Stela McLachlan, Kay-Tee Khaw, Nicholas J Wareham, Hermann Brenner, Ben Schöttker, Heiko Müller, Jan-Håkan Jansson, Patrik Wennberg, Veikko Salomaa, Kennet Harald, Pekka Jousilahti, Erkki Vartiainen, Mark Woodward, Ralph B D'Agostino, Else-Marie Bladbjerg, Torben Jørgensen, Yutaka Kiyohara, Hisatomi Arima, Yasufumi Doi, Toshiharu Ninomiya, Jacqueline M Dekker, Giel Nijpels, Coen D A Stehouwer, Jussi Kauhanen, Jukka T Salonen, Tom W Meade, Jackie A Cooper, Mary Cushman, Aaron R Folsom, Bruce M Psaty, Steven Shea, Angela Döring, Lewis H Kuller, Greg Grandits, Richard F Gillum, Michael Mussolino, Eric B Rimm, Sue E Hankinson, Joann E Manson, Jennifer K Pai, Susan Kirkland, Jonathan A Shaffer, Daichi Shimbo, Stephan J L Bakker, Ron T Gansevoort, Hans L Hillege, Philippe Amouyel, Dominique Arveiler, Alun Evans, Jean Ferrières, Naveed Sattar, Rudi G Westendorp, Brendan M Buckley, Bernard Cantin, Benoît Lamarche, Elizabeth Barrett-Connor, Deborah L Wingard, Richele Bettencourt, Vilmundur Gudnason, Thor Aspelund, Gunnar Sigurdsson, Bolli Thorsson, Maryam Kavousi, Jacqueline C Witteman, Albert Hofman, Oscar H Franco, Barbara V Howard, Ying Zhang, Lyle Best, Jason G Umans, Altan Onat, Johan Sundström, J Michael Gaziano, Meir Stampfer, Paul M Ridker, J Michael Gaziano, Paul M Ridker, Michael Marmot, Robert Clarke, Rory Collins, Astrid Fletcher, Eric Brunner, Martin Shipley, Mika Kivimäki, Paul M Ridker, Julie Buring, Nancy Cook, Ian Ford, James Shepherd, Stuart M Cobbe, Michele Robertson, Matthew Walker, Sarah Watson, Myriam Alexander, Adam S Butterworth, Emanuele Di Angelantonio, Pei Gao, Philip Haycock, Stephen Kaptoge, Lisa Pennells, Simon G Thompson, Matthew Walker, Sarah Watson, Ian R White, Angela M Wood, David Wormser, John Danesh

Abstract

Individual participant time-to-event data from multiple prospective epidemiologic studies enable detailed investigation into the predictive ability of risk models. Here we address the challenges in appropriately combining such information across studies. Methods are exemplified by analyses of log C-reactive protein and conventional risk factors for coronary heart disease in the Emerging Risk Factors Collaboration, a collation of individual data from multiple prospective studies with an average follow-up duration of 9.8 years (dates varied). We derive risk prediction models using Cox proportional hazards regression analysis stratified by study and obtain estimates of risk discrimination, Harrell's concordance index, and Royston's discrimination measure within each study; we then combine the estimates across studies using a weighted meta-analysis. Various weighting approaches are compared and lead us to recommend using the number of events in each study. We also discuss the calculation of measures of reclassification for multiple studies. We further show that comparison of differences in predictive ability across subgroups should be based only on within-study information and that combining measures of risk discrimination from case-control studies and prospective studies is problematic. The concordance index and discrimination measure gave qualitatively similar results throughout. While the concordance index was very heterogeneous between studies, principally because of differing age ranges, the increments in the concordance index from adding log C-reactive protein to conventional risk factors were more homogeneous.

Keywords: C index; D measure; coronary heart disease; individual participant data; inverse variance; meta-analysis; risk prediction; weighting.

Figures

Figure 1.
Figure 1.
Overall schemes for model derivation and testing of predictive ability over multiple studies. In the model derivation process, study-specific data sets are used to estimate the pooled vector of coefficients for the included risk predictors, either by means of a 1-stage stratified model or by a 2-stage approach applying meta-analysis of study-specific estimates. In assessment of predictive ability, the pooled is used to calculate the pooled discrimination statistic, either using a 1-stage stratified approach or by meta-analyzing study-specific estimates in a 2-stage approach. wS represents study-specific weights applied in meta-analysis approaches; possible choices are described in the text.
Figure 2.
Figure 2.
Meta-regression of study-specific concordance index (C index) and discrimination measure (D measure) for model 1, and subsequent changes upon addition of log C-reactive protein, on the study-specific standard deviation (SD) of age. Model 1 included conventional risk factors: age, smoking status, systolic blood pressure, history of diabetes, total cholesterol, and high-density lipoprotein cholesterol, and results are stratified by sex. The size of each circle represents the inverse variance weight applied to each study in the meta-regression.
Figure 3.
Figure 3.
Changes in the concordance index (C index) (section A) and the discrimination measure (D measure) (section B) upon movement from model 1 to model 2 within various population subgroups. Model 1 included conventional risk factors: age, smoking status, systolic blood pressure, history of diabetes, total cholesterol, and high-density lipoprotein cholesterol, and results are stratified by sex. Model 2 additionally included log C-reactive protein and an interaction term for interaction between this predictor and each subgroup factor. Bars, 95% confidence intervals (CIs). CHD, coronary heart disease; CVD, cardiovascular disease; NA, not applicable.
Figure 4.
Figure 4.
Comparison of C statistics for the cohort and case-control study designs. The concordance index (C index) is shown for cohort studies, and the area under the receiver operating characteristic curve is shown for case-control studies. Section A shows values for base models with the progressive addition of conventional risk factors for coronary heart disease (CHD), and section B shows the resulting change in the C statistic upon addition of log C-reactive protein (CRP) to each base model. Bars, 95% confidence intervals (CIs). BP, blood pressure; HDL, high-density lipoprotein.
Figure 5.
Figure 5.
Study-specific estimates of nonevent Net Reclassification Index (NRI) upon application of model 2 versus model 1 and overall estimates obtained using a 1-stage approach and by meta-analysis using 3 alternative weighting schemes in the Emerging Risk Factors Collaboration. Model 1 included conventional risk factors: age, smoking status, systolic blood pressure, history of diabetes, total cholesterol, and high-density lipoprotein cholesterol, and results are stratified by sex. Model 2 additionally included log C-reactive protein. The 3 weighting schemes illustrated are 1) number of contributing events occurring before 10 years (Event 10), 2) inverse-variance weights assuming fixed effects (IV-FE), and 3) inverse-variance weights assuming random effects (IV-RE). There was no reclassification observed among nonevents in the Hoorn Study (shown at the bottom), and therefore it does not contribute to the inverse-variance-weighted pooled estimates due to undefined weight. Bars, 95% confidence intervals (CIs). CHD, coronary heart disease; NA, not applicable; SE, standard error; WT, weight. Definitions of study names are given in Web Table 1.
Figure 6.
Figure 6.
Study-specific estimates of event Net Reclassification Index (NRI) upon application of model 2 versus model 1 and overall estimates obtained using a 1-stage approach and by meta-analysis using 3 alternative weighting schemes in the Emerging Risk Factors Collaboration. See the legend of Figure 5 for explanations. There was no reclassification observed among events in the MONICA Göteborg Study (shown at the bottom), and therefore it does not contribute to the inverse-variance-weighted pooled estimates due to undefined weight. Bars, 95% confidence intervals (CIs). CHD, coronary heart disease; NA, not applicable; SE, standard error; WT, weight. Definitions of study names are given in Web Table 1.

References

    1. Justice AC, Covinsky KE, Berlin JA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130(6):515–524.
    1. Royston P, Parmar MK, Sylvester R. Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer. Stat Med. 2004;23(6):907–926.
    1. Thompson S, Kaptoge S, White I, et al. Statistical methods for the time-to-event analysis of individual participant data from multiple epidemiological studies. Int J Epidemiol. 2010;39(5):1345–1359.
    1. The Fibrinogen Studies Collaboration. Measures to assess the prognostic ability of the stratified Cox proportional hazards model. Stat Med. 2009;28(3):389–411.
    1. Asia Pacific Cohort Studies Collaboration. Determinants of cardiovascular disease in the Asia Pacific region: protocol for a collaborative overview of cohort studies. Cardiovasc Dis Prev. 1999;2:281–289.
    1. Beral V, Bull D, Doll R, et al. Breast cancer and abortion: collaborative reanalysis of data from 53 epidemiological studies, including 83 000 women with breast cancer from 16 countries. Lancet. 2004;363(9414):1007–1016.
    1. Bingham S, Riboli E. Diet and cancer—the European Prospective Investigation into Cancer and Nutrition. Nat Rev Cancer. 2004;4(3):206–215.
    1. Thompson D, Easton DF Breast Cancer Linkage Consortium. Cancer incidence in BRCA1 mutation carriers. J Natl Cancer Inst. 2002;94(18):1358–1365.
    1. Elliott P, Peakman TC. The UK Biobank sample handling and storage protocol for the collection, processing and archiving of human blood and urine. Int J Epidemiol. 2008;37(2):234–244.
    1. Lewington S, Whitlock G, Clarke R, et al. Blood cholesterol and vascular mortality by age, sex, and blood pressure: a meta-analysis of individual data from 61 prospective studies with 55,000 vascular deaths. Lancet. 2007;370(9602):1829–1839.
    1. Smith-Warner SA, Spiegelman D, Ritz J, et al. Methods for pooling results of epidemiologic studies: the Pooling Project of Prospective Studies of Diet and Cancer. Am J Epidemiol. 2006;163(11):1053–1064.
    1. Uitterlinden AG, Ralston SH, Brandi ML, et al. The association between common vitamin D receptor gene variations and osteoporosis: a participant-level meta-analysis. Ann Intern Med. 2006;145(4):255–264.
    1. Matsushita K, Mahmoodi BK, Woodward M, et al. Comparison of risk prediction using the CKD-EPI equation and the MDRD study equation for estimated glomerular filtration rate. JAMA. 2012;307(18):1941–1951.
    1. Danesh J, Erqou S, Walker M, et al. The Emerging Risk Factors Collaboration: analysis of individual data on lipid, inflammatory and other markers in over 1.1 million participants in 104 prospective studies of cardiovascular diseases. Eur J Epidemiol. 2007;22(12):839–869.
    1. Wormser D, Kaptoge S, Di AE, et al. The Emerging Risk Factors Collaboration. Separate and combined associations of body-mass index and abdominal adiposity with cardiovascular disease: collaborative analysis of 58 prospective studies. Lancet. 2011;377(9771):1085–1095.
    1. The Emerging Risk Factors Collaboration. Lipid-related markers and cardiovascular disease prediction. JAMA. 2012;307(23):2499–2506.
    1. Kaptoge S, Di Angelantonio E, Pennells L, et al. The Emerging Risk Factors Collaboration. C-reactive protein, fibrinogen, and cardiovascular disease prediction. N Engl J Med. 2012;367(14):1310–1320.
    1. Cox DR. Regression models and life-tables [with discussion] J R Stat Soc Ser B. 1972;34(2):187–220.
    1. White IR. Multivariate random-effects meta-analysis. Stata J. 2009;9(1):40–56.
    1. Jackson D, Riley R, White IR. Multivariate meta-analysis: potential and promise. Stat Med. 2011;30(20):2481–2498..
    1. Leandro G. Meta-Analysis in Medical Research: The Handbook for the Understanding and Practice of Meta-Analysis. Oxford, United Kingdom: Blackwell Publishing Ltd; 2005.
    1. Harrell FE, Jr, Lee KL, Mark DB. Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med. 1996;15(4):361–387.
    1. Harrell FE, Jr, Califf RM, Pryor DB, et al. Evaluating the yield of medical tests. JAMA. 1982;247(18):2543–2546.
    1. Royston P, Sauerbrei W. A new measure of prognostic separation in survival data. Stat Med. 2004;23(5):723–748.
    1. Gonen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. Biometrika. 2005;92(4):965–970.
    1. Uno H, Cai T, Pencina MJ, et al. On the C statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med. 2011;30(10):1105–1117.
    1. Newson R. Confidence intervals for rank order statistics and their differences. Stata J. 2006;6(3):309–334.
    1. Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med. 2002;21(11):1539–1558.
    1. Riley RD, Higgins JP, Deeks JJ. Interpretation of random effects meta-analyses. BMJ. 2011;342:d549.
    1. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and interpreted? Stat Med. 2002;21(11):1559–1573.
    1. Korn EL, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol. 1997;145(1):72–80.
    1. Pencina MJ, Larson MG, D'Agostino RB. Choice of time scale and its effect on significance of predictors in longitudinal studies. Stat Med. 2007;26(6):1343–1359.
    1. Thiebaut AC, Benichou J. Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study. Stat Med. 2004;23(24):3803–3820.
    1. Ganna A, Reilly M, de FU, et al. Risk prediction measures for case-cohort and nested case-control designs: an application to cardiovascular disease. Am J Epidemiol. 2012;175(7):715–724.
    1. Janes H, Pepe MS. Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics. 2008;64(1):1–9.
    1. Pencina MJ, D'Agostino RB, Sr, D'Agostino RB, Jr, et al. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27(2):157–172.
    1. Pencina MJ, D'Agostino RB, Sr, Steyerberg EW. Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med. 2011;30(1):11–21.
    1. Royston P. Explained variation for survival models. Stata J. 2006;6(1):83–96.
    1. Hosmer DW, Jr, Lemeshow S. Applied Logistic Regression. New York, NY: John Wiley & Sons, Inc; 1989.
    1. Parzen M, Lipsitz SR. A global goodness-of-fit statistic for Cox regression models. Biometrics. 1999;55(2):580–584.
    1. Steyerberg EW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. New York, NY: Springer Science+Business Media, LLC; 2009.
    1. Phillips AN, Thompson SG, Pocock SJ. Prognostic scores for detecting a high risk group: estimating the sensitivity when applied to new data. Stat Med. 1990;9(10):1189–1198.
    1. Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393.

Source: PubMed

3
S'abonner