Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls

Jonathan A C Sterne, Ian R White, John B Carlin, Michael Spratt, Patrick Royston, Michael G Kenward, Angela M Wood, James R Carpenter, Jonathan A C Sterne, Ian R White, John B Carlin, Michael Spratt, Patrick Royston, Michael G Kenward, Angela M Wood, James R Carpenter

Abstract

Most studies have some missing data. Jonathan Sterne and colleagues describe the appropriate use and reporting of the multiple imputation approach to dealing with them

Conflict of interest statement

Competing interests: None declared.

References

    1. Wood A, White IR, Thompson SG. Are missing outcome data adequately handled? A review of published randomised controlled trials. Clin Trials 2004;1:368-76.
    1. Royston P. Multiple imputation of missing values. Stata J 2004;4:227-41.
    1. Royston P. Multiple imputation of missing values: update of ice. Stata J 2005;5:527-36.
    1. Multiple Imputation Online. Software..
    1. SAS Institute. The MI procedure. .
    1. Little RJ, Rubin DB. Statistical analysis with missing data. 2nd ed. New York: Wiley, 2002.
    1. Vach W, Blettner M. Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables. Am J Epidemiol 1991;134:895-907.
    1. Carpenter JR, Kenward MG. A critique of common approaches to missing data. In: Missing data in randomised controlled trials— a practical guide. Birmingham: National Institute for Health Research, 2008. .
    1. Steyerberg EW, van Veen M. Letter: Imputation is beneficial for handling missing data in predictive models. J Clin Epidemiol 2007;60:979.
    1. Allison PD. Multiple imputation for missing data. A cautionary tale. Sociol Methods Res 2000;28:301-9.
    1. Carpenter JR, Kenward MG. MAR methods for quantitative data. In: Missing data in randomised controlled trials— a practical guide. Birmingham: National Institute for Health Research, 2008. .
    1. Goldstein H, Carpenter J, Kenward MG, Levin K. Multilevel models with multivariate mixed response types. Stat modelling (in press).
    1. Schafer JL. Analysis of incomplete multivariate data. London: Chapman and Hall, 1997.
    1. Scharfstein DO, Rotnitzky A, Robins JM. Adjusting for non-ignorable drop-out using semiparametric non-response models. J Am Stat Assoc 1999;94:1096-120.
    1. Carpenter JR, Kenward MG, Vansteelandt S. A comparison of multiple imputation and inverse probability weighting for analyses with missing data. J R Stat Soc [Ser A] 2006;169:571-84.
    1. Rubin D. Multiple imputation for nonresponse in surveys. New York: Wiley, 1987.
    1. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P. Derivation and validation of QRISK, a new cardiovascular disease risk score for the United Kingdom: prospective open cohort study. BMJ 2007;335:136.
    1. Peto R. Doubts about QRISK score: total/HDL cholesterol should be important [ electronic response to Hippisley-Cox J, et al]. BMJ 2007 .
    1. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, May M, Brindle P. QRISK— authors’ response [electronic response]. BMJ 2007 .
    1. Moons KG, Donders RA, Stijnen T, Harrell FE. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol 2006;59:1092-101.
    1. Van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med 1999;18:681-94.
    1. Horton NJ, Kleinman KP. Much ado about nothing: a comparison of missing data methods and software to fit incomplete data regression models. Am Stat 2007;61:79-90.
    1. Bernaards CA, Belin TR, Schafer JL. Robustness of a multivariate normal approximation for imputation of incomplete binary data. Stat Med 2007;26:1368-82.
    1. Collins LM, Schafer JL, Kam CM. A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychol Methods 2001;6:330-51.
    1. Carpenter J, Kenward M. Brief comments on computational issues with multiple imputation. .
    1. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Brindle P. QRISK cardiovascular disease risk prediction algorithm—comparison of the revised and the original analyses. Technical supplement 1. 2007. .
    1. Von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP, STROBE initiative. strengthening the reporting of observational studies in epidemiology (STROBE) statement: guidelines for reporting observational studies. BMJ 2007;335:806-8.
    1. Klebanoff MA, Cole SR. Use of multiple imputation in the epidemiologic literature. Am J Epidemiol 2008;168:355-7.
    1. Horton NJ, Lipsitz SR. Multiple imputation in practice: Comparison of software packages for regression models with missing variables. Am Stat 2001;55:244-54.
    1. Demirtas H, Schafer JL. On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out. Stat Med 2003;22:2553-75.
    1. Carpenter JR, Kenward MG, White IR. Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Stat Methods Med Res 2007;16:259-75.
    1. Burton A, Billingham LJ, Bryan S. Cost-effectiveness in clinical trials: using multiple imputation to deal with incomplete cost data. Clin Trials 2007;4:154-61.

Source: PubMed

Подписаться