Combining multiple imputation and inverse-probability weighting

Shaun R Seaman, Ian R White, Andrew J Copas, Leah Li, Shaun R Seaman, Ian R White, Andrew J Copas, Leah Li

Abstract

Two approaches commonly used to deal with missing data are multiple imputation (MI) and inverse-probability weighting (IPW). IPW is also used to adjust for unequal sampling fractions. MI is generally more efficient than IPW but more complex. Whereas IPW requires only a model for the probability that an individual has complete data (a univariate outcome), MI needs a model for the joint distribution of the missing data (a multivariate outcome) given the observed data. Inadequacies in either model may lead to important bias if large amounts of data are missing. A third approach combines MI and IPW to give a doubly robust estimator. A fourth approach (IPW/MI) combines MI and IPW but, unlike doubly robust methods, imputes only isolated missing values and uses weights to account for remaining larger blocks of unimputed missing data, such as would arise, e.g., in a cohort study subject to sample attrition, and/or unequal sampling fractions. In this article, we examine the performance, in terms of bias and efficiency, of IPW/MI relative to MI and IPW alone and investigate whether the Rubin's rules variance estimator is valid for IPW/MI. We prove that the Rubin's rules variance estimator is valid for IPW/MI for linear regression with an imputed outcome, we present simulations supporting the use of this variance estimator in more general settings, and we demonstrate that IPW/MI can have advantages over alternatives. IPW/MI is applied to data from the National Child Development Study.

© 2011, The International Biometric Society.

References

    1. Atherton K, Fuller E, Shepherd P, Strachan DP, Power C. Loss and representativeness in a biomedical survey at age 45 years: 1958 British Birth Cohort. Journal of Epidemiology and Community Health. 2008;62:216–223.
    1. Caldwell TM, Rodgers B, Clark C, Jefferis BJMH, Stansfeld SA, Power C. Lifecourse socioeconomic predictors of midlife drinking patterns, problems and abstention: Findings from the 1958 British Birth Cohort study. Drug and Alcohol Dependence. 2008;95:269–278.
    1. Gelman A, Carlin JB, Stern HS, Rubin DB, editors. Bayesian Data Analysis. London: Chapman and Hall/CRC; 2004.
    1. Goldstein H. Handling attrition and non-response in longitudinal data. Longitudinal and Life Course Studies. 2009;1:63–72.
    1. Höfler M, Pfister H, Lieb R, Wittchen H. The use of weights to account for non-response and drop-out. Social Psychiatry and Psychiatric Epidemiology. 2005;40:291–299.
    1. Jones MP. Indicator and stratification methods for missing explanatory variables in multiple linear regression. Journal of the American Statistical Association. 1996;91:222–230.
    1. Kim JK, Brick JM, Fuller WA, Kalton G. On the bias of the multiple-imputation variance estimator in survey sampling. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 2006;68:509–521.
    1. Little RJA, Rubin DB, editors. Statistical Analysis with Missing Data. New Jersey, NJ:: Wiley; 2002.
    1. Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Statistical Science. 1994;9:538–573.
    1. Nielsen SF. Proper and improper multiple imputation. International Statistical Review. 2003;71:593–627.
    1. Power C, Elliott J. Cohort profile: 1958 British Birth Cohort (National Child Development Study) International Journal of Epidemiology. 2006;35:34–41.
    1. Priebe S, Fakhoury W, White I, Watts J, Bebbington P, Billings J, Burns T, Johnson S, Muijen M, Ryrie I, Wright C P.L.A.O.S. Group. Characteristics of teams, staff and patients: Associations with outcomes of patients in assertive outreach. British Journal of Psychiatry. 2004;185:306–311.
    1. Robins JM, Gill RD. Non-response models for the analysis of non-monotone ignorable missing data. Statistics in Medicine. 1997;16:39–56.
    1. Robins J, Wang N. Inference for imputation estimators. Biometrika. 2000;87:113–24.
    1. Robins J, Rotnitzky A, Zhao L. Estimation of regression coefficients when some regressors are not always observed. Journal of the American Statistical Association. 1994;89:846–866.
    1. Royston J. Multiple imputation of missing values: Update of ice. Stata Journal. 2005;5:527–536.
    1. Rubin DB, editor. Multiple Imputation for Nonresponse in Surveys. New York, NJ:: Wiley; 1987.
    1. Schafer JL. Multiple imputation in multivariate problems when the imputation and analysis models differ. Statistica Neerlandica. 2003;57:19–35.
    1. Schenker N, Welsh AH. Asymptotic results for multiple imputation. Annals of Statistics. 1988;16:1550–1566.
    1. Stansfeld SA, Clark C, Caldwell TM, Rodgers B, Power C. Psychosocial work characteristics and anxiety and depressive disorders in midlife: The effects of prior psychological distress. Occupational and Environmental Medicine. 2008a;65:634–642.
    1. Stansfeld SA, Clark C, Rodgers B, Caldwell TM, Power C. Childhood and adulthood socio-economic position and midlife depressive and anxiety disorders. Drug and Alcohol Dependence. 2008b;95:269–278.
    1. Thomas C, Hypponen E, Power C. Prenatal exposures and glucose metabolism in adulthood. Diabetes Care. 2007;30:918–924.
    1. Van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research. 2007;16:219–242.
    1. Vansteelandt S, Carpenter J, Kenward MG. Analysis of incomplete data using inverse probability weighting and doubly robust estimators. Methodology. 2010;6:37–48.
    1. Wang N, Robins JM. Large-sample theory for parametric multiple imputation procedures. Biometrika. 1998;85:935–948.
    1. White IR, Carlin JB. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Statistics in Medicine. 2010;29:2920–2931.
    1. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Statistics in Medicine. 2010;30:377–399.

Source: PubMed

3
Suscribir