Imputing missing covariate values for the Cox model

Ian R White, Patrick Royston, Ian R White, Patrick Royston

Abstract

Multiple imputation is commonly used to impute missing data, and is typically more efficient than complete cases analysis in regression analysis when covariates have missing values. Imputation may be performed using a regression model for the incomplete covariates on other covariates and, importantly, on the outcome. With a survival outcome, it is a common practice to use the event indicator D and the log of the observed event or censoring time T in the imputation model, but the rationale is not clear.We assume that the survival outcome follows a proportional hazards model given covariates X and Z. We show that a suitable model for imputing binary or Normal X is a logistic or linear regression on the event indicator D, the cumulative baseline hazard H(0)(T), and the other covariates Z. This result is exact in the case of a single binary covariate; in other cases, it is approximately valid for small covariate effects and/or small cumulative incidence. If we do not know H(0)(T), we approximate it by the Nelson-Aalen estimator of H(T) or estimate it by Cox regression.We compare the methods using simulation studies. We find that using logT biases covariate-outcome associations towards the null, while the new methods have lower bias. Overall, we recommend including the event indicator and the Nelson-Aalen estimator of H(T) in the imputation model.

Copyright 2009 John Wiley & Sons, Ltd.

Figures

Figure 1
Figure 1
Smoothed mean and SD of X|T, D with βX = 0.7, h0(t) = 1.

References

    1. Rubin DB. Multiple Imputation for Nonresponse in Surveys. New York: Wiley; 1987.
    1. Royston P. Multiple imputation of missing values. The Stata Journal. 2004;4:227–241.
    1. Royston P. Multiple imputation of missing values: update. The Stata Journal. 2005;5:188–201.
    1. SAS Institute Inc. SAS/STAT 9.1 User's Guide. Cary, NC: SAS Institute Inc.; 2004. Chapter 46.
    1. van Buuren S, Oudshoorn CGM. 2000. Multivariate imputation by chained equations: MICE V1.0 user's manual. TNO Report PG/VGZ/00.038, TNO Preventie en Gezondheid, Leiden, Available from:
    1. Rubin DB, Schenker N. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association. 1986;81:366–374.
    1. Fay RE. When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section of the American Statistical Association. 1992;81:227–232.
    1. Schafer JL. Analysis of Incomplete Multivariate Data. London: Chapman & Hall; 1997.
    1. Moons KG, Donders RA, Stijnen T, Harrell FE., Jr Using the outcome for imputation of missing predictor values was preferred. Journal of Clinical Epidemiology. 2006;59:1092–1101.
    1. van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine. 1999;18:681–694.
    1. Clark TG, Altman DG. Developing a prognostic model in the presence of missing data: an ovarian cancer case study. Journal of Clinical Epidemiology. 2003;56:28–37.
    1. Barzi F, Woodward M. Imputations of missing values in practice: results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology. 2004;160:34–45.
    1. Schafer JL. 2008. Available from: .
    1. Little RJA, Rubin DB. Statistical Analysis with Missing Data. 2nd edn. Hoboken, NJ: Wiley; 2002.
    1. Medical Research Council Renal Cancer Collaborators. Interferon-alpha and survival in metastatic renal carcinoma: early results of a randomised controlled trial. Lancet. 1999;353:14–17.
    1. Royston P, Sauerbrei W. Multivariable Model-building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables. Chichester: Wiley; 2008.
    1. Li KH, Raghunathan TE, Rubin DB. Large-sample significance levels from multiply imputed data using mopment-based statistics and an F reference distribution. Journal of the American Statistical Association. 1991;86:1065–1073.
    1. Collett D. Modelling Survival Data in Medical Research. London: Chapman & Hall; 1994.
    1. Little RJA. Missing-data adjustments in large surveys. Journal of Business and Economic Statistics. 1988;6:287–296.

Source: PubMed

3
S'abonner