Heckman imputation models for binary or continuous MNAR outcomes and MAR predictors

Jacques-Emmanuel Galimard, Sylvie Chevret, Emmanuel Curis, Matthieu Resche-Rigon, Jacques-Emmanuel Galimard, Sylvie Chevret, Emmanuel Curis, Matthieu Resche-Rigon

Abstract

Background: Multiple imputation by chained equations (MICE) requires specifying a suitable conditional imputation model for each incomplete variable and then iteratively imputes the missing values. In the presence of missing not at random (MNAR) outcomes, valid statistical inference often requires joint models for missing observations and their indicators of missingness. In this study, we derived an imputation model for missing binary data with MNAR mechanism from Heckman's model using a one-step maximum likelihood estimator. We applied this approach to improve a previously developed approach for MNAR continuous outcomes using Heckman's model and a two-step estimator. These models allow us to use a MICE process and can thus also handle missing at random (MAR) predictors in the same MICE process.

Methods: We simulated 1000 datasets of 500 cases. We generated the following missing data mechanisms on 30% of the outcomes: MAR mechanism, weak MNAR mechanism, and strong MNAR mechanism. We then resimulated the first three cases and added an additional 30% of MAR data on a predictor, resulting in 50% of complete cases. We evaluated and compared the performance of the developed approach to that of a complete case approach and classical Heckman's model estimates.

Results: With MNAR outcomes, only methods using Heckman's model were unbiased, and with a MAR predictor, the developed imputation approach outperformed all the other approaches.

Conclusions: In the presence of MAR predictors, we proposed a simple approach to address MNAR binary or continuous outcomes under a Heckman assumption in a MICE procedure.

Trial registration: ClinicalTrials.gov NCT00799760.

Keywords: Heckman’s model; Missing data; Missing not at random (MNAR); Multiple imputation by chained equation (MICE); Sample selection method.

Conflict of interest statement

Ethics approval and consent to participate

All the data have already been published in: “Efficacy of oseltamivir-zanamivir combination compared to each monotherapy for seasonal influenza: a randomized placebo-controlled trial.” (http://dx.doi.org/10.1371/journal.pmed.1000362). This study was approved on July 18, 2008 by the Ethics Committee of Ile de France 1 (“CPP Ile de France 1”) and the French drug administration (AFSSAPS). We used already analysed data and a pre-specified secondary outcome on compliance to antiviral treatment (Trial registration: http://www.ClinicalTrials.govNCT00799760).

Consent for publication

All the data have already been published in: “Efficacy of oseltamivir-zanamivir combination compared to each monotherapy for seasonal influenza: a randomized placebo-controlled trial.” (http://dx.doi.org/10.1371/journal.pmed.1000362). This study was approved on July 18, 2008 by the Ethics Committee of Ile de France 1 (“CPP Ile de France 1”) and the French drug administration (AFSSAPS). We used already analysed data and a pre-specified secondary outcome on compliance to antiviral treatment (Trial registration: http://www.ClinicalTrials.govNCT00799760).

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Binary outcome Boxplot of β1 estimates on the 1000 simulations associated to Table 1 (plot a), Table 3 (plot b), Table 5 left (plot c) and Table 5 right (plot d)
Fig. 2
Fig. 2
Continuous outcome Boxplot of β1 estimates on the 1000 simulations associated to Table 2 (plot a), Table 4 (plot b), Table 6 left (plot c) and Table 6 right (plot d)

References

    1. Little RJ, Rubin DB. Statistical Analysis with Missing Data. New York: Wiley; 2002.
    1. van Buuren S. Flexible Imputation of Missing Data. Boca Raton: CRC press; 2012.
    1. Thijs H, Molenberghs G, Michiels B, Verbeke G, Curran D. Strategies to fit pattern-mixture models. Biostatistics. 2002;3(2):245–65. doi: 10.1093/biostatistics/3.2.245.
    1. Fitzmaurice GM, Kenward MG, Molenberghs G, Verbeke G, Tsiatis AA. Handbook of Missing Data Methodology. Boca Raton: Chapman and Hall/CRC Press; 2014. Missing data: Introduction and statistical preliminaries.
    1. Little RJ. Pattern-mixture models for multivariate incomplete data. J Am Stat Assoc. 1993;88(421):125–34.
    1. Rubin DB. Formalizing subjective notions about the effect of nonrespondents in sample surveys. J Am Stat Assoc. 1977;72(359):538–43. doi: 10.1080/01621459.1977.10480610.
    1. Glynn RJ, Laird NM, Rubin DB. Drawing Inferences from Self-selected Samples. New York: Springer; 1986. Selection modeling versus mixture modeling with nonignorable nonresponse.
    1. van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999;18(6):681–94. doi: 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>;2-R.
    1. Ratitch B, O’Kelly M, Tosiello R. Missing data in clinical trials: From clinical assumptions to statistical analysis using pattern mixture models. Pharm Stat. 2013;12(6):337–47. doi: 10.1002/pst.1549.
    1. Greene WH. Econometric Analysis: International Edition (7th Ed.) Edinburgh: Pearson; 2011.
    1. Amemiya T. Tobit models: A survey. J Econom. 1984;24(1):3–61. doi: 10.1016/0304-4076(84)90074-5.
    1. Heckman JJ. The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann Econ Soc Meas. 1976;5(4):475–92.
    1. Heckman JJ. Sample selection bias as a specification error. Econometrica. 1979;47(1):153–61. doi: 10.2307/1912352.
    1. Toomet O, Henningsen A. Sample selection models in R: Package sampleSelection. J Stat Softw. 2008;27(7):1–23. doi: 10.18637/jss.v027.i07.
    1. Van de Ven WPMM, Van Praag BMS. The demand for deductibles in private health insurance: A probit model with sample selection. J Econom. 1981;17(2):229–52. doi: 10.1016/0304-4076(81)90028-2.
    1. Greene W. A stochastic frontier model with correction for sample selection. J Prod Anal. 2010;34(1):15–24. doi: 10.1007/s11123-009-0159-1.
    1. White IR, Royston P, Wood AM. Multiple imputation using chained equations: Issues and guidance for practice. Stat Med. 2011;30(4):377–99. doi: 10.1002/sim.4067.
    1. Galimard J-E, Chevret S, Protopopescu C, Resche-Rigon M. A multiple imputation approach for MNAR mechanisms compatible with Heckman’s model. Stat Med. 2016;35(17):2907–20. doi: 10.1002/sim.6902.
    1. Marra G, Radice R. A penalized likelihood estimation approach to semiparametric sample selection binary response modeling. Electron J Stat. 2013;7:1432–55. doi: 10.1214/13-EJS814.
    1. Duval X, van der Werf S, Blanchon T, Mosnier A, Bouscambert-Duchamp M, Tibi A, Enouf V, Charlois-Ou C, Vincent C, Andreoletti L, Tubach F, Lina B, Mentré F, Leport C, and the Bivir Study Group Efficacy of oseltamivir-zanamivir combination compared to each monotherapy for seasonal influenza: A randomized placebo-controlled trial. PLoS Med. 2010;7(11):1000362. doi: 10.1371/journal.pmed.1000362.
    1. Treanor JJ, Hayden FG, Vrooman PS, Barbarash R, Bettis R, Riff D, Singh S, Kinnersley N, Ward P, Mills RG, et al. Efficacy and safety of the oral neuraminidase inhibitor oseltamivir in treating acute influenza: a randomized controlled trial. JAMA. 2000;283(8):1016–24. doi: 10.1001/jama.283.8.1016.
    1. Vella F. Estimating models with sample selection bias: A survey. J Hum Resour. 1998;33(1):127–69. doi: 10.2307/146317.
    1. Puhani P. The Heckman correction for sample selection and its critique. J Econ Surveys. 2000;14(1):53–68. doi: 10.1111/1467-6419.00104.
    1. Marra G, Radice R, Bärnighausen T, Wood SN, McGovern ME. A simultaneous equation approach to estimating hiv prevalence with nonignorable missing responses. J Am Stat Assoc. 2017;112(518):484–96. doi: 10.1080/01621459.2016.1224713.
    1. Chen HY. Compatibility of conditionally specified models. Stat Probab Lett. 2010;80(7):670–7. doi: 10.1016/j.spl.2009.12.025.
    1. Hughes RA, White IR, Seaman SR, Carpenter JR, Tilling K, Sterne JA. Joint modelling rationale for chained equations. BMC Med Res Methodol. 2014;14(1):28. doi: 10.1186/1471-2288-14-28.
    1. van Buuren S. Multiple imputation of discrete and continuous data by fully conditional specification. Stat Meth Med Res. 2007;16(3):219–42. doi: 10.1177/0962280206074463.
    1. van Buuren S, Brand JP, Groothuis-Oudshoorn C, Rubin DB. Fully conditional specification in multivariate imputation. J Stat Comput Simul. 2006;76(12):1049–64. doi: 10.1080/10629360600810434.
    1. Rubin DB. Multiple Imputation for Nonresponse in Surveys. New-York: Wiley; 1987.
    1. R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing; 2016. R Foundation for Statistical Computing. .
    1. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in R. J Stat Softw. 2011;45(3):1–67.
    1. Marra G, Radice R. Estimation of a regression spline sample selection model. Comput Stat Data Anal. 2013;61:158–73. doi: 10.1016/j.csda.2012.12.010.
    1. Kaambwa B, Bryan S, Billingham L. Do the methods used to analyse missing data really matter? An examination of data from an observational study of intermediate care patients. BMC Res Notes. 2012;5(1):330. doi: 10.1186/1756-0500-5-330.
    1. Bushway S, Johnson BD, Slocum LA. Is the magic still there? the use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol. 2007;23(2):151–78. doi: 10.1007/s10940-007-9024-4.
    1. Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Boca Raton: CRC Press; 1996. Introducing markov chain monte carlo.
    1. Meng X-L. Multiple-imputation inferences with uncongenial sources of input. Stat Sci. 1994;9:538–58. doi: 10.1214/ss/1177010269.
    1. Liu J, Gelman A, Hill J, Su Y-S, Kropko J. On the stationary distribution of iterative imputations. Biometrika. 2014;101(1):155–73. doi: 10.1093/biomet/ast044.
    1. Marchenko YV, Genton MG. A heckman selection-t model. J Am Stat Assoc. 2012;107(497):304–17. doi: 10.1080/01621459.2012.656011.
    1. Ogundimu EO, Collins GS, A robust imputation method for missing responses and covariates in sample selection models. Stat Meth Med Res. 2017;0(0). 10.1177/0962280217715663.
    1. Kai L. Bayesian inference in a simultaneous equation model with limited dependent variables. J Econom. 1998;85(2):387–400. doi: 10.1016/S0304-4076(97)00106-1.
    1. van Hasselt M. Bayesian inference in a sample selection model. J Econom. 2011;165(2):221–32. doi: 10.1016/j.jeconom.2011.08.003.

Source: PubMed

3
購読する