Imputation methods for missing data for polygenic models

Brooke Fridley, Kari Rabe, Mariza de Andrade, Brooke Fridley, Kari Rabe, Mariza de Andrade

Abstract

Methods to handle missing data have been an area of statistical research for many years. Little has been done within the context of pedigree analysis. In this paper we present two methods for imputing missing data for polygenic models using family data. The imputation schemes take into account familial relationships and use the observed familial information for the imputation. A traditional multiple imputation approach and multiple imputation or data augmentation approach within a Gibbs sampler for the handling of missing data for a polygenic model are presented.We used both the Genetic Analysis Workshop 13 simulated missing phenotype and the complete phenotype data sets as the means to illustrate the two methods. We looked at the phenotypic trait systolic blood pressure and the covariate gender at time point 11 (1970) for Cohort 1 and time point 1 (1971) for Cohort 2. Comparing the results for three replicates of complete and missing data incorporating multiple imputation, we find that multiple imputation via a Gibbs sampler produces more accurate results. Thus, we recommend the Gibbs sampler for imputation purposes because of the ease with which it can be extended to more complicated models, the consistency of the results, and the accountability of the variation due to imputation.

References

    1. Little RJA, Rubin D. New York, Wiley. 2 2002. Statistical Analysis with Missing Data.
    1. Hopper J, Mathews J. Extensions to multivariate normal models for family analysis. Ann Hum Genet. 1982;46:373–383.
    1. de Andrade M, Amos C, Thiel T. Methods to estimate genetic components of variance for quantitative traits in family studies. Genet Epidemiol. 1999;17:64–76. doi: 10.1002/(SICI)1098-2272(1999)17:1<64::AID-GEPI5>;2-M.
    1. Lange K, Westlake J, Spence M. Extensions to pedigree analysis. III. Variance components by scoring method. Ann Hum Genet. 1976;39:485–491.
    1. Thompson E, Shaw R. Pedigree analysis for quantitative traits: variance components without matrix inversion. Biometrics. 1990;46:399–413. doi: 10.2307/2531445.
    1. Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. New York, Chapman & Hall. 1995.
    1. Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm (with discussion) J R Stat Soc B. 1977;39:1–38.
    1. Rao C. New York, Wiley. 2 1973. Linear Statistical Inference and Its Applications.
    1. Hopke P, Liu C, Rubin D. Multiple imputation for multivariate data with missing and below-threshold measurements: time-series concentrations of pollutants in the arctic. Biometrics. 2001;57:22–33. doi: 10.1111/j.0006-341X.2001.00022.x.
    1. Gelfand A, Smith A. Sampling-based approaches to calculating marginal densities. J Am Stat Assoc. 1990;85:398–409. doi: 10.2307/2289776.
    1. Geman S, Geman D. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Analysis Machine Intelligence. 1984;6:721–741.
    1. Tanner M, Wong W. The calculation of posterior distributions by data augmentation. J Am Stat Assoc. 1987;82:528–540. doi: 10.2307/2289457.
    1. Li K. Imputation using Markov chains. J Stat Comput Simul. 1988;30:57–79.
    1. Van Dyk D, Meng X. The art of data augmentation. J Comput Graph Stat. 2001;10:1–50. doi: 10.1198/10618600152418584.
    1. Schafer J. Analysis of Incomplete Multivariate Data. New York, Chapman & Hall. 1997.
    1. Barnard J, Rubin D. Small-sample degrees of freedom with multiple imputation. Biometrika. 1999;86:948–955. doi: 10.1093/biomet/86.4.948.

Source: PubMed

3
订阅