Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models

Han Chen, Chaolong Wang, Matthew P Conomos, Adrienne M Stilp, Zilin Li, Tamar Sofer, Adam A Szpiro, Wei Chen, John M Brehm, Juan C Celedón, Susan Redline, George J Papanicolaou, Timothy A Thornton, Cathy C Laurie, Kenneth Rice, Xihong Lin, Han Chen, Chaolong Wang, Matthew P Conomos, Adrienne M Stilp, Zilin Li, Tamar Sofer, Adam A Szpiro, Wei Chen, John M Brehm, Juan C Celedón, Susan Redline, George J Papanicolaou, Timothy A Thornton, Cathy C Laurie, Kenneth Rice, Xihong Lin

Abstract

Linear mixed models (LMMs) are widely used in genome-wide association studies (GWASs) to account for population structure and relatedness, for both continuous and binary traits. Motivated by the failure of LMMs to control type I errors in a GWAS of asthma, a binary trait, we show that LMMs are generally inappropriate for analyzing binary traits when population stratification leads to violation of the LMM's constant-residual variance assumption. To overcome this problem, we develop a computationally efficient logistic mixed model approach for genome-wide analysis of binary traits, the generalized linear mixed model association test (GMMAT). This approach fits a logistic mixed model once per GWAS and performs score tests under the null hypothesis of no association between a binary trait and individual genetic variants. We show in simulation studies and real data analysis that GMMAT effectively controls for population structure and relatedness when analyzing binary traits in a wide variety of study designs.

Copyright © 2016 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

Figures

Figure 1
Figure 1
Quantile-Quantile Plot of Association Test p Values from the Asthma GWAS Analysis in HCHS/SOL (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in Puerto Ricans over non-Puerto Ricans greater than 1.25. Abbreviations are as follows: LMM, a joint analysis using LMM on the combined samples; LMM meta, an inverse-variance weighted fixed effects meta-analysis approach to combine LMM results from analyzing Puerto Ricans and non-Puerto Ricans separately.
Figure 2
Figure 2
True Mean-Variance Relationship for a Binary Trait and the Constant Mean-Variance Relationship Assumed by Linear Models, Illustrated by the Example from the Asthma Data in HCHS/SOL For a binary trait with the mean π, its variance is π(1 − π), which varies with the mean. This heteroscedasticity is properly accounted for by logistic regression. Linear models inappropriately assume that the variance of the binary trait does not change with the mean and is a constant (homoscedasticity). For example, the variance of the binary trait (asthma status) in Puerto Ricans is considerably larger than the variances in the other five populations, because Puerto Ricans have a much higher asthma disease proportion than the other populations. This heteroscedasticity caused by population stratification results in the p values calculated from LMMs being likely to be incorrect, but is properly taken into account by logistic mixed models using GMMAT.
Figure 3
Figure 3
A Simulated Cohort Study with 10,000 Related Individuals Quantile-quantile plots of association test p values from 3,200 simulation replicates under the null hypothesis of no genetic association, each with 625,583 common SNPs, were combined to get more than 2 billion null p values. (A) All SNPs. (B) Category 1: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) less than 0.8. (C) Category 2: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) between 0.8 and 1.25. (D) Category 3: SNPs with the ratio of expected variances in population 1 (high risk) over population 2 (low risk) greater than 1.25.

Source: PubMed

3
Abonneren