Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes

Yu Jiang, Sai Chen, Daniel McGuire, Fang Chen, Mengzhen Liu, William G Iacono, John K Hewitt, John E Hokanson, Kenneth Krauter, Markku Laakso, Kevin W Li, Sharon M Lutz, Matthew McGue, Anita Pandit, Gregory J M Zajac, Michael Boehnke, Goncalo R Abecasis, Scott I Vrieze, Xiaowei Zhan, Bibo Jiang, Dajiang J Liu, Yu Jiang, Sai Chen, Daniel McGuire, Fang Chen, Mengzhen Liu, William G Iacono, John K Hewitt, John E Hokanson, Kenneth Krauter, Markku Laakso, Kevin W Li, Sharon M Lutz, Matthew McGue, Anita Pandit, Gregory J M Zajac, Michael Boehnke, Goncalo R Abecasis, Scott I Vrieze, Xiaowei Zhan, Bibo Jiang, Dajiang J Liu

Abstract

Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants.

Trial registration: ClinicalTrials.gov NCT00608764.

Conflict of interest statement

The authors have declared that no competing interests exist.

References

    1. Lee S, Teslovich TM, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. American journal of human genetics. 2013;93(1):42–53. Epub 2013/06/19. 10.1016/j.ajhg.2013.05.010 ; PubMed Central PMCID: PMC3710762.
    1. Liu DJ, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, et al. Meta-analysis of gene-level tests for rare variant association. Nature genetics. 2014;46(2):200–4. Epub 2013/12/18. 10.1038/ng.2852 .
    1. Tang ZZ, Lin DY. MASS: meta-analysis of score statistics for sequencing studies. Bioinformatics. 2013;29(14):1803–5. Epub 2013/05/24. 10.1093/bioinformatics/btt280 ; PubMed Central PMCID: PMC3702254.
    1. Tang ZZ, Lin DY. Meta-analysis of sequencing studies with heterogeneous genetic associations. Genet Epidemiol. 2014;38(5):389–401. 10.1002/gepi.21798 ; PubMed Central PMCID: PMC4157393.
    1. Tang ZZ, Lin DY. Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs. American journal of human genetics. 2015;97(1):35–53. 10.1016/j.ajhg.2015.05.001 .
    1. Do R, Stitziel NO, Won HH, Jorgensen AB, Duga S, Angelica Merlini P, et al. Exome sequencing identifies rare LDLR and APOA5 alleles conferring risk for myocardial infarction. Nature. 2015;518(7537):102–6. 10.1038/nature13917 ; PubMed Central PMCID: PMCPMC4319990.
    1. Do R, Willer CJ, Schmidt EM, Sengupta S, Gao C, Peloso GM, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nature genetics. 2013;45(11):1345–52. 10.1038/ng.2795 ; PubMed Central PMCID: PMC3904346.
    1. Tg, Hdl Working Group of the Exome Sequencing Project NHL, Blood I, Crosby J, Peloso GM, Auer PL, et al. Loss-of-function mutations in APOC3, triglycerides, and coronary disease. The New England journal of medicine. 2014;371(1):22–31. 10.1056/NEJMoa1307095 ; PubMed Central PMCID: PMC4180269.
    1. Cohen JC, Boerwinkle E, Mosley TH Jr., Hobbs HH. Sequence variations in PCSK9, low LDL, and protection against coronary heart disease. The New England journal of medicine. 2006;354(12):1264–72. 10.1056/NEJMoa054013 .
    1. Pistis G, Porcu E, Vrieze SI, Sidore C, Steri M, Danjou F, et al. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. European journal of human genetics: EJHG. 2015;23(7):975–83. 10.1038/ejhg.2014.216 ; PubMed Central PMCID: PMCPMC4463504.
    1. Yang J, Ferreira T, Morris AP, Medland SE, Genetic Investigation of ATC, Replication DIG, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nature genetics. 2012;44(4):369–75, S1-3. 10.1038/ng.2213 ; PubMed Central PMCID: PMC3593158.
    1. Pasaniuc B, Zaitlen N, Shi H, Bhatia G, Gusev A, Pickrell J, et al. Fast and accurate imputation of summary statistics enhances evidence of functional enrichment. Bioinformatics. 2014. 10.1093/bioinformatics/btu416 .
    1. Yoneoka D, Henmi M. Synthesis of linear regression coefficients by recovering the within-study covariance matrix from summary statistics. Res Synth Methods. 2017;8(2):212–9. 10.1002/jrsm.1228 .
    1. Becker BJ, Wu M-J. The Synthesis of Regression Slopes in Meta-Analysis. Statist Sci. 2007;22(3):414–29. 10.1214/07-STS243
    1. Feng S, Liu D, Zhan X, Wing MK, Abecasis GR. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics. 2014. 10.1093/bioinformatics/btu367 .
    1. Hu YJ, Berndt SI, Gustafsson S, Ganna A, Genetic Investigation of ATC, Hirschhorn J, et al. Meta-analysis of gene-level associations for rare variants based on single-variant statistics. American journal of human genetics. 2013;93(2):236–48. 10.1016/j.ajhg.2013.06.011 ; PubMed Central PMCID: PMC3738834.
    1. Liu M, Malone SM, Vaidyanathan U, Keller MC, Abecasis G, McGue M, et al. Psychophysiological endophenotypes to characterize mechanisms of known schizophrenia genetic loci. Psychol Med. 2016:1–10. 10.1017/S0033291716003184 .
    1. Miller MB, Basu S, Cunningham J, Eskin E, Malone SM, Oetting WS, et al. The Minnesota Center for Twin and Family Research genome-wide association study. Twin Res Hum Genet. 2012;15(6):767–74. 10.1017/thg.2012.62 ; PubMed Central PMCID: PMCPMC3561927.
    1. Vrieze SI, Feng S, Miller MB, Hicks BM, Pankratz N, Abecasis GR, et al. Rare nonsynonymous exonic variants in addiction and behavioral disinhibition. Biological psychiatry. 2014;75(10):783–9. 10.1016/j.biopsych.2013.08.027 ; PubMed Central PMCID: PMC3975816.
    1. Pilia G, Chen WM, Scuteri A, Orru M, Albai G, Dei M, et al. Heritability of cardiovascular and personality traits in 6,148 Sardinians. PLoS genetics. 2006;2(8):e132 10.1371/journal.pgen.0020132 ; PubMed Central PMCID: PMCPMC1557782.
    1. Stancakova A, Javorsky M, Kuulasmaa T, Haffner SM, Kuusisto J, Laakso M. Changes in insulin sensitivity and insulin release in relation to glycemia and glucose tolerance in 6,414 Finnish men. Diabetes. 2009;58(5):1212–21. 10.2337/db08-1607 ; PubMed Central PMCID: PMCPMC2671053.
    1. Brieger K, Zajac GJM, Schmidt EM, Clark CP, Yang J, Li K, et al. Genes for Good: engaging the public in genetics research using social media. In preparation.
    1. Qiao D, Lange C, Beaty TH, Crapo JD, Barnes KC, Bamshad M, et al. Exome Sequencing Analysis in Severe, Early-Onset Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med. 2016;193(12):1353–63. 10.1164/rccm.201506-1223OC .
    1. Stallings MC, Corley RP, Dennehey B, Hewitt JK, Krauter KS, Lessem JM, et al. A genome-wide search for quantitative trait Loci that influence antisocial drug dependence in adolescence. Arch Gen Psychiatry. 2005;62(9):1042–51. 10.1001/archpsyc.62.9.1042 .
    1. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nature genetics. 2016;48(10):1279–83. 10.1038/ng.3643 .
    1. Das S, Forer L, Schonherr S, Sidore C, Locke AE, Kwong A, et al. Next-generation genotype imputation service and methods. Nature genetics. 2016;48(10):1284–7. 10.1038/ng.3656 ; PubMed Central PMCID: PMCPMC5157836.
    1. Zhan X, Hu Y, Li B, Abecasis GR, Liu DJ. RVTESTS: an efficient and comprehensive tool for rare variant association analysis using sequence data. Bioinformatics. 2016;32(9):1423–6. 10.1093/bioinformatics/btw079 ; PubMed Central PMCID: PMCPMC4848408.
    1. Saccone NL, Culverhouse RC, Schwantes-An TH, Cannon DS, Chen X, Cichon S, et al. Multiple independent loci at chromosome 15q25.1 affect smoking quantity: a meta-analysis and comparison with lung cancer and COPD. PLoS Genet. 2010;6(8). Epub 2010/08/12. 10.1371/journal.pgen.1001053 ; PubMed Central PMCID: PMC2916847.
    1. Liu DJ, Peloso GM, Yu H, Butterworth AS, Wang X, Mahajan A, et al. Exome-wide association study of plasma lipids in >300,000 individuals. Nature genetics. 2017. Epub 2017/10/31. 10.1038/ng.3977 .
    1. Marouli E, Graff M, Medina-Gomez C, Lo KS, Wood AR, Kjaer TR, et al. Rare and low-frequency coding variants alter human adult height. Nature. 2017. 10.1038/nature21039 .
    1. Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Schizophrenia Working Group of the Psychiatric Genomics C, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nature genetics. 2015;47(3):291–5. 10.1038/ng.3211 ; PubMed Central PMCID: PMCPMC4495769.
    1. Thorgeirsson TE, Gudbjartsson DF, Surakka I, Vink JM, Amin N, Geller F, et al. Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior. Nature genetics. 2010;42(5):448–53. 10.1038/ng.573 ; PubMed Central PMCID: PMCPMC3080600.
    1. Yin X, Bizon C, Tilson J, Lin Y, Gizer IR, Ehlers CL, et al. Genome-wide meta-analysis identifies a novel susceptibility signal at CACNA2D3 for nicotine dependence. Am J Med Genet B Neuropsychiatr Genet. 2017. 10.1002/ajmg.b.32540 ; PubMed Central PMCID: PMCPMC5656555.
    1. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic acids research. 2014;42(Database issue):D1001–6. 10.1093/nar/gkt1229 ; PubMed Central PMCID: PMC3965119.
    1. Verzilli C, Shah T, Casas JP, Chapman J, Sandhu M, Debenham SL, et al. Bayesian meta-analysis of genetic association studies with different sets of markers. American journal of human genetics. 2008;82(4):859–72. 10.1016/j.ajhg.2008.01.016 ; PubMed Central PMCID: PMCPMC2665011.
    1. Newcombe PJ, Verzilli C, Casas JP, Hingorani AD, Smeeth L, Whittaker JC. Multilocus Bayesian meta-analysis of gene-disease associations. American journal of human genetics. 2009;84(5):567–80. 10.1016/j.ajhg.2009.04.001 ; PubMed Central PMCID: PMCPMC2680997.
    1. 1000 Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74. 10.1038/nature15393 .
    1. Li Y, Kellis M. Joint Bayesian inference of risk variants and tissue-specific epigenomic enrichments across multiple complex human diseases. Nucleic acids research. 2016;44(18):e144 10.1093/nar/gkw627 ; PubMed Central PMCID: PMCPMC5062982.
    1. Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32(10):1493–501. 10.1093/bioinformatics/btw018 ; PubMed Central PMCID: PMCPMC4866522.
    1. Chen W, McDonnell SK, Thibodeau SN, Tillmans LS, Schaid DJ. Incorporating Functional Annotations for Fine-Mapping Causal Variants in a Bayesian Framework Using Summary Statistics. Genetics. 2016;204(3):933–58. 10.1534/genetics.116.188953 ; PubMed Central PMCID: PMCPMC5105870.

Source: PubMed

Подписаться