Bayesian network analysis incorporating genetic anchors complements conventional Mendelian randomization approaches for exploratory analysis of causal relationships in complex data
Richard Howey, So-Youn Shin, Caroline Relton, George Davey Smith, Heather J Cordell, Richard Howey, So-Youn Shin, Caroline Relton, George Davey Smith, Heather J Cordell
Abstract
Mendelian randomization (MR) implemented through instrumental variables analysis is an increasingly popular causal inference tool used in genetic epidemiology. But it can have limitations for evaluating simultaneous causal relationships in complex data sets that include, for example, multiple genetic predictors and multiple potential risk factors associated with the same genetic variant. Here we use real and simulated data to investigate Bayesian network analysis (BN) with the incorporation of directed arcs, representing genetic anchors, as an alternative approach. A Bayesian network describes the conditional dependencies/independencies of variables using a graphical model (a directed acyclic graph) with an accompanying joint probability. In real data, we found BN could be used to infer simultaneous causal relationships that confirmed the individual causal relationships suggested by bi-directional MR, while allowing for the existence of potential horizontal pleiotropy (that would violate MR assumptions). In simulated data, BN with two directional anchors (mimicking genetic instruments) had greater power for a fixed type 1 error than bi-directional MR, while BN with a single directional anchor performed better than or as well as bi-directional MR. Both BN and MR could be adversely affected by violations of their underlying assumptions (such as genetic confounding due to unmeasured horizontal pleiotropy). BN with no directional anchor generated inference that was no better than by chance, emphasizing the importance of directional anchors in BN (as in MR). Under highly pleiotropic simulated scenarios, BN outperformed both MR (and its recent extensions) and two recently-proposed alternative approaches: a multi-SNP mediation intersection-union test (SMUT) and a latent causal variable (LCV) test. We conclude that BN incorporating genetic anchors is a useful complementary method to conventional MR for exploring causal relationships in complex data sets such as those generated from modern "omics" technologies.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
References
- Davey Smith G, Ebrahim S. Epidemiology—is it time to call it a day? Int J Epidemiology. 2001;30:1–11. 10.1093/ije/30.1.1
- Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. 10.1016/0270-0255(86)90088-6
- Robins JM, Hernán MA. Estimation of the causal effects of time-varying exposures In: Longitudinal Data Analysis. New York: Chapman & Hall/CRC Press; 2009. p. 553–599.
- Davey Smith G, Ebrahim S. ‘Mendelian randomization’: can genetic epidemiology contribute to understanding environmental determinants of disease? Int J Epidemiology. 2003;32:1–22. 10.1093/ije/dyg070
- Evans DM, Davey Smith G. Mendelian Randomization: New Applications in the Coming Age of Hypothesis-Free Causality. Annu Rev Genomics Hum Genet. 2015;16:327–350. 10.1146/annurev-genom-090314-050016
- Lawlor DA, Windmeijer F, Davey Smith G. Is Mendelian randomization ‘lost in translation?’: Comments on ‘Mendelian randomization equals instrumental variable analysis with genetic instruments’ by Wehby et al. Statistics in Medicine. 2008;27:2750–2755. 10.1002/sim.3308
- Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007;16:309–330. 10.1177/0962280206077743
- Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362.
- Voight BF, Peloso GM, Orho-Melander M, Frikke-Schmidt R, Barbalic M, Jensen MK, et al. Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study. Lancet. 2012;380:572–580. 10.1016/S0140-6736(12)60312-2
- Weng LC, Roetker NS, Lutsey PL, Alonso A, Guan W, Pankow JS, et al. Evaluation of the relationship between plasma lipids and abdominal aortic aneurysm: A Mendelian randomization study. PLoS One. 2018;13(4):e0195719 10.1371/journal.pone.0195719
- Richmond RC, Sharp GC, Ward ME, Fraser A, Lyttleton O, McArdle WL, et al. DNA Methylation and BMI: Investigating Identified Methylation Sites at HIF3A in a Causal Framework. Diabetes. 2016;65(5):1231–1244. 10.2337/db15-0996
- Richardson TG, Haycock PC, Zheng J, Timpson NJ, Gaunt TR, Davey Smith G, et al. Systematic Mendelian randomization framework elucidates hundreds of CpG sites which may mediate the influence of genetic variants on disease. Hum Molec Genet. 2018;27:3293–3304. 10.1093/hmg/ddy210
- Yao C, Chen G, Song C, Keefe J, Mendelson M, Huan T, et al. Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nature Communications. 2018;9:3268 10.1038/s41467-018-05512-x
- Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis with multiple genetic variants using summarized data. Genet Epidemiol. 2013;37:658–665. 10.1002/gepi.21758
- Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Molec Genet. 2014;23(R1):R89–98. 10.1093/hmg/ddu328
- Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG, EPIC-InterAct Consortium. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol. 2015;30:543–552. 10.1007/s10654-015-0011-z
- Hartwig FP, Davies NM, Hemani G, Davey Smith G. Two-sample Mendelian randomization: avoiding the downsides of a powerful, widely applicable but potentially fallible technique. Int J Epidemiol. 2016;45:1717–1726. 10.1093/ije/dyx028
- Relton C, Davey Smith G. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int J Epidemiol. 2012;41:161–176. 10.1093/ije/dyr233
- Bowden J, Davey Smith G, S B. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int J Epidemiol. 2015;44:512–525. 10.1093/ije/dyv080
- Burgess S, Thompson SG. Multivariable Mendelian randomization: the use of pleiotropic genetic variants to estimate causal effects. Am J Epidemiol. 2015;181:251–260. 10.1093/aje/kwu283
- Burgess S, Daniel RM, Butterworth AS, Thompson SG, EPIC-InterAct Consortium. Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways. Int J Epidemiol. 2015;44:484–495. 10.1093/ije/dyu176
- Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan N, Thompson J. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. Statistics in Medicine. 2017;36:1783–1802. 10.1002/sim.7221
- Bowden J, Hemani G, Davey Smith G. Detecting individual and global horizontal pleiotropy in Mendelian randomization: a job for the humble heterogeneity statistic? Am J Epidemiol. 2018;187:2681–2685. 10.1093/aje/kwy185
- Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018;50:693–698. 10.1038/s41588-018-0099-7
- Zuber V, Colijn JM, Klaver C, Burgess S. Selecting causal risk factors from high-throughput experiments using multivariable Mendelian randomization. bioRxiv. 2018; 10.1101/396333.
- Porcu E, Rüeger S, Lepik K, eQTLGen Consortium, BIOS Consortium, Santoni FA, et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nature Communications. 2019;10:3300 10.1038/s41467-019-10936-0
- Timpson NJ, Nordestgaard BG, Harbord RM, Zacho J, Frayling TM, Tybjærg-Hansen A, et al. C-reactive protein levels and body mass index: elucidating direction of causation through reciprocal Mendelian randomization. Int J Obes. 2011;35:300–308. 10.1038/ijo.2010.137
- Hemani G, Tilling K, Davey Smith G. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLOS Genetics. 2017;13:e1007081 10.1371/journal.pgen.1007081
- O’Connor LJ, Price AL. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat Genet. 2018;50:1726–1734.
- Pearl J. Bayesian networks: A model of self-activated memory for evidential reasoning. In: Proceedings, Cognitive Science Society. Irvine, CA; 1985. p. 329–334.
- Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann; 1988.
- Spirtes P. Introduction to Causal Inference. Journal of Machine Learning Research. 2010;11:1643–1662.
- Spirtes P, Glymour C, Scheines R. Causation, prediction, and search. Springer; 1993.
- Pearl J. Causality: models, reasoning, and inference, 2nd Ed Cambridge University Press; 2009.
- Scheines R. Computation and causation. Metaphilosophy. 2002;33(1-2):158–180. 10.1111/1467-9973.00223
- Lagani V, Triantafillou S, Ball G, Tegnér J, Tsamardinos I. Probabilistic Computational Causal Discovery for Systems Biology In: Geris L, Gomez-Cabrero D, editors. Uncertainty in Biology: A Computational Modeling Approach. Studies in Mechanobiology, Tissue Engineering and Biomaterials 17 Switzerland: Springer International Publishing; 2016. p. 33–73.
- Nagarajan R, Scutari M, Lébre S. Bayesian Networks in R. Springer-Verlag; New York; 2013.
- Hemani G, Bowden J, Davey Smith G. Evaluating the potential role of pleiotropy in Mendelian randomization studies. Hum Molec Genet. 2018;27:R195–R208. 10.1093/hmg/ddy163
- Scutari M, Denis JB. Bayesian Networks with Examples in R Texts in Statistical Science, Chapman & Hall/CRC; (US: ); 2014.
- Chickering DM, Heckerman D, Meek C. Large-Sample Learning of Bayesian Networks is NP-Hard. The Journal of Machine Learning Research. 2004;5:1287–1330.
- Hua L, Zheng WY, Xia H, Zhou P. Detecting the potential cancer association or metastasis by multi-omics data analysis. Genetic Molecular Research. 2016;15(3). 10.4238/gmr.15038987
- Myte R, Gylling B, Häggström J, Schneede J, Magne Ueland P, Hallmans G, et al. Untangling the role of one-carbon metabolism in colorectal cancer risk: a comprehensive Bayesian network analysis. Scientific Reports. 2017;7:43434 10.1038/srep43434
- Zhu J, Lum PY, Lamb J, GuhaThakurta D, Edwards SW, Thieringer R, et al. An integrative genomics approach to the reconstruction of gene networks in segregating populations. Cytogenetic and Genome Research. 2004;105(2-4):363–374. 10.1159/000078209
- Zhu J, Sova P, Xu Q, Dombek KM, Xu EY, Vu H, et al. Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biology. 2012;10(4):e1001301 10.1371/journal.pbio.1001301
- Yazdani A, Yazdani A, Samiei A, Boerwinkle E. Generating a robust statistical causal structure over 13 cardiovascular disease risk factors using genomics data. Journal of Biomedical Informatics. 2016;60:114–119. 10.1016/j.jbi.2016.01.012
- Sedgewick AJ, Buschur K, Shi I, Ramsey JD, Raghu VK, Manatakis DV, et al. Mixed Graphical Models for Integrative Causal Analysis with Application to Chronic Lung Disease Diagnosis and Prognosis. Bioinformatics. 2019;35:1204–1212. 10.1093/bioinformatics/bty769
- Badsha MB, Fu AQ. Learning Causal Biological Networks With the Principle of Mendelian Randomization. Frontiers in Genetics. 2019;10:460 10.3389/fgene.2019.00460
- Zhong W, Spracklen CN, Mohlke KL, Zheng X, Fine J, Li Y. Multi-SNP mediation intersection-union test. Bioinformatics. 2019;35:4724–4729. 10.1093/bioinformatics/btz285
- Moayyeri A, Hammond CJ, Valdes AM, Spector TD. Cohort Profile: TwinsUK and healthy ageing twin study. Int J Epidemiol. 2013;42:76–85. 10.1093/ije/dyr207
- Shi SY, Fauman EB, Petersen AK, Krumsiek J, Santos R, Huang J, et al. An atlas of genetic influences on human blood metabolites. Nat Genet. 2014;46(6):543–550. 10.1038/ng.2982
- Speliotes EK, et al. Association analyses of 249,796 individuals reveal eighteen new loci associated with body mass index. Nature Genetics. 2010;42:937–948. 10.1038/ng.686
- Monda KL, et al. A meta-analysis identifies new loci associated with body mass index in individuals of African ancestry. Nature Genetics. 2013;45(6):690–696. 10.1038/ng.2608
- Boettcher SG, Dethlefsen C. deal: A Package for Learning Bayesian Networks. Journal of Statistical Software. 2003;8(20).
- Wasserstein RL, Lazar NA. The ASA’s Statement on p-Values: Context, Process, and Purpose. The American Statistician. 2016;70:129–133. 10.1080/00031305.2016.1154108
- Shih S, Huang YT, Yang HI. A multiple mediator analysis approach to quantify the effects of the ADH1B and ALDH2 genes on hepatocellular carcinoma risk. Genetic Epidemiology. 2018;42(4):394–404. 10.1002/gepi.22120
- Cho Y, Haycock PC, Sanderson E, Gaunt TR, Zheng J, Davey Smith APMG, et al. MR-TRYX: A Mendelian randomization framework that exploits horizontal pleiotropy to infer novel causal pathways. bioRxiv. 2019; 10.1101/476085.
- Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C, Goodman SN, et al. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016;31:337–350. 10.1007/s10654-016-0149-3
- Visscher PM, Brown MA, McCarthy MI, Yang J. Five years of GWAS discovery. Am J Hum Genet. 2012;90:7–24. 10.1016/j.ajhg.2011.11.029
- Brumpton B, Sanderson E, Pires Hartwig F, Harrison S, Åberge Vie G, Cho Y, et al. Within-family studies for Mendelian randomization: avoiding dynastic, assortative mating, and population stratification biases. bioRxiv. 2019; 10.1101/602516.
- Ainsworth HF, Shin SY, Cordell HJ. A comparison of methods for inferring causal relationships between genotype and phenotype using additional biological measurements. Genet Epidemiol. 2017;41(7):577–586. 10.1002/gepi.22061
- Bycroft C and Freeman C and Petkova D and Band G and Elliott L T and Sharp K and Motyer A and Vukcevic D and Delaneau O and O’Connell J and Cortes A and Welsh S and Young A and Effingham M and McVean G and Leslie S and Allen N and Donnelly P and Marchini J. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. 10.1038/s41586-018-0579-z
- Lawlor DA, Tilling K, Davey Smith G. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–1886. 10.1093/ije/dyw314
- Munafò MR, Davey Smith G. Robust research needs many lines of evidence. Nature. 2018;553:399–401. 10.1038/d41586-018-01023-3
- Burgess S, Small DS, Thompson SG. A review of instrumental variable estimators for Mendelian randomization. Stat Methods Med Res. 2017;26:2333–2355. 10.1177/0962280215597579
- Kleiber C, Zeileis A. Applied Econometrics with R. New York: Springer-Verlag; 2008. Available from: .
- Howey R. BayesNetty. Computer program package obtainable from ; 2019.
- Csardi G, Nepusz T. The igraph software package for complex network research. InterJournal. 2006;Complex Systems:1695.
- Sanderson E, Davey Smith G, Windmeijer F, Bowden J. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int J Epidemiol. 2019;in press. 10.1093/ije/dyy262
- Kettunen J, Demirkan A, Würtz P, Draisma HH, Haller T, Rawal R, et al. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA. Nature Communications. 2016;7:11122 10.1038/ncomms11122
- Do R, Willer CJ, Schmidt EM, Sengupta S, Gao C, Peloso GM, et al. Common variants associated with plasma triglycerides and risk for coronary artery disease. Nat Genet. 2013;45:1345–1352. 10.1038/ng.2795
Source: PubMed