Classification of common human diseases derived from shared genetic and environmental determinants

Kanix Wang, Hallie Gaitsch, Hoifung Poon, Nancy J Cox, Andrey Rzhetsky, Kanix Wang, Hallie Gaitsch, Hoifung Poon, Nancy J Cox, Andrey Rzhetsky

Abstract

In this study, we used insurance claims for over one-third of the entire US population to create a subset of 128,989 families (481,657 unique individuals). We then used these data to (i) estimate the heritability and familial environmental patterns of 149 diseases and (ii) infer the genetic and environmental correlations for disease pairs from a set of 29 complex diseases. The majority (52 of 65) of our study's heritability estimates matched earlier reports, and 84 of our estimates appear to have been obtained for the first time. We used correlation matrices to compute environmental and genetic disease classifications and corresponding reliability measures. Among unexpected observations, we found that migraine, typically classified as a disease of the central nervous system, appeared to be most genetically similar to irritable bowel syndrome and most environmentally similar to cystitis and urethritis, all of which are inflammatory diseases.

Conflict of interest statement

Competing Financial Interests Statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1
Information on study population, results of model selection, and analysis of heritability of 149 diseases. (A) Distribution of study population across population density septile; septile 1 corresponds to the most-rural counties and septile 7 most-urban. (B) Number of children in a family as a function of population density septile; septile notations are the same as in the (A). (C) Parent/child age distribution in studied families. (D) Model selection results, using univariate models GF, GS, GCF, GCSF, GC, and GCS, where G stands for additive genetics, F for common family environment, S for common sibling environment, and C for environment common for parental couple; plot shows frequency of corresponding model becoming the “best” (rank 1) as compared by DIC, second best (rank 2), and so on; clearly, the GCS model wins in the majority of cases. (E) Disease heritability estimates with one standard deviation; diseases, heritability for which appears to be measured for the first time are marked with asterisk; heritability values are sorted in decreasing order; color of the bar indicates biological system associated with the disease, see key in the upper right corner; keys to disease acronyms are given in the Supplementary Table 1 and 2. (F) Estimates of disease heritability values against estimates of disease prevalence; the linear correlation is significantly negative, Pearson’s r = −0.212 (95% CI [−0.36 −0.05]), and p = 0.00915. (G) Comparison of our estimates of heritability with the previously published estimates; see Supplement Table 3 for detailed numbers.
Figure 2
Figure 2
Genetic and environmental correlations between diseases. (A) Matrix of pairwise genetic correlations (upper half) and corresponding environmental interactions (lower half) colored by sign and magnitude (see legend) The disease color labels indicate biological systems associated; the size of the squares indicates statistical significance, see key on the right. Cells with asterisks indicate pairwise interactions that remained significant at a false discovery rate of 1%. The color boxes within the matrix indicate opposite-sign correlation values for the same pair of diseases. Posterior probabilities of two correlation values (genetic and environmental) for the same pair of diseases having the same sign were 1.869 × 10−14 (ADHD and benign skin neoplasm), 3.376 × 10−14 (ADHD and non-melanoma skin cancer), 4.523 × 10−9 (adjustment disorder and general hypertension), 8.715 × 10−4 (migraine and type 1 diabetes mellitus), 9.251 × 10−5 (benign skin neoplasm and type 1 diabetes mellitus), 6.401 × 10−33 (benign skin neoplasm and general hypertension), 3.712 × 10−17 (non-melanoma skin cancer and general hypertension), 3.933 × 10−4 (allergic rhinitis and type 1 diabetes mellitus). (B) Distribution of (Genetic correlation − Environmental correlation) values for the same pair of diseases. (C) Individual distributions of genetic and environmental correlations superimposed on the same plot. (D) Comparison of our family-based estimates of genetic correlations between diseases compared to previously published GWAS-based values, the complete data on values and references is provided in the Supplement Table 5. Linear fit with a slope of 1.08 (SE=0.167) is indicated by the dotted line.

References

    1. van de Water T, Suliman S, Seedat S. Gender and cultural issues in psychiatric nosological classification systems. CNS Spectr. 2016;21:334–340.
    1. Kendler KS. The nature of psychiatric disorders. World Psychiatry. 2016;15:5–12.
    1. Endlicher S. In: Genera plantarum secundum ordines naturales disposita. Beck F, editor. 1836.
    1. Jussieu ALd, Stafleu FA. In: Genera plantarum. Cramer J, editor. Stechert-Hafner Service Agency; 1964.
    1. Linne Cv, et al. The families of plants : with their natural characters, according to the number, figure, situation, and proportion of all of the parts of fructification. 1787. Printed by John Jackson, sold by J. Johnson … T. Byrne … and J. Balfour.
    1. Thunberg KP, et al. Nova genera plantarum. apud J. Edman etc; 1781.
    1. Anderson MJ. Carl Linnaeus : genius of classification. Enslow Publishers, Inc; 2015.
    1. Felsenstein J. Inferring phylogenies. Sinauer Associates; 2004.
    1. Suthram S, et al. Network-based elucidation of human disease similarities reveals common functional modules enriched for pluripotent drug targets. PLoS Comput Biol. 2010;6:e1000662.
    1. Fisher RA. XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. Transactions of the Royal Society of Edinburgh. 1918;52:399–433.
    1. Wright S. Systems of Mating. I. the Biometric Relations between Parent and Offspring. Genetics. 1921;6:111–123.
    1. Lynch M, Walsh B. Genetics and analysis of quantitative traits. Sinauer; 1998.
    1. Gelman A. Bayesian data analysis. 3. CRC Press; 2014.
    1. Hadfield JD. MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package. Journal of Statistical Software. 2010;33:1–22.
    1. Benjamini Y, Hochberg Y. Controlling the False Discovery Rate - a Practical and Powerful Approach to Multiple Testing. J Roy Stat Soc B Met. 1995;57:289–300.
    1. Lichtenstein P, et al. Common genetic determinants of schizophrenia and bipolar disorder in Swedish families: a population-based study. Lancet. 2009;373:234–239.
    1. Boyle EA, Li YI, Pritchard JK. An Expanded View of Complex Traits: From Polygenic to Omnigenic. Cell. 2017;169:1177–1186.
    1. Saitou N, Nei M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol. 1987;4:406–425.
    1. Efron B. The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics; 1982.
    1. Felsenstein J. Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985;39:783–791.
    1. Efron B. The bootstrap and Markov-chain Monte Carlo. Journal of biopharmaceutical statistics. 2011;21:1052–1062.
    1. Farh KK, et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature. 2015;518:337–343.
    1. Gormley P, et al. Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine. Nature genetics. 2016
    1. Bulik-Sullivan B, Finucane HK, Anttila V, et al. An Atlas of Genetic Correlations across Human Diseases and Traits. Nature genetics. 2015;47:1236–1241.
    1. Xia C, et al. Pedigree- and SNP-Associated Genetics and Recent Environment are the Major Contributors to Anthropometric and Cardiometabolic Trait Variation. PLoS genetics. 2016;12:e1005804.
    1. Schildkraut JM, Risch N, Thompson WD. Evaluating genetic association among ovarian, breast, and endometrial cancer: evidence for a breast/ovarian cancer relationship. American journal of human genetics. 1989;45:521–529.
    1. Davis LK, et al. Partitioning the heritability of Tourette syndrome and obsessive compulsive disorder reveals differences in genetic architecture. PLoS genetics. 2013;9:e1003864.
    1. Lee SH, et al. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nature genetics. 2013;45:984–994.
    1. Loh PR, et al. Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis. Nature genetics. 2015;47:1385–1392.
    1. Munoz M, et al. Evaluating the contribution of genetics and familial shared environment to common disease using the UK Biobank. Nature genetics. 2016;48:980–983.
    1. Vattikuti S, Guo J, Chow CC. Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS genetics. 2012;8:e1002637.
    1. Liu C, et al. Revisiting heritability accounting for shared environmental effects and maternal inheritance. Human genetics. 2015;134:169–179.
    1. Zuk O, Hechter E, Sunyaev SR, Lander ES. The mystery of missing heritability: Genetic interactions create phantom heritability. Proc Natl Acad Sci U S A. 2012;109:1193–1198.
    1. Zaitlen N, et al. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits. PLoS genetics. 2013;9:e1003520.
    1. Wray NR, Maier R. Genetic Basis of Complex Genetic Disease: The Contribution of Disease Heterogeneity to Missing Heritability. Current Epidemiology Reports. 2014;1:220–227.
    1. Ojodu J, et al. Incidence of sickle cell trait--United States, 2010. MMWR. Morbidity and mortality weekly report. 2014;63:1155–1158.
    1. Us Census Bureau, D. I. D. (Washington, DC, 2017).

    1. Denny JC, et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205–1210.
    1. Us Census Bureau, D. I. D.

    1. Korsgaard IR, et al. Multivariate Bayesian analysis of Gaussian, right censored Gaussian, ordered categorical and binary traits using Gibbs sampling. Genetics Selection Evolution : GSE. 2003;35:159–183.
    1. Falconer D, Mackay T. Introduction to Quantitative Genetics. 4. Harlow, UK: Longman Scientific and Technical; 1996.
    1. Falconer DS. The inheritance of liability to certain diseases, estimated from the incidence among relatives. Ann Hum Genet. 1965;29:51–76.
    1. Sorensen D, Gianola D. Likelihood, Bayesian and MCMC methods in quantitative genetics. Springer-Verlag; 2002.
    1. German Rodriguez NG. An Assessment of Estimation Procedures for Multilevel Models with Binary Responses. Journal of the Royal Statistical Society. Series A (Statistics in Society) 1995;158:73–89.
    1. de Villemereuil P, Gimenez O, Doligez B. Comparing parent–offspring regression with frequentist and Bayesian animal models to estimate heritability in wild populations: a simulation study for Gaussian and binary traits. Methods in Ecology and Evolution. 2013;4:260–275.
    1. Gelman A. Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) 2006. pp. 515–534.
    1. Gelman A, Rubin DB. Inference from Iterative Simulation using Multiple Sequences. Stat Sci. 1992;7:457–511.
    1. Heidelberger P, Welch PD. Simulation run length control in the presence of an initial transient. Opns Res. 1983;31:1109–1144.
    1. Plummer M, Best N, Cowles K, Vines K. CODA: Convergence Diagnosis and Output Analysis for MCMC. R News. 2006;6:7–11.
    1. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29:1165–1188.
    1. Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. Bayesian measures of model complexity and fit. 2002. pp. 583–616.
    1. Bérénos C, Ellis PA, Pilkington JG, Pemberton JM. Estimating quantitative genetic parameters in wild populations: a comparison of pedigree and genomic approaches. Molecular ecology. 2014;23:3434–3451.
    1. Charmantier A, Réale D. How do misassigned paternities affect the estimation of heritability in the wild? Molecular ecology. 2005;14:2839–2850.
    1. Morrissey MB, Wilson AJ, Pemberton JM, Ferguson MM. A framework for power and sensitivity analyses for quantitative genetic studies of natural populations, and case studies in Soay sheep (Ovis aries) Journal of evolutionary biology. 2007;20:2309–2321.
    1. Kreider RM, Lofquist DA. P20-572: Adopted Children and Stepchildren: 2010. Washington, DC: U.S. Census Bureau; 2014.
    1. United States Census, B. Children by Presence and Type of Parent(s), Race, and Hispanic Origin:2007–2011. 2007.
    1. Anttila V, Bulik-Sullivan B, Finucane HK, et al. Analysis of shared heritability in common disorders of the brain. 2016. bioRxiv.
    1. Pippitt K, Li M, Gurgle HE. Diabetes Mellitus: Screening and Diagnosis. American family physician. 2016;93:103–109.

Source: PubMed

3
Abonnieren