Systematic localization of common disease-associated variation in regulatory DNA

Matthew T Maurano, Richard Humbert, Eric Rynes, Robert E Thurman, Eric Haugen, Hao Wang, Alex P Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, Sandra Stehling-Sun, Audra K Johnson, Theresa K Canfield, Erika Giste, Morgan Diegel, Daniel Bates, R Scott Hansen, Shane Neph, Peter J Sabo, Shelly Heimfeld, Antony Raubitschek, Steven Ziegler, Chris Cotsapas, Nona Sotoodehnia, Ian Glass, Shamil R Sunyaev, Rajinder Kaul, John A Stamatoyannopoulos, Matthew T Maurano, Richard Humbert, Eric Rynes, Robert E Thurman, Eric Haugen, Hao Wang, Alex P Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, Anthony Shafer, Fidencio Neri, Kristen Lee, Tanya Kutyavin, Sandra Stehling-Sun, Audra K Johnson, Theresa K Canfield, Erika Giste, Morgan Diegel, Daniel Bates, R Scott Hansen, Shane Neph, Peter J Sabo, Shelly Heimfeld, Antony Raubitschek, Steven Ziegler, Chris Cotsapas, Nona Sotoodehnia, Ian Glass, Shamil R Sunyaev, Rajinder Kaul, John A Stamatoyannopoulos

Abstract

Genome-wide association studies have identified many noncoding variants associated with common diseases and traits. We show that these variants are concentrated in regulatory DNA marked by deoxyribonuclease I (DNase I) hypersensitive sites (DHSs). Eighty-eight percent of such DHSs are active during fetal development and are enriched in variants associated with gestational exposure-related phenotypes. We identified distant gene targets for hundreds of variant-containing DHSs that may explain phenotype associations. Disease-associated variants systematically perturb transcription factor recognition sequences, frequently alter allelic chromatin states, and form regulatory networks. We also demonstrated tissue-selective enrichment of more weakly disease-associated variants within DHSs and the de novo identification of pathogenic cell types for Crohn's disease, multiple sclerosis, and an electrocardiogram trait, without prior knowledge of physiological mechanisms. Our results suggest pervasive involvement of regulatory DNA variation in common human disease and provide pathogenic insights into diverse disorders.

Figures

Fig. 1. Disease-associated variation is concentrated in…
Fig. 1. Disease-associated variation is concentrated in DNase I hypersensitive sites
(A) Proportions of noncoding GWAS SNPs localizing within DHSs (green); in complete linkage disequilibrium (r2 = 1) with a SNP in a DHS (blue); or neither (yellow). Note that 76.5% of GWAS SNPs are either within or in perfect LD with DHSs. (B) Proportions of GWAS SNPs overlapping DHSs after partitioning by degree of replication. (C) Representative DNase I hypersensitivity (tag density) patterns at diverse disease-associated variants. (D) Proportion of GWAS SNPs localizing in DHSs active in fetal tissues that persist in adult cells (salmon); fetal stage-specific DHSs (red); and adult stage DHSs (green). (E) GWAS SNPs in DHSs show phenotype-specific enrichment for fetal regulatory elements.
Fig. 2. Candidate regulatory roles for GWAS…
Fig. 2. Candidate regulatory roles for GWAS SNPs
(A) GWAS variant associated with platelet count is connected with the JAK2 gene (myeloproliferative disorders) 222 kb away. Below, ChlA-PET tags (36) validate direct chromatin interactions between this DHS and the JAK2 promoter; red tags demonstrate an interaction between these DHSs. (B) Proportion of DHSs harboring GWAS variants that can be linked to target promoters at the indicated distance. (C) Examples of allele-specific DNase I sensitivity in cell types derived from heterozygous individuals for GWAS variants that alter TF recognition motifs within DHSs (also see table S9). Each cell type track shows DNase I cleavage density scaled by allelic imbalance at the GWAS variant and colored by variant nucleotide (blue = C, green = A, yellow = G, red = T). Total reads from each allele are also shown.
Fig. 3. Common disease-associated variants cluster in…
Fig. 3. Common disease-associated variants cluster in regulatory pathways
(A) SNPs in DHSs associated with diabetes (Type I and Type II), diabetic complications, and glucose homeostasis localize in recognition sites of transcriptional regulators (labeled ellipses) controlling glucose transport, glycolysis, and beta cell function that are structurally disrupted in the Mendelian phenotypes of maturity-onset diabetes of the young (MODY). Chromosome of each SNP associated with the indicated phenotype is listed (see table S2). (B) 24.4% of SNPs associated with autoimmune disorders that fall within DHSs localize in recognition sequences of TFs that interact with IRF9. Arrows indicate directionality of relationship, dotted lines represent indirect interactions (12). The complete network is shown in fig. S10.
Fig. 4. Common disease networks
Fig. 4. Common disease networks
GWAS SNPs from related diseases repeatedly perturb recognition sequences of common transcription factors. Shown are factors whose recognition sequences harbor ≥8 or ≥6 GWAS SNPs in inflammatory/autoimmune diseases (A) and cancer (B), respectively. Edge thickness represents number of associations between TF and disease in DHSs in relevant tissues. Both networks are significantly enriched for overlap with disease-relevant GWAS SNPs, and include many well-studied regulators.
Fig. 5. Identification of pathogenic cell types
Fig. 5. Identification of pathogenic cell types
GWAS SNPs are systematically enriched in the regulatory DNA of disease-specific cell types throughout the full range of significance. Shown are SNPs tested for association with the autoimmune disorders Crohn’s disease (A), multiple sclerosis (B) and QRS duration (C).

Source: PubMed

3
Abonneren