An integrated encyclopedia of DNA elements in the human genome

ENCODE Project Consortium

Abstract

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

Figures

Figure 1. Impact of Selection on ENCODE…
Figure 1. Impact of Selection on ENCODE Functional Elements in Mammals and Human Populations
Panel A shows the levels of pan-mammalian constraint (mean GERP score; 24 mammals, x-axis) compared to diversity, a measure of negative selection in the human population (mean expected heterozygosity, inverted scale, y-axis) for ENCODE datasets. Each point is an average for a single dataset. The top right corners have the strongest evolutionary constraint and lowest diversity. Coding (C), UTR (U), genomic (G), intergenic (IG) and intronic (IN) averages are shown as filled squares. In each case the vertical and horizontal cross hairs show representative levels for the neutral expectation for mammalian conservation and human population diversity respectively. Panel A shows the spread over all non-exonic ENCODE elements greater than 2.5 kb from TSSs. The inner dashed box indicates that parts of the plot have been magnified for the surrounding outer panels, although the scales in the outer plots provide the exact regions and dimensions magnified. The spread for DHS sites (B) and RNA elements (D) are shown in the plots on the left. RNA elements are either long novel intronic (dark green) or long intergenic (light green) RNAs. The horizontal cross hairs are colour coded to the relevant dataset in panel D. Panel C shows the spread of TF motif instances either in regions bound by the TF (orange points) or the corresponding unbound motif matches in grey, with bound and unbound points connected with an arrow in each case showing that bound sites are generally more constrained and less diverse. Panel E shows the derived allele frequency spectrum for primate specific elements with variations outside ENCODE elements in black and variations covered by ENCODE elements in red. The increase in low frequency alleles compared to background is indicative of negative selection occurring in the set of variants annotated by the ENCODE data. Panel F shows aggregation of mammalian constraint scores over the glucocorticoid receptor (GR) TF motif in bound sites, showing the expected correlation with the information content of bases in the motif.
Figure 2. Modelling Transcription Levels from Histone…
Figure 2. Modelling Transcription Levels from Histone Modification and TF-Binding Patterns
Panels A and B show the correlative models between either histone modifications or TFs, respectively, and RNA production as measured by CAGE tag density at TSSs in K562. In each case the scatter plot shows the output of the correlation models (x-axis) compared to observed values (y-axis). The bar graphs show the most important histone modifications (A) or TFs (B) in both the initial classification phase (upper bar graph) or the quantitative regression phase (lower bar graph), with larger values indicating increasing importance of the variable in the model. Further analysis of other cell lines and RNA measurement types are reported elsewhere,.
Figure 3. Patterns and Asymmetry of Chromatin…
Figure 3. Patterns and Asymmetry of Chromatin Modification at Transcription Factor-binding Sites
Panel A shows the results of clustered aggregation of H3K27me3 modification signal around CTCF binding sites (a multi-functional protein involved with chromatin structure). The first three left-most plots show the signal behaviour of the histone modification over all sites (top) and then split into the high and low signal components. The high signal component is then decomposed further into six different shape classes on the right (see ref for details). The shape decomposition process is strand aware. Panel B summarises shape asymmetry for DNase1, nucleosome and histone modification signals by plotting an asymmetry ratio for each signal over all TF binding sites. All histone modifications measured in this study show predominantly asymmetric patterns at TF binding sites.
Figure 4. Co-association between Transcription Factors
Figure 4. Co-association between Transcription Factors
Panel A shows significant co-associations of TF pairs using the GSC statistic across the entire genome in K562 cells. The colour strength represents the extent of association (red (strongest) through orange to yellow (weakest)), whereas the depth of colour represents the fit to the GSC model (white meaning that the statistical model is not appropriate) as indicated by the key. The majority of TFs have a non-random association to other TFs, and these associations are dependent on the genomic context, meaning that once the genome is separated into promoter proximal and distal regions, the overall levels of co-association decrease, but more specific relationships are uncovered. Panel B illustrates three classes of behaviour. The first column shows a set of associations whose strength is independent of location in promoter and distal regions while the second shows a set of TFs which have stronger associations in promoter-proximal regions. Both these examples are from data in K562 cells and are highlighted on the genome wide coassociation matrix (panel A) by the labelled boxes A and B, respectively. The third column shows a set of TFs that show stronger association in distal regions (in the H1 hESC cell line).
Figure 5. Integration of ENCODE Data by…
Figure 5. Integration of ENCODE Data by Genome-wide Segmentation
Panel A shows an illustrative region with the two segmentations methods (ChromHMM and Segway) in a dense view and the combined segmentation expanded to show each state in GM12878, beneath a compressed view of the GENCODE gene annotations. Note that at this level of zoom and genome browser resolution, some segments appear to overlap although they do not. Segmentation classes are named and coloured according to the scheme in Table 3. Beneath the segmentations are shown each of the normalised signals that were used as the input data for the segmentations. Open Chromatin signals from the DNase 1-seq and FAIRE assays are shown in blue, signal from histone modification ChIP-seq in red and TF ChIP-seq signal for Pol II and CTCF in green. The mauve ChIP-seq control signal (“Input control”) at the bottom was also included as an input to the segmentation. Panel B shows the association of selected TF (left) and RNA (right) elements in the combined segmentation states (x-axis) expressed as an observed/expected ratio for each combination of TF or RNA element and segmentation class using the heatmap scale shown in the keybesides each heatmap. Panel C shows the variability of states between cell lines, showing the distribution of occurrences of the state in the 6 cell lines at specific genome locations — from unique to one cell line to ubiquitous in all six cell lines for five states (CTCF, E, T, TSS, and R). Panel D shows the distribution of the level of methylation at individual sites from RRBS analysis in GM12878 across the different states, showing the expecting hypomethylation at TSSs and hypermethylation of genes bodies (T state) and repressed (R) regions.
Figure 6. Experimental Characterisation of Segmentations
Figure 6. Experimental Characterisation of Segmentations
Randomly sampled E state segments (see table 3) from the K562 segmentation were cloned for mouse- and fish-based transgenic enhancer assays. Panel A shows a representative LacZ-stained transgenic e11.5 mouse embryo obtained with construct hs2065 (EN167, chr10:46,052,882-46,055,670, GRCh37). Highly reproducible staining in the blood vessels was observed in 9 out of 9 embryos resulting from independent transgenic integration events. Panel B shows a representative green fluorescent protein reporter transgenic medaka fish obtained from a construct with a basal hsp70 promoter on meganuclease based transfection. Reproducible transgenic expression in the circulating nucleated blood cells and the endothelial cell walls was seen in 81 out of 100 transgenic tests of this construct.
Figure 7. High-Resolution Segmentation of ENCODE Data…
Figure 7. High-Resolution Segmentation of ENCODE Data by Self-Organising Maps (SOM)
The training of the self-organising map (panel A) and analysis of the results (panels B and C) are shown. Initially we arbitrarily placed genomic segments from the chromHMM segmentation on to the toroidal map surface, although the SOM does not use the chromHMM state assignments (panel A). We then trained the map using the signal of the 12 different ChIP-seq and DNase-seq assays in the six cell types analysed. Each unit of the SOM is represented here by an hexagonal cell in a planar two-dimensional view of the toroidal map. Curved arrows indicate that traversing the edges of two dimensional view leads back to the opposite edge. The resulting map can be overlaid with any class of ENCODE or other data to view the distribution of that data within this high-resolution segmentation. In panel A the distributions of genome bases across the untrained and trained map (left and right, respectively) are shown using heatmap colours for log10 values. Panel B shows the distribution of TSSs from CAGE experiments of GENCODE annotation on the planar representations of either the initial random organisation (left) or the final trained SOM (right) using heat maps coloured according to the accompanying scales. The bottom half of panel B expands the different distributions in the SOM for all expressed TSSs (left) or TSSs specifically expressed in two example cell lines, H1 hESC (centre) and HepG2 (right). Panel C shows the association of Gene Ontology (GO) terms on the same representation of the same trained SOM. We assigned genes that are within 20 kb of a genomic segment in a SOM unit to that unit, and then associated this set of genes with GO terms using a hypergeometric distribution after correcting for multiple testing. Map units that are significantly associated to GO terms are now coloured green, with increasing strength of colour reflecting increasing numbers of genes significantly associated with the GO terms for either immune response (left) or sequence-specific TF activity (centre). In each case, specific SOM units show association with these terms. The right-hand panel shows the distribution on the same SOM of all significantly associated GO terms, now colouring by GO term count per SOM unit. For sequence-specific TF activity, two example genomic regions are extracted at the bottom of panel C from neighbouring SOM units. These are regions around the DBX1 (from SOM unit 26,31, left panel) and IRX6 (SOM unit 27,30, right panel) genes, respectively, along with their H3K27me3 ChIP-seq signal for each of the Tier 1 and 2 cell types. For DBX1, representative of a set of primarily neuronal TFs associated with unit 26,31, there is a repressive H3K27me3 signal in both H1 hESC and HUVEC cells; for IRX6, representative of a set of body patterning TFs associated with SOM unit 27,30, the repressive mark is restricted largely to the embryonic stem cell.
Figure 8. Allele-Specific ENCODE Elements
Figure 8. Allele-Specific ENCODE Elements
Panel A shows representative allele-specific information from GM12878 cells for selected assays around the first exon of the NACC2 gene (genomic region chr9:138,950,000- 138,995,000, GRCh37). Transcription signal is shown in green, and the three sections show allele specific data for three datasets (POLR2A, H3K79me2 and H3K27me3 ChIP-seq). In each case the purple signal is the processed signal for all sequence reads for the assay, while the blue and red signals show sequence reads specifically assigned to either the paternal or maternal copies of the genome, respectively. The set of common SNPs from dbSNP, including the phased, heterozygous SNPs used to provide the assignment, are shown at the bottom of the panel. NACC2 has a statistically significant paternal bias for POLR2A and the transcription associated mark H3K79me2, and has a significant maternal bias for the repressive mark H3K27me3. Panel B shows pairwise correlations of allele specific signal within single genes (below the diagonal) or within individual ChromHMM segments across the whole genome for selected DNase-seq and histone modification and TF ChIP-seq assays. The extent of correlation is coloured according to the heatmap scale indicated from positive correlation (red) through to anti-correlation (blue).
Figure 9. Examining ENCODE Elements on a…
Figure 9. Examining ENCODE Elements on a per individual basis in the Normal and Cancer Genome
Panel A shows the breakdown of variants in a single genome (NA12878) by both frequency (common or rare (i.e., variants not present in the low-coverage sequencing of 179 individuals in the pilot 1 European panel of the 1000 Genomes project) and by ENCODE annotation, including protein-coding gene and non-coding elements (GENCODE annotations for protein-coding genes, pseudogenes, and other ncRNAs, as well as TF-binding sites from ChIP-seq datasets, excluding broad annotations such as histone modifications, segmentations, and RNA-seq). Annotation status is further subdivided by predicted functional effect, being non-synonymous and missense mutations for protein-coding regions and variants overlapping bound TF motifs for non-coding element annotations. A substantial proportion of variants are annotated as having predicted functional effects in the non-coding category. Panel B shows one of several relatively rare occurrences, where alignment to an individual genome sequence (paternal and maternal panels) shows a different readout from the reference genome. In this case, a paternal haplotype-specific CTCF peak is identified. Panel C shows the relative level of somatic variants from whole-genome melanoma sample that occur in DHSs unique to different cell lines. The coloured bars show cases that are significantly enriched or supressed in somatic mutations. Details of ENCODE cell types can be found at http://encodeproject.org/ENCODE/cellTypes.html.
Figure 10. Comparison of Genome-wide Association Study-identified…
Figure 10. Comparison of Genome-wide Association Study-identified Loci with ENCODE Data
Panel A shows overlap of lead SNPs in the NHGRI GWAS SNP catalog (June 2011) with DHSs (left) or TF-binding sites (right) as red bars compared to various control SNP sets in blue. The control SNP sets are: SNPs on the Illumina 2.5M chip as an example of a widely used GWAS SNP typing panel; SNPs from the 1,000 Genomes project; SNPs extracted from 24 personal genomes (see Personal Genome Variants track at http://main.genome-browser.bx.psu.edu all shown as blue bars. In addition a further control utilised 1,000 randomisations from the genotyping SNP panel, matching the SNPs with each NHGRI catalog SNP for allele frequency and distance to the nearest TSS (light blue bars with bounds at 1.5 times the interquartile range, and any outliers beyond shown as circles). For both DHSs and TF binding regions, a larger proportion of overlaps with GWAS-implicated SNPs is found compared to any of the controls sets. Panel B shows the aggregate overlap of phenotypes to selected TF-binding sites (left matrix) or DHSs in selected cell lines (right matrix), with a count of overlaps between the phenotype and the cell line/factor. Values in green squares pass an empirical p-value threshold <=0.01 (based on the same analysis of overlaps between randomly chosen, GWAS-matched SNPs and these epigenetic features) and have at least a count of 3 overlaps. The p-value for the total number of phenotype-TF associations is <0.001. Panel C shows several SNPs associated with Crohn’s disease and other inflammatory diseases that reside in a large gene desert on chromosome 5, along with some epigenetic features suggestive of function. The SNP (rs11742570) strongly associated to Crohn’s disease overlaps a GATA2 TF binding signal determined in HUVEC cells. This region is also DNaseI hypersensitive in HUVEC and T-helper Th1 and Th2 cells.

References

    1. ENCODE_Project_Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. 306/5696/636 [pii]
    1. Birney E, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874.
    1. Myers RM, et al. A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS biology. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046.
    1. Waterston RH, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262.
    1. Chiaromonte F, et al. The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harbor symposia on quantitative biology. 2003;68:245–254.
    1. Cooper GM, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome research. 2005;15:901–913. doi: 10.1101/gr.3577405.
    1. Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local DNA topography correlates with functional noncoding regions of the human genome. Science. 2009;324:389–392. doi: 10.1126/science.1169050.
    1. Lindblad-Toh K, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. doi: 10.1038/nature10530.
    1. Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome research. 2007;17:1245–1253. doi: 10.1101/gr.6406307.
    1. Ponting CP, Hardison RC. What fraction of the human genome is functional? Genome research. 2011;21:1769–1776. doi: 10.1101/gr.116814.110.
    1. Asthana S, et al. Widely distributed noncoding purifying selection in the human genome. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:12410–12415. doi: 10.1073/pnas.0705140104.
    1. Landt SG, et al. ChIP-seq guidelines and practices used by the ENCODE and modENCODE consortia. Genome research. 2012;22(9) doi: 10.1101/gr.136184.111.
    1. Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics. 5:1752–1779.
    1. Harrow J, et al. GENCODE: The reference human genome annotation for the ENCODE project. Genome research. 2012 manuscript submitted.
    1. Howald C, et al. Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome research. 2012;22(9) doi: 10.1101/gr.134478.111.
    1. Djebali S, et al. Landscape of transcription in human cells. Nature. 2012 in press.
    1. Derrien T, et al. The GENCODE v7 catalogue of human long non-coding RNAs: Analysis of their gene structure, evolution and expression. Genome research. 2012 in press.
    1. Pei B, et al. The GENCODE Pseudogene Resource: Integration of Functional Genomics Evidence Allows Comprehensive Annotation of Partial Activity. Genome biology. 2012 Manuscript under review.
    1. Gerstein MB, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 In Press.
    1. Bickel PJ, Boley N, Brown JB, Huang HY, Zhang NR. Subsampling Methods for Genomic Inference. Annals of Applied Statistics. 2010;4:1660–1697. doi: 10.1214/10-Aoas363.
    1. Kaplan T, et al. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS genetics. 2011;7:e1001290. doi: 10.1371/journal.pgen.1001290.
    1. Li XY, et al. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome biology. 2011;12:R34. doi: 10.1186/gb-2011-12-4-r34.
    1. Pique-Regi R, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome research. 2011;21:447–455. doi: 10.1101/gr.112623.110.
    1. Zhang Y, et al. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic acids research. 2009;37:7024–7038. doi: 10.1093/nar/gkp747.
    1. Neph S, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012 in press.
    1. Whitfield TW, et al. Functional analysis of transcription factor binding sites in human promoters. Genome biology. 2012 in press.
    1. Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annual review of biochemistry. 1988;57:159–197. doi: 10.1146/annurev.bi.57.070188.001111.
    1. Urnov FD. Chromatin remodeling as a guide to transcriptional regulatory networks in mammals. Journal of cellular biochemistry. 2003;88:684–694. doi: 10.1002/jcb.10397.
    1. Thurman RE, et al. The accessible chromatin landscape of the human genome. Nature. 2012 in press.
    1. Kundaje A, et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome research. 2012;22 doi: 10.1101/gr.136366.111.
    1. Schultz DC, Ayyanathan K, Negorev D, Maul GG, Rauscher FJ., 3rd SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes & development. 2002;16:919–932. doi: 10.1101/gad.973302.
    1. Frietze S, O’Geen H, Blahnik KR, Jin VX, Farnham PJ. ZNF274 recruits the histone methyltransferase SETDB1 to the 3′ ends of ZNF genes. PloS one. 2010;5:e15082. doi: 10.1371/journal.pone.0015082.
    1. Boyle AP, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome research. 2011;21:456–464. doi: 10.1101/gr.112656.110.
    1. Hesselberth JR, et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature methods. 2009;6:283–289. doi: 10.1038/nmeth.1313.
    1. Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome biology. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137.
    1. Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705. doi: 10.1016/j.cell.2007.02.005.
    1. Li B, Carey M, Workman JL. The role of chromatin during transcription. Cell. 2007;128:707–719. doi: 10.1016/j.cell.2007.01.015.
    1. Hon GC, Hawkins RD, Ren B. Predictive chromatin signatures in the mammalian genome. Human molecular genetics. 2009;18:R195–201. doi: 10.1093/hmg/ddp409.
    1. Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nature reviews Genetics. 2011;12:7–18. doi: 10.1038/nrg2905.
    1. Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906.
    1. Hon G, Wang W, Ren B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS computational biology. 2009;5:e1000566. doi: 10.1371/journal.pcbi.1000566.
    1. Ball MP, et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nature biotechnology. 2009;27:361–368. doi: 10.1038/nbt.1533.
    1. Meissner A, et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454:766–770. doi: 10.1038/nature07107.
    1. Ogryzko VV, Schiltz RL, Russanova V, Howard BH, Nakatani Y. The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell. 1996;87:953–959.
    1. Lister R, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514.
    1. Dekker J. Gene regulation in the third dimension. Science. 2008;319:1793–1794. doi: 10.1126/science.1152850.
    1. Dostie J, et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome research. 2006;16:1299–1309. doi: 10.1101/gr.5571506.
    1. Lajoie BR, van Berkum NL, Sanyal A, Dekker J. My5C: web tools for chromosome conformation capture studies. Nature methods. 2009;6:690–691. doi: 10.1038/nmeth1009-690.
    1. Sanyal A, Lajoie B, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012 in press.
    1. Fullwood MJ, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497.
    1. Li G, et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012;148:84–98. doi: 10.1016/j.cell.2011.12.014.
    1. Borneman AR, et al. Divergence of transcription factor binding sites across related yeast species. Science. 2007;317:815–819. doi: 10.1126/science.1140748.
    1. Odom DT, et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature genetics. 2007;39:730–732. doi: 10.1038/ng2047.
    1. Schmidt D, et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010;328:1036–1040. doi: 10.1126/science.1186176.
    1. 1000_Genomes_Project_Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. nature09534 [pii]
    1. King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–116.
    1. Spivakov M, et al. Analysis of variation at transcription factor binding sites in Drosophilaand humans. Genome biology. 2012 in press.
    1. Sandelin A, et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature reviews Genetics. 2007;8:424–436. doi: 10.1038/nrg2026.
    1. Dong X, et al. Modeling gene expression using chromatin features in various cellular contexts. Genome biology. 2012 manuscript submitted.
    1. Huff JT, Plocik AM, Guthrie C, Yamamoto KR. Reciprocal intronic and exonic histone modification regions in humans. Nature structural & molecular biology. 2010;17:1495–1499. doi: 10.1038/nsmb.1924.
    1. Tilgner H, et al. Genomic analysis of ENCODE data: a weak but very widespread role of chromatin organization in alternative splicing. Genome research. 2012 manuscript submitted.
    1. Tilgner H, et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome research. 2012;22(9) doi: 10.1101/gr.134445.111.
    1. Fu Y, Sinha M, Peterson CL, Weng Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS genetics. 2008;4:e1000138. doi: 10.1371/journal.pgen.1000138.
    1. Kornberg RD, Stryer L. Statistical distributions of nucleosomes: nonrandom locations by a stochastic mechanism. Nucleic acids research. 1988;16:6677–6690.
    1. Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132:887–898. doi: 10.1016/j.cell.2008.02.022.
    1. Valouev A, et al. Determinants of nucleosome organization in primary human cells. Nature. 2011;474:516–520. doi: 10.1038/nature10002.
    1. Frietze S, et al. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome biology. 2012 under review.
    1. Yip KY, et al. Classification of human genomic regions based on experimentally-determined binding sites of more than 100 transcription-related factors. Genome biology. 2012 in Press.
    1. Hoffman MM, et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods. 2012 doi: 10.1038/nmeth.1937.
    1. Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341.
    1. Koch F, et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature structural & molecular biology. 2011;18:956–963. doi: 10.1038/nsmb.2085.
    1. McLean CY, et al. GREAT improves functional interpretation of cis-regulatory regions. Nature biotechnology. 2010;28:495–501. doi: 10.1038/nbt.1630.
    1. Rozowsky J, et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Molecular systems biology. 2011;7:522. doi: 10.1038/msb.2011.54.
    1. Boyle AP, et al. Annotation of Functional Variation in Personal Genomes Using RegulomeDB. Genome research. 2012;22(9) doi: 10.1101/gr.137323.112.
    1. Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106.
    1. Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking Disease Associations with Regulatory Information in the Human Genome. Genome research. 2012;22(9) doi: 10.1101/gr.136127.111.
    1. Libioulle C, et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS genetics. 2007;3:e58. doi: 10.1371/journal.pgen.0030058.
    1. Harismendy O, et al. 9p21 DNA variants associated with coronary artery disease impair interferon-gamma signalling response. Nature. 2011;470:264–268. doi: 10.1038/nature09753.
    1. Cheng C, et al. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome research. 2012;22(9) doi: 10.1101/gr.136838.111.
    1. Schuster SC, et al. Complete Khoisan and Bantu genomes from southern Africa. Nature. 2010;463:943–947. doi: 10.1038/nature08795.

Source: PubMed

3
Tilaa