An integrated encyclopedia of DNA elements in the human genome
ENCODE Project Consortium
Abstract
The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.
Figures
References
- ENCODE_Project_Consortium. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science. 2004;306:636–640. doi: 10.1126/science.1105136. 306/5696/636 [pii]
- Birney E, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. doi: 10.1038/nature05874.
- Myers RM, et al. A user’s guide to the encyclopedia of DNA elements (ENCODE) PLoS biology. 2011;9:e1001046. doi: 10.1371/journal.pbio.1001046.
- Waterston RH, et al. Initial sequencing and comparative analysis of the mouse genome. Nature. 2002;420:520–562. doi: 10.1038/nature01262.
- Chiaromonte F, et al. The share of human genomic DNA under selection estimated from human-mouse genomic alignments. Cold Spring Harbor symposia on quantitative biology. 2003;68:245–254.
- Cooper GM, et al. Distribution and intensity of constraint in mammalian genomic sequence. Genome research. 2005;15:901–913. doi: 10.1101/gr.3577405.
- Parker SC, Hansen L, Abaan HO, Tullius TD, Margulies EH. Local DNA topography correlates with functional noncoding regions of the human genome. Science. 2009;324:389–392. doi: 10.1126/science.1169050.
- Lindblad-Toh K, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478:476–482. doi: 10.1038/nature10530.
- Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome research. 2007;17:1245–1253. doi: 10.1101/gr.6406307.
- Ponting CP, Hardison RC. What fraction of the human genome is functional? Genome research. 2011;21:1769–1776. doi: 10.1101/gr.116814.110.
- Asthana S, et al. Widely distributed noncoding purifying selection in the human genome. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:12410–12415. doi: 10.1073/pnas.0705140104.
- Landt SG, et al. ChIP-seq guidelines and practices used by the ENCODE and modENCODE consortia. Genome research. 2012;22(9) doi: 10.1101/gr.136184.111.
- Li Q, Brown JB, Huang H, Bickel PJ. Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics. 5:1752–1779.
- Harrow J, et al. GENCODE: The reference human genome annotation for the ENCODE project. Genome research. 2012 manuscript submitted.
- Howald C, et al. Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome. Genome research. 2012;22(9) doi: 10.1101/gr.134478.111.
- Djebali S, et al. Landscape of transcription in human cells. Nature. 2012 in press.
- Derrien T, et al. The GENCODE v7 catalogue of human long non-coding RNAs: Analysis of their gene structure, evolution and expression. Genome research. 2012 in press.
- Pei B, et al. The GENCODE Pseudogene Resource: Integration of Functional Genomics Evidence Allows Comprehensive Annotation of Partial Activity. Genome biology. 2012 Manuscript under review.
- Gerstein MB, et al. Architecture of the human regulatory network derived from ENCODE data. Nature. 2012 In Press.
- Bickel PJ, Boley N, Brown JB, Huang HY, Zhang NR. Subsampling Methods for Genomic Inference. Annals of Applied Statistics. 2010;4:1660–1697. doi: 10.1214/10-Aoas363.
- Kaplan T, et al. Quantitative models of the mechanisms that control genome-wide patterns of transcription factor binding during early Drosophila development. PLoS genetics. 2011;7:e1001290. doi: 10.1371/journal.pgen.1001290.
- Li XY, et al. The role of chromatin accessibility in directing the widespread, overlapping patterns of Drosophila transcription factor binding. Genome biology. 2011;12:R34. doi: 10.1186/gb-2011-12-4-r34.
- Pique-Regi R, et al. Accurate inference of transcription factor binding from DNA sequence and chromatin accessibility data. Genome research. 2011;21:447–455. doi: 10.1101/gr.112623.110.
- Zhang Y, et al. Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1. Nucleic acids research. 2009;37:7024–7038. doi: 10.1093/nar/gkp747.
- Neph S, et al. An expansive human regulatory lexicon encoded in transcription factor footprints. Nature. 2012 in press.
- Whitfield TW, et al. Functional analysis of transcription factor binding sites in human promoters. Genome biology. 2012 in press.
- Gross DS, Garrard WT. Nuclease hypersensitive sites in chromatin. Annual review of biochemistry. 1988;57:159–197. doi: 10.1146/annurev.bi.57.070188.001111.
- Urnov FD. Chromatin remodeling as a guide to transcriptional regulatory networks in mammals. Journal of cellular biochemistry. 2003;88:684–694. doi: 10.1002/jcb.10397.
- Thurman RE, et al. The accessible chromatin landscape of the human genome. Nature. 2012 in press.
- Kundaje A, et al. Ubiquitous heterogeneity and asymmetry of the chromatin environment at regulatory elements. Genome research. 2012;22 doi: 10.1101/gr.136366.111.
- Schultz DC, Ayyanathan K, Negorev D, Maul GG, Rauscher FJ., 3rd SETDB1: a novel KAP-1-associated histone H3, lysine 9-specific methyltransferase that contributes to HP1-mediated silencing of euchromatic genes by KRAB zinc-finger proteins. Genes & development. 2002;16:919–932. doi: 10.1101/gad.973302.
- Frietze S, O’Geen H, Blahnik KR, Jin VX, Farnham PJ. ZNF274 recruits the histone methyltransferase SETDB1 to the 3′ ends of ZNF genes. PloS one. 2010;5:e15082. doi: 10.1371/journal.pone.0015082.
- Boyle AP, et al. High-resolution genome-wide in vivo footprinting of diverse transcription factors in human cells. Genome research. 2011;21:456–464. doi: 10.1101/gr.112656.110.
- Hesselberth JR, et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nature methods. 2009;6:283–289. doi: 10.1038/nmeth.1313.
- Zhang Y, et al. Model-based analysis of ChIP-Seq (MACS) Genome biology. 2008;9:R137. doi: 10.1186/gb-2008-9-9-r137.
- Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705. doi: 10.1016/j.cell.2007.02.005.
- Li B, Carey M, Workman JL. The role of chromatin during transcription. Cell. 2007;128:707–719. doi: 10.1016/j.cell.2007.01.015.
- Hon GC, Hawkins RD, Ren B. Predictive chromatin signatures in the mammalian genome. Human molecular genetics. 2009;18:R195–201. doi: 10.1093/hmg/ddp409.
- Zhou VW, Goren A, Bernstein BE. Charting histone modifications and the functional organization of mammalian genomes. Nature reviews Genetics. 2011;12:7–18. doi: 10.1038/nrg2905.
- Ernst J, et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature. 2011;473:43–49. doi: 10.1038/nature09906.
- Hon G, Wang W, Ren B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS computational biology. 2009;5:e1000566. doi: 10.1371/journal.pcbi.1000566.
- Ball MP, et al. Targeted and genome-scale strategies reveal gene-body methylation signatures in human cells. Nature biotechnology. 2009;27:361–368. doi: 10.1038/nbt.1533.
- Meissner A, et al. Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature. 2008;454:766–770. doi: 10.1038/nature07107.
- Ogryzko VV, Schiltz RL, Russanova V, Howard BH, Nakatani Y. The transcriptional coactivators p300 and CBP are histone acetyltransferases. Cell. 1996;87:953–959.
- Lister R, et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature. 2009;462:315–322. doi: 10.1038/nature08514.
- Dekker J. Gene regulation in the third dimension. Science. 2008;319:1793–1794. doi: 10.1126/science.1152850.
- Dostie J, et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome research. 2006;16:1299–1309. doi: 10.1101/gr.5571506.
- Lajoie BR, van Berkum NL, Sanyal A, Dekker J. My5C: web tools for chromosome conformation capture studies. Nature methods. 2009;6:690–691. doi: 10.1038/nmeth1009-690.
- Sanyal A, Lajoie B, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012 in press.
- Fullwood MJ, et al. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature. 2009;462:58–64. doi: 10.1038/nature08497.
- Li G, et al. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation. Cell. 2012;148:84–98. doi: 10.1016/j.cell.2011.12.014.
- Borneman AR, et al. Divergence of transcription factor binding sites across related yeast species. Science. 2007;317:815–819. doi: 10.1126/science.1140748.
- Odom DT, et al. Tissue-specific transcriptional regulation has diverged significantly between human and mouse. Nature genetics. 2007;39:730–732. doi: 10.1038/ng2047.
- Schmidt D, et al. Five-vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding. Science. 2010;328:1036–1040. doi: 10.1126/science.1186176.
- 1000_Genomes_Project_Consortium. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. nature09534 [pii]
- King MC, Wilson AC. Evolution at two levels in humans and chimpanzees. Science. 1975;188:107–116.
- Spivakov M, et al. Analysis of variation at transcription factor binding sites in Drosophilaand humans. Genome biology. 2012 in press.
- Sandelin A, et al. Mammalian RNA polymerase II core promoters: insights from genome-wide studies. Nature reviews Genetics. 2007;8:424–436. doi: 10.1038/nrg2026.
- Dong X, et al. Modeling gene expression using chromatin features in various cellular contexts. Genome biology. 2012 manuscript submitted.
- Huff JT, Plocik AM, Guthrie C, Yamamoto KR. Reciprocal intronic and exonic histone modification regions in humans. Nature structural & molecular biology. 2010;17:1495–1499. doi: 10.1038/nsmb.1924.
- Tilgner H, et al. Genomic analysis of ENCODE data: a weak but very widespread role of chromatin organization in alternative splicing. Genome research. 2012 manuscript submitted.
- Tilgner H, et al. Deep sequencing of subcellular RNA fractions shows splicing to be predominantly co-transcriptional in the human genome but inefficient for lncRNAs. Genome research. 2012;22(9) doi: 10.1101/gr.134445.111.
- Fu Y, Sinha M, Peterson CL, Weng Z. The insulator binding protein CTCF positions 20 nucleosomes around its binding sites across the human genome. PLoS genetics. 2008;4:e1000138. doi: 10.1371/journal.pgen.1000138.
- Kornberg RD, Stryer L. Statistical distributions of nucleosomes: nonrandom locations by a stochastic mechanism. Nucleic acids research. 1988;16:6677–6690.
- Schones DE, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132:887–898. doi: 10.1016/j.cell.2008.02.022.
- Valouev A, et al. Determinants of nucleosome organization in primary human cells. Nature. 2011;474:516–520. doi: 10.1038/nature10002.
- Frietze S, et al. Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the genome by association with GATA3. Genome biology. 2012 under review.
- Yip KY, et al. Classification of human genomic regions based on experimentally-determined binding sites of more than 100 transcription-related factors. Genome biology. 2012 in Press.
- Hoffman MM, et al. Unsupervised pattern discovery in human chromatin structure through genomic segmentation. Nature methods. 2012 doi: 10.1038/nmeth.1937.
- Kapranov P, et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science. 2007;316:1484–1488. doi: 10.1126/science.1138341.
- Koch F, et al. Transcription initiation platforms and GTF recruitment at tissue-specific enhancers and promoters. Nature structural & molecular biology. 2011;18:956–963. doi: 10.1038/nsmb.2085.
- McLean CY, et al. GREAT improves functional interpretation of cis-regulatory regions. Nature biotechnology. 2010;28:495–501. doi: 10.1038/nbt.1630.
- Rozowsky J, et al. AlleleSeq: analysis of allele-specific expression and binding in a network framework. Molecular systems biology. 2011;7:522. doi: 10.1038/msb.2011.54.
- Boyle AP, et al. Annotation of Functional Variation in Personal Genomes Using RegulomeDB. Genome research. 2012;22(9) doi: 10.1101/gr.137323.112.
- Hindorff LA, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences of the United States of America. 2009;106:9362–9367. doi: 10.1073/pnas.0903103106.
- Schaub MA, Boyle AP, Kundaje A, Batzoglou S, Snyder M. Linking Disease Associations with Regulatory Information in the Human Genome. Genome research. 2012;22(9) doi: 10.1101/gr.136127.111.
- Libioulle C, et al. Novel Crohn disease locus identified by genome-wide association maps to a gene desert on 5p13.1 and modulates expression of PTGER4. PLoS genetics. 2007;3:e58. doi: 10.1371/journal.pgen.0030058.
- Harismendy O, et al. 9p21 DNA variants associated with coronary artery disease impair interferon-gamma signalling response. Nature. 2011;470:264–268. doi: 10.1038/nature09753.
- Cheng C, et al. Understanding transcriptional regulation by integrative analysis of transcription factor binding data. Genome research. 2012;22(9) doi: 10.1101/gr.136838.111.
- Schuster SC, et al. Complete Khoisan and Bantu genomes from southern Africa. Nature. 2010;463:943–947. doi: 10.1038/nature08795.
Source: PubMed