The landscape of histone modifications across 1% of the human genome in five human cell lines

Christoph M Koch, Robert M Andrews, Paul Flicek, Shane C Dillon, Ulaş Karaöz, Gayle K Clelland, Sarah Wilcox, David M Beare, Joanna C Fowler, Phillippe Couttet, Keith D James, Gregory C Lefebvre, Alexander W Bruce, Oliver M Dovey, Peter D Ellis, Pawandeep Dhami, Cordelia F Langford, Zhiping Weng, Ewan Birney, Nigel P Carter, David Vetrie, Ian Dunham, Christoph M Koch, Robert M Andrews, Paul Flicek, Shane C Dillon, Ulaş Karaöz, Gayle K Clelland, Sarah Wilcox, David M Beare, Joanna C Fowler, Phillippe Couttet, Keith D James, Gregory C Lefebvre, Alexander W Bruce, Oliver M Dovey, Peter D Ellis, Pawandeep Dhami, Cordelia F Langford, Zhiping Weng, Ewan Birney, Nigel P Carter, David Vetrie, Ian Dunham

Abstract

We generated high-resolution maps of histone H3 lysine 9/14 acetylation (H3ac), histone H4 lysine 5/8/12/16 acetylation (H4ac), and histone H3 at lysine 4 mono-, di-, and trimethylation (H3K4me1, H3K4me2, H3K4me3, respectively) across the ENCODE regions. Studying each modification in five human cell lines including the ENCODE Consortium common cell lines GM06990 (lymphoblastoid) and HeLa-S3, as well as K562, HFL-1, and MOLT4, we identified clear patterns of histone modification profiles with respect to genomic features. H3K4me3, H3K4me2, and H3ac modifications are tightly associated with the transcriptional start sites (TSSs) of genes, while H3K4me1 and H4ac have more widespread distributions. TSSs reveal characteristic patterns of both types of modification present and the position relative to TSSs. These patterns differ between active and inactive genes and in particular the state of H3K4me3 and H3ac modifications is highly predictive of gene activity. Away from TSSs, modification sites are enriched in H3K4me1 and relatively depleted in H3K4me3 and H3ac. Comparison between cell lines identified differences in the histone modification profiles associated with transcriptional differences between the cell lines. These results provide an overview of the functional relationship among histone modifications and gene expression in human cells.

Figures

Figure 1.
Figure 1.
Example of histone modification profiles across ENCODE regions. (A) Screenshot from the UCSC genome browser (Hinrichs et al. 2006) of ENCODE region ENr333 (human chromosome 20: 33,304,929–33,804,928 bp, NCBI 35) showing ChIP-chip data for the lymphoblastoid cell line, GM06990, using five antibodies for the histone modifications H3K4me1, H3K4me2, H3K4me3, H3ac, and H4ac. The scale in base pairs is indicated by the vertical ticks at the top. The top track shows the UCSC known genes (Hsu et al. 2006) with transcriptional orientation and exons indicated by arrows and vertical ticks, respectively. Below is a track indicating the extent of the ENCODE region ENr333. ChIP-chip data are displayed in the five subsequent tracks as the median value of the ratio of normalized ChIP-chip sample fluorescence to input DNA fluorescence. Each black vertical bar is the enrichment measured at a single amplicon on the ENCODE PCR product microarray with the enrichment represented by the height of the bar. Five tracks represent the data for the five antibodies used in ChIP-chip as indicated by the label at the left of each track. Note that each track is dynamically scaled according to the data displayed, and hence comparison between tracks must take into account the enrichment scale at the left of each data track. (B) Screenshot as in A of ENCODE region ENr333 for ChIP-chip data using the antibody for H3K4me3 with the five cell lines as indicated at the left of the data tracks. The screenshot is aligned to the scale in panel A. Note that the GM06990 data are the same as are displayed in the third data track in panel A. (C) Screenshot of ChIP-chip data for ENCODE region ENm010 (the HOXA cluster on human chromosome 7: 26,730,761–27,230,760bp, NCBI35). At the top is the GENCODE reference gene annotation (Harrow et al. 2006). Data tracks are shown as in panels A and B for all ChIP-chip data on cell lines GM06990, K562, HeLa-S3, and HFL-1. Note the browser is zoomed in to show the HOXA cluster and does not show the full extent of ENm010.
Figure 2.
Figure 2.
Distribution of histone modification sites in the lymphoblastoid cell line, GM06990. (A) The plot shows the number of histone modification sites identified by the HMM which overlap with an exon (orange), intron (mauve), gene 5′ end (yellow), gene 3′ exon (blue), or intergenic sequence (green) for the lymphoblastoid cell line, GM06990, in the ChIP-chip data (left panel) or in random simulated data (right panel). Random data were simulated by generating sites of the same size distribution as the experimental data and placing them at random among the ENCODE regions represented by the PCR product microarray. This was repeated 100 times, and the mean frequencies of overlap were plotted. (B) Distribution of histone modification sites with respect to gene starts. The distance to the nearest gene start for histone modification sites identified by the HMM relative to GENCODE annotation (gencode_start), UCSC known gene starts (known_start), CpG islands (CpG), FirstEF gene start predictions (FirstEF), and Eponine predictions (Eponine) was determined. The plot shows the frequency of distances to the nearest gene start for H3K4me1 (red), H3K4me2 (blue), H3K4me3 (green), H3ac (mauve), and H4ac (orange) in 1-kb windows over ±30 kb from a start.
Figure 3.
Figure 3.
The coincidences of histone modification sites in the lymphoblastoid cell line, GM06990. (A) The frequency of occurrences of overlapping histone modification sites. The histogram shows the distribution of each type of overlapping histone modification site expressed as a percentage. Histone sites were defined as overlapping if they occurred within a 5-kb window centered on the site. Combinations are indicated as a five-digit binary code where 1 and 0 represent the presence or absence of each modification at a site, respectively, in the order H3K4me1:H3K4me2:H3K4me3:H3ac:H4ac. Combinations for which no sites were found are not shown. (B) Pie chart of occurrences of overlapping histone H3K4 methylation sites. (C) Box plot showing the distribution of distances to the nearest transcriptional start site (TSS) for the main combinations of histone H3K4 methylation sites. The box and horizontal line show the interquartile range and median of the data, while the whiskers extend to 1.5× the interquartile range of the distribution with open circles being the outliers of the distribution of distances from the TSS for each modification pattern.
Figure 4.
Figure 4.
The histone modification profile at TSSs and other sites. (A) The average Z-scored histone modification profile for ±10 kb surrounding each HMM-identified histone modification site split into sites at TSSs (within 5 kb) and sites >5 kb from a TSS. Histone modification signals are plotted as lines for the average Z-score over all HMM sites in each cluster with H3K4me1 (red), H3K4me2 (green), H3K4me3 (blue), H3ac (mauve), and H4ac (orange). (B) Heatmaps representing the histone modification enrichment signal over ±10 kb surrounding all TSSs across the ENCODE regions. Histone modification signal is scored on a red (not enriched) to yellow (highest enrichment) scale. Each horizontal line of the heatmap is the histone modification profile in 1-kb windows for a single gene. TSS and TSSs are ordered according to the level of gene expression determined by analysis of the Affymetrix U133 plus 2.0 data from the GM06990 cell line using the gcRMA package of Bioconductor. The scale is the distance from the TSS in bp using 1000-bp windows. A heatmap is presented for each of the five antibodies used as indicated above the panels.
Figure 5.
Figure 5.
The histone modification profile at TSSs for active and inactive genes. (A) The average Z-scored histone modification profile for ±10 kb surrounding each TSS of a gene split according to the expression level of the gene as determined by the GeneSpring MAS5 present (Expressed) and absent (Non-expressed) analysis of Affymetrix U133 plus 2.0 gene expression data from the GM06990 cell line. Histone modification signals are plotted as lines for the average Z-score over all HMM sites in each class with lines color-coded as in Figure 4A. (B) For comparison this plot shows the mean log2 value of the enrichment for ChIP-chip using antibodies to histone H3 and H2B over all ENCODE TSSs, split according to the expression level of the gene as determined by the GeneSpring MAS5 present (Expressed) and absent (Non-expressed) analysis of Affymetrix U133 plus 2.0 gene expression data for the K562 cell line.
Figure 6.
Figure 6.
Identifying cell line specificity within histone modification profiles. (A) Four-dimensional plot showing raw data for H3ac for 13,407 PCR products on the microarray with data common to all experiments. Each sphere denotes a PCR product used in the analysis (13,407 spheres in this plot). The three axes denote the mean Z-score of H3ac modification in GM06990, HeLa-S3, and HFL-1 and the sphere size is proportional to the mean Z-score in the fourth cell line (MOLT4). Red spheres are PCR products that are called cell line-specific (FDR = 0.0001), Green are PCR products that are not cell line-specific. Green spheres tend to line up along the main diagonal while the red spheres are biased toward one or more axis. (B) Number of cell line-specific regions at different stringencies. For each FDR, all the PCR products that were cell line-specific were further filtered so that the mean Z-score is >1.5 in at least one of the four cell lines. Products that were closer than 200 bp were merged to define cell line-specific regions. (C) Cell line specificity profiles for 1890 PCR products that are cell line-specific for at least one histone modification. The FDR level was set to 0.01% and the same filtering was applied. The five main columns show the specificity for each of the histone modifications (green: not cell line-specific, red: cell line-specific). Four additional columns next to each of these five columns indicate the contribution of each cell line to the cell line specificity. For each cell line, a cell is colored only if the mean of the replicates for that cell line is significantly higher than the mean of all the replicates (blue: GM06990, gray: HeLa-S3, black: HFL-1, pink: MOLT4). (D) The distributions of the distances from the nearest GENCODE TSS of cell line-specific PCR products for H3K4me2. From left to right, box plots representing: cell line-specific tiles (FDR = 0.01%, filtered as in A), significantly modified (Z-score above the 95th percentile) but not cell line-specific PCR products in GM06990, HFL-1, HeLa-S3, and MOLT4. The width of each box is proportional to the square root of the number of PCR products in each group. Cell line-specific PCR products are significantly farther from TSS compared to highly modified but not cell line-specific tiles (see P-values in Table 3).

Source: PubMed

3
S'abonner