RNA editing in the human ENCODE RNA-seq data

Eddie Park, Brian Williams, Barbara J Wold, Ali Mortazavi, Eddie Park, Brian Williams, Barbara J Wold, Ali Mortazavi

Abstract

RNA-seq data can be mined for sequence differences relative to the reference genome to identify both genomic SNPs and RNA editing events. We analyzed the long, polyA-selected, unstranded, deeply sequenced RNA-seq data from the ENCODE Project across 14 human cell lines for candidate RNA editing events. On average, 43% of the RNA sequencing variants that are not in dbSNP and are within gene boundaries are A-to-G(I) RNA editing candidates. The vast majority of A-to-G(I) edits are located in introns and 3' UTRs, with only 123 located in protein-coding sequence. In contrast, the majority of non-A-to-G variants (60%-80%) map near exon boundaries and have the characteristics of splice-mapping artifacts. After filtering out all candidates with evidence of private genomic variation using genome resequencing or ChIP-seq data, we find that up to 85% of the high-confidence RNA variants are A-to-G(I) editing candidates. Genes with A-to-G(I) edits are enriched in Gene Ontology terms involving cell division, viral defense, and translation. The distribution and character of the remaining non-A-to-G variants closely resemble known SNPs. We find no reproducible A-to-G(I) edits that result in nonsynonymous substitutions in all three lymphoblastoid cell lines in our study, unlike RNA editing in the brain. Given that only a fraction of sites are reproducibly edited in multiple cell lines and that we find a stronger association of editing and specific genes suggests that the editing of the transcript is more important than the editing of any individual site.

Figures

**Figure 1.**
RNA SNV calling strategy. (A) Flowchart of analysis: 75-bp paired-end RNA-seq reads were mapped onto an extended genome (genome + known splice junctions + spikes) using Bowtie. Reads mapping onto splice sites and spikes were set aside, and reads mapping onto hg19 were used to call single nucleotide variants (SNVs). A parallel set of analyses was done using a collapsed set of reads with unique coordinates, and the intersections of SNVs from the uncollapsed and collapsed treatments were obtained. Known SNPs annotated in dbSNP132, sites outside gene boundaries, and intronic sites within 5 bp of splice junctions were removed. For the GM trio, any candidate with evidence of a private genomic variation was also removed. (B) Example of candidate editing site. Purple arrows pointing to the *left* represent reads on the (−) strand, while blue arrows pointing to the *right* represent reads on the (+) strand. The blocks represent variants between the reference DNA and the RNA-seq. A SNV is kept when at least three nonidentical reads support the SNV, with a minimum SNV frequency of 10%, and at least one edit per strand. (C) Intersection strategy for two replicates. For cell types with two replicates, the SNVs remaining after collapsing were intersected between the replicates. (D) The number of SNVs remaining after collapsing for the prefiltered sites. Number of SNVs that are only in the uncollapsed set are in blue; the intersection, purple; and collapsed set, red. (E) Collapsing increases the relative amount of A-to-G SNVs and also increases the relative number of transitions. Number of SNVs that are only in the uncollapsed set are in blue; the intersection, purple; and collapsed set, red. (F) The fraction of dbSNP is highest in the intersection of the full and collapsed sets. The relative amount of calls found in dbSNP132, novel genic SNVs, and other SNVs in the uncollapsed set are at the *left*; the collapsed set, *right*; and the intersection of the two, *middle*.

**Figure 2.**
RNA editing calls in GM12878. (A) Most non–A-to-G SNVs are near splicing boundaries. The distribution relative to gene boundaries of A-to-G SNVs (*left*) versus non–A-to-G SNVs (*right*). (B) Example of reads mapped incorrectly across a known splice junction. Overhanging RNA-seq reads are mapped incorrectly into the intron when the correct position is in the adjacent exon, even though the splice junction was provided to the read mapper. (C) Distribution of SNVs at different steps in the pipeline. Prefiltered SNVs defined by having at least three nonidentical reads support the SNV, with a minimum SNV frequency of 10%, at least one edit per strand, and no more than one type of SNV for the same position in blue. SNVs annotated in dbSNP132 are red, SNVs that are not in dbSNP132 and within gene boundaries are green, SNVs that are not in dbSNP132 and within gene boundaries without splicing sites are purple, SNVs that had no matching 1000 Genome sequencing reads are in light blue, and SNVs passing ChIP filtering are in orange. (D) Frequency distribution of SNVs primarily reflects expression of homozygous and heterozygous SNPs. The SNVs that were found in dbSNP132 are in blue; the novel genic SNVs, red. (E) Most nonsplice adjoining SNPs are A-to-G. The nonsplicing novel genic A-to-G calls in filtered calls are in blue; nonsplicing novel genic A-to-G calls, red; nonsplicing novel genic non–A-to-G, brown; nonsplicing novel genic non–A-to-G in filtered calls, purple; and splicing-only novel genic, light blue. (F) Distribution of gene expression versus coverage of exonic sites are in red and intronic sites are in blue for genic SNVs. SNVs in more lowly expressed genes are primarily on exons, due to our minimum depth of coverage requirements.

**Figure 3.**
Survey of SNV calls across ENCODE cell lines. (A) Distribution of nonsplicing novel genic SNVs for all data sets. (B) In every cell type, the percentage of A-to-G SNVs increase and the number of candidate sites decrease (red) after filtering for private SNVs using ChIP-seq. GM12878 calls were filtered with 1000 Genomes or ChIP-seq reads are labeled with G or C, respectively. (C) Relatively few non–A-to-G synonymous SNVs (purple), non–A-to-G nonsynonymous SNVs (green), A-to-G synonymous SNVs (red), A-to-G nonsynonymous SNVs (blue) are found in ORFs.

**Figure 4.**
Gene level analysis of RNA editing after private SNV filtering. (A) Hierarchical clustering of the editing frequency of the 33.5% (1905 out of 5695 possible) individual A-to-G candidate editing sites occurring in at least two distinct cell types. (B) Hierarchical clustering of the number of edits in the 47.4% (662 out of 1395 possible) of genes edited in at least two distinct cell types. (C) RNA editing in genes cluster in the UTR or in the introns with few genes having edits in both UTR and introns. Percentage of genes with only UTR edits are in green; intronic edits, blue; and edits in both introns and UTR, red. (D) Reproducibility of calling RNA edits for human H1 ES cells. Scatter plot of RNA edit calls for rep 1,2 versus rep 3,4 is on a log2-log2 scale with a pseudocount of 1. A Gaussian noise was added to points to visualize density. (E) Venn diagrams of A-to-G candidate edits in lymphoblastoid cells from a hapmap trio. The Venn diagram of the individual sites (*left*) and edited genes (*right*); 35.8% of the union of edited sites are found in two or more cell types, while 54.2% of the union of edited genes are found in two or more cell types.

References

1. The 1000 Genomes Project Consortium 2010. A map of human genome variation from population-scale sequencing. Nature 467: 1061–1073
1. Agranat L, Raitskin O, Sperling J, Sperling R 2008. The editing enzyme ADAR and the mRNA surveillance protein hUpf1 interact in the cell nucleus. Proc Natl Acad Sci 105: 5028–5033
1. Athanasiadis A, Rich A, Maas S 2004. Widespread A-to-I RNA editing of Alu-containing mRNAs in the human transcriptome. PLoS Biol 2: e391 doi: 10.1371/journal.pbio.0020391
1. Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X 2012. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res 22: 142–150
1. Burns CM, Chu H, Rueter SM, Hutchinson LK, Canton H, Sanders-Bush E, Emeson RB 1997. Regulation of serotonin-2C receptor G-protein coupling by RNA editing. Nature 387: 303–308
1. Cenci C, Barzotti R, Galeano F, Corbelli S, Rota R, Massimi L, Di Rocco C, O'Connell MA, Gallo A 2008. Down-regulation of RNA editing in pediatric astrocytomas: ADARB1 editing activity inhibits cell migration and proliferation. J Biol Chem 283: 7251–7260
1. de Hoon MJL, Imoto S, Nolan J, Miyano S 2004. Open Source Clustering Software. Bioinformatics 20: 1453–1454
1. Doria M, Neri F, Gallo A, Farace MG, Michienzi A 2009. Editing of HIV-1 RNA by the double-stranded RNA deaminase ADAR stimulates viral infection. Nucleic Acids Res 37: 5848–5858
1. Gerber AP, Keller W 2001. RNA editing by base deamination: more enzymes, more targets, new mysteries. Trends Biochem Sci 26: 376–384
1. Iizasa H, Nishikura K 2009. A new function for the RNA-editing enzyme ADAR. Nat Immunol 10: 16–18
1. Kim U, Wang Y, Sanford T, Zeng Y, Nishikura K 1994. Molecular cloning of cDNA for double-stranded RNA adenosine deaminase, a candidate enzyme for nuclear RNA editing. Proc Natl Acad Sci 91: 11457–11461
1. Kim DD, Kim TT, Walsh T, Kobayashi Y, Matise TC, Buyske S, Gabriel A 2004. Widespread RNA editing of embedded Alu elements in the human transcriptome. Genome Res 14: 1719–1725
1. Kiran A, Baranov PV 2010. DARNED: A DAtabase of RNa EDiting in humans. Bioinformatics 26: 772–776
1. Kleinman CL, Majewski J 2012. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome.” Science 335: 1302 doi: 10.1126/science.1209658
1. Kumar M, Carmichael GC 1997. Nuclear antisense RNA induces extensive adenosine modifications and nuclear retention of target transcripts. Proc Natl Acad Sci 94: 3542–3547
1. Langmead B, Trapnell C, Pop M, Salzberg SL 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol 10: R25 doi: 10.1186/gb-2009-10-3-r25
1. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG 2011. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333: 53–58
1. Lin W, Piskol R, Tan MH, Li JB 2012. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome.” Science 335: 1302 doi: 10.1126/science.1210624
1. Luciano DJ, Mirsky H, Vendetti NJ, Maas S 2004. RNA editing of a miRNA precursor. RNA 10: 1174–1177
1. Maas S, Kawahara Y, Tamburro KM, Nishikura K 2006. A-to-I RNA editing and human disease. RNA Biol 3: 1–9
1. McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G 2010. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotechnol 28: 495–501
1. Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Larch RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET 2010. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464: 773–777
1. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B 2008. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 5: 621–628
1. Nishikura K 2006. Editor meets silencer: Crosstalk between RNA editing and RNA interference. Nat Rev Mol Cell Biol 7: 919–931
1. Nishikura K 2010. Functions and regulation of RNA editing by ADAR deaminases. Annu Rev Biochem 79: 321–349
1. Nishikura K, Yoo C, Kim U, Murray JM, Estes PA, Cash FE, Liebhaber SA 1991. Substrate specificity of the dsRNA unwinding/modifying activity. EMBO J 10: 3523–3532
1. Page RD 2002. Visualizing phylogenetic trees using TreeView. Curr Protoc Bioinformatics 6.2.1–6.2.15.
1. Paz N, Levanon EY, Amariglio N, Heimberger AB, Ram Z, Constantini S, Barbash ZS, Adamsky K, Safran M, Hirschberg A, et al. 2007. Altered adenosine-to-inosine RNA editing in human cancer. Genome Res 17: 1586–1595
1. Peng PL, Zhong X, Tu W, Soundarapandian MM, Molner P, Zhu D, Lau L, Liu S, Liu F, Lu Y 2006. ADARB1-dependent RNA editing of AMPA receptor subunit GluR2 determines vulnerability of neurons in forebrain ischemia. Neuron 49: 719–733
1. Peng Z, Cheng Y, Tan BC, Kang L, Tian Z, Zhu Y, Zhang W, Liang Y, Hu X, Tan X, et al. 2012. Comprehensive analysis of RNA-Seq data reveals extensive RNA editing in a human transcriptome. Nat Biotechnol. doi: 10.1038/nbt.2122
1. Pickrell JK, Gilad Y, Pritchard JK 2012. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome.” Science 335: 1302 doi: 10.1126/science.1210484
1. Polson AG, Bass BL 1994. Preferential selection of adenosines for modification by double-stranded RNA adenosine deaminase. EMBO J 13: 5701–5711
1. Prasanth KV, Prasanth SG, Xuan Z, Hearn S, Freier SM, Bennett CF, Zhang MQ, Spector DL 2005. Regulating gene expression through RNA nuclear retention. Cell 123: 249–263
1. Schrider DR, Gout JF, Hahn MW 2011. Very few RNA and DNA sequence differences in the human transcriptome. PLoS ONE 6: e25842 doi: 10.1371/journal.pone.0025842
1. Seeburg PH, Higuchi M, Sprengel R 1998. RNA editing of brain glutamate receptor channels: Mechanism and physiology. Brain Res Brain Res Rev 26: 217–229
1. Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K 2001. dbSNP: The NCBI database of genetic variation. Nucleic Acids Res 29: 308–311
1. Taylor DR, Puig M, Darnell ME, Mihalik K, Feinstone SM 2005. New antiviral pathway that mediates hepatitis C virus replicon interferon sensitivity through ADAR. J Virol 79: 6291–6298
1. Wagner RW, Smith JE, Cooperman BS, Nishikura K 1989. A double-stranded RNA unwinding activity introduces structural alterations by means of adenosine to inosine conversions in mammalian cells and Xenopus eggs. Proc Natl Acad Sci 86: 2647–2651
1. Wang Q, Miyakoda M, Yang W, Khillan J, Stachura D, Weiss M, Nishikura K 2004. Stress-induced apoptosis associated with null mutation of ADAR RNA editing deaminase gene. J Biol Chem 279: 4952–4961
1. Wang Q, Zhang Z, Blackwell K, Carmichael GG 2005. Vigilins bind to promiscuously A-to-I-edited RNAs and are involved in the formation of heterochromatin. Curr Biol 15: 384–391
1. Wang K, Li M, Hakonarson H 2010. ANNOVAR: Functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38: e164 doi: 10.1093/nar/gkq603
1. Zheng H, Fu TB, Lazinski D, Taylor J 1992. Editing on the genomic RNA of human hepatitis delta virus. J Virol 66: 4693–4697

Source: PubMed

RNA editing in the human ENCODE RNA-seq data

Abstract

Figures

References

Patrocinadores y Colaboradores

Condiciones médicas

Intervenciones de drogas