Circular RNAs are abundant, conserved, and associated with ALU repeats

William R Jeck, Jessica A Sorrentino, Kai Wang, Michael K Slevin, Christin E Burd, Jinze Liu, William F Marzluff, Norman E Sharpless, William R Jeck, Jessica A Sorrentino, Kai Wang, Michael K Slevin, Christin E Burd, Jinze Liu, William F Marzluff, Norman E Sharpless

Abstract

Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained non-colinear exons (a "backsplice") and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression.

Figures

FIGURE 1.
FIGURE 1.
CircleSeq experimental approach. Experimental schema for identification of circular RNAs in cultured human fibroblasts. (A) Experimental procedure, with aliquots of ribosome-depleted RNA split into a mock treatment and RNase R treatment and run through RNAseq. (B) Resulting reads mapped with MapSplice are segmented and mapped separately with resulting possible mappings in order of preference, including spliced and backspliced reads. (C) Diagram of normalized, aggregated sequencing data producing a normalized coverage value over individual nucleotides (reads per kilobase per million mapping [RPKM]; see Materials and Methods) as well as locations of backsplice reads that were enriched in RNase R-treated samples and a normalized count of those backspliced reads (blue horizontal bracket, spliced reads per billion mapping [SRPBM]; see Materials and Methods).
FIGURE 2.
FIGURE 2.
CircleSeq enriches for backsplice junctions. Two-dimensional histograms showing normalized backspliced read count (SRPBM) or normalized exon coverage (RPKM) between two samples or replicates. (A) Coverage of backsplice reads in RNase R-treated replicates over all distinct backsplice species (R2 = 0.579). (B) Coverage of exons in mock treated replicates (R2 = 0.91). (C) Average backsplice coverage in RNase R-treated against mock treated RNA-seq showing enrichment of most backsplice species by RNase R. (D) Mean normalized exon coverage in annotated exon sequences in RNase R-treated against mock treated RNA-seq showing depletion of the majority of species by RNase R.
FIGURE 3.
FIGURE 3.
CircleSeq identifies previously identified species of circular RNA. Mapped read depth for RNase R-untreated (green) and -treated (orange) samples is shown at differing scales, along with bars identifying the end points of backsplice reads and their number in RNase R-untreated samples (blue). (A) Circle-Seq identification of cANRIL species in the 9p21.3 locus. (B) The imputed cANRIL circular products. (C) Circle-Seq identification of ETS-1 circular RNAs and (D) imputed circular RNA species. (E) Sequencing identifies a highly expressed circular RNA in KIAA0182, producing a single exon circular RNA (F).
FIGURE 4.
FIGURE 4.
Validation of circular RNA species and “virtual Northern.” (A) Design of Taqman assays using outward facing primers. (B) Validation of RNase R enrichment in seven novel backsplice junctions and one control circular RNA (ANRIL 14-5). Noncircular RNAs (TBP, GAPDH, and 18S) are depleted by RNase R treatment. (C) Ratio of expression of these same ecirc and control RNAs with cDNA synthesis using oligo dT primers vs. random hexamer. Note markedly decreased cDNA synthesis with oligo dT for ecircRNAs and 18S. (D) “Virtual northern” analysis of four backsplice species and their linear counterparts employing agarose gel size fractionation followed by qPCR for quantification of products in each fraction. The x-axis shows size fraction in order of decreasing size, and the y-axis indicates the calculated fraction of total species contained at that size range.
FIGURE 5.
FIGURE 5.
Lariat species also identified by CircleSeq. (A) Nine read sequences including at least 30 nt from the splice donor of intron 2 of GAPDH also include sequence 5′ of the proposed branch point. (B) Intron 2 of GAPDH demonstrates enriched coverage after RNase R treatment, but no backsplice reads are detected. Plots of mapped read depth in untreated (green) and RNase R-treated (orange) samples shown at differing scales. (C) Expected effects of RNase R treatment on lariat structures.
FIGURE 6.
FIGURE 6.
Circularization is conserved between paralogs and in mice. CircleSeq identified high exonuclease enrichment and backsplice prevalence in paralogous kinases (A) HIPK2 and (B) HIPK3. (C) Coverage of each gene by exon in the absence of RNase R treatment, showing a marked excess of the circularized exon of HIPK3. (D) Absolute quantification by quantitative real-time PCR of circular and linear HIPK3 species. (E) Genomic structures of murine Hipk2 and Hipk3, showing conservation of long introns around a relatively long circularized second exon (arrows).
FIGURE 7.
FIGURE 7.
Circular RNAs are not associated with ribosomes and are susceptible to siRNA knockdown. Jurkat cell lysates were separated by sucrose gradient centrifugation. (A) Agarose gel to verify separation of 40S, 60S, 80S, monosome, and polysome fractions. Linear and circular RNAs were quantified by qRT-PCR and plotted by relevant quantity in each fraction. (B) Linear forms assayed were associated with monosome and polysome fractions. (C) Circular forms were predominantly unassociated with these complexes. Knockdown using three targeted siRNAs against HIPK3 or ZFY were quantified by qRT-PCR and plotted to show the effect of differentially targeted siRNA against the linear, circular, and both (lin/circ) forms of (D) HIPK3 and (E) ZFY. Quantities shown are given as ΔΔCt and are normalized to TBP and then to a nonspecific (NS) siRNA.
FIGURE 8.
FIGURE 8.
Novel ecircRNAs are highly stable and cytoplasmic. RNA stability assay using actinomycin D and qRT-PCR quantification demonstrates (A) expected stability of control transcripts and (B) less stable linear gene products compared to (C) highly stable circular RNAs from the same genes. (D) RNA fluorescence in situ hybridization using a probe specific to circular HIPK3 demonstrates cytoplasmic localization (top panel). Knockdown with an siRNA (from Fig. 7D) to the linear form does not affect localization (middle panel), whereas treatment with an siRNA targeting both the linear and circular forms extinguishes detection of cytoplasmic species (bottom panel).
FIGURE 9.
FIGURE 9.
Backsplices are flanked by paired ALU elements and long introns. (A) The highest information-bearing motif discovered within 200 bp upstream of and downstream from backsplice locations shows high homology to ALU elements. Frequency of RepeatMasker-annotated ALU elements in flanking sequences 50, 100, 200, and 500 bp (B) upstream of or (C) downstream from these expression categories of backsplice events, as compared to control splice sites of expressed genes. (D) The frequency of complementing and noncomplementing ALU pairs located on opposite sides of a backsplice within a flank. (E) Annotated length of introns flanking backsplices as compared to introns flanking control exons generally. (*) P < 10−5, (**) P < 10−10, (***) P < 10−20.
FIGURE 10.
FIGURE 10.
Backsplicing and ecircRNA formation mechanisms. Models of backsplice formation. In Model 1, exon-skipping leads to a lariat whose restricted structure promotes circularization. In Model 2, exon-skipping is not required, with ALU complementarity or other RNA secondary structures bringing nonsequential donor-acceptor pairs into apposition, allowing for circularization. See Discussion for further explanation.

Source: PubMed

3
S'abonner