Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation
Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, Lior Pachter, Cole Trapnell, Brian A Williams, Geo Pertea, Ali Mortazavi, Gordon Kwan, Marijke J van Baren, Steven L Salzberg, Barbara J Wold, Lior Pachter
Abstract
High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Figures
References
- Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nature Methods. 2008;5:613–619.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5:621–628.
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D. The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing. Science. 2008;320:1344–1349.
- Wang E, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476.
- Denoeud F, et al. Annotating genomes with massive-scale RNA sequencing. Genome Biology. 2008;9:R175.
- Maher C, et al. Transcriptome sequencing to detect gene fusions in cancer. Nature. 2009;458:97–101.
- Marioni J, Mason C, Mane S, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 2008;10:1509–1517.
- Hiller D, Jiang H, Xu W, Wong W. Identifiability of isoform deconvolution from junction arrays and RNA-Seq. Bioinformatics. 2009;25:3056–3059.
- Jiang H, Wong WH. Statistical Inferences for Isoform Expression in RNA-Seq. Bioinformatics. 2009;25:1026–1032.
- Li B, Ruotti V, Stewart RM, Thomson JA, Dewey CN. RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics. 2009;26:493–500.
- Mortazavi A, Williams B, Mccue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5:621–628.
- Pepke S, Wold B, Mortazavi A. Computation for ChIP-Seq and RNA-Seq studies. Nature Methods. 2009;6:S22–32.
- Yaffe D, Saxel O. A myogenic cell line with altered serum requirements for differentiation. Differentiation. 1977;7:159–166.
- Yun K, Wold B. Skeletal muscle determination and differentiation: story of a core regulatory network and its context. Current opinion in cell biology. 1996;8:877–889.
- Tapscott SJ. The circuitry of a master switch: Myod and the regulation of skeletal muscle gene transcription. Development. 2005;132:2685–2695.
- Trapnell C, Pachter L, Salzberg S. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25:1105–1111.
- Haas BJ, et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research. 2003;31:5654–5666.
- Dilworth R. A decomposition theorem for partially ordered sets. Annals of Mathematics. 1950;51:161–166.
- Eriksson N, et al. Viral Population Estimation Using Pyrosequencing. PLoS Computational Biology. 2008;4:e1000074.
- Guttman M, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;457:223–227.
- Cordes KR, et al. miR-145 and miR-143 regulate smooth muscle cell fate and plasticity. Nature. 2009;460:705–710.
- Lareau LF, Inada M, Green RE, Wengrod JC, Brenner SE. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature. 2007;446:926–929.
- Bullard J, Purdom E, Hansen K, Durinck S, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010;11:94.
- Endo T, Nadal-Ginard B. Transcriptional and posttranscriptional control of c-myc during myogenesis: its mRNA remains inducible in differentiated cells and does not suppress the differentiated phenotype. Mol Cell Biol. 1986;6:1412–1421.
- Fuglede B, Topsøe F. Proceedings of the IEEE International Symposium on Information Theory. 2004;3
- Cottle DL, McGrath MJ, Cowling BS, Coghill ID. FHL3 binds MyoD and negatively regulates myotube formation. Journal of Cell Science. 2007;120:1423–1435.
- Sammeth M, Lacroix V, Ribeca P, Guigó R. The FLUX Simulator. .
- Johnson D, Mortazavi A, Myers R, Wold B. Genome-Wide Mapping of in Vivo Protein-DNA Interactions. Science. 2007;316:1497–1502.
- Langmead B, Trapnell C, Pop M, Salzberg S. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009;10:R25.
- Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079.
Source: PubMed