Differential expression analysis for sequence count data
Simon Anders, Wolfgang Huber, Simon Anders, Wolfgang Huber
Abstract
High-throughput sequencing assays such as RNA-Seq, ChIP-Seq or barcode counting provide quantitative readouts in the form of count data. To infer differential signal in such data correctly and with good statistical power, estimation of data variability throughout the dynamic range and a suitable error model are required. We propose a method based on the negative binomial distribution, with variance and mean linked by local regression and present an implementation, DESeq, as an R/Bioconductor package.
Figures







References
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320:1344–1349. doi: 10.1126/science.1158441.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226.
- Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, Euskirchen G, Bernier B, Varhol R, Delaney A, Thiessen N, Griffith OL, He A, Marra M, Snyder M, Jones S. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods. 2007;4:651–657. doi: 10.1038/nmeth1068.
- Licatalosi DD, Mele A, Fak JJ, Ule J, Kayikci M, Chi SW, Clark TA, Schweitzer AC, Blume JE, Wang X, Darnell JC, Darnell RB. HITS-CLIP yields genome-wide insights into brain alternative RNA processing. Nature. 2008;456:464–469. doi: 10.1038/nature07488.
- Smith AM, Heisler LE, Mellor J, Kaper F, Thompson MJ, Chee M, Roth FP, Giaever G, Nislow C. Quantitative phenotyping via deep barcode sequencing. Genome Res. 2009;19:1836–1842. doi: 10.1101/gr.093955.109.
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517. doi: 10.1101/gr.079558.108.
- Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010;26:136–138. doi: 10.1093/bioinformatics/btp612.
- Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23(21):2881–2887. doi: 10.1093/bioinformatics/btm453.
- Whitaker L. On the Poisson law of small numbers. Biometrika. 1914;10:36–71. doi: 10.1093/biomet/10.1.36.
- Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26:139–140. doi: 10.1093/bioinformatics/btp616.
- Robinson MD, Smyth GK. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008;9:321–332. doi: 10.1093/biostatistics/kxm030.
- Cameron AC, Trivedi PK. Regression Analysis of Count Data. Cambridge University Press; 1998.
- Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11:R25. doi: 10.1186/gb-2010-11-3-r25.
- Loader C. Local Regression and Likelihood. Springer; 1999.
- McCullagh P, Nelder JA. Generalized Linear Models. 2. Chapman & Hall/CRC; 1989.
- locfit: Local regression, likelihood and density estimation.
- Agresti A. Categorical Data Analysis. 2. Wiley; 2002.
- Engström P, Tommei D, Stricker S, Smith A, Pollard S, Bertone P. Transcriptional characterization of glioblastoma stem cell lines using tag sequencing. 2010. in press .
- Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA. Next-generation tag sequencing for cancer gene expression profiling. Genome Res. 2009;19:1825–1835. doi: 10.1101/gr.094482.109.
- Kasowski M, Grubert F, Heffelfinger C, Hariharan M, Asabere A, Waszak SM, Habegger L, Rozowsky J, Shi M, Urban AE, Hong MY, Karczewski KJ, Huber W, Weissman SM, Gerstein MB, Korbel JO, Snyder M. Variation in transcription factor binding among humans. Science. 2010;328:232–235. doi: 10.1126/science.1183621.
- Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc B. 1995;57:289–300.
- Bullard J, Purdom E, Hansen K, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010;11:94. doi: 10.1186/1471-2105-11-94.
- Bloom JS, Khan Z, Kruglyak L, Singh M, Caudy AA. Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays. BMC Genomics. 2009;10:221. doi: 10.1186/1471-2164-10-221.
- Smyth GK. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Gentleman R, Carey V, Dudoit S, R Irizarry WH, editor. New York: Springer; 2005. Limma: linear models for microarray data. pp. 397–420. full_text.
- Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol. 2004;3:Article3.
- Lönnstedt I, Speed T. Replicated microarray data. Stat Sin. 2002;12:31–46.
- R: A Language and Environment for Statistical Computing.
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang J. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80. doi: 10.1186/gb-2004-5-10-r80.
- Bliss CI, Fisher RA. Fitting the negative binomial distribution to biological data. Biometrics. 1953;9:176–200. doi: 10.2307/3001850.
- Clark SJ, Perry JN. Estimation of the negative binomial parameter κ by maximum quasi-likelihood. Biometrics. 1989;45:309–316. doi: 10.2307/2532055.
- Lawless JF. Negative binomial and mixed Poisson regression. Can J Stat. 1987;15:209–225. doi: 10.2307/3314912.
- Saha K, Paul S. Bias-corrected maximum likelihood estimator of the negative binomial dispersion parameter. Biometrics. 2005;61:179–285. doi: 10.1111/j.0006-341X.2005.030833.x.
- Fast and accurate computation of binomial probabilities. (Note: This is a copy of the original paper, which is no longer available online.)
- Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10:R25. doi: 10.1186/gb-2009-10-3-r25.
- HTSeq: Analysing high-throughput sequencing data with Python.
- DESeq.
Source: PubMed
Próximos ensayos clínicos
-
NCT07661992Aún no reclutando
-
NCT07662005Aún no reclutandoDormir | Adenocarcinoma gástrico | Adenocarcinoma de la unión gastroesofágica | Respuesta patológica
-
NCT07662018ReclutamientoTrasplante de células madre | Gemcitabina | Docetaxel | Carboplatino | Recaída pediátrica | Estudio Observacional Prospectivo | Melphalan | Refractory Germ
-
NCT07662031Aún no reclutandoCáncer colorrectal metastásico | Cáncer colonrectal
-
NCT07662044Aún no reclutando
-
NCT07662057Aún no reclutando
-
NCT07662070Aún no reclutandoAdenocarcinoma gástrico | Estrés emocional | Adenocarcinoma de la unión gastroesofágica | Respuesta patológica
-
NCT07662083Aún no reclutandoSíndrome del Intestino Irritable (SII-C)
-
NCT07662096Aún no reclutandoParticipantes Saludables
-
NCT07662109Aún no reclutando
-
NCT07662122Aún no reclutandoObesidad | Exceso de peso