edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
Mark D Robinson, Davis J McCarthy, Gordon K Smyth, Mark D Robinson, Davis J McCarthy, Gordon K Smyth
Abstract
Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data.
Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).
Figures
References
- Andersson AF, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE. 2008;3:e2836.
- Gentleman RC, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.
- Li H, et al. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc. Natl Acad. Sci. USA. 2008;105:20179–20184.
- Marioni JC, et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517.
- Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23:2881–2887.
- Robinson MD, Smyth GK. Small sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008;9:321–332.
- Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;1 Art 3.
- Smyth GK, Verbyla AP. A conditional approach to residual maximum likelihood estimation in generalized linear models. J. R. Stat. Soc. B. 1996;58:565–572.
- Wong JWH, et al. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments. Brief. Bioinform. 2008;9:156–165.
Source: PubMed