edgeR: a Bioconductor package for differential expression analysis of digital gene expression data

Mark D Robinson, Davis J McCarthy, Gordon K Smyth, Mark D Robinson, Davis J McCarthy, Gordon K Smyth

Abstract

Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data.

Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

Figures

Fig. 1.
Fig. 1.
DGE data can be visualized as ‘MA’ plots (log ratio versus abundance), just as with microarray data where each dot represents a gene. This plot shows RNA-seq gene expression for DHT-stimulated versus Control LNCaP cells, as described in Li et al. (2008). The smear of points on the left side signifies that genes were observed in only one group of replicate samples and the points marked ‘×’ denote the top 500 differentially expressed genes.

References

    1. Andersson AF, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE. 2008;3:e2836.
    1. Gentleman RC, et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004;5:R80.
    1. Li H, et al. Determination of tag density required for digital transcriptome analysis: application to an androgen-sensitive prostate cancer model. Proc. Natl Acad. Sci. USA. 2008;105:20179–20184.
    1. Marioni JC, et al. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008;18:1509–1517.
    1. Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23:2881–2887.
    1. Robinson MD, Smyth GK. Small sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics. 2008;9:321–332.
    1. Smyth GK. Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 2004;1 Art 3.
    1. Smyth GK, Verbyla AP. A conditional approach to residual maximum likelihood estimation in generalized linear models. J. R. Stat. Soc. B. 1996;58:565–572.
    1. Wong JWH, et al. Computational methods for the comparative quantification of proteins in label-free LCn-MS experiments. Brief. Bioinform. 2008;9:156–165.

Source: PubMed

3
Subskrybuj