Enrichr: a comprehensive gene set enrichment analysis web server 2016 update

Maxim V Kuleshov, Matthew R Jones, Andrew D Rouillard, Nicolas F Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L Jenkins, Kathleen M Jagodnik, Alexander Lachmann, Michael G McDermott, Caroline D Monteiro, Gregory W Gundersen, Avi Ma'ayan, Maxim V Kuleshov, Matthew R Jones, Andrew D Rouillard, Nicolas F Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, Sherry L Jenkins, Kathleen M Jagodnik, Alexander Lachmann, Michael G McDermott, Caroline D Monteiro, Gregory W Gundersen, Avi Ma'ayan

Abstract

Enrichment analysis is a popular method for analyzing gene sets generated by genome-wide experiments. Here we present a significant update to one of the tools in this domain called Enrichr. Enrichr currently contains a large collection of diverse gene set libraries available for analysis and download. In total, Enrichr currently contains 180 184 annotated gene sets from 102 gene set libraries. New features have been added to Enrichr including the ability to submit fuzzy sets, upload BED files, improved application programming interface and visualization of the results as clustergrams. Overall, Enrichr is a comprehensive resource for curated gene sets and a search engine that accumulates biological knowledge for further biological discoveries. Enrichr is freely available at: http://amp.pharm.mssm.edu/Enrichr.

© The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

Figures

Figure 1.
Figure 1.
Benchmarking different enrichment analysis methods. (A) Deviation of the cumulative distribution from uniform of the scaled ranks of TFs derived from different enrichment analysis methods; (B) Comparison between crisp and fuzzy version of the proportion test. The ranking distribution of randomly ordered ChEA terms is plotted in gray dashed line. The area under the curve (AUC) is indicated in the legend as a measure of the degree of deviation from uniform.
Figure 2.
Figure 2.
Statistics of Enrichr. (A) Histogram of gene lists submitted per user. (B) Histogram of uploaded list lengths. (C) Histogram of appearance of genes in uploaded list. (D) Histogram of annotated gene set sizes within Enrichr.
Figure 3.
Figure 3.
Comparing Enrichr resources with MSigDB and GO-Elite. (A) Venn diagram summarizing the various resources processed and served by Enrichr, MSigDB and GO-Elite. (B) Venn diagram to compare the number of processed gene sets of genetic and chemical perturbations curated from publications in Enrichr and MSigDB.

References

    1. Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., Davis A.P., Dolinski K., Dwight S.S., Eppig J.T. Gene Ontology: tool for the unification of biology. Nat. Genet. 2000;25:25–29.
    1. Al-Shahrour F., Díaz-Uriarte R., Dopazo J. FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004;20:578–580.
    1. Maere S., Heymans K., Kuiper M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005;21:3448–3449.
    1. Boyle E.I., Weng S., Gollub J., Jin H., Botstein D., Cherry J.M., Sherlock G. GO:: TermFinder—open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes. Bioinformatics. 2004;20:3710–3715.
    1. Kanehisa M., Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28:27–30.
    1. Liberzon A., Subramanian A., Pinchback R., Thorvaldsdóttir H., Tamayo P., Mesirov J.P. Molecular signatures database (MSigDB) 3.0. Bioinformatics. 2011;27:1739–1740.
    1. Huang D.W., Sherman B.T., Lempicki R.A. Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37:1–13.
    1. Hung J.-H., Yang T.-H., Hu Z., Weng Z., DeLisi C. Gene set enrichment analysis: performance evaluation and usage guidelines. Brief. Bioinform. 2011;13:281–291.
    1. Kim S.-Y., Volsky D.J. PAGE: parametric analysis of gene set enrichment. BMC Bioinformatics. 2005;6:144.
    1. Chen J., Bardes E.E., Aronow B.J., Jegga A.G. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37:W305–W311.
    1. Backes C., Keller A., Kuentzer J., Kneissl B., Comtesse N., Elnakady Y.A., Müller R., Meese E., Lenhof H.-P. GeneTrail—advanced gene set enrichment analysis. Nucleic Acids Res. 2007;35:W186–W192.
    1. Zhang B., Kirov S., Snoddy J. WebGestalt: an integrated system for exploring gene sets in various biological contexts. Nucleic Acids Res. 2005;33:W741–W748.
    1. Dennis G., Jr, Sherman B.T., Hosack D.A., Yang J., Gao W., Lane H.C., Lempicki R.A. DAVID: database for annotation, visualization, and integrated discovery. Genome Biol. 2003;4:P3.
    1. Chen E., Tan C., Kou Y., Duan Q., Wang Z., Meirelles G., Clark N., Ma'ayan A. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics. 2013;14:128.
    1. Romero P. The Handbook of Metabolomics. NY: Humana Press; 2012. pp. 419–438.
    1. Demir E., Cary M.P., Paley S., Fukuda K., Lemer C., Vastrik I., Wu G., D'Eustachio P., Schaefer C., Luciano J. The BioPAX community standard for pathway data sharing. Nat. Biotechnol. 2010;28:935–942.
    1. Schaefer C.F., Anthony K., Krupa S., Buchoff J., Day M., Hannay T., Buetow K.H. PID: the pathway interaction database. Nucleic Acids Res. 2009;37:D674–D679.
    1. Malovannaya A., Lanz R.B., Jung S.Y., Bulynko Y., Le N.T., Chan D.W., Ding C., Shi Y., Yucer N., Krenciute G. Analysis of the human endogenous coregulator complexome. Cell. 2011;145:787–799.
    1. Mi H., Muruganujan A., Thomas P.D. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees. Nucleic Acids Res. 2013;41:D377–D386.
    1. Duan G., Li X., Kohn M. The human DEPhOsphorylation database DEPOD: a 2015 update. Nucleic Acids Res. 2015;43:D531–D535.
    1. Kohler S., Doelken S.C., Mungall C.J., Bauer S., Firth H.V., Bailleul-Forestier I., Black G.C., Brown D.L., Brudno M., Campbell J., et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 2014;42:D966–D974.
    1. Brown G.R., Hem V., Katz K.S., Ovetsky M., Wallin C., Ermolaeva O., Tolstoy I., Tatusova T., Pruitt K.D., Maglott D.R., et al. Gene: a gene-centered information resource at NCBI. Nucleic Acids Res. 2015;43:D36–D42.
    1. Sloan C.A., Chan E.T., Davidson J.M., Malladi V.S., Strattan J.S., Hitz B.C., Gabdank I., Narayanan A.K., Ho M., Lee B.T., et al. ENCODE data at the ENCODE portal. Nucleic Acids Res. 2016;44:D726–D732.
    1. Sunkin S.M., Ng L., Lau C., Dolbeare T., Gilbert T.L., Thompson C.L., Hawrylycz M., Dang C. Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res. 2013;41:D996–D1008.
    1. Consortium G. Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660.
    1. Wilhelm M., Schlegl J., Hahne H., Moghaddas Gholami A., Lieberenz M., Savitski M.M., Ziegler E., Butzmann L., Gessulat S., Marx H., et al. Mass-spectrometry-based draft of the human proteome. Nature. 2014;509:582–587.
    1. Kim M.S., Pinto S.M., Getnet D., Nirujogi R.S., Manda S.S., Chaerkady R., Madugundu A.K., Kelkar D.S., Isserlin R., Jain S., et al. A draft map of the human proteome. Nature. 2014;509:575–581.
    1. Cowley G.S., Weir B.A., Vazquez F., Tamayo P., Scott J.A., Rusin S., East-Seletsky A., Ali L.D., Gerath W.F., Pantel S.E., et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the identification of context-specific genetic dependencies. Sci. Data. 2014;1:140035.
    1. Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., Marshall K.A., Phillippy K.H., Sherman P.M., Holko M., et al. NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Research. 2013;41:D991–D995.
    1. Gundersen G.W., Jones M.R., Rouillard A.D., Kou Y., Monteiro C.D., Feldmann A.S., Hu K.S., Ma'ayan A. GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions. Bioinformatics. 2015;31:3060–3062.
    1. Clark N., Hu K., Feldmann A., Kou Y., Chen E., Duan Q., Ma'ayan A. The characteristic direction: a geometrical approach to identify differentially expressed genes. BMC Bioinformatics. 2014;15:79.
    1. Lachmann A., Xu H., Krishnan J., Berger S.I., Mazloom A.R., Ma'ayan A. ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments. Bioinformatics. 2010;26:2438–2444.
    1. McLean C.Y., Bristor D., Hiller M., Clarke S.L., Schaar B.T., Lowe C.B., Wenger A.M., Bejerano G. GREAT improves functional interpretation of cis-regulatory regions. Nat. Biotechnol. 2010;28:495–501.
    1. Merkel D. Docker: lightweight linux containers for consistent development and deployment. Linux J. 2014;239:2.
    1. Hindman B., Konwinski A., Zaharia M., Ghodsi A., Joseph A.D., Katz R.H., Shenker S., Stoica I. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. NSDI. 2011;11:22–22.
    1. Saha P., Govindaraju M., Marru S., Pierce M. Integrating Apache Airavata with Docker, Marathon, and Mesos. Concurrency and Computation: Practice and Experience. 2015;28:1952–1959.
    1. Ma'ayan A., Clark N.R. Large collection of diverse gene set search queries recapitulate known protein-protein interactions and gene-gene functional associations. 2016. arXiv:
    1. Zambon A.C., Gaj S., Ho I., Hanspers K., Vranizan K., Evelo C.T., Conklin B.R., Pico A.R., Salomonis N. GO-Elite: a flexible solution for pathway and ontology over-representation. Bioinformatics. 2012;28:2209–2210.
    1. D'Andrea D., Grassi L., Mazzapioda M., Tramontano A. FIDEA: a server for the functional interpretation of differential expression analysis. Nucleic Acids Res. 2013;41:W84–W88.
    1. Subramanian A., Tamayo P., Mootha V.K., Mukherjee S., Ebert B.L., Gillette M.A., Paulovich A., Pomeroy S.L., Golub T.R., Lander E.S. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 2005;102:15545–15550.
    1. Clark N.R., Szymkiewicz M., Wang Z., Monteiro C.D., Jones M.R., Ma'ayan A. Principle Angle Enrichment Analysis (PAEA): Dimensionally reduced multivariate gene set enrichment analysis tool. 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) IEEE. 2015:256–262.
    1. Napolitano F., Sirci F., Carrella D., di Bernardo D. Drug-set enrichment analysis: a novel tool to investigate drug mode of action. Bioinformatics. 2016;32:235–241.
    1. Han J., Shi X., Zhang Y., Xu Y., Jiang Y., Zhang C., Feng L., Yang H., Shang D., Sun Z. ESEA: discovering the dysregulated pathways based on edge set enrichment analysis. Sci. Rep. 2015;5:13044.

Source: PubMed

3
Subscribe