SUPER-FOCUS: a tool for agile functional analysis of shotgun metagenomic data
Genivaldo Gueiros Z Silva, Kevin T Green, Bas E Dutilh, Robert A Edwards, Genivaldo Gueiros Z Silva, Kevin T Green, Bas E Dutilh, Robert A Edwards
Abstract
Summary: Analyzing the functional profile of a microbial community from unannotated shotgun sequencing reads is one of the important goals in metagenomics. Functional profiling has valuable applications in biological research because it identifies the abundances of the functional genes of the organisms present in the original sample, answering the question what they can do. Currently, available tools do not scale well with increasing data volumes, which is important because both the number and lengths of the reads produced by sequencing platforms keep increasing. Here, we introduce SUPER-FOCUS, SUbsystems Profile by databasE Reduction using FOCUS, an agile homology-based approach using a reduced reference database to report the subsystems present in metagenomic datasets and profile their abundances. SUPER-FOCUS was tested with over 70 real metagenomes, the results showing that it accurately predicts the subsystems present in the profiled microbial communities, and is up to 1000 times faster than other tools.
Availability and implementation: SUPER-FOCUS was implemented in Python, and its source code and the tool website are freely available at https://edwards.sdsu.edu/SUPERFOCUS.
Contact: redwards@mail.sdsu.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.
Figures
References
- Altschul S.F., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.
- Aziz R.K., et al. (2008) The RAST server: rapid annotations using subsystems technology. BMC Genomics, 9, 75.
- Aziz R.K., et al. (2012) SEED servers: high-performance access to the SEED genomes, annotations, and metabolic models. PLoS One, 7, e48053.
- Berendzen J., et al. (2012) Rapid phylogenetic and functional classification of short genomic fragments with signature peptides. BMC Res. Notes, 5, 460.
- Buchfink B., et al. (2015) Fast and sensitive protein alignment using DIAMOND. Nat. Methods, 12, 59–60.
- Caspi R., et al. (2010) The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res., 38, D473–D479.
- Cock P., et al. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinforma. Oxf. Engl., 25, 1422–1423.
- Consortium T.H.M.P. (2012) Structure, function and diversity of the healthy human microbiome. Nature, 486, 207–214.
- Dinsdale E.A., et al. (2008) Microbial ecology of four coral atolls in the Northern line islands. PLoS One, 3, e1584.
- Disz T., et al. (2010) Accessing the SEED genome databases via Web services API: tools for programmers. BMC Bioinformatics, 11, 319.
- Edwards R.A., et al. (2012) Real time metagenomics: using k-mers to annotate metagenomes. Bioinformatics, 28, 3316–3317.
- Garcia G.D., et al. (2013) Metagenomic analysis of healthy and white plague-affected Mussismilia braziliensis corals. Microb. Ecol., 65, 1076–1086.
- Haas A.F., et al. (2014) Unraveling the unseen players in the ocean–a field guide to water chemistry and marine microbiology. JoVE J. Vis. Exp., e52131–e52131.
- Handelsman J. (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol. Mol. Biol. Rev., 68, 669–685.
- Huang Y., et al. (2010) CD-HIT suite: a web server for clustering and comparing biological sequences. Bioinformatics, 26, 680–682.
- Hunter J.D. (2007) Matplotlib: a 2D graphics environment. Comput. Sci. Eng., 9, 90–95.
- Jones E., et al. (2001) SciPy: Open source scientific tools for Python. , (20 October 2015, date last accessed).
- Kanehisa M., Goto S. (2000) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res., 28, 27–30.
- Kent W.J. (2002) BLAT—The BLAST-like alignment tool. Genome Res., 12, 656–664.
- Lindgreen S., et al. (2015) An evaluation of the accuracy and speed of metagenome analysis tools. 017830, .
- Li W., et al. (2012) Ultrafast clustering algorithms for metagenomic sequence analysis. Brief. Bioinform., 13, 656–668.
- Mendoza M.L.Z., et al. (2015) Environmental genes and genomes: understanding the differences and challenges in the approaches and software for their analyses. Brief. Bioinform., 16, 745–758.
- Meyer F., et al. (2008) The metagenomics RAST server–a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics, 9, 386.
- Mitra S., et al. (2011) Functional analysis of metagenomes and metatranscriptomes using SEED and KEGG. BMC Bioinformatics, 12, S21.
- de Oliveira L.S., et al. (2012) Transcriptomic analysis of the red seaweed Laurencia dendroidea (Florideophyceae, Rhodophyta) and its microbiome. BMC Genomics, 13, 487.
- Ounit R., et al. (2015) CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers. BMC Genomics, 16, 236.
- Overbeek R., et al. (2004) The SEED: a peer-to-peer environment for genome annotation. Commun ACM, 47, 46–51.
- Overbeek R., et al. (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res., 33, 5691–5702.
- Rho M., et al. (2010) FragGeneScan: predicting genes in short and error-prone reads. Nucleic Acids Res., 38, e191.
- Rotmistrovsky K., Agarwala R. (2011) BMTagger: best match tagger for removing human reads from metagenomics datasets. (20 October 2015, date last accessed).
- Schmieder R., Edwards R. (2011) Fast identification and removal of sequence contamination from genomic and metagenomic datasets. PLoS One, 6, e17288.
- Segata N., et al. (2012) Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods, 9, 811–814.
- Silva G.G.Z., et al. (2014) FOCUS: an alignment-free model to identify organisms in metagenomes using non-negative least squares. Peer J, 2, e425.
- Trindade-Silva A.E., et al. (2012) Taxonomic and functional microbial signatures of the endemic marine sponge arenosclera brasiliensis. PLoS One, 7, e39905.
- Trindade-Silva A.E., et al. (2013) Polyketide synthase gene diversity within the microbiome of the sponge arenosclera brasiliensis, endemic to the Southern Atlantic Ocean. Appl. Environ. Microbiol., 79, 1598–1605.
- Weiss S., et al. (2014) Tracking down the sources of experimental contamination in microbiome studies. Genome Biol., 15, 564.
- Whitman W.B., et al. (1998) Prokaryotes: the unseen majority. Proc. Natl. Acad. Sci. USA, 95, 6578–6583.
- Wood D.E., Salzberg S.L. (2014) Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol., 15, R46.
- Zhang J., et al. (2011) The impact of next-generation sequencing on genomics. J. Genet. Genomics, 38, 95–109.
- Zhao Y., et al. (2012) RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data. Bioinformatics, 28, 125–126.
Source: PubMed