Use of direct gradient analysis to uncover biological hypotheses in 16s survey data and beyond
John R Erb-Downward, Amir A Sadighi Akha, Juan Wang, Ning Shen, Bei He, Fernando J Martinez, Margaret R Gyetko, Jeffrey L Curtis, Gary B Huffnagle, John R Erb-Downward, Amir A Sadighi Akha, Juan Wang, Ning Shen, Bei He, Fernando J Martinez, Margaret R Gyetko, Jeffrey L Curtis, Gary B Huffnagle
Abstract
This study investigated the use of direct gradient analysis of bacterial 16S pyrosequencing surveys to identify relevant bacterial community signals in the midst of a "noisy" background, and to facilitate hypothesis-testing both within and beyond the realm of ecological surveys. The results, utilizing 3 different real world data sets, demonstrate the utility of adding direct gradient analysis to any analysis that draws conclusions from indirect methods such as Principal Component Analysis (PCA) and Principal Coordinates Analysis (PCoA). Direct gradient analysis produces testable models, and can identify significant patterns in the midst of noisy data. Additionally, we demonstrate that direct gradient analysis can be used with other kinds of multivariate data sets, such as flow cytometric data, to identify differentially expressed populations. The results of this study demonstrate the utility of direct gradient analysis in microbial ecology and in other areas of research where large multivariate data sets are involved.
Figures
References
- Liu Z., DeSantis T. Z., Andersen G. L. & Knight R. Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers. Nucleic Acids Res 36, e120 (2008)
- Schmidt T. M. & Relman D. A. Phylogenetic identification of uncultured pathogens using ribosomal RNA sequences. Methods Enzymol 235, 205–222 (1994).
- Rothberg J. M. & Leamon J. H. The development and impact of 454 sequencing. Nat Biotechnol 26, 1117–1124 (2008).
- Mason K. L., Erb Downward J. R., Falkowski N. R., Young V. B., Kao J. Y. et al. Interplay between the Gastric Bacterial Microbiota and Candida albicans during Postantibiotic Recolonization and Gastritis. Infect Immun 80, 150–158 (2012).
- Ramette A. Multivariate analyses in microbial ecology. FEMS Microbiol Ecol 62, 142–160 (2007).
- Turnbaugh P. J. & Gordon J. I. The core gut microbiome, energy balance and obesity. J Physiol 587, 4153–4158 (2009).
- Turnbaugh P. J., Ley R. E., Mahowald M. A., Magrini V., Mardis E. R. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 1027–1031 (2006).
- Muegge B. D., Kuczynski J., Knights D., Clemente J. C., Gonzalez A. et al. Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science 332, 970–974 (2011).
- Caporaso J. G., Kuczynski J., Stombaugh J., Bittinger K., Bushman F. D. et al. QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7, 335–336 (2010).
- Lozupone C. & Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71, 8228–8235 (2005).
- Schloss P. D., Westcott S. L., Ryabin T., Hall J. R., Hartmann M. et al. Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol 75, 7537–7541 (2009).
- Sun Y., Cai Y., Mai V., Farmerie W., Yu F. et al. Advanced computational algorithms for microbial community analysis using massive 16S rRNA sequence data. Nucleic Acids Res 38, e205 (2010).
- Wang Q., Garrity G. M., Tiedje J. M. & Cole J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73, 5261–5267 (2007).
- Minchin P. R. An evaluation of relative robustness of techniques for ecological ordinations. Vegetatio 69, 89–107 (1987).
- Legendre P. & Legendre L. Numerical Ecology. 853. (1998).
- Erb-Downward J. R., Thompson D. L., Han M. K., Freeman C. M., McCloskey L. et al. Analysis of the lung microbiome in the "healthy" smoker and in COPD. PLoS One 6, e16384 (2011).
- Ter Braak C. J. F. Canonical Correspondence Analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67, 1167–1179 (1986).
- Jari Oksanen F., Guillaume Blanchet, Roeland Kindt, Pierre Legendre, R. B. O'Hara, et al.vegan: Community Ecology Package. R package version 1.17-3. 1.17-3 ed. (2010).
- Noverr M. C., Noggle R. M., Toews G. B. & Huffnagle G. B. Role of antibiotics and fungal microbiota in driving pulmonary allergic responses. Infect Immun 72, 4996–5003 (2004).
- Mason K. L., Erb Downward J. R., Mason K. D., Falkowski N. R., Eaton K. A. et al. Candida albicans and Bacterial Microbiota Interactions in the Cecum during Recolonization following Broad-Spectrum Antibiotic Therapy. Infect Immun 80, 3371–3380 (2012).
- Clarke K. R. Non-parametric multivariate analyses of changes in community structure. Australian Journal of Ecology 18, 117–143 (1993).
- Peres-Neto P. & Jackson D. How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test. Oecologia 129, 169–178 (2001).
- Mardia K. V., Kent J. T. & Bibby J. M. Multivariate analysis: Academic Press. (1979).
- Zapala M. A. & Schork N. J. Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables. Proc Natl Acad Sci USA 103, 19430–19435 (2006).
- Kuczynski J., Liu Z., Lozupone C., McDonald D., Fierer N. et al. Microbial community resemblance methods differ in their ability to detect biologically relevant patterns. Nat Methods 7, 813–819 (2010).
- Kosugi Y., Sato R., Genka S., Shitara N. & Takakura K. An interactive multivariate analysis of FCM data. Cytometry 9, 405–408 (1988).
- De Zen L., Bicciato S., te Kronnie G. & Basso G. Computational analysis of flow-cytometry antigen expression profiles in childhood acute lymphoblastic leukemia: an MLL/AF4 identification. Leukemia 17, 1557–1565 (2003).
- Lugli E., Pinti M., Nasi M., Troiano L., Ferraresi R. et al. Subject classification obtained by cluster analysis and principal component analysis applied to flow cytometric data. Cytometry Part A 71A, 334–344 (2007).
- Noverr M. C., Falkowski N. R., McDonald R. A., McKenzie A. N. & Huffnagle G. B. Development of allergic airway disease in mice following antibiotic therapy and fungal microbiota increase: role of host genetics, antigen, and interleukin-13. Infect Immun 73, 30–38 (2005).
- Hasegawa M., Yamazaki T., Kamada N., Tawaratsumida K., Kim Y. G. et al. Nucleotide-binding oligomerization domain 1 mediates recognition of Clostridium difficile and induces neutrophil recruitment and protection against the pathogen. J Immunol 186, 4872–4880 (2011).
- Hahne F., LeMeur N., Brinkman R. R., Ellis B., Haaland P. et al. flowCore: a Bioconductor package for high throughput flow cytometry. BMC Bioinformatics 10, 106 (2009).
- Murdoch D. A. a. D. rgl: 3D visualization device system (OpenGL). R package version 0.92.798 ed (2011).
Source: PubMed