Large-scale benchmarking reveals false discoveries and count transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in microbiome studies

Jonathan Thorsen, Asker Brejnrod, Martin Mortensen, Morten A Rasmussen, Jakob Stokholm, Waleed Abu Al-Soud, Søren Sørensen, Hans Bisgaard, Johannes Waage, Jonathan Thorsen, Asker Brejnrod, Martin Mortensen, Morten A Rasmussen, Jakob Stokholm, Waleed Abu Al-Soud, Søren Sørensen, Hans Bisgaard, Johannes Waage

Abstract

Background: There is an immense scientific interest in the human microbiome and its effects on human physiology, health, and disease. A common approach for examining bacterial communities is high-throughput sequencing of 16S rRNA gene hypervariable regions, aggregating sequence-similar amplicons into operational taxonomic units (OTUs). Strategies for detecting differential relative abundance of OTUs between sample conditions include classical statistical approaches as well as a plethora of newer methods, many borrowing from the related field of RNA-seq analysis. This effort is complicated by unique data characteristics, including sparsity, sequencing depth variation, and nonconformity of read counts to theoretical distributions, which is often exacerbated by exploratory and/or unbalanced study designs. Here, we assess the robustness of available methods for (1) inference in differential relative abundance analysis and (2) beta-diversity-based sample separation, using a rigorous benchmarking framework based on large clinical 16S microbiome datasets from different sources.

Results: Running more than 380,000 full differential relative abundance tests on real datasets with permuted case/control assignments and in silico-spiked OTUs, we identify large differences in method performance on a range of parameters, including false positive rates, sensitivity to sparsity and case/control balances, and spike-in retrieval rate. In large datasets, methods with the highest false positive rates also tend to have the best detection power. For beta-diversity-based sample separation, we show that library size normalization has very little effect and that the distance metric is the most important factor in terms of separation power.

Conclusions: Our results, generalizable to datasets from different sequencing platforms, demonstrate how the choice of method considerably affects analysis outcome. Here, we give recommendations for tools that exhibit low false positive rates, have good retrieval power across effect sizes and case/control proportions, and have low sparsity bias. Result output from some commonly used methods should be interpreted with caution. We provide an easily extensible framework for benchmarking of new methods and future microbiome datasets.

Keywords: 16S sequencing; Benchmark; Beta-diversity; Differential relative abundance; Microbiome.

Figures

Fig. 1
Fig. 1
Spike-in approach and analysis flowchart. Left, a theoretical sample where the count data for OTU A is multiplied by 3 before rescaling to original sequencing depth. Right, flowchart of the simulation study, yielding FPR and AUC for each method, dataset, and set of variables, as well as R2 values for all combinations of normalization, transformation, and distance in the beta-diversity optimization
Fig. 2
Fig. 2
OTU sparsity vs. p value. Scatterplots of OTU sparsity vs p value with panels representing each differential relative abundance test method in feces dataset A1, with 50% cases. Colored line represents the LOESS regression on data. False positive rate (FPR) is defined as the fraction of OTUs with p < 0.05. Each differential relative abundance test represents the median FPR for that method, out of all 150 permutations. Contour lines indicate point density and can be compared to a hypothetical null distribution of p values demonstrated in the final panel (“Random uniform”)
Fig. 3
Fig. 3
Median test AUC vs median test FPR. Scatterplots of median test area under the curve (AUC) vs median test false positive rate (FPR), for the datasets A1–A3; three different compartments in the COPSAC2010 cohort. FPR is defined as the fraction of OTUs with p < 0.05. Dot color represents differential relative abundance test method, while dot shape represents experiment case/control balance
Fig. 4
Fig. 4
Median test AUC vs median test FPR, small and medium subsets. Scatterplots of median test area under the curve (AUC) vs median test false positive rate (FPR), for the datasets A1s–A3s and A1m–A3m; three different compartments in the COPSAC2010 cohort, with 50% case/control proportion. FPR is defined as the fraction of OTUs with p < 0.05. Dot color represents differential relative abundance test method
Fig. 5
Fig. 5
The effect of normalization, transformation, and distance metric on beta-diversity separation. a Datasets B1–B3 (vertical panels) with all combinations of library size normalizations (horizontal panels) and count transformations (color) applied prior to the calculation of distances and use of Adonis permanova model. Effect of design variable in question (B1—age; B2—tongue versus palate; B3—age group) measured as model R2 value. The highest and lowest R2 values (yielding best and worst separation, respectively) are demonstrated in subplots bg for each dataset as principal coordinate analysis plots, colored by design variable, with overlaid prediction ellipses (B1—subplot (b, c); B2—subplot (d, e); B3—subplot (f, g)). CSS cumulative sum scaling, TMM trimmed mean of M values, TSS total sum scaling

References

    1. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010;464:59–65. doi: 10.1038/nature08821.
    1. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, Mende DR, et al. Enterotypes of the human gut microbiome. Nature. 2011;473:174–80. doi: 10.1038/nature09944.
    1. Stackebrandt E, Goebel BM. Taxonomic note: a place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol. 1994;44:846–849. doi: 10.1099/00207713-44-4-846.
    1. Huse SM, Welch DM, Morrison HG, Sogin ML. Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol. 2010;12:1889–98. doi: 10.1111/j.1462-2920.2010.02193.x.
    1. Schloss PD, Westcott SL. Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol. 2011;77:3219–26. doi: 10.1128/AEM.02810-10.
    1. McDonald D, Clemente JC, Kuczynski J, Rideout JR, Stombaugh J, Wendel D, et al. The biological observation matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience. 2012;1:7. doi: 10.1186/2047-217X-1-7.
    1. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nat Methods. 2013;10:1200–2. doi: 10.1038/nmeth.2658.
    1. Zuur A, Leno EN, Walker N, Saveliev AA, Smith GM. Mixed effects models and extensions in ecology with R. Springer Science & Business Media. Berlin, Heidelberg; 2009.
    1. Richards SA. Dealing with overdispersed count data in applied ecology: overdispersed count data. J Appl Ecol. 2007;45:218–27. doi: 10.1111/j.1365-2664.2007.01377.x.
    1. Robinson MD, Smyth GK. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics. 2007;23:2881–7. doi: 10.1093/bioinformatics/btm453.
    1. Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106. doi: 10.1186/gb-2010-11-10-r106.
    1. McMurdie PJ, Holmes S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput Biol. 2014;10:e1003531. doi: 10.1371/journal.pcbi.1003531.
    1. Weiss SJ, Xu Z, Amir A, Peddada S, Bittinger K, Gonzalez A, et al. Effects of library size variance, sparsity, and compositionality on the analysis of microbiome data [Internet]. PeerJ PrePrints; 2015. Report No.: e1408. Available from: .
    1. Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, et al. Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol. 2009;24:127–35. doi: 10.1016/j.tree.2008.10.008.
    1. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550. doi: 10.1186/s13059-014-0550-8.
    1. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinforma Oxf Engl. 2010;26:139–40. doi: 10.1093/bioinformatics/btp616.
    1. Hardcastle TJ, Kelly KA. baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics. 2010;11:422. doi: 10.1186/1471-2105-11-422.
    1. Paulson J, Pop M, Bravo H. metagenomeSeq: statistical analysis for sparse high-throughput sequncing. Bioconductor package: 1.10.0. [Internet]. Available from: .
    1. Fernandes AD, Reid JN, Macklaim JM, McMurrough TA, Edgell DR, Gloor GB. Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis. Microbiome. 2014;2:15. doi: 10.1186/2049-2618-2-15.
    1. Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, McCulle SL, et al. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2011;108(Suppl 1):4680–7. doi: 10.1073/pnas.1002611107.
    1. Koren O, Knights D, Gonzalez A, Waldron L, Segata N, Knight R, et al. A guide to enterotypes across the human body: meta-analysis of microbial community structures in human microbiome datasets. PLoS Comput Biol. 2013;9:e1002863. doi: 10.1371/journal.pcbi.1002863.
    1. Knights D, Ward TL, McKinlay CE, Miller H, Gonzalez A, McDonald D, et al. Rethinking “enterotypes.”. Cell Host Microbe. 2014;16:433–7. doi: 10.1016/j.chom.2014.09.013.
    1. Angly FE, Dennis PG, Skarshewski A, Vanwonterghem I, Hugenholtz P, Tyson GW. CopyRighter: a rapid tool for improving the accuracy of microbial community profiles through lineage-specific gene copy number correction. Microbiome. 2014;2:11. doi: 10.1186/2049-2618-2-11.
    1. Bray JR, Curtis JT. An ordination of the upland forest communities of southern Wisconsin. Ecol Monogr. 1957;27:325–49. doi: 10.2307/1942268.
    1. Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol. 2005;71:8228–35. doi: 10.1128/AEM.71.12.8228-8235.2005.
    1. Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and qualitative β diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol. 2007;73:1576–85. doi: 10.1128/AEM.01996-06.
    1. Legendre P, Legendre LFJ. Numerical Ecology. Amsterdam: Elsevier; 1998.
    1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995;57:289–300.
    1. Bonferroni CE. Teoria statistica delle classi e calcolo delle probabilità. Libreria internazionale Seeber: Florence; 1936.
    1. Costea PI, Zeller G, Sunagawa S, Bork P. A fair comparison. Nat Methods. 2014;11:359. doi: 10.1038/nmeth.2897.
    1. Paulson JN, Bravo HC, Pop M. Reply to: a fair comparison. Nat Methods. 2014;11:359–60. doi: 10.1038/nmeth.2898.
    1. Fukuyama J, McMurdie PJ, Dethlefsen L, Relman DA, Holmes S. Comparisons of distance methods for combining covariates and abundances in microbiome studies. Biocomput. 2012 [Internet]. WORLD SCIENTIFIC; 2011. p. 213–24. Available from: . [cited 19 May 2015].
    1. Bisgaard H, Vissing NH, Carson CG, Bischoff AL, Følsgaard NV, Kreiner-Møller E, et al. Deep phenotyping of the unselected COPSAC2010 birth cohort study. Clin Exp Allergy J Br Soc Allergy Clin Immunol. 2013;43:1384–94. doi: 10.1111/cea.12213.
    1. Human Microbiome Project Consortium Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14. doi: 10.1038/nature11234.
    1. Pop M, Walker AW, Paulson J, Lindsay B, Antonio M, Hossain MA, et al. Diarrhea in young children from low-income countries leads to large-scale alterations in intestinal microbiota composition. Genome Biol. 2014;15:R76. doi: 10.1186/gb-2014-15-6-r76.
    1. Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinforma Oxf Engl. 2010;26:2460–1. doi: 10.1093/bioinformatics/btq461.
    1. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, Giannoukos G, et al. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res. 2011;21:494–504. doi: 10.1101/gr.112730.110.
    1. McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS One. 2013;8:e61217. doi: 10.1371/journal.pone.0061217.
    1. R Core Team . R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing, Vienna, Austria. [Internet]; 2015.
    1. Tange O. GNU parallel—the command-line power tool. Login USENIX Mag. 2011;36:42–7.
    1. Venables WN, Ripley BD. Modern applied statistics with S [Internet]. Fourth. New York: Springer; 2002.
    1. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez J-C, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics. 2011;12:77. doi: 10.1186/1471-2105-12-77.
    1. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’Hara RB, et al. vegan: community ecology package [Internet] 2015.
    1. Wickham H. ggplot2: elegant graphics for data analysis [Internet]. New York: Springer; 2009. Available from: .

Source: PubMed

3
Tilaa