Evaluating the Information Content of Shallow Shotgun Metagenomics

Benjamin Hillmann, Gabriel A Al-Ghalith, Robin R Shields-Cutler, Qiyun Zhu, Daryl M Gohl, Kenneth B Beckman, Rob Knight, Dan Knights, Benjamin Hillmann, Gabriel A Al-Ghalith, Robin R Shields-Cutler, Qiyun Zhu, Daryl M Gohl, Kenneth B Beckman, Rob Knight, Dan Knights

Abstract

Although microbial communities are associated with human, environmental, plant, and animal health, there exists no cost-effective method for precisely characterizing species and genes in such communities. While deep whole-metagenome shotgun (WMS) sequencing provides high taxonomic and functional resolution, it is often prohibitively expensive for large-scale studies. The prevailing alternative, 16S rRNA gene amplicon (16S) sequencing, often does not resolve taxonomy past the genus level and provides only moderately accurate predictions of the functional profile; thus, there is currently no widely accepted approach to affordable, high-resolution, taxonomic, and functional microbiome analysis. To address this technology gap, we evaluated the information content of shallow shotgun sequencing with as low as 0.5 million sequences per sample as an alternative to 16S sequencing for large human microbiome studies. We describe a library preparation protocol enabling shallow shotgun sequencing at approximately the same per-sample cost as 16S sequencing. We analyzed multiple real and simulated biological data sets, including two novel human stool samples with ultradeep sequencing of 2.5 billion sequences per sample, and found that shallow shotgun sequencing recovers more-accurate species-level taxonomic and functional profiles of the human microbiome than 16S sequencing. We discuss the inherent limitations of shallow shotgun sequencing and note that 16S sequencing remains a valuable and important method for taxonomic profiling of novel environments. Although deep WMS sequencing remains the gold standard for high-resolution microbiome analysis, we recommend that researchers consider shallow shotgun sequencing as a useful alternative to 16S sequencing for large-scale human microbiome research studies where WMS sequencing may be cost-prohibitive. IMPORTANCE A common refrain in recent microbiome-related academic meetings is that the field needs to move away from broad taxonomic surveys using 16S sequencing and toward more powerful longitudinal studies using shotgun sequencing. However, performing deep shotgun sequencing in large longitudinal studies remains prohibitively expensive for all but the most well-funded research labs and consortia, which leads many researchers to choose 16S sequencing for large studies, followed by deep shotgun sequencing on a subset of targeted samples. Here, we show that shallow- or moderate-depth shotgun sequencing may be used by researchers to obtain species-level taxonomic and functional data at approximately the same cost as amplicon sequencing. While shallow shotgun sequencing is not intended to replace deep shotgun sequencing for strain-level characterization, we recommend that microbiome scientists consider using shallow shotgun sequencing instead of 16S sequencing for large-scale human microbiome studies.

Keywords: human microbiome; metagenomics; microbiome; shotgun metagenomics.

Figures

FIG 1
FIG 1
Information content of deep and shallow shotgun sequencing. (A) Percentages of raw shotgun DNA sequences that are unique to one bacterial species across different human body habitats (7 distinct plaque samples, 30 distinct samples from other body sites). (B, C) Principal-coordinate analysis of Bray-Curtis beta diversity using deep (B) and shallow (C) sequencing (sample sizes were as described for panel A). (D, E) Shannon diversity estimates at varied sequencing depths for human stool (D) and subgingival plaque microbiomes (E) (sample sizes were as described for panel B). Boxplots show minimums, first quartiles, medians, second quartiles, and maximums, with outliers beyond 1.5 times the interquartile range plotted individually.
FIG 2
FIG 2
Comparison of species and function profiles with ultradeep sequencing data. (A, B) Correlation with ground truth species (A) and KEGG Orthology group (KO) (B) profile for known genes present in the reference database at different sequencing depths, showing that as few as 0.5 million sequences recover nearly the full species and function profiles (ground truth based on 2.5 billion reads per sample; 4,394 genes and 694 species were used at each subsampling level from the subject 1 ultradeep sequencing sample; comparable results from subject 2 are not shown). Gene and species profiles recovered from the ultradeep data include only direct matches to genes and genomes present in the database; de novo assembly of novel genes and contigs from deep data are expected to yield additional uncharacterized gene content and is not possible with shallow shotgun data. (C, D) Scatterplots of species (C) and KOs (D) at 0.5 million versus 2.5 billion reads per sample (we used the same sample size as used for panel A and B above).
FIG 3
FIG 3
Biomarker discovery using shallow shotgun sequencing. (A, B) Precision, recall for per-read species binning of different metagenomics analysis tools (“95” and “98” refer to the minimum alignment identity threshold used; 5 distinct replicates [rep] were performed per subsampling depth, and error bars show standard deviations). (C) Stacked bar plot of species abundances recovered from HMP mock community shotgun sequencing data. (D) Negative log10 false-discovery rate (FDR)-corrected P values using Mann-Whitney U tests for species associated with type 2 diabetes (17), compared between deep and shallow shotgun sequencing (43 healthy patients, 53 patients with type 2 diabetes).
FIG 4
FIG 4
Comparison of 16S sequencing and shallow shotgun recovery of species-level taxa. (A) Histogram of average Pearson correlation (R-squared) of species profiles between 16S sequencing and shallow shotgun sequencing from the same HMP sample (R-squared = 0.918), compared to the permutation-based null distribution of R-squared values for random pairings (P < 0.001). (B) Scatterplot of relative abundances of species in shallow shotgun sequencing versus 16S sequencing from the same HMP samples. Species found in only one data type are shown in a different color. (C) Fractions of all observed species, with relative abundance accounted for by species found by 16S sequencing only, shallow shotgun sequencing only, or both.

References

    1. Thompson LR, Sanders JG, McDonald D, Amir A, Ladau J, Locey KJ, Prill RJ, Tripathi A, Gibbons SM, Ackermann G, Navas-Molina JA, Janssen S, Kopylova E, Vázquez-Baeza Y, González A, Morton JT, Mirarab S, Zech Xu Z, Jiang L, Haroon MF, Kanbar J, Zhu Q, Jin Song S, Kosciolek T, Bokulich NA, Lefler J, Brislawn CJ, Humphrey G, Owens SM, Hampton-Marcell J, Berg-Lyons D, McKenzie V, Fierer N, Fuhrman JA, Clauset A, Stevens RL, Shade A, Pollard KS, Goodwin KD, Jansson JK, Gilbert JA, Knight R, Earth Microbiome Project Consortium. 2017. A communal catalogue reveals Earth’s multiscale microbial diversity. Nature 551:457–463. doi:10.1038/nature24621.]
    1. Singh BK, Bardgett RD, Smith P, Reay DS. 2010. Microorganisms and climate change: terrestrial feedbacks and mitigation options. Nat Rev Microbiol 8:779–790. doi:10.1038/nrmicro2439.
    1. Fitzpatrick CR, Copeland J, Wang PW, Guttman DS, Kotanen PM, Johnson MTJ. 2018. Assembly and ecological function of the root microbiome across angiosperm plant species. Proc Natl Acad Sci U S A 115:E1157–E1165. doi:10.1073/pnas.1717617115.
    1. Huttenhower C, Gevers D, Knight R, Abubucker S, Badger J, Chinwalla A, Creasy H, Earl A, FitzGerald M, Fulton R, Giglio M, Hallsworth-Pepin K, Lobos E, Madupu R, Magrini V, Martin J, Mitreva M, Muzny D, Sodergren E, Versalovic J, Wollam A, Worley K, Wortman J, Young S, Zeng Q, Aagaard K, Abolude O, Allen-Vercoe E, Alm E, Alvarado L, Andersen G, Anderson S, Appelbaum E, Arachchi H, Armitage G, Arze C, Ayvaz T, Baker C, Begg L, Belachew T, Bhonagiri V, Bihan M, Blaser M, Bloom T, Bonazzi V, Brooks J, Buck G, Buhay C, Busam D, Campbell J, et al. 2012. Structure, function and diversity of the healthy human microbiome. Nature 486:207–214. doi:10.1038/nature11234.
    1. Knights D, Costello EK, Knight R. 2011. Supervised classification of human microbiota. FEMS Microbiol Rev 35:343–359. doi:10.1111/j.1574-6976.2010.00251.x.
    1. Yarza P, Yilmaz P, Pruesse E, Glöckner FO, Ludwig W, Schleifer K-H, Whitman WB, Euzéby J, Amann R, Rosselló-Móra R. 2014. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Microbiol 12:635–645. doi:10.1038/nrmicro3330.
    1. Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA, Clemente JC, Burkepile DE, Vega Thurber RL, Knight R, Beiko RG, Huttenhower C. 2013. Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31:814–821. doi:10.1038/nbt.2676.
    1. Iwai S, Weinmaier T, Schmidt BL, Albertson DG, Poloso NJ, Dabbagh K, DeSantis TZ. 2016. Piphillin: improved prediction of metagenomic content by direct inference from human microbiomes. PLoS One 11:e0166104. doi:10.1371/journal.pone.0166104.
    1. Luo C, Knight R, Siljander H, Knip M, Xavier RJ, Gevers D. 2015. ConStrains identifies microbial strains in metagenomic datasets. Nat Biotechnol 33:1045–1052. doi:10.1038/nbt.3319.
    1. Alneberg J, Bjarnason BS, de Bruijn I, Schirmer M, Quick J, Ijaz UZ, Lahti L, Loman NJ, Andersson AF, Quince C. 2014. Binning metagenomic contigs by coverage and composition. Nat Methods 11:1144–1146. doi:10.1038/nmeth.3103.
    1. Nielsen HB, Almeida M, Juncker AS, Rasmussen S, Li J, Sunagawa S, Plichta DR, Gautier L, Pedersen AG, Le Chatelier E, Pelletier E, Bonde I, Nielsen T, Manichanh C, Arumugam M, Batto J-M, Quintanilha Dos Santos MB, Blom N, Borruel N, Burgdorf KS, Boumezbeur F, Casellas F, Doré J, Dworzynski P, Guarner F, Hansen T, Hildebrand F, Kaas RS, Kennedy S, Kristiansen K, Kultima JR, Léonard P, Levenez F, Lund O, Moumen B, Le Paslier D, Pons N, Pedersen O, Prifti E, Qin J, Raes J, Sørensen S, Tap J, Tims S, Ussery DW, Yamada T, MetaHIT Consortium P, Renault P, Sicheritz-Ponten T, Bork P, Wang J, Brunak S, Ehrlich SD. 2014. Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes. Nat Biotechnol 32:822–828. doi:10.1038/nbt.2939.
    1. Jones MB, Highlander SK, Anderson EL, Li W, Dayrit M, Klitgord N, Fabani MM, Seguritan V, Green J, Pride DT, Yooseph S, Biggs W, Nelson KE, Venter JC. 2015. Library preparation methodology can influence genomic and functional predictions in human microbiome research. Proc Natl Acad Sci U S A 112:14024–14029. doi:10.1073/pnas.1519288112.
    1. Ni J, Yan Q, Yu Y. 2013. How much metagenomic sequencing is enough to achieve a given goal? Sci Rep 3:1968. doi:10.1038/srep01968.
    1. Jovel J, Patterson J, Wang W, Hotte N, O'Keefe S, Mitchel T, Perry T, Kao D, Mason AL, Madsen KL, Wong GK-S. 2016. Characterization of the gut microbiome using 16S or shotgun metagenomics. Front Microbiol 7:459. doi:10.3389/fmicb.2016.00459.
    1. Rodriguez-R LM, Konstantinidis KT. 2014. Estimating coverage in metagenomic data sets and why it matters. ISME J 8:2349–2351. doi:10.1038/ismej.2014.76.
    1. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M. 2012. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40:D109–D114. doi:10.1093/nar/gkr988.
    1. Karlsson F, Tremaroli V, Nielsen J, Backhed F. 2013. Assessing the human gut microbiota in metabolic diseases. Diabetes 62:3341–3349. doi:10.2337/db13-0844.
    1. Al-Ghalith GA, Knights D. 2017. BURST enables optimal exhaustive DNA alignment for big data. .
    1. Needleman SB, Wunsch CD. 1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48:443–453. doi:10.1016/0022-2836(70)90057-4.
    1. Tatusova T, Ciufo S, Fedorov B, O’Neill K, Tolstoy I. 2014. RefSeq microbial genomes database: new representation and annotation strategy. Nucleic Acids Res 42:D553–D559. doi:10.1093/nar/gkt1274.
    1. Wood DE, Salzberg SL. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biol 15:R46. doi:10.1186/gb-2014-15-3-r46.
    1. Oksanen J, Blanchet FG, Kindt R, Legendre P, Minchin PR, O’hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H. 2013. Package ‘vegan.’ Community Ecol Packag version 2 .
    1. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9:357–359. doi:10.1038/nmeth.1923.
    1. Kim D, Song L, Breitwieser FP, Salzberg SL. 2016. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res 26:1721–1729. doi:10.1101/gr.210641.116.
    1. Al-Ghalith GA, Knights D. 2017. Faster and lower-memory metagenomic profiling with UTree. .
    1. Radnedge L, Agron PG, Hill KK, Jackson PJ, Ticknor LO, Keim P, Andersen GL. 2003. Genome differences that distinguish Bacillus anthracis from Bacillus cereus and Bacillus thuringiensis. Appl Environ Microbiol 69:2755–2764. doi:10.1128/AEM.69.5.2755-2764.2003.
    1. Tessler M, Neumann JS, Afshinnekoo E, Pineda M, Hersch R, Velho LFM, Segovia BT, Lansac-Toha FA, Lemke M, DeSalle R, Mason CE, Brugler MR. 2017. Large-scale differences in microbial biodiversity discovery between 16S sequencing amplicon and shotgun sequencing. Sci Rep 7:6589. doi:10.1038/s41598-017-06665-3.
    1. Burton JN, Liachko I, Dunham MJ, Shendure J. 2014. Species-level deconvolution of metagenome assemblies with Hi-C-based contact probability maps. G3 (Bethesda) 4:1339–1346. doi:10.1534/g3.114.011825.
    1. McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A, Andersen GL, Knight R, Hugenholtz P. 2012. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6:610–618. doi:10.1038/ismej.2011.139.
    1. Homer N. 2010. DWGSIM. .

Source: PubMed

3
Se inscrever