Microbiome profiling by illumina sequencing of combinatorial sequence-tagged PCR products
Gregory B Gloor, Ruben Hummelen, Jean M Macklaim, Russell J Dickson, Andrew D Fernandes, Roderick MacPhee, Gregor Reid, Gregory B Gloor, Ruben Hummelen, Jean M Macklaim, Russell J Dickson, Andrew D Fernandes, Roderick MacPhee, Gregor Reid
Abstract
We developed a low-cost, high-throughput microbiome profiling method that uses combinatorial sequence tags attached to PCR primers that amplify the rRNA V6 region. Amplified PCR products are sequenced using an Illumina paired-end protocol to generate millions of overlapping reads. Combinatorial sequence tagging can be used to examine hundreds of samples with far fewer primers than is required when sequence tags are incorporated at only a single end. The number of reads generated permitted saturating or near-saturating analysis of samples of the vaginal microbiome. The large number of reads allowed an in-depth analysis of errors, and we found that PCR-induced errors composed the vast majority of non-organism derived species variants, an observation that has significant implications for sequence clustering of similar high-throughput data. We show that the short reads are sufficient to assign organisms to the genus or species level in most cases. We suggest that this method will be useful for the deep sequencing of any short nucleotide region that is taxonomically informative; these include the V3, V5 regions of the bacterial 16S rRNA genes and the eukaryotic V9 region that is gaining popularity for sampling protist diversity.
Conflict of interest statement
Competing Interests: The authors have declared that no competing interests exist.
Figures
References
- Andersson AF, Lindberg M, Jakobsson H, Bäckhed F, Nyrén P, et al. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS One. 2008;3:e2836.
- Lauber CL, Hamady M, Knight R, Fierer N. Pyrosequencing-based assessment of soil ph as a predictor of soil bacterial community structure at the continental scale. Appl Environ Microbiol. 2009;75:5111–20.
- Polymenakou PN, Lampadariou N, Mandalakis M, Tselepides A. Phylogenetic diversity of sediment bacteria from the southern cretan margin, eastern mediterranean sea. Syst Appl Microbiol. 2009 Feb;32:17–26.
- Hamady M, Knight R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res. 2009;19:1141–52.
- Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM. A method for studying protistan diversity using massively parallel sequencing of v9 hypervariable regions of small-subunit ribosomal rna genes. PLoS One. 2009;4:e6372.
- Schellenberg J, Links MG, Hill JE, Dumonceaux TJ, Peters GA, et al. Pyrosequencing of the chaperonin-60 universal target as a tool for determining microbial community composition. Appl Environ Microbiol. 2009;75:2889–98.
- Nocker A, Burr M, Camper AK. Genotypic microbial community profiling: a critical technical review. Microb Ecol. 2007;54:276–89.
- Petrosino JF, Highlander S, Luna RA, Gibbs RA, Versalovic J. Metagenomic pyrosequencing and microbial identification. Clin Chem. 2009;55:856–66.
- Reeder J, Knight R. The ‘rare biosphere’: a reality check. Nat Methods. 2009;6:636–7.
- Quince C, Lanzén A, Curtis TP, Davenport RJ, Hall N, et al. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009;6:639–41.
- Pawlowski J, Lecroq B. Short rdna barcodes for species identification in foraminifera. J Eukaryot Microbiol. 2010;57:197–205.
- Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Lozupone CA, et al. Global patterns of 16s rrna diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci U S A 2010
- Hummelen R, Fernandes AD, Macklaim JM, Dickson RJ, Changalucha J, et al. Deep sequencing of the vaginal microbiota of women with hiv. PLoS ONE. 2010;5:e12078.
- Huse SM, Dethlefsen L, Huber JA, Mark Welch D, Welch DM, et al. Exploring microbial diversity and taxonomy using ssu rrna hypervariable tag sequencing. PLoS Genet. 2008;4:e1000255.
- Cole JR, Wang Q, Cardenas E, Fish J, Chai B, et al. The ribosomal database project: improved alignments and new tools for rrna analysis. Nucleic Acids Res. 2009;37:D141–5.
- Wang Y, Qian PY. Conservative fragments in bacterial 16s rrna genes and primer design for 16s ribosomal dna amplicons in metagenomic studies. PLoS One. 2009;4:e7401.
- Laurikari V. Tre the free and portable approximate regex matching library. 2010. Available: . Accessed 2010, May 28.
- Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–9.
- Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res. 2008;18:1638–42.
- Whiteford N, Skelly T, Curtis C, Ritchie ME, Löhr A, et al. Swift: primary data analysis for the illumina solexa sequencing platform. Bioinformatics. 2009;25:2194–9.
- Frank DN. Barcrawl and bartab: software tools for the design and implementation of barcoded primers for highly multiplexed dna sequencing. BMC Bioinformatics. 2009;10:362.
- Engels WR. Contributing software to the internet: the amplify program. Trends Biochem Sci. 1993;18:448–50.
- Lahr DJG, Katz LA. Reducing the impact of pcr-mediated recombination in molecular evolution and environmental studies using a new-generation high-fidelity dna polymerase. Biotechniques. 2009;47:857–66.
- Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, et al. Blast+: architecture and applications. BMC Bioinformatics. 2009;10:421.
- Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Genbank. Nucleic Acids Res. 2010;38:D46–51.
- Altschul SF, Gish W. Local alignment statistics. Methods Enzymol. 1996;266:460–80.
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods. 2008;5:621–8.
- Hurlbert S. The nonconcept of species diversity: a critique and alternative parameters. Ecology. 1971;52:577–586.
- Smith EP, Van Belle G. Nonparametric estimation of species richness. Biometrics. 1984;40:119–129.
- Efron B. Nonparametric estimates of standard error: The jackknife, the bootstrap and other methods. Biometrika. 1981;68:589.
- Colwell R. EstimateS: Statistical estimation of species richness and shared species from samples. 2009. Available: . Accessed 2010 April 6.
- Chao A. Nonparametric estimation of the number of classes in a population. Scandinavian Journal of Statistics. 1984;11:265–270.
- Chao A, Lee SM. Estimating the number of classes via sample coverage. J American Stat Soc. 1992;87:210–217.
- Oksanen J, Kindt R, Legendre P, B O. vegan: Community ecology package version 18-3. r package. 2010. Available: . Accessed 2010 May 1.
- Srinivasan S, Fredricks DN. The human vaginal bacterial biota and bacterial vaginosis. Interdisciplinary perspectives on infectious diseases. 2008;2008:750479.
- Shi Y, Chen L, Tong J, Xu C. Preliminary characterization of vaginal microbiota in healthy chinese women using cultivation-independent methods. J Obstet Gynaecol Res. 2009;35:525–32.
- Rodrigue S, Materna AC, Timberlake SC, C BM, Malmstrom RR, et al. Unlocking short read sequencing for metagenomics. PLoS ONE. 2010;5:e11840.
- Ravel J, Gajer P, Abdo Z, Schneider GM, Koenig SSK, et al. Vaginal microbiome of reproductive-age women. Proc Natl Acad Sci U S A. 2010 doi/10.1073/pnas.100611107.
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, et al. Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402.
- Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM. The sanger fastq file format for sequences with quality scores, and the solexa/illumina fastq variants. Nucleic Acids Res. 2010;38:1767–71.
Source: PubMed