Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform

James J Kozich, Sarah L Westcott, Nielson T Baxter, Sarah K Highlander, Patrick D Schloss, James J Kozich, Sarah L Westcott, Nielson T Baxter, Sarah K Highlander, Patrick D Schloss

Abstract

Rapid advances in sequencing technology have changed the experimental landscape of microbial ecology. In the last 10 years, the field has moved from sequencing hundreds of 16S rRNA gene fragments per study using clone libraries to the sequencing of millions of fragments per study using next-generation sequencing technologies from 454 and Illumina. As these technologies advance, it is critical to assess the strengths, weaknesses, and overall suitability of these platforms for the interrogation of microbial communities. Here, we present an improved method for sequencing variable regions within the 16S rRNA gene using Illumina's MiSeq platform, which is currently capable of producing paired 250-nucleotide reads. We evaluated three overlapping regions of the 16S rRNA gene that vary in length (i.e., V34, V4, and V45) by resequencing a mock community and natural samples from human feces, mouse feces, and soil. By titrating the concentration of 16S rRNA gene amplicons applied to the flow cell and using a quality score-based approach to correct discrepancies between reads used to construct contigs, we were able to reduce error rates by as much as two orders of magnitude. Finally, we reprocessed samples from a previous study to demonstrate that large numbers of samples could be multiplexed and sequenced in parallel with shotgun metagenomes. These analyses demonstrate that our approach can provide data that are at least as good as that generated by the 454 platform while providing considerably higher sequencing coverage for a fraction of the cost.

Figures

Fig 1
Fig 1
Design of dual-index sequencing strategy and schematic describing the four sequencing reads. The primers specific to the 16S rRNA gene are shown in boldface black text, linkers are in blue, pads are in green, the index region is in red, and the adapters are underlined. This schematic is demonstrated using the V4-specific primer sequences and linkers. The PCR and sequencing primers for each of the three regions are provided in the supplemental material.
Fig 2
Fig 2
Profile of sequencing errors in the first and second read (A and C) and the quality scores associated with different types of errors in the first and second read (B and D) using data from run 130403.
Fig 3
Fig 3
Relationship between the error rate and the fraction of sequences kept as a function of the ΔQ value for the V34, V4, and V45 regions using data from run 130403.
Fig 4
Fig 4
Principal coordinate ordination of ϴYC values (28) relating the community structures of the fecal microbiota from 12 mice collected on days 0 through 9 (Early) and days 141 through 150 (Late) after weaning.

Source: PubMed

3
Subscribe