Experimental and analytical tools for studying the human microbiome

Justin Kuczynski, Christian L Lauber, William A Walters, Laura Wegener Parfrey, José C Clemente, Dirk Gevers, Rob Knight, Justin Kuczynski, Christian L Lauber, William A Walters, Laura Wegener Parfrey, José C Clemente, Dirk Gevers, Rob Knight

Abstract

The human microbiome substantially affects many aspects of human physiology, including metabolism, drug interactions and numerous diseases. This realization, coupled with ever-improving nucleotide sequencing technology, has precipitated the collection of diverse data sets that profile the microbiome. In the past 2 years, studies have begun to include sufficient numbers of subjects to provide the power to associate these microbiome features with clinical states using advanced algorithms, increasing the use of microbiome studies both individually and collectively. Here we discuss tools and strategies for microbiome studies, from primer selection to bioinformatics analysis.

Conflict of interest statement

statement The authors declare no competing financial interests.

Figures

Figure 1. Bioinformatics analysis of microbiome sequence…
Figure 1. Bioinformatics analysis of microbiome sequence data
Although variations exist, we show typical analysis paths for both targeted amplicon analysis (for example, analysis of 16S rDNA) and metagenomic analysis (for example, shotgun amplification). In targeted amplicon studies (left branch), raw sequences are usually passed through quality filtering and denoising algorithms to minimize the effects of sequencing artefacts. The resulting filtered sequence reads are clustered into operational taxonomic units (OTUs), representing similar organisms, and the phylogeny and taxonomic identity (when the organisms closely resemble named taxonomic groups) are inferred. At this stage, it is possible to incorporate sequence data from other relevant studies, or the data can be treated individually. The abundance of the various OTUs is then subjected to a variety of multivariate analyses and visualization procedures to elucidate the structure and patterns of the microbial communities. In metagenomic studies (right branch), the raw sequence fragments are sometimes assembled into contiguous sequences (contigs). The functional potential of those sequences is then typically assessed using a functional annotation database. The results are used to identify important metabolic pathways and are compared to the results of other metagenomic studies. The processed data are then subjected to multivariate analyses and visualizations, and they are often combined with the results of microbial profiling. Note that there are several opportunities for obtaining targeted gene data (similar to those produced in 16S rDNA gene surveys) in metagenomic studies, as indicated by the step labelled ‘identify target gene sequences’. KEGG, Kyoto Encyclopedia of Genes and Genomes; MG-RAST, Metagenomics Rapid Annotation using Subsystems Technology.
Figure 2. Effects of primer choice in…
Figure 2. Effects of primer choice in targeted amplicon sequencing
In choosing primers, there is often a trade-off between being broadly inclusive and avoiding biases for or against specific groups, which might occur owing to variation in sequence conservation between lineages. Therefore, changes that increase the representation of one group may cause another group to drop out. This figure shows predicted taxonomic coverage for two 16S rDNA primer pairs: carefully selected universal primers (a) and commonly used primers with high bias (b). This results in variations in the observed taxonomic composition of communities owing to primer sensitivity, as shown in the pie charts for the communities in the mouth, the head and the skin and in the histograms on the right, representing the primer pairs’ sensitivity to various microbial phyla (the effect continues to finer-level taxa, such as genera (not shown)). As no primer set is without some degree of bias, it is important that a priori knowledge of the target microbiota is used along with knowledge of a primer’s performance when choosing a primer set. However, primer set 1 (which amplifies the F515–R806 fragment) is an especially good choice in terms of avoiding bias and allowing for good representation of known bacterial and archaeal groups. It has been adopted by the Earth Microbiome Project, among other projects. Data taken from REF. .
Figure 3. How to get the most…
Figure 3. How to get the most taxonomic information out of each sequencing technology
Taxonomic profiling consists of generating an amplicon (in red) of the (partial) 16S ribosomal RNA (rRNA) gene (top) with selected PCR primers, followed by sequencing that amplicon with a preferred technology (grey arrows): Sanger ABI 3730xl, 454 (FLX and FLX Titanium) and Illumina 101 paired-end (PE) sequencing technologies are compared in the figure. Arrows emanating from the schematic 16S gene represent common forward (F) and reverse (R) primers, and the orange boxes denote the hypervariable regions (V1–9), which are known to be far less conserved than the surrounding sequence. Technologies differ in the maximum allowable amplicon size and read length (TABLE 1) and therefore result in a different view of the community. To increase overall length and/or quality, Sanger- and Illumina-based strategies involve sequencing amplicons in both directions; in Sanger sequencing, there is also the option of using a third read. By contrast, a preferred 454-based strategy sequences the amplicons in a single direction owing to the lack of standard paired-end sequencing and loss of pairing information. Getting the most taxonomic information requires careful selection of primers, 16S rDNA windows and technologies in order to obtain the most data,. The long length of Sanger reads diminishes the need for careful selection of amplicon primer pairs, which have been shown to have a large role in taxonomic assignment and community comparison results with 454 data. A variety of options exist, and studies such as the one described in REF. provide suggestions on which sets to choose. Short read effects are exacerbated further in Illumina sequencing, and choosing amplicon size such that overlapping paired-end reads occur is an important consideration.

Source: PubMed

3
Předplatit