PyNAST: a flexible tool for aligning sequences to a template alignment

J Gregory Caporaso, Kyle Bittinger, Frederic D Bushman, Todd Z DeSantis, Gary L Andersen, Rob Knight, J Gregory Caporaso, Kyle Bittinger, Frederic D Bushman, Todd Z DeSantis, Gary L Andersen, Rob Knight

Abstract

Motivation: The Nearest Alignment Space Termination (NAST) tool is commonly used in sequence-based microbial ecology community analysis, but due to the limited portability of the original implementation, it has not been as widely adopted as possible. Python Nearest Alignment Space Termination (PyNAST) is a complete reimplementation of NAST, which includes three convenient interfaces: a Mac OS X GUI, a command-line interface and a simple application programming interface (API).

Results: The availability of PyNAST will make the popular NAST algorithm more portable and thereby applicable to datasets orders of magnitude larger by allowing users to install PyNAST on their own hardware. Additionally because users can align to arbitrary template alignments, a feature not available via the original NAST web interface, the NAST algorithm will be readily applicable to novel tasks outside of microbial community analysis.

Availability: PyNAST is available at http://pynast.sourceforge.net.

Figures

Fig. 1.
Fig. 1.
(A) Screenshot of the PyNAST graphical user interface for Mac OS X. (B) Runtime of PyNAST is compared with that of NAST, each running on a single processor. PyNAST has a slightly shorter per sequence runtime (slope). The candidate sequences used in this evaluation ranged from 917 to 1343 bases, with a median length of 1294. The template alignment was a Greengenes core set (dated November 8, 2007) with 7682 positions and 4938 sequences.

References

    1. Altschul SF, et al. Basic local alignment search tool. J. Mol. Biol. 1990;215:403–410.
    1. DeSantis TZ, et al. Greengenes, a chimera-checked 16s rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 2006a;72:5069–5072.
    1. DeSantis TZ, et al. NAST: a multiple sequence alignment server for comparative analysis of 16s rRNA genes. Nucleic Acids Res. 2006b;34:W394–W399.
    1. Edgar RC. Muscle: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics. 2004;5:113.
    1. Katoh K, et al. Mafft version 5: improvement in accuracy of multiple sequence alignment. Nucleic Acids Res. 2005;33:511–518.
    1. Knight R, et al. PyCogent: a toolkit for making sense from sequence. Genome Biol. 2007;8:R171.
    1. Thompson JD, et al. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22:4673–4680.

Source: PubMed

3
Subscribe