Trimmomatic: a flexible trimmer for Illumina sequence data

Anthony M Bolger, Marc Lohse, Bjoern Usadel, Anthony M Bolger, Marc Lohse, Bjoern Usadel

Abstract

Motivation: Although many next-generation sequencing (NGS) read preprocessing tools already existed, we could not find any tool or combination of tools that met our requirements in terms of flexibility, correct handling of paired-end data and high performance. We have developed Trimmomatic as a more flexible and efficient preprocessing tool, which could correctly handle paired-end data.

Results: The value of NGS read preprocessing is demonstrated for both reference-based and reference-free tasks. Trimmomatic is shown to produce output that is at least competitive with, and in many cases superior to, that produced by other tools, in all scenarios tested.

Availability and implementation: Trimmomatic is licensed under GPL V3. It is cross-platform (Java 1.5+ required) and available at http://www.usadellab.org/cms/index.php?page=trimmomatic

Contact: usadel@bio1.rwth-aachen.de

Supplementary information: Supplementary data are available at Bioinformatics online.

© The Author 2014. Published by Oxford University Press.

Figures

Fig. 1.
Fig. 1.
Putative sequence alignments as tested in simple mode. The alignment process begins with a partial overlap at the 5′ end of the read (A), increasing to a full-length 5′ overlap (B), followed by full overlaps at all positions (C) and finishes with a partial overlap at the 3′ end of the read (D). Note that the upstream ‘adapter’ sequence is for illustration only and is not part of the read or the aligned region
Fig. 2.
Fig. 2.
Putative sequence alignments as tested in palindrome mode. The alignment process begins with the adapters completely overlapping the reads (A) testing for immediate ‘read-through’, then proceeds by checking for later overlap (B), including partial adapter read-through (C), finishing when the overlap indicates no read-through into the adapters (D)
Fig. 3.
Fig. 3.
How Maximum Information mode combines uniqueness, coverage and error rate to determine the optimal trimming point

References

    1. Aronesty E. Comparison of sequencing utility programs. Open Bioinform. J. 2013;7:1–8.
    1. Junier T, Zdobnov EM. The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell. Bioinformatics. 2010;26:1669–1670.
    1. Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat. Methods. 2012;9:357–359.
    1. Li H, Durbin R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics. 2009;25:1754–1760.
    1. Li H, Homer N. A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 2010;11:473–483.
    1. Li JW, et al. The NGS WikiBook: a dynamic collaborative online training effort with long-term sustainability. Brief. Bioinform. 2013;14:548–555.
    1. Lindgreen S. AdapterRemoval: easy cleaning of next generation sequencing reads. BMC Res. Notes. 2012;5:337.
    1. Mardis ER. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 2008;9:387–402.
    1. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. J. 2011;17:10–12.
    1. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 2008;18:821–829.

Source: PubMed

3
Se inscrever