Systematic benchmarking of tools for CpG methylation detection from nanopore sequencing
Zaka Wing-Sze Yuen, Akanksha Srivastava, Runa Daniel, Dennis McNevin, Cameron Jack, Eduardo Eyras, Zaka Wing-Sze Yuen, Akanksha Srivastava, Runa Daniel, Dennis McNevin, Cameron Jack, Eduardo Eyras
Abstract
DNA methylation plays a fundamental role in the control of gene expression and genome integrity. Although there are multiple tools that enable its detection from Nanopore sequencing, their accuracy remains largely unknown. Here, we present a systematic benchmarking of tools for the detection of CpG methylation from Nanopore sequencing using individual reads, control mixtures of methylated and unmethylated reads, and bisulfite sequencing. We found that tools have a tradeoff between false positives and false negatives and present a high dispersion with respect to the expected methylation frequency values. We described various strategies to improve the accuracy of these tools, including a consensus approach, METEORE ( https://github.com/comprna/METEORE ), based on the combination of the predictions from two or more tools that shows improved accuracy over individual tools. Snakemake pipelines are also provided for reproducibility and to enable the systematic application of our analyses to other datasets.
Conflict of interest statement
The authors declare no competing interests.
Figures
References
- Greenberg MVC, Bourc’his D. The diverse roles of DNA methylation in mammalian development and disease. Nat. Rev. Mol. Cell Biol. 2019;20:590–607. doi: 10.1038/s41580-019-0159-6.
- Kader F, Ghai M. DNA methylation and application in forensic sciences. Forensic Sci. Int. 2015;249:255–265. doi: 10.1016/j.forsciint.2015.01.037.
- Jones PA. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev. Genet. 2012;13:484–492. doi: 10.1038/nrg3230.
- Yong W-S, Hsu F-M, Chen P-Y. Profiling genome-wide DNA methylation. Epigenetics Chromatin. 2016;9:26. doi: 10.1186/s13072-016-0075-3.
- Raiber E-A, Hardisty R, van Delft P, Balasubramanian S. Mapping and elucidating the function of modified bases in DNA. Nat. Rev. Chem. 2017;1:0069. doi: 10.1038/s41570-017-0069.
- Grunau C, Clark S, Rosenthal A. Bisulfite genomic sequencing: systematic investigation of critical experimental parameters. Nucleic Acids Res. 2001;29:e65–e65. doi: 10.1093/nar/29.13.e65.
- Ehrich M, Zoll S, Sur S, Van Den Boom D. A new method for accurate assessment of DNA quality after bisulfite treatment. Nucleic Acids Res. 2007;35:e29. doi: 10.1093/nar/gkl1134.
- Simpson JT, et al. Detecting DNA cytosine methylation using nanopore sequencing. Nat. Methods. 2017;14:407–410. doi: 10.1038/nmeth.4184.
- Laszlo AH, et al. Detection and mapping of 5-methylcytosine and 5-hydroxymethylcytosine with nanopore MspA. Proc. Natl Acad. Sci. USA. 2013;110:18904–18909. doi: 10.1073/pnas.1310240110.
- Rand AC, et al. Mapping DNA methylation with high-throughput nanopore sequencing. Nat. Methods. 2017;14:411–413. doi: 10.1038/nmeth.4189.
- Yuen, Z. W.-S., Srivastava, A., Jack, C. & Eyras, E. Systematic benchmarking of tools for CpG methylation detecgtion from Nanopore sequencing., 10.5281/zenodo.4748319 (2021).
- Oxford Nanopore Technologies. GitHub—Megalodon (Oxford Nanopore Technologies, 2020). .
- Ni P, et al. DeepSignal: detecting DNA methylation state from nanopore sequencing reads using deep-learning. Bioinformatics. 2019;35:4586–4595. doi: 10.1093/bioinformatics/btz276.
- Oxford Nanopore Technologies. GitHub (Oxford Nanopore Technologies, 2020). .
- Stoiber, M. et al. De novo identification of dna modifications enabled by genome-guided nanopore signal processing. 10.1101/094672 (2017).
- Liu Q, et al. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nat. Commun. 2019;10:2449. doi: 10.1038/s41467-019-10168-2.
- Köster J, Rahmann S. Snakemake—a scalable bioinformatics workflow engine. Bioinformatics. 2012;28:2520–2522. doi: 10.1093/bioinformatics/bts480.
- Gilpatrick, T. et al. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat. Biotechnol.38, 433–438 (2020).
- Dunham I, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489:57–74. doi: 10.1038/nature11247.
- Chen P-Y, Feng S, Joo JWJ, Jacobsen SE, Pellegrini M. A comparative analysis of DNA methylation across human embryonic stem cell lines. Genome Biol. 2011;12:R62. doi: 10.1186/gb-2011-12-7-r62.
- Li H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics. 2018;34:3094–3100. doi: 10.1093/bioinformatics/bty191.
- Liu Q, Georgieva DC, Egli D, Wang K. NanoMod: a computational tool to detect DNA modifications using Nanopore long-read sequencing data. BMC Genom. 2019;20:78. doi: 10.1186/s12864-018-5372-8.
- McIntyre ABR, et al. Single-molecule sequencing detection of N6-methyladenine in microbial reference materials. Nat. Commun. 2019;10:579. doi: 10.1038/s41467-019-08289-9.
- Oxford Nanopore Technologies. Rerio GitHub (Oxford Nanopore Technologies, 2020). .
- Breiman L. Random forests. Mach. Learn. 2001;45:5–32. doi: 10.1023/A:1010933404324.
- Pedregosa F, et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 2011;12:2825–2830.
- Crooks GE, Hon G, Chandonia J-M, Brenner SE. WebLogo: a sequence logo generator. Genome Res. 2004;14:1188–1190. doi: 10.1101/gr.849004.
- O’Shea JP, et al. pLogo: a probabilistic approach to visualizing sequence motifs. Nat. Methods. 2013;10:1211–1212. doi: 10.1038/nmeth.2646.
- Labun K, et al. CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing. Nucleic Acids Res. 2019;47:W171–W174. doi: 10.1093/nar/gkz365.
- Integrated DNA Technologies. CRISPR-Cas9 Guide RNA Design Checker (Integrated DNA Technologies, 2019). .
- Robinson JT, et al. Integrative genomics viewer. Nat. Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754.
- Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience10, giab008 (2021).
- Oxford Nanopore Technologies. Evaluation of Read-mapping Characteristics from a Cas-Mediated PCR-Free Enrichment (Oxford Nanopore Technologies, 2019). .
- R Core Team. R: a Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2020). .
Source: PubMed