Nondestructive enzymatic deamination enables single-molecule long-read amplicon sequencing for the determination of 5-methylcytosine and 5-hydroxymethylcytosine at single-base resolution

Zhiyi Sun, Romualdas Vaisvila, Laura-Madison Hussong, Bo Yan, Chloé Baum, Lana Saleh, Mala Samaranayake, Shengxi Guan, Nan Dai, Ivan R Corrêa Jr, Sriharsa Pradhan, Theodore B Davis, Thomas C Evans Jr, Laurence M Ettwiller, Zhiyi Sun, Romualdas Vaisvila, Laura-Madison Hussong, Bo Yan, Chloé Baum, Lana Saleh, Mala Samaranayake, Shengxi Guan, Nan Dai, Ivan R Corrêa Jr, Sriharsa Pradhan, Theodore B Davis, Thomas C Evans Jr, Laurence M Ettwiller

Abstract

The predominant methodology for DNA methylation analysis relies on the chemical deamination by sodium bisulfite of unmodified cytosine to uracil to permit the differential readout of methylated cytosines. Bisulfite treatment damages the DNA, leading to fragmentation and loss of long-range methylation information. To overcome this limitation of bisulfite-treated DNA, we applied a new enzymatic deamination approach, termed enzymatic methyl-seq (EM-seq), to long-range sequencing technologies. Our methodology, named long-read enzymatic modification sequencing (LR-EM-seq), preserves the integrity of DNA, allowing long-range methylation profiling of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) over multikilobase length of genomic DNA. When applied to known differentially methylated regions (DMRs), LR-EM-seq achieves phasing of >5 kb, resulting in broader and better defined DMRs compared with that previously reported. This result showed the importance of phasing methylation for biologically relevant questions and the applicability of LR-EM-seq for long-range epigenetic analysis at single-molecule and single-nucleotide resolution.

© 2021 Sun et al.; Published by Cold Spring Harbor Laboratory Press.

Figures

Figure 1.
Figure 1.
5mC and 5hmC detection by enzymatic deamination method. (A) Principle of the methodology: genomic DNA can either be treated with TET2 and BGT (left) to protect both 5mC and 5hmC or with BGT alone (right) to protect 5hmC. Subsequent deamination by APOBEC3A followed by PCR amplification allows the distinction between the unprotected substrate (read as T) from the protected cytosine derivatives (read as C). The TET2 and BGT treatment results in the distinction of 5mC and 5hmC from C, whereas BGT treatment results in the distinction of 5hmC from C and 5mC. (B) Deaminated cytosines from unmethylated lambda genome display no observable sequence preference by APOBEC(5mC) deamination method. (C) False-positive methylation calling rate (nonconversion error rate) of each cytosine dinucleotides sequence context (CpA [CA], CpC [CC], CpG [CG], and CpT [CT]) estimated from the unmethylated lambda genome for the enzymatic deamination method (APOBEC(5mC), two WGBS performed in this study, i.e., BS kit 1 and BS kit 2, and six published WGBS experiments sampled from the ENCODE Project) (Supplemental Table S3). (D) Deamination of 5mC in the fully methylated XP12 genome results in no observable sequence preference by APOBEC(5hmC) enzymatic deamination method. (E) Distribution patterns of 5mCpG (blue) and 5hmCpG (red: 50 ng library; pink: 1 ng library) at various protein/DNA interaction sites. The absolute (smooth lines) and normalized (dotted lines) 5hmC and 5mC levels in the CpG context are depicted around TET1, RNA polymerase II, and CTCF binding sites, as well as at active transcription chromatin mark (H3K4me3), repressive chromatin mark (H3K27me3), active enhancer mark (H3K27ac), and general enhancer (H3K4me1 in the absence of H3K4me3) regions. Unbound sites that are randomly sampled from the reference genome server as a control. (F) Pearson's correlation between 5hmC measured using sequencing of enzymatically deaminated DNA (x-axis) versus LC-MS (y-axis) for various genomic DNA. There are two technical replicates of the APOBEC(5hmC) sequencing method for each sample. 5hmC levels are presented as 1000 percentage, and both axes use the log scale.
Figure 2.
Figure 2.
Enzymatic deamination preserves the integrity of the DNA. (A) qPCR results show the quantities of undamaged amplifiable DNA templates of different sizes after the enzymatic deamination (green) and bisulfite treatments (orange and blue). All quantifications are normalized to the values obtained for the enzymatic deamination experiments. (B) Agilent 2100 Bioanalyzer trace on RNA 6000 pico chip comparing equal amounts of mouse E14 genomic DNA sheared to an average of 15 kb and treated with sodium bisulfite (green), APOBEC(5hmC) (red), or APOBEC(5mC) (blue) over the control ssDNA (cyan). Bisulfite treatment fragmented the DNA to an average of 800 bp, whereas enzymatically treated DNA shows no notable size differences compared with control DNA. (C) Agarose gel images of end-point PCR of six amplicons ranging from 388–4229 bp illustrating upper amplicon size limit for sodium-bisulfite-, APOBEC(5mC)-, or APOBEC(5hmC)-treated E14 genomic DNA. (D) The 731-bp amplicons from the E14 genomic DNA shown in C were cloned and sequenced, and the methylation status was determined by bisulfite treatment (left), the enzymatic deamination method for 5mC (center), and the enzymatic deamination method for 5hmC (right) (Supplemental Data S1). Open and closed circles indicate unmethylated and methylated CpG sites, respectively.
Figure 3.
Figure 3.
5mC and 5hmC phasing using long-read sequencing. (A) Scatter plots and Pearson's correlations of calculated methylation (top) and hydroxymethylation (bottom) levels of all CpG sites within the 5378-bp region from the mouse E14 genome between the three sequencing platforms: PacBio, Nanopore, and Illumina. (B) Dot plots showing methylation (left) and hydroxymethylation (right) levels of individual CpG sites within the 5378-bp region calculated by the LR-EM-seq method using three major sequencing platforms: Illumina (red), Nanopore (green), and PacBio (blue). The fitted lines are drawn using the LOESS method. (C) Single-base single-molecule cytosine modification maps of the 5378-bp region generated by the LR-EM-seq method coupled with PacBio SMRT sequencing (top) and Nanopore sequencing (bottom). Methylated (left) and hydroxymethylated CpG sites are depicted by red dots, and unmodified CpG sites are depicted by beige dots.
Figure 4.
Figure 4.
Phasing of 5mC and 5hmC by LR-EM-seq. (A) Single-base, single-molecule CpG methylation (middle) and hydroxymethylation (bottom) profile of a 4.6-kb region of the imprinted Inpp5f_v2 gene locus (top) in the mouse brain. Red dots represent modified sites, and beige dots represent unmodified sites. This region overlaps with the promoter for the Inpp5f_v2 gene and contains a previously reported DMR (orange box). The shaded area in the dot plots corresponds to the known DMR. (B) Correlation matrix of CpG modification state: (top) 5mC; (bottom) 5hmC. Each location on the matrix represents the correlation of any two CpG sites across the amplicon and the correlation strength is depicted by color: red indicates correlation=1; blue, correlation=−1; white, no correlation. The known DMR is indicated by a black outline.
Figure 5.
Figure 5.
Phasing of 5mC with heterozygous variants using LR-EM-seq. (A) Phasing of 5mC with SNP of a 3.1-kb region in the imprinted Inpp5f_v2 gene promoter of the mouse cortex brain from a F1 offspring of a cross between two inbred mouse strains (129X1/SvJ male and Cast/EiJ female). Methylation state of individual CpG sites at the single-molecule level is denoted by either a beige dot (unmodified) or a red dot (methylated). The heterozygous SNP near the 5′ end of the region was either highlighted in red for paternal allele (A) or blue for maternal allele (G). The orange boxes denote previously identified DMRs. Our result not only confirmed the existence of the imprinted DMR but also revealed much extended boundaries of the imprinted DMR. (B) Phasing of 5mC and SNP in the imprinted promoter of the Gnas gene in the mouse cortex from a cross between the inbred mouse strains 129X1/SvJ (male) and Cast/EiJ (female). Methylation state of individual CpG sites at single-molecule level is denoted by either a beige dot (unmodified) or a red dot (methylated). The heterozygous SNP was highlighted in red for paternal allele (A) and blue for maternal allele (G). The orange box denotes a previously identified DMR. Our result confirmed the existence of the imprinted DMR and further extended this DMR in both directions particularly into the CpG island.

References

    1. Choi JD, Underkoffler LA, Wood AJ, Collins JN, Williams PT, Golden JA, Schuster EF, Loomes KM, Oakey RJ. 2005. A novel variant of Inpp5f is imprinted in brain, and its expression is correlated with differential methylation of an internal CpG island. Mol Cell Biol 25: 5514–5522. 10.1128/MCB.25.13.5514-5522.2005
    1. Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, Korlach J, Turner SW. 2010. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7: 461–465. 10.1038/nmeth.1459
    1. Frommer M, McDonald LE, Millar DS, Collis CM, Watt F, Grigg GW, Molloy PL, Paul CL. 1992. A genomic sequencing protocol that yields a positive display of 5-methylcytosine residues in individual DNA strands. Proc Natl Acad Sci 89: 1827–1831. 10.1073/pnas.89.5.1827
    1. Giesselmann P, Brändl B, Raimondeau E, Bowen R, Rohrandt C, Tandon R, Kretzmer H, Assum G, Galonska C, Siebert R, et al. 2019. Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing. Nat Biotechnol 37: 1478–1481. 10.1038/s41587-019-0293-x
    1. Gilpatrick T, Lee I, Graham JE, Raimondeau E, Bowen R, Heron A, Downs B, Sukumar S, Sedlazeck FJ, Timp W. 2020. Targeted nanopore sequencing with Cas9-guided adapter ligation. Nat Biotechnol 38: 433–438. 10.1038/s41587-020-0407-5
    1. Globisch D, Münzel M, Müller M, Michalakis S, Wagner M, Koch S, Brückl T, Biel M, Carell T. 2010. Tissue distribution of 5-hydroxymethylcytosine and search for active demethylation intermediates. PLoS One 5: e15367 10.1371/journal.pone.0015367
    1. Grunau C, Clark SJ, Rosenthal A. 2001. Bisulfite genomic sequencing: systematic investigation of critical experimental parameters. Nucleic Acids Res 29: 65e 10.1093/nar/29.13.e65
    1. Guo JU, Su Y, Shin JH, Shin J, Li H, Xie B, Zhong C, Hu S, Le T, Fan G, et al. 2014. Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain. Nat Neurosci 17: 215–222. 10.1038/nn.3607
    1. Hafford-Tear NJ, Tsai YC, Sadan AN, Sanchez-Pintado B, Zarouchlioti C, Maher GJ, Liskova P, Tuft SJ, Hardcastle AJ, Clark TA, et al. 2019. CRISPR/Cas9-targeted enrichment and long-read sequencing of the Fuchs endothelial corneal dystrophy–associated TCF4 triplet repeat. Genet Med 21: 2092–2102. 10.1038/s41436-019-0453-x
    1. Ito S, Shen L, Dai Q, Wu SC, Collins LB, Swenberg Ja, He C, Zhang Y. 2011. Tet proteins can convert 5-methylcytosine to 5-formylcytosine and 5-carboxylcytosine. Science 333: 1300–1303. 10.1126/science.1210597
    1. Jain M, Olsen HE, Turner DJ, Stoddart D, Bulazel KV, Paten B, Haussler D, Willard HF, Akeson M, Miga KH. 2018. Linear assembly of a human centromere on the Y chromosome. Nat Biotechnol 36: 321–323. 10.1038/nbt.4109
    1. Kinney SM, Chin HG, Vaisvila R, Bitinaite J, Zheng Y, Estève PO, Feng S, Stroud H, Jacobsen SE, Pradhan S. 2011. Tissue-specific distribution and dynamic changes of 5-hydroxymethylcytosine in mammalian genomes. J Biol Chem 286: 24685–24693. 10.1074/jbc.M110.217083
    1. Krueger F, Andrews SR. 2011. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27: 1571–1572. 10.1093/bioinformatics/btr167
    1. Kurdyukov S, Bullock M. 2016. DNA methylation analysis: choosing the right method. Biology (Basel) 5: 3 10.3390/biology5010003
    1. Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat Methods 9: 357–359. 10.1038/nmeth.1923
    1. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079. 10.1093/bioinformatics/btp352
    1. Lister R, Pelizzola M, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, Lee L, Ye Z, Ngo Q-M, et al. 2009. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462: 315–322. 10.1038/nature08514
    1. Lister R, Mukamel EA, Nery JR, Urich M, Puddifoot CA, Johnson ND, Lucero J, Huang Y, Dwork AJ, Schultz MD, et al. 2013. Global epigenomic reconfiguration during mammalian brain development. Science 341: 1237905 10.1126/science.1237905
    1. Liu Y, Siejka-Zielińska P, Velikova G, Bi Y, Yuan F, Tomkova M, Bai C, Chen L, Schuster-Böckler B, Song C-X. 2019. Bisulfite-free direct detection of 5-methylcytosine and 5-hydroxymethylcytosine at base resolution. Nat Biotechnol 37: 424.–. 10.1038/s41587-019-0041-2
    1. Liu Y, Cheng J, Siejka-Zielińska P, Weldon C, Roberts H, Lopopolo M, Magri A, D'Arienzo V, Harris JM, McKeating JA, et al. 2020. Accurate targeted long-read DNA methylation and hydroxymethylation sequencing with TAPS. Genome Biol 21: 54 10.1186/s13059-020-01969-6
    1. Pollard MO, Gurdasani D, Mentzer AJ, Porter T, Sandhu MS. 2018. Long reads: their purpose and place. Hum Mol Genet 27: R234–R241. 10.1093/hmg/ddy177
    1. Rand AC, Jain M, Eizenga JM, Musselman-Brown A, Olsen HE, Akeson M, Paten B. 2017. Mapping DNA methylation with high-throughput nanopore sequencing. Nat Methods 14: 411–413. 10.1038/nmeth.4189
    1. R Core Team. 2017. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna: .
    1. Salter JD, Bennett RP, Smith HC. 2016. The APOBEC protein family: united by structure, divergent in function. Trends Biochem Sci 41: 578–594. 10.1016/j.tibs.2016.05.001
    1. Sambrook HC. 1989. Molecular cloning: a laboratory manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY: [accessed September 10, 2020].
    1. Schutsky EK, DeNizio JE, Hu P, Liu MY, Nabel CS, Fabyanic EB, Hwang Y, Bushman FD, Wu H, Kohli RM. 2018. Nondestructive, base-resolution sequencing of 5-hydroxymethylcytosine using a DNA deaminase. Nat Biotechnol 36: 1083–1090. 10.1038/nbt.4204
    1. Simpson JT, Workman RE, Zuzarte PC, David M, Dursi LJ, Timp W. 2017. Detecting DNA cytosine methylation using nanopore sequencing. Nat Methods 14: 407–410. 10.1038/nmeth.4184
    1. Sun Z, Dai N, Borgaro JG, Quimby A, Sun D, Corrêa IR, Zheng Y, Zhu Z, Guan S. 2015. A sensitive approach to map genome-wide 5-hydroxymethylcytosine and 5-formylcytosine at single-base resolution. Mol Cell 57: 750–761. 10.1016/j.molcel.2014.12.035
    1. Xie W, Barr CL, Kim A, Yue F, Lee AY, Eubanks J, Dempster EL, Ren B. 2012. Base-resolution analyses of sequence and parent-of-origin dependent DNA methylation in the mouse genome. Cell 148: 816–831. 10.1016/j.cell.2011.12.035
    1. Yang Y, Sebra R, Pullman BS, Qiao W, Peter I, Desnick RJRJ, Geyer CR, DeCoteau JF, Scott SASA. 2015. Quantitative and multiplexed DNA methylation analysis using long-read single-molecule real-time bisulfite sequencing (SMRT-BS). BMC Genomics 16: 350 10.1186/s12864-015-1572-7
    1. Yu M, Hon GCC, Szulwach KEE, Song C-X, Zhang L, Kim A, Li X, Dai Q, Shen Y, Park B, et al. 2012. Base-resolution analysis of 5-hydroxymethylcytosine in the mammalian genome. Cell 149: 1368–1380. 10.1016/j.cell.2012.04.027

Source: PubMed

3
Subscribe