Targeted sequencing of large genomic regions with CATCH-Seq

Kenneth Day, Jun Song, Devin Absher, Kenneth Day, Jun Song, Devin Absher

Abstract

Current target enrichment systems for large-scale next-generation sequencing typically require synthetic oligonucleotides used as capture reagents to isolate sequences of interest. The majority of target enrichment reagents are focused on gene coding regions or promoters en masse. Here we introduce development of a customizable targeted capture system using biotinylated RNA probe baits transcribed from sheared bacterial artificial chromosome clone templates that enables capture of large, contiguous blocks of the genome for sequencing applications. This clone adapted template capture hybridization sequencing (CATCH-Seq) procedure can be used to capture both coding and non-coding regions of a gene, and resolve the boundaries of copy number variations within a genomic target site. Furthermore, libraries constructed with methylated adapters prior to solution hybridization also enable targeted bisulfite sequencing. We applied CATCH-Seq to diverse targets ranging in size from 125 kb to 3.5 Mb. Our approach provides a simple and cost effective alternative to other capture platforms because of template-based, enzymatic probe synthesis and the lack of oligonucleotide design costs. Given its similarity in procedure, CATCH-Seq can also be performed in parallel with commercial systems.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1. Overview of the clone adapted…
Figure 1. Overview of the clone adapted template capture hybridization sequencing procedure.
BAC clone templates are selected to span genomic coordinates of interest, and pooled by percent mass of the composite target. BACs are sheared, ligated with T7 adapters to transcribe biotinylated RNA probes, and then solution hybridized with prepared libraries. Following capture, libraries are amplified by PCR, or bisulfite converted prior to amplification for analysis of DNA methylation. Target enriched libraries are pooled and sequenced.
Figure 2. Read depth plot of a…
Figure 2. Read depth plot of a chromosome 11 target for a sample showing median coverage among all samples used for capture.
Vertical bars indicate read depth with scale depicted on the left side of the panel. Red lines show percent GC content across non-overlapping 400 bp intervals spanning the target region with scale shown on the right side of the panel. Horizontal dotted line indicates 50% GC content. A repeat structure track (RepMask) is shown below the plot in gray derived from the UCSC genome browser for all repeats containing a Smith-Waterman score of at least 600, and larger than 200 bp in size. Genes are shown below the repeat track in dark blue and arrows depict gene orientation.
Figure 3. Capture efficiency in a sample…
Figure 3. Capture efficiency in a sample representing the median coverage among all sequenced samples shown by the percent of total targeted bases covered at particular coverage depths in a chromosome 11 target.
(A) Percent of targeted bases covered using various thresholds of repeat masking (A) by size, or (B) (SW) scores. (C) Percent of targeted bases covered based on masking of percent GC content extremes. Upper panels show coverage by CATCH-Seq within a sample that showed median coverage among all other samples used in the capture. (D–F) Lower panels show coverage within the corresponding captured region for the same number of merged reads analyzed for CATCH-Seq under the same repeat masking or percent GC content thresholds from 15 individuals sequenced for the 1000 genomes project (merged WGS).
Figure 4. Read depth plot of a…
Figure 4. Read depth plot of a chromosome 11 target for a sample showing median coverage among all samples used for capture and bisulfite sequencing.
Vertical bars indicate read depth with scale depicted on the left side of the panel. Red lines show percent GC content across non-overlapping 400 bp intervals spanning the target region with scale shown on the right side of the panel. Horizontal dotted line indicates 50% GC content. A repeat structure track (RepMask) is shown below the plot in gray derived from the UCSC genome browser for all repeats containing a Smith-Waterman score of at least 600, and larger than 200 bp in size. Genes are shown below the repeat track in dark blue and arrows depict gene orientation.
Figure 5. The effect of repeat blocking…
Figure 5. The effect of repeat blocking with increased concentrations of Cot-1 DNA within the CATCH-Seq hybridization step of a chromosome 11 target.
Total numbers of on target and off target read yields in millions within non-repetitive sequences (A) or repetitive sequences (B). (C–H) On and off target read yields within repeat structures based on different thresholds of size (C,E,G) or divergence (D,F,H). Green and gray lines show on target and off target reads, respectively.
Figure 6. Determination of copy number variation…
Figure 6. Determination of copy number variation across a CATCH-Seq target using read depth.
(A) Read depths are partitioned into 100 bp segments across the length of target genomic coordinates and the fraction of total aligned bases per segment are calculated. In this target, there is a noticeable drop in read depth in two individuals shown in bottom panels compared to wild type (+/+) that indicates individuals that contain heterozygous (+/−) and homozygous (−/−) deletions in this region. (B) Log-ratio values (logR) are calculated across the target site that are normalized for read depth variance caused by capture and sequencer biases to resolve clear copy number variation boundaries. Contained within the deleted region is a repeat sequence as shown by underlying RepeatMasker track (RepMask) that is not well covered. Coverage of this repeat structure is reflected in the logR plot as a slight fluctuation from zero as indicated by the horizontal green lines. For targets containing a copy number variation that represents a large proportion of the total target sequence such as the one depicted here, often the individual base fraction normalization by the median of control samples will result in slightly elevated logR values outside the variable region that is most noticeable in the individual containing the homozygous deletion in the bottom panel. The extent of the BAC template used for CATCH-Seq is depicted just below the RepMask track.
Figure 7. High density methylation data derived…
Figure 7. High density methylation data derived from bisulfite sequencing of a CATCH-Seq target.
Scale of the captured region is indicated in the topmost track in kilobases (kb), followed by repeat structure in gray and black (RepMask), genes shown in blue (RefSeq), and CpG islands in green. Four CATCH-Seq tracks from the same cell type show DNA methylation levels across ∼2,700 target CpGs with hypomethylation depicted in green and hypermethylation in red. Six reduced representation bisulfite sequencing (RRBS) tracks for different cell and tissue types correspond with the same captured region, and demonstrate CpGs not covered by RRBS method compared to CATCH-Seq. The four CATCH-Seq tracks are from the same cell type as the topmost RRBS track. RRBS tracks are derived from previously reported data . CpGs shown within CpG islands were all typically hypomethylated across all cell and tissue types depicted.

References

    1. Borate U, Absher D, Erba HP, Pasche B (2012) Potential of whole-genome sequencing for determining risk and personalizing therapy: focus on AML. Expert Rev Anticancer Ther 12: 1289–1297.
    1. Whitcomb DC (2012) What is personalized medicine and what should it replace? Nat Rev Gastroenterol Hepatol 9: 418–424.
    1. Mertes F, Elsharawy A, Sauer S, van Helvoort JM, van der Zaag PJ, et al. (2011) Targeted enrichment of genomic DNA regions for next-generation sequencing. Brief Funct Genomics 10: 374–386.
    1. Goh G, Choi M (2012) Application of whole exome sequencing to identify disease-causing variants in inherited human diseases. Genomics Inform 10: 214–219.
    1. Gnirke A, Melnikov A, Maguire J, Rogov P, LeProust EM, et al. (2009) Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nat Biotechnol 27: 182–189.
    1. Clark MJ, Chen R, Lam HY, Karczewski KJ, Euskirchen G, et al. (2011) Performance comparison of exome DNA sequencing technologies. Nat Biotechnol 29: 908–914.
    1. Irizarry RA, Ladd-Acosta C, Wen B, Wu Z, Montano C, et al. (2009) The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41: 178–186.
    1. Heyn H, Esteller M (2012) DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet 13: 679–692.
    1. Kaper F, Swamy S, Klotzle B, Munchel S, Cottrell J, et al. (2013) Whole-genome haplotyping by dilution, amplification, and sequencing. Proc Natl Acad Sci U S A 110: 5552–5557.
    1. Porreca GJ, Zhang K, Li JB, Xie B, Austin D, et al. (2007) Multiplex amplification of large sets of human exons. Nat Methods 4: 931–936.
    1. Johansson H, Isaksson M, Sorqvist EF, Roos F, Stenberg J, et al. (2011) Targeted resequencing of candidate genes using selector probes. Nucleic Acids Res 39: e8.
    1. Diep D, Plongthongkum N, Gore A, Fung HL, Shoemaker R, et al. (2012) Library-free methylation sequencing with bisulfite padlock probes. Nat Methods 9: 270–272.
    1. Varley KE, Mitra RD (2008) Nested Patch PCR enables highly multiplexed mutation discovery in candidate genes. Genome Res 18: 1844–1850.
    1. DeAngelis MM, Wang DG, Hawkins TL (1995) Solid-phase reversible immobilization for the isolation of PCR products. Nucleic Acids Res 23: 4742–4743.
    1. Lindgreen S (2012) AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC Res Notes 5: 337.
    1. Bashiardes S, Veile R, Helms C, Mardis ER, Bowcock AM, et al. (2005) Direct genomic selection. Nat Methods 2: 63–69.
    1. Yigit E, Zhang Q, Xi L, Grilley D, Widom J, et al. (2013) High-resolution nucleosome mapping of targeted regions using BAC-based enrichment. Nucleic Acids Res 41: e87.
    1. Carpenter ML, Buenrostro JD, Valdiosera C, Schroeder H, Allentoft ME, et al. (2013) Pulling out the 1%: Whole-Genome Capture for the Targeted Enrichment of Ancient DNA Sequencing Libraries. The American Journal of Human Genetics 93: 852–864.
    1. Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, et al. (2010) Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464: 713–720.
    1. Henrichsen CN, Chaignat E, Reymond A (2009) Copy number variants, diseases and gene expression. Hum Mol Genet 18: R1–8.
    1. Winchester L, Yau C, Ragoussis J (2009) Comparing CNV detection methods for SNP arrays. Brief Funct Genomic Proteomic 8: 353–366.
    1. Schneider A, David VA, Johnson WE, O'Brien SJ, Barsh GS, et al. (2012) How the leopard hides its spots: ASIP mutations and melanism in wild cats. PLoS One 7: e50386.
    1. Meissner A, Mikkelsen TS, Gu H, Wernig M, Hanna J, et al. (2008) Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454: 766–770.

Source: PubMed

3
Abonneren