Sequencing the CYP2D6 gene: from variant allele discovery to clinical pharmacogenetic testing

Yao Yang, Mariana R Botton, Erick R Scott, Stuart A Scott, Yao Yang, Mariana R Botton, Erick R Scott, Stuart A Scott

Abstract

CYP2D6 is one of the most studied enzymes in the field of pharmacogenetics. The CYP2D6 gene is highly polymorphic with over 100 catalogued star (*) alleles, and clinical CYP2D6 testing is increasingly accessible and supported by practice guidelines. However, the degree of variation at the CYP2D6 locus and homology with its pseudogenes make interrogating CYP2D6 by short-read sequencing challenging. Moreover, accurate prediction of CYP2D6 metabolizer status necessitates analysis of duplicated alleles when an increased copy number is detected. These challenges have recently been overcome by long-read CYP2D6 sequencing; however, such platforms are not widely available. This review highlights the genomic complexities of CYP2D6, current sequencing methods and the evolution of CYP2D6 from allele discovery to clinical pharmacogenetic testing.

Keywords: CYP2D6; CYP450-2D6; Sanger sequencing; genotyping; long-read sequencing; pharmacogenetics; pharmacogenomics; short-read sequencing.

Conflict of interest statement

Financial and competing interests disclosure

This work was supported in part by the National Institute of General Medical Sciences (NIGMS) of the NIH, through grant K23GM104401 (SA Scott). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed.

No writing assistance was utilized in the production of this manuscript.

Figures

Figure 1. . Hive panel displaying multiple…
Figure 1.. Hive panel displaying multiple sequence alignment of CYP2D6, CYP2D7 and CYP2D8.
The hive plot edges display sequence similarity between CYP2D6-CYP2D7, CYP2D7-CYP2D8 and CYP2D8-CYP2D6 (clockwise from top). Three principle axes (0, 120 and 240°) of the hive plots represent the nucleotide composition of the multiple sequence alignment for the indicated gene: (A) exonic sequences (intronic sequences shown in black), (B) intronic sequences (exonic sequences shown in black) and (C) exonic and intronic sequences. ClustalW was used to align the three genes (plus flanking 300 bp for each gene). Blue: aligned sequence is identical across the three genes; orange: aligned sequences are identical between the labeled genes; white: sequence gaps created by inserted nucleotides unique to the principle axis colored white.
Figure 2. . Illustration of the CYP2D6…
Figure 2.. Illustration of the CYP2D6 genomic region highlighting GC content across ten nucleotide-sliding windows of three functional CYP2D6 transcript isoforms (CYP2D6 loci: chr22, 42522500–42526907, GENCODE v19, hg19).
Median read coverage of ExAC exomes; base pair resolution of three functional CYP2D6 transcript isoforms; and base pair resolution of the intersection with 'reliable genome' intervals. ExAc: Exome Aggregation Consortium. Data taken from [29].
Figure 3. . Gene diagram of CYP2D6…
Figure 3.. Gene diagram of CYP2D6 (and chromosome cytoband location) highlighting the location of variant star (*) alleles that are commonly included in targeted genotyping assays, including the deletion allele (*5).
Note that variants are denoted by their common nucleotide nomenclature from M33388.1 GenBank reference sequence.
Figure 4. . Paired-end short-read sequencing (Illumina)…
Figure 4.. Paired-end short-read sequencing (Illumina) and long-read sequencing (Pacific Biosciences) of the CYP2D6 gene region visualized with the Integrative Genomics Viewer.
Results for NA12878 (CYP2D6*3/*4) are displayed from top to bottom panels for WGS from the 1000 Genomes Project, in-house WGS, WES, targeted capture with the PGRNseq platform, targeted PacBio CYP2D6 sequencing, and ALEC-corrected targeted PacBio CYP2D6 sequencing. Of note, discrepant and skewed allele frequencies in several loci from the WGS data indicate potential read misalignment errors. Moreover, the common CYP2D6 capture strategies (e.g., WES, PGRNseq) coupled with short-read Illumina sequencing result in significant read assignment to the CYP2D7 and CYP2D8 pseudogenes. These reads indicate a lack of specificity for CYP2D6 by these target enrichment approaches and/or informatic errors related to read misalignment. Targeted PacBio sequencing results in CYP2D6-specific sequencing and no misalignment to CYP2D7 or CYP2D8, but random errors throughout the sequencing reads are characteristic to this technology. These random errors can be minimized by circular consensus sequencing read analysis; however, further correction prior to variant calling can also be accomplished by available informatics tools (e.g., Amplicon Long-read Error Correction [ALEC]). 1KG: 1000 Genomes Project; PacBio: Pacific Biosciences; WES: Whole-exome sequencing; WGS: Whole-genome sequencing. Data taken from [26,45,54].

Source: PubMed

3
Abonnere