Analyzing and Reanalyzing the Genome: Findings from the MedSeq Project

Kalotina Machini, Ozge Ceyhan-Birsoy, Danielle R Azzariti, Himanshu Sharma, Peter Rossetti, Lisa Mahanta, Laura Hutchinson, Heather McLaughlin, MedSeq Project, Robert C Green, Matthew Lebo, Heidi L Rehm, Kalotina Machini, Ozge Ceyhan-Birsoy, Danielle R Azzariti, Himanshu Sharma, Peter Rossetti, Lisa Mahanta, Laura Hutchinson, Heather McLaughlin, MedSeq Project, Robert C Green, Matthew Lebo, Heidi L Rehm

Abstract

Although genome sequencing is increasingly available in clinical and research settings, many questions remain about the interpretation of sequencing data. In the MedSeq Project, we explored how much effort is required to evaluate and report on more than 4,500 genes reportedly associated with monogenic conditions, as well as pharmacogenomic (PGx) markers, blood antigen serotyping, and polygenic risk scores in 100 individuals (50 with cardiomyopathy and 50 healthy) randomized to the sequencing arm. We defined the quality thresholds for determining the need for Sanger confirmation. Finally, we examined the effort needed and new findings revealed by reanalyzing each genome (6-23 months after initial analysis; mean 13 months). Monogenic disease risk and carrier status were reported in 21% and 94% of participants, respectively. Only two participants had no monogenic disease risk or carrier status identified. For the PGx results (18 genotypes in six genes for five drugs), the identified diplotypes prompted recommendation for non-standard dosing of at least one of the analyzed drugs in 95% of participants. For blood antigen studies, we found that 31% of participants had a rare blood antigen genotype. In the cardiomyopathy cohort, an explanation for disease was identified in 48% of individuals. Over the course of the study, 14 variants were reclassified and, upon reanalysis, 18 new variants met criteria for reporting. These findings highlight the quantity of medically relevant findings from a broad analysis of genomic sequencing data as well as the need for periodic reinterpretation and reanalysis of data for both diagnostic indications and secondary findings.

Keywords: MedSeq; clinical genomes; genome; genomic interpretation; reanalysis; secondary findings; sequencing.

Copyright © 2019 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

Figures

Figure 1
Figure 1
General Filtration Approach and Overall Outcome (A) Schematic representation of the filters used for rare-variant interpretation of the MedSeq genomes. HGMD = Human Gene Mutation Database; LoF = loss of function. (B) Pie chart showing the results of gene curation performed for all genes that had at least one variant that came through the MedSeq genome filter. (C) Bar graph showing the breakdown of unique variants queued for review and their sources (HGMD versus LoF). Variants were excluded either for insufficient gene-disease validity (gene disease validity) or for high MedSeq allele frequency (platform-specific frequency). Please note that several variants were excluded on the basis of both gene disease validity and platform-specific frequency. (D) Graph showing the reduction of variants that required review as a function of the number of genomes reviewed.
Figure 2
Figure 2
Framework for Predicted LoF Variant Assessment and Results of Manual Variant Curation (A) Final variant classifications with filter source noted. Blue bars represent variants from the HGMD filter, whereas orange bars illustrate novel predicted LoF variants. (B) Schematic depiction of the LoF checklist designed to assist in the rapid and accurate classification of predicted LoF variants. (C) Application of the predicted LoF checklist on novel as well as previously reported predicted LoF variants.
Figure 3
Figure 3
Summary of Genome Reanalysis Findings and Variant Reclassification (A) The middle pie chart depicts the overall findings, including the number of variants added onto the report upon genome reanalysis (shown in blue), and variants reclassified over the course of the study (shown in green). The pie chart on the left represents reportable variants newly discovered as a result of pipeline updates (light blue) or existing variants newly reported based on new evidence during variant reanalysis (blue). The pie chart on the right shows the consequences of reclassification of previously reported variants; variants removed from reports are shown in dark green, and variants with a category change only (but still reportable) are shown in light green. (B) Schematic illustration of variant reclassification categories. The middle pie chart shows the number of variants reclassified (relevant to indication in green, monogenic disease risk in red, and carrier status in blue). The pie chart on the left illustrates the various classification changes in carrier status variants, whereas the pie chart on the right focuses on variant reclassifications relative to cardiomyopathy indication. Please note that one monogenic disease variant was reclassified (from VUS-FP to N/A) and was removed from the report because new evidence disputed the gene’s association with disease.
Figure 4
Figure 4
Results of Sanger Confirmation with Scatterplot of Variants That Underwent Sanger Testing (A and B) Blue circles represent true positive (TP) indels and gray circles represent TP SNPs. Orange crosses represent false positive (FP) indels and yellow crosses represent FP SNPs. (A)The y axis corresponds to the quality by depth (QD) and the x axis to the mapping quality (MQ). The right plot is a magnification showing variants with QD 30). The six FP variants that also had QD > 4 are circled in black.

Source: PubMed

3
S'abonner