An informatics approach to analyzing the incidentalome

Jonathan S Berg, Michael Adams, Nassib Nassar, Chris Bizon, Kristy Lee, Charles P Schmitt, Kirk C Wilhelmsen, James P Evans, Jonathan S Berg, Michael Adams, Nassib Nassar, Chris Bizon, Kristy Lee, Charles P Schmitt, Kirk C Wilhelmsen, James P Evans

Abstract

Purpose: Next-generation sequencing has transformed genetic research and is poised to revolutionize clinical diagnosis. However, the vast amount of data and inevitable discovery of incidental findings require novel analytic approaches. We therefore implemented for the first time a strategy that utilizes an a priori structured framework and a conservative threshold for selecting clinically relevant incidental findings.

Methods: We categorized 2,016 genes linked with Mendelian diseases into "bins" based on clinical utility and validity, and used a computational algorithm to analyze 80 whole-genome sequences in order to explore the use of such an approach in a simulated real-world setting.

Results: The algorithm effectively reduced the number of variants requiring human review and identified incidental variants with likely clinical relevance. Incorporation of the Human Gene Mutation Database improved the yield for missense mutations but also revealed that a substantial proportion of purported disease-causing mutations were misleading.

Conclusion: This approach is adaptable to any clinically relevant bin structure, scalable to the demands of a clinical laboratory workflow, and flexible with respect to advances in genomics. We anticipate that application of this strategy will facilitate pretest informed consent, laboratory analysis, and posttest return of results in a clinical context.

Figures

FIGURE 1. Selection of variants based on…
FIGURE 1. Selection of variants based on allele frequency and predicted effect on the translated protein
(A) The initial informatics analysis resulted in an average of ~13,000 variants per person in Bin 1 genes, ~175,000 variants per person in Bin 2b genes, and ~9000 variants per person in Bin 2c genes. (B) Limiting these variants to

FIGURE 2. Analysis of mutations annotated as…

FIGURE 2. Analysis of mutations annotated as “DM” in HGMD

(A) All variants were queried…

FIGURE 2. Analysis of mutations annotated as “DM” in HGMD
(A) All variants were queried against the HGMD to identify variants classified as “DM”. The numbers of rare (th and 95th percentiles and outliers shown as filled circles. Homozygous variants are counted twice. (B) The overlap between the rare missense variants and “DM” variants is depicted as a Venn diagram. (C) The maximum 1000 Genomes allele frequencies were determined for each variant identified in the 80 whole genomes and histograms of allele frequencies were generated for each person. These histograms were then combined to depict the average number of variants per person within each range of allele frequencies (depicted as a bar plot with standard deviations). (D) The maximum 1000 Genomes allele frequencies were determined for all “DM” variants in HGMD and graphed as a histogram. The inset shows the distribution for “DM” variants with >1% allele frequencies.

FIGURE 3. Results of the manual review…

FIGURE 3. Results of the manual review of variants selected by the informatics algorithm

After…

FIGURE 3. Results of the manual review of variants selected by the informatics algorithm
After individual review of the 906 unique variants returned by the final informatics algorithm, 45% were reassigned or removed from consideration. The graphs depict the variants initially selected within a given “bin” and the stacked segments represent the proportions of those variants that were confirmed, reassigned, or removed after review (see legend). Figure (A) shows all 906 unique variants, (B) shows the 392 rare truncating variants identified by the algorithm and (C) shows the 514 rare “DM” variants from HGMD. A higher proportion of “DM” variants in each bin category were removed from consideration compared to novel truncating variants.
FIGURE 2. Analysis of mutations annotated as…
FIGURE 2. Analysis of mutations annotated as “DM” in HGMD
(A) All variants were queried against the HGMD to identify variants classified as “DM”. The numbers of rare (th and 95th percentiles and outliers shown as filled circles. Homozygous variants are counted twice. (B) The overlap between the rare missense variants and “DM” variants is depicted as a Venn diagram. (C) The maximum 1000 Genomes allele frequencies were determined for each variant identified in the 80 whole genomes and histograms of allele frequencies were generated for each person. These histograms were then combined to depict the average number of variants per person within each range of allele frequencies (depicted as a bar plot with standard deviations). (D) The maximum 1000 Genomes allele frequencies were determined for all “DM” variants in HGMD and graphed as a histogram. The inset shows the distribution for “DM” variants with >1% allele frequencies.
FIGURE 3. Results of the manual review…
FIGURE 3. Results of the manual review of variants selected by the informatics algorithm
After individual review of the 906 unique variants returned by the final informatics algorithm, 45% were reassigned or removed from consideration. The graphs depict the variants initially selected within a given “bin” and the stacked segments represent the proportions of those variants that were confirmed, reassigned, or removed after review (see legend). Figure (A) shows all 906 unique variants, (B) shows the 392 rare truncating variants identified by the algorithm and (C) shows the 514 rare “DM” variants from HGMD. A higher proportion of “DM” variants in each bin category were removed from consideration compared to novel truncating variants.

Source: PubMed

3
Abonner