Genetic diagnosis by whole exome capture and massively parallel DNA sequencing

Murim Choi, Ute I Scholl, Weizhen Ji, Tiewen Liu, Irina R Tikhonova, Paul Zumbo, Ahmet Nayir, Ayşin Bakkaloğlu, Seza Ozen, Sami Sanjad, Carol Nelson-Williams, Anita Farhi, Shrikant Mane, Richard P Lifton, Murim Choi, Ute I Scholl, Weizhen Ji, Tiewen Liu, Irina R Tikhonova, Paul Zumbo, Ahmet Nayir, Ayşin Bakkaloğlu, Seza Ozen, Sami Sanjad, Carol Nelson-Williams, Anita Farhi, Shrikant Mane, Richard P Lifton

Abstract

Protein coding genes constitute only approximately 1% of the human genome but harbor 85% of the mutations with large effects on disease-related traits. Therefore, efficient strategies for selectively sequencing complete coding regions (i.e., "whole exome") have the potential to contribute to the understanding of rare and common human diseases. Here we report a method for whole-exome sequencing coupling Roche/NimbleGen whole exome arrays to the Illumina DNA sequencing platform. We demonstrate the ability to capture approximately 95% of the targeted coding sequences with high sensitivity and specificity for detection of homozygous and heterozygous variants. We illustrate the utility of this approach by making an unanticipated genetic diagnosis of congenital chloride diarrhea in a patient referred with a suspected diagnosis of Bartter syndrome, a renal salt-wasting disease. The molecular diagnosis was based on the finding of a homozygous missense D652N mutation at a position in SLC26A3 (the known congenital chloride diarrhea locus) that is virtually completely conserved in orthologues and paralogues from invertebrates to humans, and clinical follow-up confirmed the diagnosis. To our knowledge, whole-exome (or genome) sequencing has not previously been used to make a genetic diagnosis. Five additional patients suspected to have Bartter syndrome but who did not have mutations in known genes for this disease had homozygous deleterious mutations in SLC26A3. These results demonstrate the clinical utility of whole-exome sequencing and have implications for disease gene discovery and clinical diagnosis.

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Coverage of targeted bases, error rate, and sensitivity to detect variants in whole-exome capture data. (A) Distribution of per-base read coverage among 5 capture experiments. A small fraction of targeted bases are poorly captured across all experiments. (B) The per-base error rate in this data set is shown as a function of read position. (C) Subject GIT 264–1 was sequenced to a mean depth of 99×. The sensitivity to detect homozygous (solid line) or heterozygous (dashed line) variants as mean depth of whole-exome sequence coverage increases from 0 to 100× is shown. Sensitivity to detect heterozygous variants increases from 81% to 90% to 95% as mean coverage is increased from 20× to 30× and 40×, and plateaus at 98%. (D) Sensitivity of detection of heterozygous variants at exact per-base coverage. Sensitivity is approximately 80% at 10× coverage, and approaches 100% at or greater than 20× per-base coverage.
Fig. 2.
Fig. 2.
Kindred GIT 264. The affected subject is indicated by the arrow, and the pedigree structure demonstrates parental consanguinity. A presumably affected sister died 4 d after premature birth (gray symbol), and there were 2 other spontaneous abortions (small triangles, disease status unknown). The parents and other relatives were free from clinical syndromes like that seen in the index case. For simplification, the mother's 10 siblings and the father's 8 siblings are not depicted in the pedigree.
Fig. 3.
Fig. 3.
Homozygous missense mutation at highly conserved position in SLC26A3 in GIT 264–1. (A) Top: Reference sequence of aa 636–668 of SLC26A3 and the corresponding DNA sequences are shown. Below: Independent Illumina DNA sequence reads from GIT 264–1 are shown. Forward and reverse reads are shown in capital and lowercase letters, respectively. The results demonstrate a homozygous missense mutation, D652N. (B) Sanger sequence of codons 651–653 of SLC26A3 in a WT subject and GIT 264–1 are shown and confirm the D652N mutation. The mutated residue is indicated by an asterisk, and the encoded amino acids are shown in single-letter code. (C) Conservation of D652 across species. The amino acid sequence of segment 648–656 of human SLC26A3 is shown and compared to the corresponding sequence of 6 (of 39) vertebrates and identified in D. melanogaster paralogue and C. elegans orthologue. Positions identical to Homo sapiens (H.s.) are highlighted in yellow. D652 is completely conserved among all species examined. M.m., Mus musculus; O.c., Oryctolagus cuniculus; B.t., Bos taurus; G.g., Gallus gallus; X.l., Xenopus laevis; D.r., Danio rerio; D.m., D. melanogaster; C.e., C. elegans. (D) The sequence of the same segment of human SLC26A3 is compared to 9 paralogues of the human SLC26A gene family (SLC26A1-A11; SLC26A10 is a pseudogene). (E) Structure of SLC26A3. The protein has 12 transmembrane domains and a C-terminal STAS domain (highlighted in blue). The D652N mutation (red circle) lies in the STAS domain.
Fig. 4.
Fig. 4.
Additional patients with SLC26A3 mutations. Sanger sequence traces of WT and 5 subjects with homozygous mutations in SLC26A3 are shown. These include two subjects with premature termination at codon 187 (A and B); a frameshift at codon 454 resulting from a single base deletion that leads to termination at codon 458 (C); a second patient with the D652N mutation (D); and a patient with a Y520C mutation (E). (F) Conservation of Y520 across species: Y520 is conserved among all orthologues and paralogues examined.

Source: PubMed

3
Abonnieren