Genetic diagnosis of Mendelian disorders via RNA sequencing

Laura S Kremer, Daniel M Bader, Christian Mertes, Robert Kopajtich, Garwin Pichler, Arcangela Iuso, Tobias B Haack, Elisabeth Graf, Thomas Schwarzmayr, Caterina Terrile, Eliška Koňaříková, Birgit Repp, Gabi Kastenmüller, Jerzy Adamski, Peter Lichtner, Christoph Leonhardt, Benoit Funalot, Alice Donati, Valeria Tiranti, Anne Lombes, Claude Jardel, Dieter Gläser, Robert W Taylor, Daniele Ghezzi, Johannes A Mayr, Agnes Rötig, Peter Freisinger, Felix Distelmaier, Tim M Strom, Thomas Meitinger, Julien Gagneur, Holger Prokisch, Laura S Kremer, Daniel M Bader, Christian Mertes, Robert Kopajtich, Garwin Pichler, Arcangela Iuso, Tobias B Haack, Elisabeth Graf, Thomas Schwarzmayr, Caterina Terrile, Eliška Koňaříková, Birgit Repp, Gabi Kastenmüller, Jerzy Adamski, Peter Lichtner, Christoph Leonhardt, Benoit Funalot, Alice Donati, Valeria Tiranti, Anne Lombes, Claude Jardel, Dieter Gläser, Robert W Taylor, Daniele Ghezzi, Johannes A Mayr, Agnes Rötig, Peter Freisinger, Felix Distelmaier, Tim M Strom, Thomas Meitinger, Julien Gagneur, Holger Prokisch

Abstract

Across a variety of Mendelian disorders, ∼50-75% of patients do not receive a genetic diagnosis by exome sequencing indicating disease-causing variants in non-coding regions. Although genome sequencing in principle reveals all genetic variants, their sizeable number and poorer annotation make prioritization challenging. Here, we demonstrate the power of transcriptome sequencing to molecularly diagnose 10% (5 of 48) of mitochondriopathy patients and identify candidate genes for the remainder. We find a median of one aberrantly expressed gene, five aberrant splicing events and six mono-allelically expressed rare variants in patient-derived fibroblasts and establish disease-causing roles for each kind. Private exons often arise from cryptic splice sites providing an important clue for variant prioritization. One such event is found in the complex I assembly factor TIMMDC1 establishing a novel disease-associated gene. In conclusion, our study expands the diagnostic tools for detecting non-exonic variants and provides examples of intronic loss-of-function variants with pathological relevance.

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1. Strategy for genetic diagnosis using… — **Figure 1. Strategy for genetic diagnosis using RNA-seq.**
The approach we followed started with RNA-seq of fibroblasts from unsolved WES patients. Three strategies to facilitate diagnosis were pursued: Detection of aberrant expression (for example, depletion), aberrant splicing (for example, exon creation) and MAE of the alternative allele (for example, A as alternative allele). Candidates were validated by proteomic measurements, lentiviral transduction of the wild-type (wt) allele or, in particular cases, by specific metabolic supplementation.

Figure 2. RNA aberrant expression detection and… — **Figure 2. RNA aberrant expression detection and validation.**
(a) Aberrantly expressed genes (Hochberg corrected P value<0.05 and |Z-score|>3) for each patient fibroblasts. (b) Gene-wise RNA expression volcano plot of nominal P values (−log10P value) against Z-scores of the patient #35791 compared against all other fibroblasts. Z-scores with absolute value >5 are plotted at ±5, respectively. (c) Same as (b) for patient #73804. (d) Sample-wise RNA expression is ranked for the genes *TIMMDC1* (top) and *MGST1* (bottom). Samples with aberrant expression for the corresponding gene are highlighted in red (#35791, #66744, and #73804). (e) Gene-wise comparison of RNA and protein fold changes of patient #35791 compared to the average across the fibroblast cell lines of all other patients. Subunits of the mitochondrial respiratory chain complex I are highlighted (red squares). Reliably detected proteins that were not detected in this sample are shown separately with their corresponding RNA fold changes (points below solid horizontal line). (f) Western blot of TIMMDC1, NDUFA13, NDUFB3 and NDUFB8 protein in three fibroblast cell lines without (#62346, #91324, NHDF) and three with a variant in *TIMMDC1* (#35791, #66744 and #96687), and fibroblasts re-expressing *TIMMDC1* (‘-T’) (#35791-T, #66744-T and #96687-T). UQCRC2 was used as loading control. CI, complex I subunit; CIII, complex III subunit; MW, molecular weight. (g) Blue native PAGE blot of the control fibroblasts re-expressing *TIMMDC1* (NHDF-T), the control fibroblasts (NHDF), patient fibroblasts (#96687) and patient fibroblast re-expressing *TIMMDC1* (#96687-T). Immunodecoration for complex I and complex III was performed using NDUFB8 and UQCRC2 antibodies, respectively. CI, complex I subunit; CIII, complex III subunit.

**Figure 3. Aberrant splicing detection and quantification.**
(a) Aberrant splicing events (Hochberg corrected P value<0.05) for all fibroblasts. (b) Aberrant splicing events (n=175) in undiagnosed patients (n=48) grouped by their splicing category after manual inspection. (c) *CLPP* Sashimi plot of exon skipping and truncation events in *CLPP*-affected and *CLPP*-unaffected fibroblasts (red and orange, respectively). The RNA coverage is given as the log10 RPKM-value and the number of split reads spanning the given intron is indicated on the exon-connecting lines. At the bottom the gene model of the RefSeq annotation is depicted and the aberrantly spliced exon is coloured in red. (d) Same as in c for *TIMMDC1*. At the bottom the newly created exon is depicted in red within the RefSeq annotation track. (e) Coverage tracks (light red) for patients #35791, #66744, and #91324 based on RNA and WGS. For patient #91324 only WGS is available. The homozygous SNV c.596+2146A>G is present in all coverage tracks (vertical orange bar). The top tracks show the genomic annotation: genomic position on chromosome 3, DNA sequence, amino acid translation (grey, stop codon in red), the RefSeq gene model (blue line), the predominant additional exon of *TIMMDC1* (blue rectangle) and the SNV annotation of the 1000 Genomes Project (each black bar represents one variant). (f) Per cent spliced in (Ψ) distribution for different splicing classes and genes. Top: histogram of the genome-wide distribution of the 3′ and 5′ Ψ-values based on all reads over all samples. Middle: The shaded horizontal bars represent the densities (black for high density) of the background, weak and strong splicing class, respectively (Methods section). Bottom: Ψ-values of the predominant donor and acceptor splice sites of genes with private splice sites (that is, found predominant in at most two samples) computed over all other samples.

Figure 4. Detection and validation of MAE… — **Figure 4. Detection and validation of MAE of rare variants.**
(a) Distribution of heterozygous single nucleotide variants (SNVs) across samples for different consecutive filtering steps. Heterozygous SNVs detected by exome sequencing (black), SNVs with RNA-seq coverage of at least 10 reads (grey), SNVs where the alternative allele is mono-allelically expressed (alternative allele frequency >0.8 and Benjamini-Hochberg corrected P value <0.05, blue), and the rare subset of those (ExAC minor allele frequency <0.001, red). (b) Fold change between alternative (ALT+1) and reference (REF+1) allele read counts for the patient #80256 compared to total read counts per SNV within the sample. Points are coloured according to the groups defined in a. (c) Gene-wise comparison of RNA and protein fold changes of the patient #80256 compared to the average across the fibroblast cell lines of all other patients. The position of the gene *ALDH18A1* is highlighted. Reliably detected proteins that were not detected in this sample are shown separately with their corresponding RNA fold changes (points below solid horizontal line). (d) Relative intensity for metabolites of the proline biosynthesis pathway (inlet) for the patient #80256 and 16 healthy controls of matching age. Equi-tailed 95% interval (whiskers), 25th, 75th percentile (boxes) and median (bold horizontal line) are indicated. Data points belonging to the patient are highlighted (red circles, P values were computed using the Student’s t-test). (e) Cell counts under different growth conditions for the NHDF and patient #80256. Both fibroblasts were grown in fetal bovine serum (FBS), dialysed FBS (without proline) and dialysed FBS with proline added. Boxplot as in d. P values are based on a two-sided Wilcoxon test. (f) Intron retention for *MCOLN1* in patient #62346. Tracks from top to bottom: genomic position on chromosome 19, amino acid translation (red for stop codons), RefSeq gene model, coverage of WES of patient #62346, RNA-seq based coverage for patients #62346 and #85153 (red and orange shading, respectively). SNVs are indicated by non-reference coloured bars with respect to the corresponding reference and alternative nucleotide.

Figure 5. Characterization of diagnoses and variants… — **Figure 5. Characterization of diagnoses and variants causing aberrant splicing.**
(a) Detection strategy and validation of genes with RNA defects in newly diagnosed patients, that is, *TIMMDC1* (n=2 patients), *CLPP*, *ALDH18A1* and *MCOLN1*, and one patient with a strong candidate, that is, *MGST1*. The median number (±median absolute deviation) of candidate genes is given per detection strategies. Dotted check: identified by manual inspection (not statistically significant). (b) Schematic representation of variant causing splicing defects for *TIMMDC1* (top, new exon red box), *CLPP* (middle, exon skipping and truncation) and *MCOLN1* (bottom, intron retention). Variants are depicted by a red star.

References

1. Wortmann S. B., Koolen D. A., Smeitink J. A., van den Heuvel L. & Rodenburg R. J. Whole exome sequencing of suspected mitochondrial patients in clinical practice. J. Inherit. Metab. Dis. 38, 437–443 (2015).
1. 1000 Genomes Project Consortium. et al.. A global reference for human genetic variation. Nature 526, 68–74 (2015).
1. Sudmant P. H. et al.. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
1. Taylor J. C. et al.. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat. Genet. 47, 717–726 (2015).
1. Li X. et al.. The impact of rare variation on gene expression across tissues. bioRxiv doi:; DOI: 10.1101/074443 (2016).
1. Zeng Y. et al.. Aberrant gene expression in humans. PLoS Genet. 11, 1–20 (2015).
1. Guan J. et al.. Exploiting aberrant mRNA expression in autism for gene discovery and diagnosis. Hum. Genet. 135, 1–15 (2016).
1. Zhao J. et al.. A burden of rare variants associated with extremes of gene expression in human peripheral blood. Am. J. Hum. Genet. 98, 299–309 (2016).
1. Albers C. A. Compound inheritance of a low-frequency regulatory SNP and a rare null mutation in exon-junction complex subunit RBM8A causes TAR syndrome. Nat. Genet. 44, 435–439 S1-2 (2012).
1. Reinius B. & Sandberg R. Random monoallelic expression of autosomal genes: stochastic transcription and allele-level regulation. Nat. Rev. Genet. 16, 653–664 (2015).
1. Eckersley-Maslin M. A. & Spector D. L. Random monoallelic expression: regulating gene expression one allele at a time. Trends Genet. 30, 237–244 (2014).
1. Tazi J., Bakkour N. & Stamm S. Alternative splicing and disease. Biochim. Biophys. Acta 1792, 14–26 2009.
1. Scotti M. M. & Swanson M. S. RNA mis-splicing in disease. Nat. Rev. Genet. 17, 19–32 (2015).
1. Singh R. K. & Cooper T. A. Pre-mRNA splicing in disease and therapeutics. Trends Mol. Med. 18, 472–482 (2012).
1. Xiong H. Y. et al.. The human splicing code reveals new insights into the genetic determinants of disease. Science 347, 1254806–1254806 (2015).
1. Muntoni F., Torelli S. & Ferlini A. Dystrophin and mutations: one gene, several proteins, multiple phenotypes. Lancet Neurol. 2, 731–740 (2003).
1. Gonorazky H. et al.. RNAseq analysis for the diagnosis of muscular dystrophy. Ann. Clin. Transl. Neurol. 3, 55–60 (2016).
1. Morel C. F. et al.. A LMNA splicing mutation in two sisters with severe dunnigan-type familial partial lipodystrophy type 2. J. Clin. Endocrinol. Metab. 91, 2689–2695 (2006).
1. Qu Y. et al.. A rare variant (c.863G>T) in exon 7 of SMN1 disrupts mRNA splicing and is responsible for spinal muscular atrophy. Eur. J. Hum. Genet. 24, 864–870 (2016).
1. Gorman G. S. et al.. Mitochondrial diseases. Nat. Rev. Dis. Primer 2, 16080 (2016).
1. Elstner M., Andreoli C. & Ahting U. MitoP2: an integrative tool for the analysis of the mitochondrial proteome. Mol. Biotechnol. 40, 306–315 (2008).
1. Mayr J. A. et al.. Spectrum of combined respiratory chain defects. J. Inherit. Metab. Dis. 38, 629–640 (2015).
1. Haack T. B. et al.. ELAC2 mutations cause a mitochondrial RNA processing defect associated with hypertrophic cardiomyopathy. Am. J. Hum. Genet. 93, 211–223 (2013).
1. Haack T. B. et al.. Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency. Nat. Genet. 42, 1131–1134 (2010).
1. Li Y. I. et al.. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
1. Zhang B. et al.. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014).
1. Liu Y., Beyer A. & Aebersold R. On the dependency of cellular protein levels on mRNA abundance. Cell 165, 535–550 (2016).
1. Lee K. K., Shimoji M., Hossain Q. S., Sunakawa H. & Aniya Y. Novel function of glutathione transferase in rat liver mitochondrial membrane: role for cytochrome c release from mitochondria. Toxicol. Appl. Pharmacol. 232, 109–118 (2008).
1. Holzerova E. et al.. Human thioredoxin 2 deficiency impairs mitochondrial redox homeostasis and causes early-onset neurodegeneration. Brain 139, 346–354 (2016).
1. Guarani V. et al.. TIMMDC1/C3orf1 functions as a membrane-embedded mitochondrial complex I assembly factor through association with the MCIA complex. Mol. Cell. Biol. 34, 847–861 (2014).
1. Andrews B., Carroll J., Ding S., Fearnley I. M. & Walker J. E. Assembly factors for the membrane arm of human complex I. Proc. Natl Acad. Sci. USA 110, 18934–18939 (2013).
1. Pervouchine D. D., Knowles D. G. & Guig R. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29, 273–274 (2013).
1. Lek M. et al.. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016).
1. Halperin T., Zheng B., Itzhaki H., Clarke A. K. & Adam Z. Plant mitochondria contain proteolytic and regulatory subunits of the ATP-dependent Clp protease. Plant Mol. Biol. 45, 461–468 (2001).
1. Jenkinson E. M. et al.. Perrault syndrome is caused by recessive mutations in CLPP, encoding a mitochondrial ATP-dependent chambered protease. Am. J. Hum. Genet. 92, 605–613 (2013).
1. Jenkinson E. M. et al.. Perrault syndrome: further evidence for genetic heterogeneity. J. Neurol. 259, 974–976 (2012).
1. Szczepanowska K. et al.. CLPP coordinates mitoribosomal assembly through the regulation of ERAL1 levels. EMBO J. 35, 2566–2583 (2016).
1. Piva F., Giulietti M., Burini A. B. & Principato G. SpliceAid 2: a database of human splicing factors expression data and RNA target motifs. Hum. Mutat. 33, 81–85 (2012).
1. Dogan R. I., Getoor L., Wilbur W. J. & Mount S. M. SplicePort--an interactive splice-site analysis tool. Nucleic Acids Res. 35, W285–W291 (2007).
1. Timmermans M. J. T. N. et al.. Why barcode? High-throughput multiplex sequencing of mitochondrial genomes for molecular systematics. Nucleic Acids Res. 38, e197–e197 (2010).
1. Desmet F.-O. et al.. Human Splicing Finder: an online bioinformatics tool to predict splicing signals. Nucleic Acids Res. 37, e67 (2009).
1. Yeo G., Hoon S., Venkatesh B. & Burge C. B. Variation in sequence and organization of splicing regulatory elements in vertebrate genes. Proc. Natl Acad. Sci. USA 101, 15700–15705 (2004).
1. Burge C. & Karlin S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
1. Kapustin Y. et al.. Cryptic splice sites and split genes. Nucleic Acids Res. 39, 5837–5844 (2011).
1. Adams E. & Frank L. Metabolism of proline and the hydroxyprolines. Annu. Rev. Biochem. 49, 1005–1061 (1980).
1. Baumgartner M. R. et al.. Hyperammonemia with reduced ornithine, citrulline, arginine and proline: a new inborn error caused by a mutation in the gene encoding delta(1)-pyrroline-5-carboxylate synthase. Hum. Mol. Genet. 9, 2853–2858 (2000).
1. Fischer-Zirnsak B. et al.. Recurrent de novo mutations affecting residue Arg138 of pyrroline-5-carboxylate synthase cause a progeroid form of autosomal-dominant cutis laxa. Am. J. Hum. Genet. 97, 483–492 (2015).
1. Coutelier M. et al.. Alteration of ornithine metabolism leads to dominant and recessive hereditary spastic paraplegia. Brain 138, 2191–2205 (2015).
1. Sibley C. R., Blazquez L. & Ule J. Lessons from non-canonical splicing. Nat. Rev. Genet. 17, 407–421 (2016).
1. Cummings B. B. et al.. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med 9, eaal5209 (2017).
1. Gibson G. Human genetics. GTEx detects genetic effects. Science 348, 640–641 (2015).
1. Vafai S. B. & Mootha V. K. Mitochondrial disorders as windows into an ancient organelle. Nature 491, 374–383 (2012).
1. Gagneur J. et al.. Genotype-environment interactions reveal causal pathways that mediate genetic effects on phenotype. PLoS Genet. 9, e1003803 (2013).
1. Mayr J. A. et al.. Lack of the mitochondrial protein acylglycerol kinase causes sengers syndrome. Am. J. Hum. Genet. 90, 314–320 (2012).
1. Li H. & Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
1. Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
1. Li H. et al.. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
1. McLaren W. et al.. Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics 26, 2069–2070 (2010).
1. Eilbeck K. et al.. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
1. Dobin A. et al.. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
1. Hsu F. et al.. The UCSC known genes. Bioinformatics 22, 1036–1046 (2006).
1. Lawrence M. et al.. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
1. Anders S. & Huber W. DESeq: differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
1. Love M. I., Huber W. & Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
1. Hochberg Y. A sharper bonferroni procedure for multiple tests of significance. Biometrika 75, 800–802 (1988).
1. Li Y. I., Knowles D. A. & Pritchard J. K. LeafCutter: annotation-free quantification of RNA splicing. bioRxiv doi:; DOI: 10.1101/044107 (2016).
1. Harrow J. et al.. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).
1. Van Haute L. et al.. Deficient methylation and formylation of mt-tRNA(Met) wobble cytosine in a patient carrying mutations in NSUN3. Nat. Commun. 7, 12039 (2016).

Source: PubMed

Genetic diagnosis of Mendelian disorders via RNA sequencing

Abstract

Conflict of interest statement

Figures

References

Sponsors et collaborateurs

Les conditions médicales

Interventions en matière de drogue