Application of full-genome analysis to diagnose rare monogenic disorders

Joseph T Shieh, Monica Penon-Portmann, Karen H Y Wong, Michal Levy-Sakin, Michelle Verghese, Anne Slavotinek, Renata C Gallagher, Bryce A Mendelsohn, Jessica Tenney, Daniah Beleford, Hazel Perry, Stephen K Chow, Andrew G Sharo, Steven E Brenner, Zhongxia Qi, Jingwei Yu, Ophir D Klein, David Martin, Pui-Yan Kwok, Dario Boffelli, Joseph T Shieh, Monica Penon-Portmann, Karen H Y Wong, Michal Levy-Sakin, Michelle Verghese, Anne Slavotinek, Renata C Gallagher, Bryce A Mendelsohn, Jessica Tenney, Daniah Beleford, Hazel Perry, Stephen K Chow, Andrew G Sharo, Steven E Brenner, Zhongxia Qi, Jingwei Yu, Ophir D Klein, David Martin, Pui-Yan Kwok, Dario Boffelli

Abstract

Current genetic testenhancer and narrows the diagnostic intervals for rare diseases provide a diagnosis in only a modest proportion of cases. The Full-Genome Analysis method, FGA, combines long-range assembly and whole-genome sequencing to detect small variants, structural variants with breakpoint resolution, and phasing. We built a variant prioritization pipeline and tested FGA's utility for diagnosis of rare diseases in a clinical setting. FGA identified structural variants and small variants with an overall diagnostic yield of 40% (20 of 50 cases) and 35% in exome-negative cases (8 of 23 cases), 4 of these were structural variants. FGA detected and mapped structural variants that are missed by short reads, including non-coding duplication, and phased variants across long distances of more than 180 kb. With the prioritization algorithm, longer DNA technologies could replace multiple tests for monogenic disorders and expand the range of variants detected. Our study suggests that genomes produced from technologies like FGA can improve variant detection and provide higher resolution genome maps for future application.

Conflict of interest statement

The authors declare no competing interests.

© 2021. The Author(s).

Figures

Fig. 1. Heterozygous, intronic tandem duplication (32…
Fig. 1. Heterozygous, intronic tandem duplication (32 kb) in NHEJ1.
The region affected (chr2: 219,102,933 - 219,134,970, 2q35, genome version GRCh38) corresponds to an IHH upstream enhancer and narrows the diagnostic interval for this condition. a Depicts a de novo assembly (light blue) and its alignment to reference (green). The labeled motifs in the reference genome (vertical maroon lines) are duplicated in Haplotype B and their orientation demonstrates the duplication occurred adjacent to the original sequence, in tandem. b Shows a matrix view of linked reads. The dark orange square in the left panel (proband), illustrates a higher density of barcode overlap in the read matrix compared to parents, indicating the variant likely occurred de novo. c Contains phased haplotypes generated using linked-read data. Haplotype B, in purple, contains the intronic region with higher number of linked reads due to sequence duplication. Accompanying supplemental data show overlap with enhancer.
Fig. 2. Structural rearrangement detection with de…
Fig. 2. Structural rearrangement detection with de novo assembly and linked reads; t(1:9)(p33,p21).
a Contains de novo assemblies of chromosome 9 and 1. Genomic coordinates in gray at the top and the reference assembly in green (reference GRCh38). The proband assembly map is shown in blue with vertical maroon lines that show matching label patterns. The first and second panels show two de novo assembly maps that align to reference chromosomes 1 and 9 and the translocation breakpoint where the alignment switches. The third panel depicts two assembly maps in chromosome 1 with segments that align and segments that do not align to the reference due to the translocation. b Shows the matrix view of linked reads that contains unexpected barcode overlap (in orange) between chromosome 1 and 9, corresponding to the intronic point of fusion between the two. This overlap is absent in the parents.
Fig. 3. Deletion disrupting TANGO2 (chr22: 20,039,637–20,075,714…
Fig. 3. Deletion disrupting TANGO2 (chr22: 20,039,637–20,075,714 and chr22: 20,041,469—20,075,432, genome version GRCh38).
a De novo assembly (light blue) demonstrates missing sequence labels with respect to reference (green). The orange bracket and gray triangle show the deleted region. b Matrix view with absent signal from intervening region demonstrates proband with biallelic deletion. c Deletion is also seen by drop in coverage in both haplotypes in linked-read data.
Fig. 4. Variant haplotype distinction.
Fig. 4. Variant haplotype distinction.
a Shows compound heterozygous TSPEAR variants (NM_144991). Phasing was successful for etiologic variants 184,756 bp apart given a phasing block of 15.1 Mb, which is not possible with short-read sequencing. Maroon and yellow arrows point to each variant. Gray arrowheads point to single-nucleotide polymorphisms that confirm trans orientation in relation to parental haplotypes. b Shows a de novo SMAD4 pathogenic variant (NM_005359) identified by linked-read sequencing, also detectable by short-read sequencing. Haplotype analysis showed the variant occurred on the paternally inherited allele, Haplotype A. Variant position is indicated with a maroon arrow. Gray arrowheads point to.

References

    1. Clark MM, et al. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases. npj Genom. Med. 2018;3:1–10. doi: 10.1038/s41525-018-0053-8.
    1. Biesecker LG, Green RC. Diagnostic clinical genome and exome sequencing. N. Engl. J. Med. 2014;370:2418–2425. doi: 10.1056/NEJMra1312543.
    1. Levy SE, Myers RM. Advancements in next-generation sequencing. Annu. Rev. Genomics Hum. Genet. 2016;17:95–115. doi: 10.1146/annurev-genom-083115-022413.
    1. Richards S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 2015;17:405–424. doi: 10.1038/gim.2015.30.
    1. Stavropoulos DJ, et al. Whole-genome sequencing expands diagnostic utility and improves clinical management in paediatric medicine. npj Genomic Med. 2016;1:15012. doi: 10.1038/npjgenmed.2015.12.
    1. Ostrander BEP, et al. Whole-genome analysis for effective clinical diagnosis and gene discovery in early infantile epileptic encephalopathy. npj Genomic Med. 2018;3:22. doi: 10.1038/s41525-018-0061-8.
    1. Schwarze K, Buchanan J, Taylor JC, Wordsworth S. Are whole-exome and whole-genome sequencing approaches cost-effective? A systematic review of the literature. Genet. Med. 2018;20:1122–1130. doi: 10.1038/gim.2017.247.
    1. Nambot S, et al. Clinical whole-exome sequencing for the diagnosis of rare disorders with congenital anomalies and/or intellectual disability: substantial interest of prospective annual reanalysis. Genet. Med. 2018;20:645–654. doi: 10.1038/gim.2017.162.
    1. Levy-Sakin M, et al. Genome maps across 26 human populations reveal population-specific patterns of structural variation. Nat. Commun. 2019;10:1–14. doi: 10.1038/s41467-019-08992-7.
    1. Wong KHY, Levy-Sakin M, Kwok PY. De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations. Nat. Commun. 2018;9:1–9. doi: 10.1038/s41467-017-02088-w.
    1. Marks P, et al. Resolving the full spectrum of human genome variation using Linked-Reads. Genome Res. 2019;29:635–645. doi: 10.1101/gr.234443.118.
    1. Demaerel, W. et al. The 22q11 low copy repeats are characterized by unprecedented size and structure variability. bioRxiv10.1101/403873 (2018).
    1. Seo JS, et al. De novo assembly and phasing of a Korean human genome. Nature. 2016;538:243–247. doi: 10.1038/nature20098.
    1. James KN, et al. Partially automated whole-genome sequencing reanalysis of previously undiagnosed pediatric patients can efficiently yield new diagnoses. npj Genom. Med. 2020;5:1–8. doi: 10.1038/s41525-020-00140-1.
    1. Kosugi S, et al. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019;20:8–11. doi: 10.1186/s13059-019-1720-5.
    1. Gross AM, et al. Copy-number variants in clinical genome sequencing: deployment and interpretation for rare and undiagnosed disease. Genet. Med. 2019;21:1121–1130. doi: 10.1038/s41436-018-0295-y.
    1. Penon M, Zahed H, Berger V, Su I, Shieh JT. Using exome sequencing to decipher family history in a healthy individual: comparison of pathogenic and population MTM1 variants. Mol. Genet. Genom. Med. 2018;6:722–727. doi: 10.1002/mgg3.405.
    1. Mostovoy Y, et al. A hybrid approach for de novo human genome sequence assembly and phasing. Nat. Methods. 2016;13:587–590. doi: 10.1038/nmeth.3865.
    1. Chaisson MJP, et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 2019;10:1–16. doi: 10.1038/s41467-018-08148-z.
    1. Klopocki E, et al. Copy-number variations involving the IHH locus are associated with syndactyly and craniosynostosis. Am. J. Hum. Genet. 2011;88:70–75. doi: 10.1016/j.ajhg.2010.11.006.
    1. Will AJ, et al. Composition and dosage of a multipartite enhancer cluster control developmental expression of Ihh (Indian hedgehog) Nat. Genet. 2017;49:1539–1545. doi: 10.1038/ng.3939.
    1. Gambin T, et al. Identification of novel candidate disease genes from de novo exonic copy number variants. Genome Med. 2017;9:1–15. doi: 10.1186/s13073-017-0472-7.
    1. Sarah, Z. H. Common Fragile Site Genes, CNTLN and LINGO2, are Associated with Increased Genome Instability in Different Tumors. (University of Heidelberg, 2010).
    1. Barr E, Applebaum M. Genetic predisposition to neuroblastoma. Children. 2018;5:119. doi: 10.3390/children5090119.
    1. Bonaglia MC, et al. De novo unbalanced translocations have a complex history/aetiology. Hum. Genet. 2018;137:817–829. doi: 10.1007/s00439-018-1941-9.
    1. Du R, et al. Identification of likely pathogenic and known variants in TSPEAR, LAMB3, BCOR, and WNT10A in four Turkish families with tooth agenesis. Hum. Genet. 2018;137:689–703. doi: 10.1007/s00439-018-1907-y.
    1. Peled A, et al. Mutations in TSPEAR, encoding a regulator of notch signaling, affect tooth and hair follicle morphogenesis. PLoS Genet. 2016;12:1–17. doi: 10.1371/journal.pgen.1006369.
    1. Goldmann JM, et al. Parent-of-origin-specific signatures of de novo mutations. Nat. Genet. 2016;48:935–939. doi: 10.1038/ng.3597.
    1. Wang H, et al. Clinical utility of 24-h rapid trio-exome sequencing for critically ill infants. npj Genom. Med. 2020;5:1–6. doi: 10.1038/s41525-020-0129-0.
    1. Clark MM, et al. Diagnosis of genetic diseases in seriously ill children by rapid whole-genome sequencing and automated phenotyping and interpretation. Sci. Transl. Med. 2019;11:eaat6177. doi: 10.1126/scitranslmed.aat6177.
    1. Schneider VA, et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 2017;27:849–864. doi: 10.1101/gr.213611.116.
    1. Kehr B, et al. Diversity in non-repetitive human sequences not found in the reference genome. Nat. Genet. 2017;49:588–593. doi: 10.1038/ng.3801.
    1. Cameron DL, Di Stefano L, Papenfuss AT. Comprehensive evaluation and characterisation of short read general-purpose structural variant calling software. Nat. Commun. 2019;10:1–11. doi: 10.1038/s41467-019-11146-4.
    1. Chen X, et al. Manta: Rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics. 2016;32:1220–1222. doi: 10.1093/bioinformatics/btv710.

Source: PubMed

3
Subskrybuj