A robust model for read count data in exome sequencing experiments and implications for copy number variant calling
Vincent Plagnol, James Curtis, Michael Epstein, Kin Y Mok, Emma Stebbings, Sofia Grigoriadou, Nicholas W Wood, Sophie Hambleton, Siobhan O Burns, Adrian J Thrasher, Dinakantha Kumararatne, Rainer Doffinger, Sergey Nejentsev, Vincent Plagnol, James Curtis, Michael Epstein, Kin Y Mok, Emma Stebbings, Sofia Grigoriadou, Nicholas W Wood, Sophie Hambleton, Siobhan O Burns, Adrian J Thrasher, Dinakantha Kumararatne, Rainer Doffinger, Sergey Nejentsev
Abstract
Motivation: Exome sequencing has proven to be an effective tool to discover the genetic basis of Mendelian disorders. It is well established that copy number variants (CNVs) contribute to the etiology of these disorders. However, calling CNVs from exome sequence data is challenging. A typical read depth strategy consists of using another sample (or a combination of samples) as a reference to control for the variability at the capture and sequencing steps. However, technical variability between samples complicates the analysis and can create spurious CNV calls.
Results: Here, we introduce ExomeDepth, a new CNV calling algorithm designed to control for this technical variability. ExomeDepth uses a robust model for the read count data and uses this model to build an optimized reference set in order to maximize the power to detect CNVs. As a result, ExomeDepth is effective across a wider range of exome datasets than the previously existing tools, even for small (e.g. one to two exons) and heterozygous deletions. We used this new approach to analyse exome data from 24 patients with primary immunodeficiencies. Depending on data quality and the exact target region, we find between 170 and 250 exonic CNV calls per sample. Our analysis identified two novel causative deletions in the genes GATA2 and DOCK8.
Availability: The code used in this analysis has been implemented into an R package called ExomeDepth and is available at the Comprehensive R Archive Network (CRAN).
Figures
References
- Agresti A. Categorical data analysis. Wiley Series in Probability and Statistics. 2002 Chapter 13, 2nd edn. Wiley-Interscience, Hoboken, NJ, p. 553.
- Conrad DF, et al. Origins and functional impact of copy number variation in the human genome. Nature. 2010;464:704–712.
- Karakoc E, et al. Detection of structural variants and indels within exome data. Nat. Methods. 2011;9:176–178.
- Krumm N, et al. Copy number variation detection and genotyping from exome sequence data. Genome Res. 2012;22:1525–1532.
- Love MI, et al. Modeling read counts for CNV detection in exome sequencing data. Stat. Appl. Genet. Mol. Biol. 2011;10
- Marioni JC, et al. Breaking the waves: improved detection of copy number variation from microarray-based comparative genomic hybridization. Genome Biol. 2007;8:R228.
- Medvedev P, et al. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods. 2009;6(Suppl. 11):S13–S20.
- Mortazavi A, et al. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods. 2008;5:621–628.
- Ng SB, et al. Exome sequencing identifies the cause of a mendelian disorder. Nat. Genet. 2010;42:30–35.
- Ostergaard P, et al. Mutations in GATA2 cause primary lymphedema associated with a predisposition to acute myeloid leukemia (Emberger syndrome) Nat. Genet. 2011;43:929–931.
- Sathirapongsasuti JF, et al. Exome sequencing-based copy-number variation and loss of heterozygosity detection: ExomeCNV. Bioinformatics. 2011;27:2648–2654.
- Xie C, Tammi MT. CNV-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 2009;10:80.
- Ye K, et al. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics. 2009;25:2865–2871.
- Zeitouni B, et al. SVDetect: a tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics. 2010;26:1895–1896.
- Zhang J, et al. Development of bioinformatics resources for display and analysis of copy number and other structural variants in the human genome. Cytogenet. Genome Res. 2006;115:205–214.
- Zhang Q, et al. Combined immunodeficiency associated with DOCK8 mutations. New Engl. J. Med. 2009;361:2046–2055.
Source: PubMed