CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data

Jonathan S Packer, Evan K Maxwell, Colm O'Dushlaine, Alexander E Lopez, Frederick E Dewey, Rostislav Chernomorsky, Aris Baras, John D Overton, Lukas Habegger, Jeffrey G Reid, Jonathan S Packer, Evan K Maxwell, Colm O'Dushlaine, Alexander E Lopez, Frederick E Dewey, Rostislav Chernomorsky, Aris Baras, John D Overton, Lukas Habegger, Jeffrey G Reid

Abstract

Motivation: Several algorithms exist for detecting copy number variants (CNVs) from human exome sequencing read depth, but previous tools have not been well suited for large population studies on the order of tens or hundreds of thousands of exomes. Their limitations include being difficult to integrate into automated variant-calling pipelines and being ill-suited for detecting common variants. To address these issues, we developed a new algorithm--Copy number estimation using Lattice-Aligned Mixture Models (CLAMMS)--which is highly scalable and suitable for detecting CNVs across the whole allele frequency spectrum.

Results: In this note, we summarize the methods and intended use-case of CLAMMS, compare it to previous algorithms and briefly describe results of validation experiments. We evaluate the adherence of CNV calls from CLAMMS and four other algorithms to Mendelian inheritance patterns on a pedigree; we compare calls from CLAMMS and other algorithms to calls from SNP genotyping arrays for a set of 3164 samples; and we use TaqMan quantitative polymerase chain reaction to validate CNVs predicted by CLAMMS at 39 loci (95% of rare variants validate; across 19 common variant loci, the mean precision and recall are 99% and 94%, respectively). In the Supplementary Materials (available at the CLAMMS Github repository), we present our methods and validation results in greater detail.

Availability and implementation: https://github.com/rgcgithub/clamms (implemented in C).

Contact: jeffrey.reid@regeneron.com

Supplementary information: Supplementary data are available at Bioinformatics online.

© The Author 2015. Published by Oxford University Press.

Figures

Fig. 1.
Fig. 1.
Overview of the CLAMMS CNV-calling pipeline. A reference panel is selected for each sample based on seven sequencing QC metrics using an efficient k-d tree data structure. After selecting reference panels, each sample and its corresponding reference panel may be processed in parallel across processes and/or servers, requiring only ∼50 MB of RAM per process

References

    1. Backenroth D., et al. (2014) CANOES: detecting rare copy number variants from whole exome sequencing data. Nucleic Acids Res., 42, e97.
    1. Fromer M., et al. (2012) Discovery and statistical genotyping of copy-number variation from whole-exome sequencing depth. Am. J. Hum. Genet., 91, 597–607.
    1. Handsaker R.E., et al. (2015) Large multiallelic copy number variations in humans. Nat. Genet., 47, 296–303.
    1. Krumm N., et al. (2012) Copy number variation detection and genotyping from exome sequence data. Genome Res., 22, 1525–1532.
    1. Plagnol V., et al. (2012) A robust model for read count data in exome sequencing experiments and implications for copy number variant calling. Bioinformatics, 28, 2747–2754.
    1. Stenson P.D., et al. (2012) The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr. Protoc. Bioinform., 39, 1.13.1–1.13.20.
    1. Wang K., et al. (2007) PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res., 17, 1665–1674.

Source: PubMed

3
Suscribir