Sensitive quantification of mosaicism using high density SNP arrays and the cumulative distribution function

Thomas C Markello, Hannah Carlson-Donohoe, Murat Sincan, David Adams, David M Bodine, Jason E Farrar, Adrianna Vlachos, Jeffrey M Lipton, Arleen D Auerbach, Elaine A Ostrander, Settara C Chandrasekharappa, Cornelius F Boerkoel, William A Gahl, Thomas C Markello, Hannah Carlson-Donohoe, Murat Sincan, David Adams, David M Bodine, Jason E Farrar, Adrianna Vlachos, Jeffrey M Lipton, Arleen D Auerbach, Elaine A Ostrander, Settara C Chandrasekharappa, Cornelius F Boerkoel, William A Gahl

Abstract

Medicine is rapidly applying exome and genome sequencing to the diagnosis and management of human disease. Somatic mosaicism, however, is not readily detectable by these means, and yet it accounts for a significant portion of undiagnosed disease. We present a rapid and sensitive method, the Continuous Distribution Function as applied to single nucleotide polymorphism (SNP) array data, to quantify somatic mosaicism throughout the genome. We also demonstrate application of the method to novel diseases and mechanisms.

Published by Elsevier Inc.

Figures

Fig. 1
Fig. 1
B allele plots for duplication, normal dizyogous chromosome (a), an interstitial duplication (b), a trisomy/monosomy mosaic (c) and a monosomy/disomy chromosome (d). Binned counts for SNPs with B allele frequency values for 200 bins from 0 to 1.0, summed along the entire chromosome length e corresponding to a, f corresponding to c, g corresponding to d. Note that the non-homozygous regions of the binned counts form a quasi-normal distribution with a single mode for normal, and overlapping and non-overlapping bimodal peaks for the mosaic examples. Continuous distribution functions are shown for the normal h, and respective mosaic examples i, j.
Fig. 2
Fig. 2
Block algorithm for implementing the DANFIP analysis for the nonhomozygous B allele frequency data from a series of 5 controls and an experimental region of mosaic data. CDF refers to a Continuous Distribution Function; delta is described in the Methods.
Fig. 3
Fig. 3
Punit square data for models of mosaicism. a.) Graphical representation of the possible number of alleles in a single locus for a disomy/trisomy mosaicism. In this representation, A is the more frequent allele and B the less frequent allele at every polymorphic locus. All possible combinations of the disomy cell line and the derived trisomy cell line are shown. There are only four possible states for the trisomy cell line, and only four possible states can arise from the three states in the disomy cell line that gave rise to the trisomy line. The equations for the net B allele frequency are the fraction of each allele contributed by the trisomy line (f) times the specific allele (A, A and B for one and A, B, B for the other) plus the remaining fraction of the disomy line (1-f) times the specific allele for that line (A, B), divided by the total sum of all the alleles in both cell lines. If the B allele intensity is presumed to be equal between the A and B alleles, then these equations reduce to 1/(2+f) and (1+f)/(2+f), respectively. b.) Graphical representation of the possible number of alleles in a single locus for a monosomy/disomy mosaic mixture. There are only four possible outcomes at any one locus for a fraction of the monosomy line of f. The allele equations for the B allele frequency are as described above and for equal intensities of the A and B alleles (the non homozygous possibilities) they reduce to the terms (1-f)/(2-f) and 1/(2-f), respectively. It is important to note that only one of these possibilities exists at each locus in the actual experimental data, but that all four possible states are present many times in the many loci that are sampled by a high density SNP array.
Fig. 4
Fig. 4
B allele plots derived from SNP array data. (a) Mixing experiment showing B allele frequency plots of the normalized cumulative distribution function (CDF) of data points in the heterozygous region. The CDF is formed from the data values between 0.15 and 0.85 on the 0% monosomy data set and all the corresponding loci on the other plots. For the CDF plots, the modelled fitted data and experimental data are both superimposed on this scale. A series of iterated fits to the real data, with resulting residuals, is graphed next to each mixture. The minimum number of residuals converges on the experimental data’s true mosaic percent, within pipetting error. (b) B allele plots for 10 selected chromosomes from 10 separate individuals. [i] Monosomy/disomy X (58%, varies with cell type). [ii] Disomy/trisomy12 (15.6%). [iii] Mosaic Y monosomy (18.9%) in a male with pseudoautosomal region of X. [iv] Partial mosaicism for 20q13.32 to qter with distal region of homozygosity. The log R ratio excludes a deletion of the region of homozygosity at 20q13.33 [v] Partial disomy/monosomy for two separate regions of chromosome 5 (63%, varies with cell type). [vi] Mosaicism for 1pter to 1p35. [vii and viii] Two siblings with variable mosaicism for 3q. Approximate mosaicism for iv:1.1% cen-q21.1, 6.2% q21.2-q25, 10.3%q25.1-q26.31, 14.9%q26.31-q27.3 and 15.8% from q28 to qter). [ix] Variable mosaic 3p. [x] Variable mosaicism for chromosome 15q from 0% to 31.7% at qter.
Fig. 5
Fig. 5
Examples of complex mosaicism detected by SNP/CDF analysis. (a) B allele plots of X monosomy/disomy mosaic after cell fractionation. The CDFs for each fraction are to the right of each B allele frequency plot; black CDF dots are actual data and red are the fitted regression. Also shown are the B allele plots of the partial 5p monosomy/disomy mosaicism after cell type fraction. The log R ratios for these samples are shown to the right of each plot, and the very minimal change in the T cell intensity is apparent. (b) Variable degree of monosomy/disomy for the distal portion of 3p. Subregions of the B allele plot were analyzed by CDF for percent mosaicism as shown in the corresponding CDF plots in the bottom half of the figure. Blue dots are actual CDF data; red dots show regression fit.

Source: PubMed

3
S'abonner