Recombinational and mutational hotspots within the human lipoprotein lipase gene

A R Templeton, A G Clark, K M Weiss, D A Nickerson, E Boerwinkle, C F Sing, A R Templeton, A G Clark, K M Weiss, D A Nickerson, E Boerwinkle, C F Sing

Abstract

Here an analysis is presented of the roles of recombination and mutation in shaping previously determined haplotype variation in 9.7 kb of genomic DNA sequence from the human lipoprotein lipase gene (LPL), scored in 71 individuals from three populations: 24 African Americans, 24 Finns, and 23 non-Hispanic whites. Recombination and gene-conversion events inferred from data on 88 haplotypes that were defined by 69 variable sites were tested. The analysis revealed 29 statistically significant recombination events and one gene-conversion event. The recombination events were concentrated in a 1.9-kb region, near the middle of the segment, that contains a microsatellite and a pair of tandem and complementary mononucleotide runs; both the microsatellite and the runs show length variation. An analysis of site variation revealed that 9.6% of the nucleotides at CpG sites were variable, as were 3% of the nucleotides found in mononucleotide runs of >/=5 nucleotides, 3% of the nucleotides found </=3 bp from certain putative polymerase alpha-arrest sites, and 0. 5% of the remaining nucleotides. This nonhomogeneous distribution of variation suggests that multiple mutational hits at certain sites are common, an observation that challenges the fundamental assumption of the infinite-sites-mutation model. The nonrandom patterns of recombination and mutation suggest that randomly chosen single-nucleotide polymorphisms may not be optimal for disequilibrium mapping of this gene. Overall, these results indicate that both recombinational and mutational hotspots have played significant roles in shaping the haplotype variation at the LPL locus.

Figures

Figure 1
Figure 1
Evolutionary “trees” of LPL, as estimated by three procedures. In each case, only the horizontal length of branches indicate mutational change, with the length proportional to the number of mutations on that branch. Haplotypes are indicated by a number, followed by one or more letters indicating the populations in which that particular haplotype was present (J = Jackson, N = North Karelia, and R = Rochester). A, Statistical parsimony “tree.” Nodes that define four major clades are indicated by an oval containing “T-i,” where i can be 1, 2, 3, or 4. There are actually 8 “trees” in the statistically parsimonious confidence set, but the differences from the one portrayed here are minor. The full set of SP trees is available at the MDECODE Web site. B, NJ “tree.” C, Corrected NJ “tree.”
Figure 2
Figure 2
Number of false positives under the CT algorithm for inferring recombination events in 10 random permutations of variable-site position across phylogenetic position in the “tree” given in figure 1A.
Figure 3
Figure 3
Run length of matched homoplasies for 31 false positives under the CT algorithm for inferring recombination events in 10 random permutations of variable-site position across phylogenetic position in the “tree” given in figure 1A.
Figure 4
Figure 4
Tail probabilities for 31 false positives of the CT hypergeometric test of recombination in 10 random permutations of variable-site position across phylogenetic position in the “tree” given in figure 1A.
Figure 5
Figure 5
Physical distribution of 31 false positives for recombination in 10 random permutations of variable-site position across phylogenetic position in the “tree” given in figure 1A, and physical distribution of 29 recombination events inferred from the observed data. For each variable site in table 1table 1, the number of recombination events inferred from the observed and permuted data whose potential crossover intervals overlapped that site are indicated on the Y-axis.
Figure 6
Figure 6
Plot showing pairwise linkage disequilibrium, indicated by a blackened square, for site pairs with a significant Fisher's exact test (P<.001) and no correction for multiple comparisons. The labels on the X- and Y-axes indicate the site numbers, as given intable 1table 1. However, the numbers for sites 12 and 54 in table 1table 1 are not included, because their phasing was not determined. In comparisons of site pairs in which both sites have rare nucleotides, there can be complete disequilibrium (one of the four possible gametes having a count of 0)—yet the Fisher's exact test can be not significant. Site pairs that lack the power to test a significant association are indicated by a dot in the center of the square. The diagonal line, with exons 4–9 labeled, indicates the location of each varying site along the gene. The thick solid lines outline the maximal boundaries of the recombinational hotspot.

Source: PubMed

3
Se inscrever