Adaptive evolution of UGT2B17 copy-number variation

Yali Xue, Donglin Sun, Allan Daly, Fengtang Yang, Xue Zhou, Mengyao Zhao, Ni Huang, Tatiana Zerjal, Charles Lee, Nigel P Carter, Matthew E Hurles, Chris Tyler-Smith, Yali Xue, Donglin Sun, Allan Daly, Fengtang Yang, Xue Zhou, Mengyao Zhao, Ni Huang, Tatiana Zerjal, Charles Lee, Nigel P Carter, Matthew E Hurles, Chris Tyler-Smith

Abstract

The human UGT2B17 gene varies in copy number from zero to two per individual and also differs in mean number between populations from Africa, Europe, and East Asia. We show that such a high degree of geographical variation is unusual and investigate its evolutionary history. This required first reinterpreting the reference sequence in this region of the genome, which is misassembled from the two different alleles separated by an artifactual gap. A corrected assembly identifies the polymorphism as a 117 kb deletion arising by nonallelic homologous recombination between approximately 4.9 kb segmental duplications and allows the deletion breakpoint to be identified. We resequenced approximately 12 kb of DNA spanning the breakpoint in 91 humans from three HapMap and one extended HapMap populations and one chimpanzee. Diversity was unusually high and the time to the most recent common ancestor was estimated at approximately 2.4 or approximately 3.0 million years by two different methods, with evidence of balancing selection in Europe. In contrast, diversity was low in East Asia where a single haplotype predominated, suggesting positive selection for the deletion in this part of the world.

Figures

Figure 1
Figure 1
The BAC Clone Containing UGT2B17 Shows Unusually High Population Differentiation in CGH Experiments The mean log2 ratios of the copy-number-variable BAC clones in the YRI and CHB were compared and the UGT2B17-containing BAC stood out (filled symbol, arrow, top) and was an outlier when residuals from a regression analysis were examined (arrow, bottom).
Figure 2
Figure 2
Reinterpretation of the Reference Sequence surrounding UGT2B17 The reference sequence in this region contains a large gap (white bar) flanked by large segmental duplications (red and gray bars) carrying copies of the TMPRSS11E and UGT2B15 genes. The UGT2B17 gene lines in a 117 kb insertion into the left segmental duplication. This structure predicts that a BAC clone within the red segmental duplication (Chr4tp-1B5) should show two signals in a fiber-FISH experiment, but only one was ever observed. Consequently, we suggest that the large segmental duplication does not exist and the reference structure represents a misassembly of an allele containing (left) and lacking (right) UGT2B17.
Figure 3
Figure 3
Distribution of the UGT2B17 CNV Frequency in the HGDP-CEPH Population Samples Note the high frequency of the gene in most African populations, intermediate frequency in Europe/West Asia, and low frequency in East Asia.
Figure 4
Figure 4
Sequence Analysis of the UGT2B17 Deletion Breakpoint The UGT2B17 gene is flanked by two ∼4.9 kb segmental duplications (SDs, orange) ∼117 kb apart, and nonallelic homologous recombination between them led to deletion of the gene. Long PCR products were designed for amplification of the adjacent sequences, with one primer in the SD and one in the flanking unique sequence. The pattern of LD found after resequencing the amplified region in 91 individuals is shown at the bottom. Note the high proximal LD (left, predominant red color) and low distal LD (right).
Figure 5
Figure 5
Network Analysis of UGT2B17 Inferred Haplotypes (A and B) A 6.4 kb region, all SNPs. (C and D) A 6.4 kb region omitting gene conversion SNPs. (E and F) A 3.0 kb unique region, all SNPs. Circles represent haplotypes with an area proportional to frequency. In each pair of networks, the shortest line represents on mutational step. “Ancestral” (white circle) is a reconstructed haplotype carrying the ancestral (chimpanzee) allele at each position.

Source: PubMed

3
Tilaa