Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: evolutionary evidence for differences in molecular effects

Paul D Thomas, Anish Kejariwal, Paul D Thomas, Anish Kejariwal

Abstract

Most Mendelian diseases studied to date arise from mutations that lead to a single amino acid change in an encoded protein. An increasing number of complex diseases have also been associated with amino acid-changing single-nucleotide polymorphisms (coding SNPs, cSNPs), suggesting potential similarities between Mendelian and complex diseases at the molecular level. Here, we use two different evolutionary analyses to compare Mendelian and complex disease-associated cSNPs. In the first, we estimate the likelihood that a specific amino acid substitution in a protein will affect the protein's function, by using amino acid substitution scores derived from an alignment of related protein sequences and statistics from hidden Markov models. In the second, we use standard Ka/Ks ratios to make comparisons at the gene, rather than the individual amino acid, level. We find that Mendelian disease cSNPs have a very strong tendency to occur at highly conserved amino acid positions in proteins, suggesting that they generally have a severe impact on the function of the protein. Perhaps surprisingly, the distribution of amino acid substitution scores for complex disease cSNPs is dramatically different from the distribution for Mendelian disease cSNPs, and is indistinguishable from the distribution for "normal" human variation. Further, the distributions of Ka/Ks ratios for human and mouse orthologs indicate greater positive selection (or less negative selection) pressure on complex disease-associated genes, on average. These findings suggest that caution should be exercised when using Mendelian disease as a model for complex disease, at least with respect to molecular effects on protein function.

Figures

Fig. 1.
Fig. 1.
Cumulative distributions of position-specific amino acid substitution scores for different sets of cSNPs. Distributions are shown for Mendelian disease (red), neutral variation (yellow), and “normal” human variation (green). The score distribution for complex diseases is in black squares. Shifts toward the left of the graph (smaller scores) indicate increasingly radical substitutions.
Fig. 2.
Fig. 2.
Cumulative random (neutral) distributions of subPSEC scores over the genes associated with Mendelian diseases vs. complex diseases are nearly identical. This is a control for the comparison shown in Fig. 1, demonstrating that there is no bias in these gene sets with respect to the subPSEC scores and that the differences between these sets in Fig. 1 are due to position-specific conservation.
Fig. 3.
Fig. 3.
Effect of unreliable complex disease associations. Even in our conservative set of complex disease-associated genes (see text), there may be some incorrect associations, which would affect the P value comparing subPSEC score distributions for Mendelian and complex disease-associated cSNPs. The effect of removing potentially unreliable data points in the two extreme cases is shown: removing the least deleterious cSNP (•), and removing the most deleterious cSNP (○). The dashed line shows P = 0.05.
Fig. 4.
Fig. 4.
Cumulative distributions of mouse–human ortholog Ka/Ks ratios for different sets of genes. Distributions are shown for genes having at least one Mendelian (red), or complex (black), disease-associated cSNP, compared with two background sets: all genes (green) with ortholog data in the HomoloGene database (16), and the subset of all genes having at least one cSNP in dbSNP (yellow).

Source: PubMed

3
S'abonner