A covering method for detecting genetic associations between rare variants and common phenotypes

Gaurav Bhatia, Vikas Bansal, Olivier Harismendy, Nicholas J Schork, Eric J Topol, Kelly Frazer, Vineet Bafna, Gaurav Bhatia, Vikas Bansal, Olivier Harismendy, Nicholas J Schork, Eric J Topol, Kelly Frazer, Vineet Bafna

Abstract

Genome wide association (GWA) studies, which test for association between common genetic markers and a disease phenotype, have shown varying degrees of success. While many factors could potentially confound GWA studies, we focus on the possibility that multiple, rare variants (RVs) may act in concert to influence disease etiology. Here, we describe an algorithm for RV analysis, RareCover. The algorithm combines a disparate collection of RVs with low effect and modest penetrance. Further, it does not require the rare variants be adjacent in location. Extensive simulations over a range of assumed penetrance and population attributable risk (PAR) values illustrate the power of our approach over other published methods, including the collapsing and weighted-collapsing strategies. To showcase the method, we apply RareCover to re-sequencing data from a cohort of 289 individuals at the extremes of Body Mass Index distribution (NCT00263042). Individual samples were re-sequenced at two genes, FAAH and MGLL, known to be involved in endocannabinoid metabolism (187Kbp for 148 obese and 150 controls). The RareCover analysis identifies exactly one significantly associated region in each gene, each about 5 Kbp in the upstream regulatory regions. The data suggests that the RVs help disrupt the expression of the two genes, leading to lowered metabolism of the corresponding cannabinoids. Overall, our results point to the power of including RVs in measuring genetic associations.

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1. Permutation -values versus the statistic…
Figure 1. Permutation -values versus the statistic value on the union-variant .
The mean of the empirical -values (obtained by permuting cases and controls) were plotted against each value of the statistic obtained over many tests over the entire range of simulation parameters, by varying sample size , locus PAR, and penetrance. As is the most significant subset among many possible subsets, the theoretical -value suggested by the distribution cannot be used directly. However, the plot shows that the locus value correlates tightly with the -value, implying that the union statistic can be used to filter the significant windows with no loss of power. The saturation at the ends is due to the number of trial being limited to .
Figure 2. Power of RV analyses, tested…
Figure 2. Power of RV analyses, tested over different values of penetrance , PAR , and individuals (cases+controls).
For each choice of parameters, test cases were simulated. Each test-case was analyzed using methods, and the -value computed using permutations of cases and controls. The score is considered significant only if it is higher than all permuted values. The power of the test is the fraction of test-cases that had a significant score. RareCover dominates the other methods implying greater power over all choice of parameters. For all methods, power increases with an increase in , or sample size.
Figure 3. Comparisons between causal RVs, and…
Figure 3. Comparisons between causal RVs, and RVs recovered by RareCover.
The -axis describes the raw number of causal RVs (), RVs recovered (), their intersection, and the fraction recovered (, scaled for exposition). Close to of the causal RVs are recovered over a wide range of sample populations.
Figure 4. Power calculations on populations with…
Figure 4. Power calculations on populations with bottleneck, and recent expansion.
Simulated population data with quantitative trait (QT) values was provided by Kryukov et al. The QT values are normally distributed. Individuals carrying any causal mutation have QT values drawn from a Normal distribution with a shifted mean. The shift is characterized as Low (), Medium (), and High (). As the locus PAR values are low, power is computed as the fraction of simulations that showed significance at -value . Individuals were chosen from the lower (Control) and upper (Case) tails of the QT distribution. The power of all methods is compared using the % extremes ( cases, controls), and the % ( cases, and controls). RareCover is shown to have the highest power, comparable to the power of the causal mutations.
Figure 5. Allele frequency spectra in various…
Figure 5. Allele frequency spectra in various demographic models.
BRE refers to the simulation of population under bottleneck followed by recent expansion from Kryukov et al.; CP refers to the simulation under a constant population size. The allele frequencies in CP are biased toward rare variants in cases, while there is little bias in BRE. The performance of RareCover is robust to data sets with different allele frequency spectra.
Figure 6. Running time of R are…
Figure 6. Running time of RareCover as a function of sample size, and number of SNPs.
As RareCover is a greedy approach, the running time increases linearly with an increase in number of SNPs, and individuals. The running time shown here does not include the time for disk input and output of the data, which incurs a fixed additional cost of ms to each run. The total running time is about twice that of single marker tests.
Figure 7. FAAH locus association.
Figure 7. FAAH locus association.
RareCover was used to analyze overlapping windows of Kbp in the re-sequenced region around FAAH. A -value was computed for each window using permutations of cases and controls. Each point corresponds to the -value of a single window starting at that location. The most significant window (described by the box) is Kbp upstream of the FAAH transcription start site. The region is part of an LTR element, which are known to carry regulatory signals, and is enriched in transcription factor binding sites, suggesting a regulatory role for the rare variants.

References

    1. Lander ES. The new genomics: global views of biology. Science. 1996;274:536–539.
    1. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423.
    1. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet. 2001;17:502–510.
    1. Consortium TWTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678.
    1. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894.
    1. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–1493.
    1. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, et al. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–1491.
    1. Schork NJ, Murray SS, Frazer KA, Topol EJ. Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev. 2009;19:212–219.
    1. Cohen JC, Pertsemlidis A, Fahmi S, Esmail S, Vega GL, et al. Multiple rare variants in NPC1L1 associated with reduced sterol absorption and plasma low-density lipoprotein levels. Proc Natl Acad Sci USA. 2006;103:1810–1815.
    1. Cohen JC, Kiss RS, Pertsemlidis A, Marcel YL, McPherson R, et al. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872.
    1. Fearnhead NS, Wilding JL, Winney B, Tonks S, Bartlett S, et al. Multiple rare variants in different genes account for multifactorial inherited susceptibility to colorectal adenomas. Proc Natl Acad Sci USA. 2004;101:15992–15997.
    1. Ji W, Foo JN, O'Roak BJ, Zhao H, Larson MG, et al. Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet. 2008;40:592–599.
    1. Nejentsev S, Walker N, Riches D, Egholm M, Todd JA. Rare variants of IFIH1, a gene implicated in antiviral responses, protect against type 1 diabetes. Science. 2009;324:387–389.
    1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701.
    1. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–321.
    1. Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet. 2009;5:e1000384.
    1. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753.
    1. Di Marzo V, Bifulco M, De Petrocellis L. The endocannabinoid system and its therapeutic exploitation. Nat Rev Drug Discov. 2004;3:771–784.
    1. Di Marzo V. The endocannabinoid system: its general strategy of action, tools for its pharmacological manipulation and potential therapeutic exploitation. Pharmacol Res. 2009;60:77–84.
    1. Di Marzo V, Goparaju SK, Wang L, Liu J, Bátkai S, et al. Leptin-regulated endocannabinoids are involved in maintaining food intake. Nature. 2001;410:822–825.
    1. Cravatt BF, Giang DK, Mayfield SP, Boger DL, Lerner RA, et al. Molecular characterization of an enzyme that degrades neuromodulatory fatty-acid amides. Nature. 1996;384:83–87.
    1. Engeli S, Böhnke J, Feldpausch M, Gorzelniak K, Janke J, et al. Activation of the peripheral endocannabinoid system in human obesity. Diabetes. 2005;54:2838–2843.
    1. Jensen DP, Andreasen CH, Andersen MK, Hansen L, Eiberg H, et al. The functional Pro129Thr variant of the FAAH gene is not associated with various fat accumulation phenotypes in a population-based cohort of 5,801 whites. J Mol Med. 2007;85:445–449.
    1. Emmanuelle Durand E, Lecoeur C, Delplanque J, Benzinou M, Degraeve F, et al. Evaluating the Association of FAAH Common Gene Variation with Childhood, Adult Severe Obesity and Type 2 Diabetes in the French Population. Obesity Facts. 2008;1:305–309.
    1. Müller TD, Reichwald K, Brönner G, Kirschner J, Nguyen TT, et al. Lack of association of genetic variants in genes of the endocannabinoid system with anorexia nervosa. Child Adolesc Psychiatry Ment Health. 2008;2:33.
    1. Lieb W, Manning AK, Florez JC, Dupuis J, Cupples LA, et al. Variants in the CNR1 and the FAAH genes and adiposity traits in the community. Obesity (Silver Spring) 2009;17:755–760.
    1. Sipe JC, Waalen J, Gerber A, Beutler E. Overweight and obesity associated with a missense polymorphism in fatty acid amide hydrolase (FAAH). Int J Obes (Lond) 2005;29:755–759.
    1. Garey MR, Johnson DS. Computers and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman and Company; 1979.
    1. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137.
    1. Fu YX. Statistical properties of segregating sites. Theor Popul Biol. 1995;48:172–197.
    1. Kryukov GV, Shpunt A, Stamatoyannopoulos JA, Sunyaev SR. Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci USA. 2009;106:3871–3876.
    1. Yeager M, Deng Z, Boland J, Matthews C, Bacior J, et al. Comprehensive resequence analysis of a 97 kb region of chromosome 10q11.2 containing the MSMB gene associated with prostate cancer. Hum Genet 2009
    1. Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB, et al. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol. 2009;10:R32.
    1. Craig DW, Pearson JV, Szelinger S, Sekar A, Redman M, et al. Identification of genetic variants using bar-coded multiplexed sequencing. Nat Methods. 2008;5:887–893.
    1. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–1858.
    1. Maccarrone M, Di Rienzo M, Finazzi-Agrò A, Rossi A. Leptin activates the anandamide hydrolase promoter in human T lymphocytes through STAT3. J Biol Chem. 2003;278:13318–13324.
    1. Maccarrone M, Gasperi V, Fezza F, Finazzi-Agrò A, Rossi A. Differential regulation of fatty acid amide hydrolase promoter in human immune cells and neuronal cells by leptin and progesterone. Eur J Biochem. 2004;271:4666–4676.
    1. Lovasz L. On the ratio of optimal integral and fractional covers. Discrete Mathematics. 1975;13:383–390.
    1. Sipe JC, Scott TM, Murray S, Harismendy O, Simon GM, et al. Biomarkers of endocannabinoid system activation in severe obesity. PLoS ONE. 2010;5:e8792.

Source: PubMed

3
Subscribe