Interpretation of genetic association studies: markers with replicated highly significant odds ratios may be poor classifiers

Johanna Jakobsdottir, Michael B Gorin, Yvette P Conley, Robert E Ferrell, Daniel E Weeks, Johanna Jakobsdottir, Michael B Gorin, Yvette P Conley, Robert E Ferrell, Daniel E Weeks

Abstract

Recent successful discoveries of potentially causal single nucleotide polymorphisms (SNPs) for complex diseases hold great promise, and commercialization of genomics in personalized medicine has already begun. The hope is that genetic testing will benefit patients and their families, and encourage positive lifestyle changes and guide clinical decisions. However, for many complex diseases, it is arguable whether the era of genomics in personalized medicine is here yet. We focus on the clinical validity of genetic testing with an emphasis on two popular statistical methods for evaluating markers. The two methods, logistic regression and receiver operating characteristic (ROC) curve analysis, are applied to our age-related macular degeneration dataset. By using an additive model of the CFH, LOC387715, and C2 variants, the odds ratios are 2.9, 3.4, and 0.4, with p-values of 10(-13), 10(-13), and 10(-3), respectively. The area under the ROC curve (AUC) is 0.79, but assuming prevalences of 15%, 5.5%, and 1.5% (which are realistic for age groups 80 y, 65 y, and 40 y and older, respectively), only 30%, 12%, and 3% of the group classified as high risk are cases. Additionally, we present examples for four other diseases for which strongly associated variants have been discovered. In type 2 diabetes, our classification model of 12 SNPs has an AUC of only 0.64, and two SNPs achieve an AUC of only 0.56 for prostate cancer. Nine SNPs were not sufficient to improve the discrimination power over that of nongenetic predictors for risk of cardiovascular events. Finally, in Crohn's disease, a model of five SNPs, one with a quite low odds ratio of 0.26, has an AUC of only 0.66. Our analyses and examples show that strong association, although very valuable for establishing etiological hypotheses, does not guarantee effective discrimination between cases and controls. The scientific community should be cautious to avoid overstating the value of association findings in terms of personalized medicine before their time.

Conflict of interest statement

The authors are listed as the inventors in a patent filed by the University of Pittsburgh for the LOC387715/ARMS2 locus.

Figures

Figure 1. Accuracy curves for binary markers.
Figure 1. Accuracy curves for binary markers.
The curves of accuracy points (FPF, TPF pair) for binary markers with ORs 1.5, 10, and 50 are plotted. The black diamonds and horizontal dotted line highlight the points (FPF, TPF) = (FPF, 80%) on the accuracy curves. The ORs are marked on the curves.
Figure 2. AUC for additive risk models…
Figure 2. AUC for additive risk models of SNP markers as function of risk allele frequency in cases.
The AUC is estimated for all risk allele frequencies in controls assuming additive ORs 1.5, 3, 5, 10, and 50 (the ORs are marked on the curves). The numbers in gray are the risk allele frequencies in controls corresponding to the maximum AUC for each OR. The dotted horizontal line in gray marks an AUC of 0.7 and 0.8. The black diamonds highlight the points (pca, AUC) = (pca, 0.80) for markers with additive ORs 10 and 50 (see Table 1).
Figure 3. ROC curves for AMD classification…
Figure 3. ROC curves for AMD classification models.
The black diamond highlights the point (FPF, TPF) = (31%, 74%) on the ROC curve of the three-factor model of CFH, LOC387715, and C2. The gray line for reference gives the “chance” classification rule: the farther the ROC curve is from the chance line, the better the classification rule.
Figure 4. Integrated predictiveness and classification plot…
Figure 4. Integrated predictiveness and classification plot for the three-factor model.
The light-gray lines show how the plots are used in the examples given in the text: the dashed lines are for the first example with TPF = 74%, FPF = 31%, risk percentile = 35%, and AMD risk threshold = 4%; and the dotted lines are for the second example with AMD risk threshold = 25%, risk percentile = 85%, TPF = 17%, and FPF = 5%. On the top panel, the risks for cases are marked with a dot in black while the risks for controls are marked with a vertical line segment in dark-gray.

References

    1. Mitka M. Genetics research already touching your practice. American Medical News. 1998 April 6 News sect: 3.
    1. Feero WG. Genetics of common disease: a primary care priority aligned with a teachable moment? Genet Med. 2008;10:81–82.
    1. Goetz T. 23AndMe will decode your DNA for $1000. Welcome to the age of genomics. Wired Magazine. 2007;15.12:256–265, 283.
    1. Calefato JM, Nippert I, Harris HJ, Kristoffersson U, Schmidtke J, et al. Assessing educational priorities in genetics for general practitioners and specialists in five countries: factor structure of the Genetic-Educational Priorities (Gen-EP) scale. Genet Med. 2008;10:99–106.
    1. Julian-Reynier C, Nippert I, Calefato JM, Harris H, Kristoffersson U, et al. Genetics in clinical practice: general practitioners' educational priorities in European countries. Genet Med. 2008;10:107–113.
    1. Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol. 2004;159:882–890.
    1. Janssens AC, Moonesinghe R, Yang Q, Steyerberg EW, van Duijn CM, et al. The impact of genotype frequencies on the clinical validity of genomic profiling for predicting common chronic diseases. Genet Med. 2007;9:528–535.
    1. Dunai G, Vasarhelyi B, Szabo M, Hajdu J, Meszaros G, et al. Published genetic variants in retinopathy of prematurity: random forest analysis suggests a negligible contribution to risk and severity. Curr Eye Res. 2008;33:501–505.
    1. Gold B, Merriam JE, Zernant J, Hancox LS, Taiber AJ, et al. Variation in factor B (BF) and complement component 2 (C2) genes is associated with age-related macular degeneration. Nat Genet. 2006;38:458–462.
    1. Jakobsdottir J, Conley YP, Weeks DE, Ferrell RE, Gorin MB. C2 and CFB genes in age-related maculopathy and joint action with CFH and LOC387715 genes. PLoS ONE. 2008;3:e2199. doi:10.1371/journal.pone.0002199.
    1. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935.
    1. Davies HT, Crombie IK, Tavakoli M. When can odds ratios mislead? BMJ. 1998;316:989–991.
    1. Deeks J. When can odds ratios mislead? Odds ratios should be used only in case-control studies and logistic regression analyses. BMJ. 1998;317:1155–1156.
    1. Woloshin S, Schwartz LM, Black WC, Welch HG. Women's perceptions of breast cancer risk: how you ask matters. Med Decis Making. 1999;19:221–229.
    1. Pencina MJ, D'Agostino RB, Sr, D'Agostino RB, Jr, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med. 2008;27:157–172.
    1. Kathiresan S, Melander O, Anevski D, Guiducci C, Burtt NP, et al. Polymorphisms associated with cholesterol and risk of cardiovascular events. N Engl J Med. 2008;358:1240–1249.
    1. Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y, et al. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science. 2007;316:1341–1345.
    1. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445:881–885.
    1. Weedon MN, McCarthy MI, Hitman G, Walker M, Groves CJ, et al. Combining information from common type 2 diabetes risk polymorphisms improves disease prediction. PLoS Med. 2006;3:e374. doi:10.1371/journal.pmed.0030374.
    1. Lu Q, Elston RC. Using the optimal receiver operating characteristic curve to design a predictive genetic test, exemplified with type 2 diabetes. Am J Hum Genet. 2008;82:641–651.
    1. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, et al. Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet. 2007;39:645–649.
    1. Cummings JR, Ahmad T, Geremia A, Beckly J, Cooney R, et al. Contribution of the novel inflammatory bowel disease gene IL23R to disease susceptibility and phenotype. Inflamm Bowel Dis. 2007;13:1063–1068.
    1. Cummings JR, Cooney R, Pathan S, Anderson CA, Barrett JC, et al. Confirmation of the role of ATG16L1 as a Crohn's disease susceptibility gene. Inflamm Bowel Dis. 2007;13:941–946.
    1. Duerr RH, Taylor KD, Brant SR, Rioux JD, Silverberg MS, et al. A genome-wide association study identifies IL23R as an inflammatory bowel disease gene. Science. 2006;314:1461–1463.
    1. Parkes M, Barrett JC, Prescott NJ, Tremelling M, Anderson CA, et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility. Nat Genet. 2007;39:830–832.
    1. Rioux JD, Xavier RJ, Taylor KD, Silverberg MS, Goyette P, et al. Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis. Nat Genet. 2007;39:596–604.
    1. Pepe MS, Feng Z, Huang Y, Longton G, Prentice R, et al. Integrating the predictiveness of a marker with its performance as a classifier. Am J Epidemiol. 2008;167:362–368.
    1. Friedman DS, O'Colmain BJ, Munoz B, Tomany SC, McCarty C, et al. Prevalence of age-related macular degeneration in the United States. Arch Ophthalmol. 2004;122:564–572.
    1. Young ID. Introduction to risk calculation in genetic counseling. 3rd edition. Oxford (United Kingdom): Oxford University Press; 2007. p. . 241 p.
    1. Janes H, Pepe M. The optimal ratio of cases to controls for estimating the classification accuracy of a biomarker. Biostatistics. 2006;7:456–468.
    1. Moskowitz CS, Pepe MS. Quantifying and comparing the accuracy of binary biomarkers when predicting a failure time outcome. Stat Med. 2004;23:1555–1570.
    1. Pepe MS, Zheng Y, Jin Y, Huang Y, Parikh CR, et al. Evaluating the ROC performance of markers for future events. Lifetime Data Anal. 2008;14:86–113.
    1. Edwards AO, Ritter R, 3rd, Abel KJ, Manning A, Panhuysen C, et al. Complement factor H polymorphism and age-related macular degeneration. Science. 2005;308:421–424.
    1. Hageman GS, Anderson DH, Johnson LV, Hancox LS, Taiber AJ, et al. A common haplotype in the complement regulatory gene factor H (HF1/CFH) predisposes individuals to age-related macular degeneration. Proc Natl Acad Sci U S A. 2005;102:7227–7232.
    1. Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, et al. Complement factor H variant increases the risk of age-related macular degeneration. Science. 2005;308:419–421.
    1. Jakobsdottir J, Conley YP, Weeks DE, Mah TS, Ferrell RE, et al. Susceptibility genes for age-related maculopathy on chromosome 10q26. Am J Hum Genet. 2005;77:389–407.
    1. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, et al. Complement factor H polymorphism in age-related macular degeneration. Science. 2005;308:385–389.
    1. Rivera A, Fisher SA, Fritsche LG, Keilhauer CN, Lichtner P, et al. Hypothetical LOC387715 is a second major susceptibility gene for age-related macular degeneration, contributing independently of complement factor H to disease risk. Hum Mol Genet. 2005;14:3227–3236.
    1. Swaroop A, Branham KE, Chen W, G. Genetic susceptibility to age-related macular degeneration: a paradigm for dissecting complex disease traits. Hum Mol Genet. 2007;16 Spec No. 2:R174–182.
    1. Conley YP, Jakobsdottir J, Mah T, Weeks DE, Klein R, et al. CFH, ELOVL4, PLEKHA1 and LOC387715 genes and susceptibility to age-related maculopathy: AREDS and CHS cohorts and meta-analyses. Hum Mol Genet. 2006;15:3206–3218.
    1. Maller J, George S, Purcell S, Fagerness J, Altshuler D, et al. Common variation in three genes, including a noncoding variant in CFH, strongly influences risk of age-related macular degeneration. Nat Genet. 2006;38:1055–1059.
    1. Ross RJ, Verma V, Rosenberg KI, Chan CC, Tuo J. Genetic markers and biomarkers for age-related macular degeneration. Expert Rev Ophthalmol. 2007;2:443–457.
    1. Risky business. Nat Genet. 2007;39:1415.
    1. Dinu V, Miller PL, Zhao H. Evidence for association between multiple complement pathway genes and AMD. Genet Epidemiol. 2007;31:224–237.
    1. Maller JB, Fagerness JA, Reynolds RC, Neale BM, Daly MJ, et al. Variation in complement factor 3 is associated with risk of age-related macular degeneration. Nat Genet. 2007;39:1200–1201.
    1. Yates JR, Sepp T, Matharu BK, Khan JC, Thurlby DA, et al. Complement C3 variant and the risk of age-related macular degeneration. N Engl J Med. 2007;357:553–561.
    1. Janssens AC, Aulchenko YS, Elefante S, Borsboom GJ, Steyerberg EW, et al. Predictive testing for complex diseases using multiple genes: fact or fiction? Genet Med. 2006;8:395–400.
    1. Zhou XH, Obuchowski NA, McClish DK. Statistical methods in diagnostic medicine. New York: John Wiley & Sons; 2002. p. . 437 p.

Source: PubMed

3
Abonner