Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations

Amit V Khera, Mark Chaffin, Krishna G Aragam, Mary E Haas, Carolina Roselli, Seung Hoan Choi, Pradeep Natarajan, Eric S Lander, Steven A Lubitz, Patrick T Ellinor, Sekar Kathiresan, Amit V Khera, Mark Chaffin, Krishna G Aragam, Mary E Haas, Carolina Roselli, Seung Hoan Choi, Pradeep Natarajan, Eric S Lander, Steven A Lubitz, Patrick T Ellinor, Sekar Kathiresan

Abstract

A key public health need is to identify individuals at high risk for a given disease to enable enhanced screening or preventive therapies. Because most common diseases have a genetic component, one important approach is to stratify individuals based on inherited DNA variation1. Proposed clinical applications have largely focused on finding carriers of rare monogenic mutations at several-fold increased risk. Although most disease risk is polygenic in nature2-5, it has not yet been possible to use polygenic predictors to identify individuals at risk comparable to monogenic mutations. Here, we develop and validate genome-wide polygenic scores for five common diseases. The approach identifies 8.0, 6.1, 3.5, 3.2, and 1.5% of the population at greater than threefold increased risk for coronary artery disease, atrial fibrillation, type 2 diabetes, inflammatory bowel disease, and breast cancer, respectively. For coronary artery disease, this prevalence is 20-fold higher than the carrier frequency of rare monogenic mutations conferring comparable risk6. We propose that it is time to contemplate the inclusion of polygenic risk prediction in clinical care, and discuss relevant issues.

Figures

Figure 1.. Study design and workflow
Figure 1.. Study design and workflow
A genome-wide polygenic score (GPS) for each disease was derived by combining summary association statistics from a recent large GWAS and a linkage disequilibrium reference panel of 503 Europeans. 31 candidate GPS were derived using two strategies: 1. ‘pruning and thresholding’ – aggregation of independent polymorphisms that exceed a specified level of significance in the discovery GWAS and 2. LDPred computational algorithm, a Bayesian approach to calculate a posterior mean effect for all variants based on a prior (effect size in the prior GWAS) and subsequent shrinkage based on linkage disequilibrium. The seven candidate LDPred scores vary with respect to the tuning parameter ρ, the proportion of variants assumed to be causal, as previously recommended. The optimal GPS for each disease was chosen based on area under the receiver-operator curve (AUC) in the UK Biobank Phase I validation dataset (N=120,280 Europeans) and subsequently calculated in an independent UK Biobank Phase II testing dataset (N=288,978 Europeans).
Figure 2.. Risk for coronary artery disease…
Figure 2.. Risk for coronary artery disease according to genome-wide polygenic score.
(a) Distribution of genome-wide polygenic score for CAD (GPSCAD) in the UK biobank testing dataset (N=288,978). The x-axis represents GPSCAD, with values scaled to a mean of 0 and standard deviation of 1 to facilitate interpretation. Shading reflects proportion of population with 3, 4, and 5-fold increased risk versus remainder of the population. Odds ratio assessed in a logistic regression model adjusted for age, sex, genotyping array, and the first four principal components of ancestry; (b) GPSCAD percentile among CAD cases versus controls in the UK biobank validation cohort. Within each boxplot, the horizontal lines reflect the median, the top and bottom of the box reflects the interquartile range, and the whiskers reflect the maximum and minimum value within each grouping; (c) prevalence of CAD according to 100 groups of the validation cohort binned according to percentile of the GPSCAD.
Figure 3.. Risk gradient for disease according…
Figure 3.. Risk gradient for disease according to genome-wide polygenic score percentile
100 groups of the validation cohort were derived according to percentile of the disease-specific GPS. Prevalence of disease displayed for risk of (a) atrial fibrillation, (b) type 2 diabetes, (c) inflammatory bowel disease, and (d) breast cancer according to GPS percentile.

References

    1. Green ED, Guyer MS; National Human Genome Research Institute. Charting a course for genomic medicine from base pairs to bedside. Nature. 470, 204–213 (2011).
    1. Fisher RA The correlation between relatives on the supposition of Mendelian inheritance. Proc. Roy. Soc. Edinburgh 52, 99–433 (1918).
    1. Gibson G Rare and common variants: twenty arguments. Nat Rev Genet. 18, 135–45 (2012).
    1. Golan D, Lander ES, Rosset S. Measuring missing heritability: inferring the contribution of common variants. Proc Natl Acad Sci U S A. 111, E5272–81 (2014).
    1. Fuchsberger C, et al. The genetic architecture of type 2 diabetes. Nature. 536, 41–47 (2016).
    1. Abul-Husn NS, et al. Genetic identification of familial hypercholesterolemia within a single U.S. health care system. Science. 354 (2016).
    1. Nordestgaard BG, et al. Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease: consensus statement of the European Atherosclerosis Society. Eur Heart J. 34, 3478–90a (2013).
    1. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 536, 285–91 (2016).
    1. Estrada K, et al. Association of a low-frequency variant in HNF1A with type 2 diabetes in a Latino population. JAMA. 311, 2305–14 (2014).
    1. Chatterjee N et al. Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nat Genet. 45, 400–405 (2013).
    1. Zhang Y, et al. Estimation of complex effect-size distributions using summary-level statistics from genome-wide association studies across 32 complex traits and implications for the future. Preprint at: (2017).
    1. Ripatti S, et al. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet. 327, 1393–400 (2010).
    1. Vilhjálmsson BJ et al. Modeling linkage disequilibrium increases accuracy of polygenic scores. Am J Hum Genet. 97, 576–592 (2015).
    1. Sudlow C et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
    1. Bycroft C, et al. Genome-wide genetic data on ~500,000 UK Biobank participants. Preprint at: (2017).
    1. Nikpay M et al. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat Genet. 47,1121–1130 (2015).
    1. Tada H, et al. Risk prediction by genetic risk scores for coronary heart disease is independent of self-reported family history. Eur Heart J. 37, 561–7 (2016).
    1. Abraham G, et al. Genomic prediction of coronary heart disease. Eur Heart J. 37, 3267–3278 (2016).
    1. Khera AV, et al. Genetic risk, adherence to a healthy lifestyle, and coronary disease. N Engl J Med. 375, 2349–2358 (2016).
    1. Mega JL, et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet. 385, 2264–2271 (2015).
    1. Natarajan P, et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation. 135, 2091–2101 (2017).
    1. January CT, et al. 2014 AHA/ACC/HRS guideline for the management of patients with atrial fibrillation: a report of the American College of Cardiology/American Heart Association Task Force on practice guidelines and the Heart Rhythm Society. Circulation. 130, e199–267 (2014).
    1. GBD 2015 Disease and Injury Incidence and Prevalence Collaborators. Global, regional, and national incidence, prevalence, and years live with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet. 388, 1545–1602 (2016).
    1. Knowler WC, et al. Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med. 346, 393–403 (2002).
    1. Abraham C & Cho JH Inflammatory bowel disease. N Engl J Med. 361, 2066–78 (2009).
    1. Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 358, 2796–803 (2008).
    1. Fry A, et al. Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population. Am J Epidemiol. 186, 1026–34 (2017).
    1. Khera AV & Kathiresan S Is coronary atherosclerosis one disease or many? Setting realistic expectations for precision medicine. Circulation. 135, 1005–07 (2017).
    1. Martin AR et al. Human demographic history impacts genetic risk prediction across diverse populations. Am J Hum Genet. 100, 635–649 (2017).
    1. Christophersen IE, et al. Large-scale analyses of common and rare variants identify 12 new loci associated with atrial fibrillation. Nat Genet. 49, 946–952 (2017).
    1. Scott RA, et al. An Expanded Genome-Wide Association Study of Type 2 Diabetes in Europeans. Diabetes. 66, 2888–2902 (2017).
    1. Liu JZ, et al. Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations. Nat Genet. 47, 979–986 (2015).
    1. Michailidou K, et al. Association analysis identifies 65 new breast cancer risk loci. Nature. 551, 92–94 (2017).
    1. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 526, 68–74 (2015).
    1. Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience. 4, 7 (2015).
    1. Ganna A, et al. Ultra-rare disruptive and damaging mutations influence educational attainment in the general population. Nat Neurosci. 19, 1563–65 (2016).

Source: PubMed

3
订阅