Mapping the human genetic architecture of COVID-19

COVID-19 Host Genetics Initiative

Abstract

The genetic make-up of an individual contributes to the susceptibility and response to viral infection. Although environmental, clinical and social factors have a role in the chance of exposure to SARS-CoV-2 and the severity of COVID-191,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. Here we describe the results of three genome-wide association meta-analyses that consist of up to 49,562 patients with COVID-19 from 46 studies across 19 countries. We report 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3-7. They also represent potentially actionable mechanisms in response to infection. Mendelian randomization analyses support a causal role for smoking and body-mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19 was made possible by the community of human genetics researchers coming together to prioritize the sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.

Conflict of interest statement

A full list of competing interests is supplied as Supplementary Table 13.

© 2021. The Author(s).

Figures

Fig. 1. Geographical overview of the contributing…
Fig. 1. Geographical overview of the contributing studies to the COVID-19 HGI and composition by major ancestry groups.
Populations are defined as African (AFR), admixed American (AMR), East Asian (EAS), European (EUR), Middle Eastern (MID) and South Asian (SAS).
Fig. 2. Genome-wide association results for COVID-19.
Fig. 2. Genome-wide association results for COVID-19.
a, Top, results of a genome-wide association study of hospitalized cases of COVID-19 (n = 13,641 cases and n = 2,070,709 controls). Bottom, the results of reported SARS-CoV-2 infections (n = 49,562 cases and n = 1,770,206 controls). Loci highlighted in yellow (top) represent regions associated with the severity of the COVID-19 manifestation—that is, increased odds of more severe COVID-19 phenotypes. Loci highlighted in green (bottom) are regions associated with susceptibility to a SARS-CoV-2 infection—that is, the effect is the same across mild and severe COVID-19 phenotypes. We highlight in red genome-wide significant variants that had high heterogeneity across contributing studies and that were therefore excluded from the list of loci found. b, Results of gene prioritization using different evidence measures of gene annotation. Genes in the LD region, genes with coding variants and eGenes (fine-mapped cis-eQTL variant PIP > 0.1 in GTEx Lung) are annotated if in LD with a COVID-19 lead variant (r2 > 0.6). V2G, highest gene prioritized by the V2G score of Open Target Genetics.
Fig. 3. Genetic correlations and Mendelian randomization…
Fig. 3. Genetic correlations and Mendelian randomization causal estimates between 38 traits and COVID-19 critical illness, hospitalization and reported SARS-CoV-2 infection.
Larger squares correspond to P values with higher significance, with genetic correlations (rg) or Mendelian randomization (MR) causal estimates significantly different from zero. The size of each coloured square indicates the magnitude of the P value, with P < 0.05 shown as a full-sized square, P = 0.05–0.1 as a large square, P = 0.1–0.5 as a medium square and P > 0.5 as a small square. Genetic correlations or causal estimates that are significantly different from zero at an FDR of 5% are marked with an asterisk. Two-sided P values were calculated using LDSC for genetic correlations and inverse-variance-weighted analysis for Mendelian randomization. ADHD, attention-deficit hyperactivity disorder; BMI, body mass index; CRP, C-reactive protein; eGFR, estimated glomerular filtration rate.
Extended Data Fig. 1. Analytical summary of…
Extended Data Fig. 1. Analytical summary of the COVID-19 HGI meta-analysis.
Using the analytical plan set by the COVID-19 HGI, each individual study runs their analyses and uploads the results to the Initiative, who then runs the meta-analysis. There are three main analyses that each study can contribute summary statistics to: critically ill COVID-19, hospitalized COVID-19 and reported SARS-CoV-2 infection. The phenotypic criteria used to define cases are listed in the dark grey boxes, along with the numbers of cases (N) included in the final all-ancestries meta-analysis. Controls were defined in the same way across all three analyses as everybody that is not a case—for example, population controls (light grey box). Sensitivity analyses—not reported in this extended data figure—also included mild and/or asymptomatic cases of COVID-19 as control individuals. Sample number (N) of control individuals differed between the analyses due to the difference in the number of studies contributing data to these.
Extended Data Fig. 2. Projection of contributing…
Extended Data Fig. 2. Projection of contributing studies samples into the same PC space.
We asked participating studies to perform a PC projection using the 1000 Genomes Project and Human Genome Diversity Project as a reference, with a common set of variants. For each panel (except for the reference), coloured points correspond to contributed samples from each cohort, whereas grey points correspond to the reference samples from the 1000 Genomes Project. Colour represents a genetic population that each cohort specified. As 23andMe, Genomics England 100,000 Genomes Project (GenomicsEngland100kgp), and Million Veterans Program (MVP) only submitted PCA images, we overlaid their submitted transparent images using the same coordinates, instead of directly plotting them. Populations are defined as African (AFR), admixed American (AMR), East Asian (EAS), European (EUR), Middle Eastern (MID) and South Asian (SAS), Oceanian (OCE).
Extended Data Fig. 3. Locus-zoom plots of…
Extended Data Fig. 3. Locus-zoom plots of the 3p21.31 region for reported SARS-CoV-2 infection.
a, A standard plot without exclusion. Here, the severity lead variant rs10490770 (chr. 3: 45823240T:C) is shown as a lead variant. b, Additional independent susceptibility signal(s) after excluding variants with r2 > 0.05 with rs10490770. The susceptibility lead variant rs2271616 (chr. 3: 45796521G:T) is highlighted.
Extended Data Fig. 4. Genome-wide meta-analysis association…
Extended Data Fig. 4. Genome-wide meta-analysis association results for critical illness due to COVID-19.
The locus on chromosome 6 is the HLA locus, which was removed from the list of reported loci in Supplementary Table 2 due to the high heterogeneity in effect size estimated between studies included in the analysis. The locus on chromosome 7 was also not reported in Supplementary Table 2 due to missingness across studies—that is, the high number of studies in the meta-analysis that did not report summary statistics for this region. There are two association peaks on chromosome 19.
Extended Data Fig. 5. Sensitivity analyses for…
Extended Data Fig. 5. Sensitivity analyses for overlapping controls in genomiCC and UK Biobank.
Comparison of the beta effect sizes (top) and unadjusted P values (bottom) of the 13 lead variants, using data from the COVID-19 critical illness meta-analysis in all the cohorts to leaving out genomiCC (cases, n = 4,354; controls, n = 1,474,655; total, n = 1,479,009), leaving out the UK Biobank (UKBB; cases, n = 5,870; controls, n = 1,155,203; total, n = 1,161,073) and leaving out both genomiCC and UK Biobank (cases, n = 4,045; controls, n = 1,146,078; total, n = 1,150,123) (from left to right, respectively). Top, dots and grey bars represent the beta effect size estimates ± standard error from the corresponding GWAS meta-analysis. Bottom, dots represent two-sided P values from the corresponding GWAS meta-analysis. Filled dots indicate variants that showed genome-wide significance in the full meta-analysis of critical illness due to COVID-19, and empty dots represent variants that were not significant for critical illness but were significant for either hospitalization due to COVID-19 or reported SARS-CoV-2 infection. Red dots represent variants that showed genome-wide significance in the leave-one-out analysis for genomiCC, UK Biobank or genomiCC and UK Biobank.
Extended Data Fig. 6. Comparison of χ…
Extended Data Fig. 6. Comparison of χ2 statistics and r2 values to the lead variant in the 3p21.31 region.
ac, Data are shown for critical illness (a), hospitalization (b) and reported SARS-CoV-2 infection (c). The left blue peak in c, which is uncorrelated with the lead variants in the region, indicates that there are independent signals.
Extended Data Fig. 7. Comparison of the…
Extended Data Fig. 7. Comparison of the effect sizes of lead variants between pairs of COVID-19 meta-analyses.
Comparison of effect sizes for the nine variants associated with severity of COVID-19 disease. a, Comparing hospitalized cases of COVID-19 versus population controls (n = 10,428 cases and n = 1,483,270 controls) and critically ill cases of COVID-19 versus population controls (n = 6,179 cases and n = 1,483,780 controls). b, Hospitalized cases of COVID-19 versus population controls (n = 5,806 cases and n = 1,144,263 controls) and hospitalized cases of COVID-19 versus non-hospitalized cases of COVID-19 (n = 5,773 cases and n = 15,497 controls). Sample sizes for hospitalized cases of COVID-19 versus population controls differ between a and b due to differences in the sampling of studies selected for the analysis. This selection included all studies that were able to contribute data to the respective analyses that the data were compared to (shown on the y axis) in each panel. Dots represent the effect size beta estimates, bars represent the 95% confidence interval of the estimates. Effect size estimates and P values for heterogeneity tests (Cochran’s Q, two-tailed test) are reported in Supplementary Table 3.
Extended Data Fig. 8. PheWAS for genome-wide…
Extended Data Fig. 8. PheWAS for genome-wide significant lead variants.
Selected phenotypes associated with genome-wide significant COVID-19 variants (see Supplementary Table 6 for a complete list). We report those associations for which a lead variant from a previous GWAS result was in high LD (r2 > 0.8) with the index COVID-19 variants. The colour represents the z-scores of correlated risk increasing alleles for the trait. The total number of associations for each COVID-19 variant is highlighted in the grey box.
Extended Data Fig. 9. Genetic correlation with…
Extended Data Fig. 9. Genetic correlation with COVID-19 phenotypes.
Each column shows the genetic correlation results for the three COVID-19 phenotypes (European-ancestry analyses only): critical illness, hospitalization and reported SARS-CoV-2 infection. The traits that the genetic correlation is run against are listed on the left. Significant correlations (FDR P < 0.05) are in black and non-significant correlations are in grey. Two-sided P values were calculated using LDSC for genetic correlations and exact estimates, unadjusted standard errors and two-sided P values are available in Supplementary Table 11.
Extended Data Fig. 10. Mendelian randomization sensitivity…
Extended Data Fig. 10. Mendelian randomization sensitivity analyses.
Genetic correlations and Forest plots displaying the causal estimates for each of the sensitivity analyses used in the Mendelian randomization analysis for trait pairs that were significant at an FDR of 5%. Two-sided P values were estimated using IVW, WME, WMBE and MR-PRESSO analyses. RBC, red blood cell count.

References

    1. Docherty AB, et al. Features of 20 133 UK patients in hospital with COVID-19 using the ISARIC WHO Clinical Characterisation Protocol: prospective observational cohort study. Br. Med. J. 2020;369:m1985. doi: 10.1136/bmj.m1985.
    1. Zhou F, et al. Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study. Lancet. 2020;395:1054–1062. doi: 10.1016/S0140-6736(20)30566-3.
    1. Dendrou CA, et al. Resolving TYK2 locus genotype-to-phenotype differences in autoimmunity. Sci. Transl. Med. 2016;8:363ra149. doi: 10.1126/scitranslmed.aag1974.
    1. Astle WJ, et al. The allelic landscape of human blood cell trait variation and links to common complex disease. Cell. 2016;167:1415–1429.e19. doi: 10.1016/j.cell.2016.10.042.
    1. Fingerlin TE, et al. Genome-wide association study identifies multiple susceptibility loci for pulmonary fibrosis. Nat. Genet. 2013;45:613–620. doi: 10.1038/ng.2609.
    1. Wang Z, et al. Meta-analysis of genome-wide association studies identifies multiple lung cancer susceptibility loci in never-smoking Asian women. Hum. Mol. Genet. 2016;25:620–629. doi: 10.1093/hmg/ddv494.
    1. Shrine N, et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 2019;51:481–493. doi: 10.1038/s41588-018-0321-7.
    1. Buitrago-Garcia D, et al. Occurrence and transmission potential of asymptomatic and presymptomatic SARS-CoV-2 infections: a living systematic review and meta-analysis. PLoS Med. 2020;17:e1003346. doi: 10.1371/journal.pmed.1003346.
    1. van der Made CI, et al. Presence of genetic variants among young men with severe COVID-19. J. Am. Med. Assoc. 2020;324:663–673. doi: 10.1001/jama.2020.13719.
    1. Zhang Q, et al. Inborn errors of type I IFN immunity in patients with life-threatening COVID-19. Science. 2020;370:eabd4570. doi: 10.1126/science.abd4570.
    1. Povysil, G. et al. Rare loss-of-function variants in type I IFN immunity genes are not associated with severe COVID-19. J. Clin. Invest. 147834 (2021).
    1. Severe COVID-19 GWAS Group Genomewide association study of severe covid-19 with respiratory failure. N. Engl. J. Med. 2020;383:1522–1534. doi: 10.1056/NEJMoa2020283.
    1. Shelton JF, et al. Trans-ancestry analysis reveals genetic and nongenetic associations with COVID-19 susceptibility and severity. Nat. Genet. 2021;53:801–808. doi: 10.1038/s41588-021-00854-7.
    1. Pairo-Castineira E, et al. Genetic mechanisms of critical illness in COVID-19. Nature. 2021;591:92–98. doi: 10.1038/s41586-020-03065-y.
    1. Roberts, G. H. L. et al. AncestryDNA COVID-19 host genetic study identifies three novel loci. Preprint at 10.1101/2020.10.06.20205864 (2020).
    1. Kosmicki, J. A. et al. A catalog of associations between rare coding variants and COVID-19 outcomes. Preprint at 10.1101/2020.10.28.20221804 (2021).
    1. COVID-19 Host Genetics Initiative The COVID-19 Host Genetics Initiative, a global initiative to elucidate the role of host genetic factors in susceptibility and severity of the SARS-CoV-2 virus pandemic. Eur. J. Hum. Genet. 2020;28:715–718. doi: 10.1038/s41431-020-0636-6.
    1. Couturier N, et al. Tyrosine kinase 2 variant influences T lymphocyte polarization and multiple sclerosis susceptibility. Brain. 2011;134:693–703. doi: 10.1093/brain/awr010.
    1. Li Z, et al. Two rare disease-associated Tyk2 variants are catalytically impaired but signaling competent. J. Immunol. 2013;190:2335–2344. doi: 10.4049/jimmunol.1203118.
    1. GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science. 2020;369:1318–1330. doi: 10.1126/science.aaz1776.
    1. Hao K, et al. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet. 2012;8:e1003029. doi: 10.1371/journal.pgen.1003029.
    1. Dai J, et al. Identification of risk loci and a polygenic risk score for lung cancer: a large-scale prospective cohort study in Chinese populations. Lancet Respir. Med. 2019;7:881–891. doi: 10.1016/S2213-2600(19)30144-4.
    1. Manichaikul A, et al. Genome-wide association study of subclinical interstitial lung disease in MESA. Respir. Res. 2017;18:97. doi: 10.1186/s12931-017-0581-2.
    1. Stefansson H, et al. A common inversion under selection in Europeans. Nat. Genet. 2005;37:129–137. doi: 10.1038/ng1508.
    1. Boettger LM, Handsaker RE, Zody MC, McCarroll SA. Structural haplotypes and recent evolution of the human 17q21.31 region. Nat. Genet. 2012;44:881–885. doi: 10.1038/ng.2334.
    1. Ghoussaini M, et al. Open Targets Genetics: systematic identification of trait-associated genes using large-scale genetics and functional genomics. Nucleic Acids Res. 2021;49:D1311–D1320. doi: 10.1093/nar/gkaa840.
    1. Xiao G, et al. CXCL16/CXCR6 chemokine signaling mediates breast cancer progression by pERK1/2-dependent mechanisms. Oncotarget. 2015;6:14165–14178. doi: 10.18632/oncotarget.3690.
    1. Wei Q, et al. LZTFL1 suppresses lung tumorigenesis by maintaining differentiation of lung epithelial cells. Oncogene. 2016;35:2655–2663. doi: 10.1038/onc.2015.328.
    1. Vuille-dit-Bille RN, et al. Human intestine luminal ACE2 and amino acid transporter expression increased by ACE-inhibitors. Amino Acids. 2015;47:693–705. doi: 10.1007/s00726-014-1889-6.
    1. Finucane HK, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 2018;50:621–629. doi: 10.1038/s41588-018-0081-4.
    1. GTEx Consortium The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015;348:648–660. doi: 10.1126/science.1262110.
    1. Eyre S, et al. High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat. Genet. 2012;44:1336–1340. doi: 10.1038/ng.2462.
    1. Tsoi LC, et al. Large scale meta-analysis characterizes genetic architecture for common psoriasis associated variants. Nat. Commun. 2017;8:15382. doi: 10.1038/ncomms15382.
    1. Langefeld CD, et al. Transancestral mapping and genetic load in systemic lupus erythematosus. Nat. Commun. 2017;8:16021. doi: 10.1038/ncomms16021.
    1. Kichaev G, et al. Leveraging polygenic functional enrichment to improve GWAS power. Am. J. Hum. Genet. 2019;104:65–75. doi: 10.1016/j.ajhg.2018.11.008.
    1. Lu MM, Li S, Yang H, Morrisey EE. Foxp4: a novel member of the Foxp subfamily of winged-helix genes co-expressed with Foxp1 and Foxp2 in pulmonary and gut tissues. Gene Expr. Patterns. 2002;2:223–228. doi: 10.1016/S1567-133X(02)00058-3.
    1. Li S, et al. Foxp1/4 control epithelial cell fate during lung development and regeneration through regulation of anterior gradient 2. Development. 2012;139:2500–2509. doi: 10.1242/dev.079699.
    1. Wei P-F. Diagnosis and treatment protocol for novel coronavirus pneumonia (trial version 7) Chin. Med. J. (Engl.) 2020;133:1087–1095.
    1. Zhou W, et al. Efficiently controlling for case–control imbalance and sample relatedness in large-scale genetic association studies. Nat. Genet. 2018;50:1335–1341. doi: 10.1038/s41588-018-0184-y.
    1. Purcell S, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 2007;81:559–575. doi: 10.1086/519795.
    1. Karczewski KJ, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–443. doi: 10.1038/s41586-020-2308-7.
    1. Evangelou E, Ioannidis JPA. Meta-analysis methods for genome-wide association studies and beyond. Nat. Rev. Genet. 2013;14:379–389. doi: 10.1038/nrg3472.
    1. Cochran WG. The combination of estimates from different experiments. Biometrics. 1954;10:101–129. doi: 10.2307/3001666.
    1. Chang CC, et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7. doi: 10.1186/s13742-015-0047-8.
    1. McLaren W, et al. The Ensembl Variant Effect Predictor. Genome Biol. 2016;17:122. doi: 10.1186/s13059-016-0974-4.
    1. Kerimov, N. et al. eQTL Catalogue: a compendium of uniformly processed human gene expression and splicing QTLs. Preprint at 10.1101/2020.01.29.924266 (2021).
    1. Sun BB, et al. Genomic atlas of the human plasma proteome. Nature. 2018;558:73–79. doi: 10.1038/s41586-018-0175-2.
    1. Buniello A, et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 2019;47:D1005–D1012. doi: 10.1093/nar/gky1120.
    1. Bulik-Sullivan BK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 2015;47:291–295. doi: 10.1038/ng.3211.
    1. Finucane HK, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 2015;47:1228–1235. doi: 10.1038/ng.3404.
    1. CDC. People with certain medical conditions. (2021).
    1. Williamson EJ, et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature. 2020;584:430–436. doi: 10.1038/s41586-020-2521-4.
    1. .Zhou, T. et al. Educational attainment and drinking behaviors: Mendelian randomization study in UK Biobank. Mol. Psychiatry10.1038/s41380-019-0596-9 (2019).
    1. 1000 Genomes Project Consortium A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534.
    1. Hemani G, et al. The MR-Base platform supports systematic causal inference across the human phenome. eLife. 2018;7:e34408. doi: 10.7554/eLife.34408.
    1. Verbanck M, Chen C-Y, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 2018;50:693–698. doi: 10.1038/s41588-018-0099-7.
    1. Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 2015;44:512–525. doi: 10.1093/ije/dyv080.
    1. Bowden J, et al. Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I2 statistic. Int. J. Epidemiol. 2016;45:1961–1974. doi: 10.1093/ije/dyw252.
    1. Slob EAW, Burgess S. A comparison of robust Mendelian randomization methods using summary data. Genet. Epidemiol. 2020;44:313–329. doi: 10.1002/gepi.22295.

Source: PubMed

Подписаться