Genome-wide association study-based prediction of atrial fibrillation using artificial intelligence

Oh-Seok Kwon, Myunghee Hong, Tae-Hoon Kim, Inseok Hwang, Jaemin Shim, Eue-Keun Choi, Hong Euy Lim, Hee Tae Yu, Jae-Sun Uhm, Boyoung Joung, Seil Oh, Moon-Hyoung Lee, Young-Hoon Kim, Hui-Nam Pak, Oh-Seok Kwon, Myunghee Hong, Tae-Hoon Kim, Inseok Hwang, Jaemin Shim, Eue-Keun Choi, Hong Euy Lim, Hee Tae Yu, Jae-Sun Uhm, Boyoung Joung, Seil Oh, Moon-Hyoung Lee, Young-Hoon Kim, Hui-Nam Pak

Abstract

Objective: We previously reported early-onset atrial fibrillation (AF) associated genetic loci among a Korean population. We explored whether the AF-associated single-nucleotide polymorphisms (SNPs) selected from the Genome-Wide Association Study (GWAS) of an external large cohort has a prediction power for AF in Korean population through a convolutional neural network (CNN).

Methods: This study included 6358 subjects (872 cases, 5486 controls) from the Korean population GWAS data. We extracted the lists of SNPs at each p value threshold of the association statistics from three different previously reported ethnical-specific GWASs. The Korean GWAS data were divided into training (64%), validation (16%) and test (20%) sets, and a stratified K-fold cross-validation was performed and repeated five times after data shuffling.

Results: The CNN-GWAS predictive power for AF had an area under the curve (AUC) of 0.78±0.01 based on the Japanese GWAS, AUC of 0.79±0.01 based on the European GWAS, and AUC of 0.82±0.01 based on the multiethnic GWAS, respectively. Gradient-weighted class activation mapping assigned high saliency scores for AF associated SNPs, and the PITX2 obtained the highest saliency score. The CNN-GWAS did not show AF prediction power by SNPs with non-significant p value subset (AUC 0.56±0.01) despite larger numbers of SNPs. The CNN-GWAS had no prediction power for odd-even registration numbers (AUC 0.51±0.01).

Conclusions: AF can be predicted by genetic information alone with moderate accuracy. The CNN-GWAS can be a robust and useful tool for detecting polygenic diseases by capturing the cumulative effects and genetic interactions of moderately associated but statistically significant SNPs.

Trial registration number: NCT02138695.

Keywords: atrial fibrillation; genetics; genome-wide association study.

Conflict of interest statement

Competing interests: None declared.

© Author(s) (or their employer(s)) 2022. Re-use permitted under CC BY. Published by BMJ.

Figures

Figure 1
Figure 1
Study flow chart showing the process of the CNN-GWAS, including the AF data set. AF, atrial fibrillation; CNN, convolutional neural network; GWAS, Genome-Wide Association Study; HEXA, health examinee; KoGES, korea genome epidemiology study; MAF, minor allele frequency; SNP, single-nucleotide polymorphism.
Figure 2
Figure 2
Overview of the CNN-GWAS based framework. (A) A previously reported GWAS-based sampling extracted a set of SNPs according to the p value cut-off. (B) Digitisation encoding for heterozygous or homozygous by minor alleles. (C) Process of the CNN-based neural network model prediction and analysis. (D) Neural network architecture based on the CNN. (F) Saliency score analysis of the predicted AF patients. The same AF patients (n=872) were used for the quantitative evaluation of the five-time pretrained models using different samples. AF, atrial fibrillation; CNN, convolutional neural network; Grad-CAM, gradient-weighted class activation mapping; GWAS, Genome-Wide Association Study; SNP, single-nucleotide polymorphism.
Figure 3
Figure 3
Performance evaluation results. (A–C) The results of the AF prediction ROC curves of the Korean GWAS at each p value cut-off based on the selected SNP set by three different GWAS cohorts’ summary statistics (Japanese, European and multiethnic GWAS). P value cutoffs from a p−8 were used for the performance evaluation, and a p≥0.99 was used for the verification of the non-significant SNP list. (D–F) The prediction results for the odd-even registration numbers with the SNP list for the AF prediction (p value cut-off threshold p<0.001 to p<5.0×10−8). All results were repeated five times, and the shaded area shows the 95% CI. AF, atrial fibrillation; AUC, area under the curve; GWAS, Genome-Wide Association Study; ROC, receiver operating characteristic; SNP, single-nucleotide polymorphism.
Figure 4
Figure 4
Performance comparison between CNN-GWAS and other machine learning methods. AUC; area under the curve, BACNN; Bayesian approximation convolutional neural network; CNN; convolutional neural network; Eur; European; GWAS, Genome-Wide Association Study; JAP; Japanese; LASSO; least absolute shrinkage and selection operator; LR; logistic regression; MUL; multiethnic; ROC; receiver operating characteristic.
Figure 5
Figure 5
Explanation of the predictive power of the CNN-GWAS for AF. (A) The Manhattan plot of the Korean population GWAS represented by the SNP set selected at a p−5 in the multiethnic GWAS. (B) The contribution of each SNP for the AF prediction is represented by a plot (top), which is the mean saliency score for each column of the two-dimensional (2D) saliency score map. The saliency scores of each AF patient are displayed stacked in the 2D saliency score map (below). Those in the grey font were reported to be AF associated SNPs but were not included in the top 10 highest salience scored SNPs. The blue horizontal line stands for the top 10 saliency score levels, and the red dotted horizontal line stands for the top 5% high saliency score levels. AF, atrial fibrillation; CNN, convolutional neural network; Grad-CAM, gradient-weighted class activation mapping; GWAS, Genome-Wide Association Study; SNP, single-nucleotide polymorphism.

References

    1. Kim D, Yang P-S, Jang E, et al. . 10-Year nationwide trends of the incidence, prevalence, and adverse outcomes of non-valvular atrial fibrillation nationwide health insurance data covering the entire Korean population. Am Heart J 2018;202:20–6. 10.1016/j.ahj.2018.04.017
    1. Kirchhof P, Benussi S, Kotecha D, et al. . 2016 ESC guidelines for the management of atrial fibrillation developed in collaboration with EACTS. Eur Heart J 2016;37:2893–962. 10.1093/eurheartj/ehw210
    1. Steg PG, Alam S, Chiang C-E, et al. . Symptoms, functional status and quality of life in patients with controlled and uncontrolled atrial fibrillation: data from the RealiseAF cross-sectional international registry. Heart 2012;98:195–201. 10.1136/heartjnl-2011-300550
    1. Lubitz SA, Yin X, Fontes JD, et al. . Association between familial atrial fibrillation and risk of new-onset atrial fibrillation. JAMA 2010;304:2263–9. 10.1001/jama.2010.1690
    1. Lee J-Y, Kim T-H, Yang P-S, et al. . Korean atrial fibrillation network genome-wide association study for early-onset atrial fibrillation identifies novel susceptibility loci. Eur Heart J 2017;38:2586–94. 10.1093/eurheartj/ehx213
    1. Choi SH, Weng L-C, Roselli C, et al. . Association between titin loss-of-function variants and early-onset atrial fibrillation. JAMA 2018;320:2354–64. 10.1001/jama.2018.18179
    1. Choi E-K, Park JH, Lee J-Y, et al. . Korean atrial fibrillation (AF) network: genetic variants for AF do not predict ablation success. J Am Heart Assoc 2015;4:e002046. 10.1161/JAHA.115.002046
    1. Bellot P, de Los Campos G, Pérez-Enciso M. Can deep learning improve genomic prediction of complex human traits? Genetics 2018;210:809–19. 10.1534/genetics.118.301298
    1. Selvaraju RR, Cogswell M, Das A. Grad-cam: visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision 2017:618–26.
    1. Low S-K, Takahashi A, Ebana Y, et al. . Identification of six new genetic loci associated with atrial fibrillation in the Japanese population. Nat Genet 2017;49:953–8. 10.1038/ng.3842
    1. Nielsen JB, Thorolfsdottir RB, Fritsche LG, et al. . Biobank-driven genomic discovery yields new insight into atrial fibrillation biology. Nat Genet 2018;50:1234–9. 10.1038/s41588-018-0171-3
    1. Roselli C, Chaffin MD, Weng L-C, et al. . Multi-Ethnic genome-wide association study for atrial fibrillation. Nat Genet 2018;50:1225–33. 10.1038/s41588-018-0133-9
    1. Kohavi R, Sommerfield D. Feature subset selection using the wrapper method: Overfitting and dynamic search space topology. KDD 1995:192–7.
    1. Liu Y, Wang D, He F, et al. . Phenotype prediction and genome-wide association study using deep Convolutional neural network of soybean. Front Genet 2019;10:1091. 10.3389/fgene.2019.01091
    1. Lewis CM, Vassos E. Polygenic risk scores: from research tools to clinical instruments. Genome Med 2020;12:44. 10.1186/s13073-020-00742-5
    1. Hand DJ, Till RJ. A simple generalisation of the area under the ROC curve for multiple class classification problems. Mach Learn 2001;45:171–86. 10.1023/A:1010920819831
    1. Gal Y, Ghahramani Z. Dropout as a Bayesian approximation: representing model uncertainty in deep learning. International Conference on Machine Learning: PMLR 2016:1050–9.
    1. Muse ED, Barrett PM, Steinhubl SR, et al. . Towards a smart medical home. The Lancet 2017;389:358. 10.1016/S0140-6736(17)30154-X
    1. Dey D, Slomka PJ, Leeson P, et al. . Artificial intelligence in cardiovascular imaging: JACC state-of-the-art review. J Am Coll Cardiol 2019;73:1317–35. 10.1016/j.jacc.2018.12.054
    1. Betancur J, Commandeur F, Motlagh M, et al. . Deep learning for prediction of obstructive disease from fast myocardial perfusion SPECT: a multicenter study. JACC Cardiovasc Imaging 2018;11:1654–63. 10.1016/j.jcmg.2018.01.020
    1. Motwani M, Dey D, Berman DS, et al. . Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis. Eur Heart J 2017;38:500–7. 10.1093/eurheartj/ehw188
    1. Arnar DO, Thorvaldsson S, Manolio TA, et al. . Familial aggregation of atrial fibrillation in Iceland. Eur Heart J 2006;27:708–12. 10.1093/eurheartj/ehi727
    1. Johnson KW, Torres Soto J, Glicksberg BS, et al. . Artificial intelligence in cardiology. J Am Coll Cardiol 2018;71:2668–79. 10.1016/j.jacc.2018.03.521
    1. Duch W, Jankowski N, Maszczyk T. Make it cheap: learning with O (nd) complexity. The 2012 International Joint Conference on Neural Networks (IJCNN): IEEE 2012:1–4.
    1. Ellis RP, Mookim PG. K-Fold cross-validation is superior to split sample validation for risk adjustment models. Boston University-Department of Economics, 2013.

Source: PubMed

3
S'abonner