Unsupervised Learning for Automated Detection of Coronary Artery Disease Subgroups

Alyssa M Flores, Alejandro Schuler, Anne Verena Eberhard, Jeffrey W Olin, John P Cooke, Nicholas J Leeper, Nigam H Shah, Elsie G Ross, Alyssa M Flores, Alejandro Schuler, Anne Verena Eberhard, Jeffrey W Olin, John P Cooke, Nicholas J Leeper, Nigam H Shah, Elsie G Ross

Abstract

Background The promise of precision population health includes the ability to use robust patient data to tailor prevention and care to specific groups. Advanced analytics may allow for automated detection of clinically informative subgroups that account for clinical, genetic, and environmental variability. This study sought to evaluate whether unsupervised machine learning approaches could interpret heterogeneous and missing clinical data to discover clinically important coronary artery disease subgroups. Methods and Results The Genetic Determinants of Peripheral Arterial Disease study is a prospective cohort that includes individuals with newly diagnosed and/or symptomatic coronary artery disease. We applied generalized low rank modeling and K-means cluster analysis using 155 phenotypic and genetic variables from 1329 participants. Cox proportional hazard models were used to examine associations between clusters and major adverse cardiovascular and cerebrovascular events and all-cause mortality. We then compared performance of risk stratification based on clusters and the American College of Cardiology/American Heart Association pooled cohort equations. Unsupervised analysis identified 4 phenotypically and prognostically distinct clusters. All-cause mortality was highest in cluster 1 (oldest/most comorbid; 26%), whereas major adverse cardiovascular and cerebrovascular event rates were highest in cluster 2 (youngest/multiethnic; 41%). Cluster 4 (middle-aged/healthiest behaviors) experienced more incident major adverse cardiovascular and cerebrovascular events (30%) than cluster 3 (middle-aged/lowest medication adherence; 23%), despite apparently similar risk factor and lifestyle profiles. In comparison with the pooled cohort equations, cluster membership was more informative for risk assessment of myocardial infarction, stroke, and mortality. Conclusions Unsupervised clustering identified 4 unique coronary artery disease subgroups with distinct clinical trajectories. Flexible unsupervised machine learning algorithms offer the ability to meaningfully process heterogeneous patient data and provide sharper insights into disease characterization and risk assessment. Registration URL: https://www.clinicaltrials.gov; Unique identifier: NCT00380185.

Keywords: cluster analysis; coronary artery disease; machine learning; phenotype discovery.

Figures

Figure 1. Schematic for generalized low rank…
Figure 1. Schematic for generalized low rank modeling.
A, Patient data are condensed to fewer dimensions to allow for analysis using unsupervised K‐means clustering. The “features” matrix is a high‐dimensional data set that includes patient information on demographics and clinical, lifestyle, angiographic, and cardiovascular genetic risk markers. This data set is transformed into a lower dimensional “latent feature” space by approximating the features matrix as the product of 2 matrices, shown as the X (containing each observation) and Y representations (containing the definition for each observation). L,r indicates the loss function that accounts for the accuracy in the data approximation and regularizes the latent feature representation to prevent overfitting. B, After cluster analysis, data are then transformed back to their original form and analyzed to discover subgroup characteristics and compare long‐term outcomes across clusters.
Figure 2. Distinct subgroups of patients with…
Figure 2. Distinct subgroups of patients with coronary artery disease identified by unsupervised clustering.
Plot showing 4 distinct groups of patients identified by K‐means clustering. Data are plotted based on the top 20 principal components across the first 2 discriminant functions to form a 2‐dimensional plot.
Figure 3. Schematic representation of the 4…
Figure 3. Schematic representation of the 4 CAD clusters and their major features.
ABI indicates ankle‐brachial index; BMI, body mass index; CAD, coronary artery disease; CHF, congestive heart failure; CVA, cerebrovascular accident; LDL, low‐density lipoprotein; MACCE, major adverse cardiovascular and cerebrovascular events; MI, myocardial infarction; and PAD, peripheral artery disease.
Figure 4. Long‐term outcomes of the 4…
Figure 4. Long‐term outcomes of the 4 coronary artery disease clusters.
Kaplan–Meier curves showing (A) MACCE* and (B) all‐cause mortality. *Primary MACCE composite included myocardial infarction, stroke, coronary revascularization, and peripheral revascularization. MACCE indicates major adverse cardiovascular and cerebrovascular events.
Figure 5. Comparison of clustering to PCE…
Figure 5. Comparison of clustering to PCE risk groups for prediction of MACCE* and all‐cause mortality.
*PCE‐consistent MACCE included myocardial infarction, stroke, and death. MACCE indicates major adverse cardiovascular and cerebrovascular events; and PCE, pooled cohort equations.

References

    1. Kannel WB, Dawber TR, Kagan A, Revotskie N, Stokes J III. Factors of risk in the development of coronary heart disease–six year follow‐up experience. The Framingham Study. Ann Intern Med. 1961;55:33–50. doi: 10.7326/0003-4819-55-1-33
    1. Goff DC, Lloyd‐Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, Greenland P, Lackland DT, Levy D, O’Donnell CJ, et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association task force on practice guidelines. Circulation. 2014;129:S49–S73. doi: 10.1161/01.cir.0000437741.48606.98
    1. Stoekenbroek RM, Boekholdt SM, Luben R, Hovingh GK, Zwinderman AH, Wareham NJ, Khaw KT, Peters RJ. Heterogeneous impact of classic atherosclerotic risk factors on different arterial territories: the EPIC‐Norfolk prospective population study. Eur Heart J. 2016;37:880–889. doi: 10.1093/eurheartj/ehv630
    1. Price JF, Mowbray PI, Lee AJ, Rumley A, Lowe GD, Fowkes FG. Relationship between smoking and cardiovascular risk factors in the development of peripheral arterial disease and coronary artery disease: Edinburgh artery study. Eur Heart J. 1999;20:344–353.
    1. Ding N, Sang Y, Chen J, Ballew SH, Kalbaugh CA, Salameh MJ, Blaha MJ, Allison M, Heiss G, Selvin E, et al. Cigarette smoking, smoking cessation, and long‐term risk of 3 major atherosclerotic diseases. J Am Coll Cardiol. 2019;74:498–507.
    1. Levin MG, Klarin D, Assimes TL, Freiberg MS, Ingelsson E, Lynch J, Natarajan P, O’Donnell C, Rader DJ, Tsao PS, et al. Genetics of smoking and risk of atherosclerotic cardiovascular diseases: a Mendelian randomization study. JAMA Netw Open. 2021;4:e2034461. doi: 10.1001/jamanetworkopen.2020.34461
    1. Shah RV, Yeri AS, Murthy VL, Massaro JM, D’Agostino R, Freedman JE, Long MT, Fox CS, Das S, Benjamin EJ, et al. Association of multiorgan computed tomographic phenomap with adverse cardiovascular health outcomes: the Framingham Heart Study. JAMA Cardiol. 2017;2:1236–1246. doi: 10.1001/jamacardio.2017.3145
    1. Dey D, Slomka PJ, Leeson P, Comaniciu D, Shrestha S, Sengupta PP, Marwick TH. Artificial intelligence in cardiovascular imaging: JACC state‐of‐the‐art review. J Am Coll Cardiol. 2019;73:1317–1335. doi: 10.1016/j.jacc.2018.12.054
    1. Levine J, Simonds E, Bendall S, Davis K, Amir el AD, Tadmor M, Litvin O, Fienberg H, Jager A, Zunder E, et al. Data‐driven phenotypic dissection of AML reveals progenitor‐like cells that correlate with prognosis. Cell. 2015;162:184–197. doi: 10.1016/j.cell.2015.05.047
    1. Lawson DA, Bhakta NR, Kessenbrock K, Prummel KD, Yu Y, Takai K, Zhou A, Eyob H, Balakrishnan S, Wang C‐Y, et al. Single‐cell analysis reveals a stem‐cell program in human metastatic breast cancer cells. Nature. 2015;526:131–135. doi: 10.1038/nature15260
    1. Grant RW, McCloskey J, Hatfield M, Uratsu C, Ralston JD, Bayliss E, Kennedy CJ. Use of latent class analysis and k‐means clustering to identify complex patient profiles. JAMA Netw Open. 2020;3:e2029068. doi: 10.1001/jamanetworkopen.2020.29068
    1. Schuler A, Liu V, Wan J, Callahan A, Udell M, Stark DE, Shah NH. Discovering patient phenotypes using generalized low rank models. Pac Symp Biocomput. 2016;21:144–155.
    1. Ahmad T, Pencina MJ, Schulte PJ, O'Brien E, Whellan DJ, Pina IL, Kitzman DW, Lee KL, O'Connor CM, Felker GM. Clinical implications of chronic heart failure phenotypes defined by cluster analysis. J Am Coll Cardiol. 2014;64:1765–1774.
    1. Shah SJ, Katz DH, Selvaraj S, Burke MA, Yancy CW, Gheorghiade M, Bonow RO, Huang CC, Deo RC. Phenomapping for novel classification of heart failure with preserved ejection fraction. Circulation. 2015;131:269–279. doi: 10.1161/CIRCULATIONAHA.114.010637
    1. Inohara T, Shrader P, Pieper K, Blanco RG, Thomas L, Singer DE, Freeman JV, Allen LA, Fonarow GC, Gersh B, et al. Association of atrial fibrillation clinical phenotypes with treatment patterns and outcomes: a multicenter registry study. JAMA Cardiol. 2018;3:54–63. doi: 10.1001/jamacardio.2017.4665
    1. Ahmad T, Lund LH, Rao P, Ghosh R, Warier P, Vaccaro B, Dahlstrom U, O'Connor CM, Felker GM, Desai NR. Machine learning methods improve prognostication, identify clinically distinct phenotypes, and detect heterogeneity in response to therapy in a large cohort of heart failure patients. J Am Heart Assoc. 2018;7:e008081. doi: 10.1161/JAHA.117.008081
    1. Nead KT, Zhou MJ, Caceres RD, Sharp SJ, Wehner MR, Olin JW, Cooke JP, Leeper NJ. Usefulness of the addition of beta‐2‐microglobulin, cystatin C and C‐reactive protein to an established risk factors model to improve mortality risk prediction in patients undergoing coronary angiography. Am J Cardiol. 2013;111:851–856. doi: 10.1016/j.amjcard.2012.11.055
    1. Wellcome Trust Case Control C . Genome‐wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678.
    1. Helgadottir A, Thorleifsson G, Manolescu A, Gretarsdottir S, Blondal T, Jonasdottir A, Jonasdottir A, Sigurdsson A, Baker A, Palsson A, et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science. 2007;316:1491–1493. doi: 10.1126/science.1142842
    1. Wassel CL, Lamina C, Nambi V, Coassin S, Mukamal KJ, Ganesh SK, Jacobs DR, Franceschini N, Papanicolaou GJ, Gibson Q, et al. Genetic determinants of the ankle‐brachial index: a meta‐analysis of a cardiovascular candidate gene 50K SNP panel in the candidate gene association resource (CARe) consortium. Atherosclerosis. 2012;222:138–147. doi: 10.1016/j.atherosclerosis.2012.01.039
    1. Cluett C, McDermott MM, Guralnik J, Ferrucci L, Bandinelli S, Miljkovic I, Zmuda JM, Li R, Tranah G, Harris T, et al. The 9p21 myocardial infarction risk allele increases risk of peripheral artery disease in older people. Circ‐Cardiovasc Gene. 2009;2:347–353. doi: 10.1161/CIRCGENETICS.108.825935
    1. McPherson R, Pertsemlidis A, Kavaslar N, Stewart A, Roberts R, Cox DR, Hinds DA, Pennacchio LA, Tybjaerg‐Hansen A, Folsom AR, et al. A common allele on chromosome 9 associated with coronary heart disease. Science. 2007;316:1488–1491. doi: 10.1126/science.1142447
    1. Murabito JM, White CC, Kavousi M, Sun YV, Feitosa MF, Nambi V, Lamina C, Schillert A, Coassin S, Bis JC, et al. Association between chromosome 9p21 variants and the ankle‐brachial index identified by a meta‐analysis of 21 genome‐wide association studies. Circ‐Cardiovasc Gene. 2012;5:100–112. doi: 10.1161/CIRCGENETICS.111.961292
    1. Wilson AM, Sadrzadeh‐Rafie AH, Myers J, Assimes T, Nead KT, Higgins M, Gabriel A, Olin J, Cooke JP. Low lifetime recreational activity is a risk factor for peripheral arterial disease. J Vasc Surg. 2011;54:427–432, 432 e421–424. doi: 10.1016/j.jvs.2011.02.052
    1. Nead KT, Zhou M, Diaz Caceres R, Olin JW, Cooke JP, Leeper NJ. Walking impairment questionnaire improves mortality risk prediction models in a high‐risk cohort independent of peripheral arterial disease status. Circ Cardiovasc Qual Outcomes. 2013;6:255–261. doi: 10.1161/CIRCOUTCOMES.111.000070
    1. Nead KT, Cooke JP, Olin JW, Leeper NJ. Alternative ankle‐brachial index method identifies additional at‐risk individuals. J Am Coll Cardiol. 2013;62:553–559. doi: 10.1016/j.jacc.2013.04.061
    1. Kaufman L, Rousseeuw PJ. Finding Groups in Data: an Introduction to Cluster Analysis. Hoboken, New Jersey: John Wiley & Sons, Inc.; 1990.
    1. Udell M, Horn C, Zadeh R, Boyd S. Generalized low rank models. arXiv preprint arXiv:1410.0342. 2014. Available at:
    1. Dalton L, Ballarin V, Brun M. Clustering algorithms: on learning, validation, performance, and applications to genomics. Curr Genomics. 2009;10:430–445.
    1. Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: a new method for the analysis of genetically structured populations. BMC Genet. 2010;11:94. doi: 10.1186/1471-2156-11-94
    1. Jombart T. Adegenet: a R package for the multivariate analysis of genetic markers. Bioinformatics. 2008;24:1403–1405. doi: 10.1093/bioinformatics/btn129
    1. Okabe A, Boots B, Sugihara K, Chiu SN. Spatial Tessellations: Concepts and applications of Voronoi diagrams. 2nd ed; 2000.
    1. Rana JS, Tabada GH, Solomon MD, Lo JC, Jaffe MG, Sung SH, Ballantyne CM, Go AS. Accuracy of the atherosclerotic cardiovascular risk equation in a large contemporary, multiethnic population. J Am Coll Cardiol. 2016;67:2118–2130.
    1. D'Agostino RB, Grundy S, Sullivan LM, Wilson P, Group CHDRP . Validation of the Framingham coronary heart disease prediction scores: results of a multiple ethnic groups investigation. JAMA. 2001;286:180–187. doi: 10.1001/jama.286.2.180
    1. Muntner P, Colantonio LD, Cushman M, Goff DC Jr, Howard G, Howard VJ, Kissela B, Levitan EB, Lloyd‐Jones DM, Safford MM. Validation of the atherosclerotic cardiovascular disease pooled cohort risk equations. JAMA. 2014;311:1406–1415. doi: 10.1001/jama.2014.2630
    1. Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177:26–31. doi: 10.1016/j.cell.2019.02.048
    1. Lu JT, Creager MA. The relationship of cigarette smoking to peripheral arterial disease. Rev Cardiovasc Med. 2004;5:189–193.
    1. Khera AV, Chaffin M, Aragam KG, Haas ME, Roselli C, Choi SH, Natarajan P, Lander ES, Lubitz SA, Ellinor PT, et al. Genome‐wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z
    1. Natarajan P, Young R, Stitziel NO, Padmanabhan S, Baber U, Mehran R, Sartori S, Fuster V, Reilly DF, Butterworth A, et al. Polygenic risk score identifies subgroup with higher burden of atherosclerosis and greater relative benefit from statin therapy in the primary prevention setting. Circulation. 2017;135:2091–2101. doi: 10.1161/CIRCULATIONAHA.116.024436
    1. Mega JL, Stitziel NO, Smith JG, Chasman DI, Caulfield MJ, Devlin JJ, Nordio F, Hyde CL, Cannon CP, Sacks FM, et al. Genetic risk, coronary heart disease events, and the clinical benefit of statin therapy: an analysis of primary and secondary prevention trials. Lancet. 2015;385:2264–2271. doi: 10.1016/S0140-6736(14)61730-X
    1. Kullo IJ, Jouni H, Austin EE, Brown S‐A, Kruisselbrink TM, Isseh IN, Haddad RA, Marroush TS, Shameer K, Olson JE, et al. Incorporating a genetic risk score into coronary heart disease risk estimates: effect on low‐density lipoprotein cholesterol levels (the MI‐GENES clinical trial). Circulation. 2016;133:1181–1188. doi: 10.1161/CIRCULATIONAHA.115.020109
    1. Sabatine MS. PCSK9 inhibitors: clinical evidence and implementation. Nat Rev Cardiol. 2019;16:155–165. doi: 10.1038/s41569-018-0107-8
    1. Eikelboom JW, Connolly SJ, Bosch J, Dagenais GR, Hart RG, Shestakovska O, Diaz R, Alings M, Lonn EM, Anand SS, et al. Rivaroxaban with or without aspirin in stable cardiovascular disease. N Engl J Med. 2017;377:1319–1330. doi: 10.1056/NEJMoa1709118
    1. Hlatky MA, Kazi DS. PCSK9 inhibitors: economics and policy. J Am Coll Cardiol. 2017;70:2677–2687. doi: 10.1016/j.jacc.2017.10.001
    1. Munafo MR, Tilling K, Taylor AE, Evans DM, Davey SG. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47:226–235. doi: 10.1093/ije/dyx206

Source: PubMed

3
Abonneren