Can machine-learning improve cardiovascular risk prediction using routine clinical data?

Stephen F Weng, Jenna Reps, Joe Kai, Jonathan M Garibaldi, Nadeem Qureshi, Stephen F Weng, Jenna Reps, Joe Kai, Jonathan M Garibaldi, Nadeem Qureshi

Abstract

Background: Current approaches to predict cardiovascular risk fail to identify many people who would benefit from preventive treatment, while others receive unnecessary intervention. Machine-learning offers opportunity to improve accuracy by exploiting complex interactions between risk factors. We assessed whether machine-learning can improve cardiovascular risk prediction.

Methods: Prospective cohort study using routine clinical data of 378,256 patients from UK family practices, free from cardiovascular disease at outset. Four machine-learning algorithms (random forest, logistic regression, gradient boosting machines, neural networks) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years. Predictive accuracy was assessed by area under the 'receiver operating curve' (AUC); and sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) to predict 7.5% cardiovascular risk (threshold for initiating statins).

Findings: 24,970 incident cardiovascular events (6.6%) occurred. Compared to the established risk prediction algorithm (AUC 0.728, 95% CI 0.723-0.735), machine-learning algorithms improved prediction: random forest +1.7% (AUC 0.745, 95% CI 0.739-0.750), logistic regression +3.2% (AUC 0.760, 95% CI 0.755-0.766), gradient boosting +3.3% (AUC 0.761, 95% CI 0.755-0.766), neural networks +3.6% (AUC 0.764, 95% CI 0.759-0.769). The highest achieving (neural networks) algorithm predicted 4,998/7,404 cases (sensitivity 67.5%, PPV 18.4%) and 53,458/75,585 non-cases (specificity 70.7%, NPV 95.7%), correctly predicting 355 (+7.6%) more patients who developed cardiovascular disease compared to the established algorithm.

Conclusions: Machine-learning significantly improves accuracy of cardiovascular risk prediction, increasing the number of patients identified who could benefit from preventive treatment, while avoiding unnecessary treatment of others.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Fig 1. Patient cohort data extraction procedures.
Fig 1. Patient cohort data extraction procedures.
Fig 2. Illuminating “black-box” understanding of machine-learning…
Fig 2. Illuminating “black-box” understanding of machine-learning neural networks: visualization of the risk factors and their association with cardiovascular disease developed from CPRD primary care study population.
Green lines are positive predictors, red lines are negative predictors, and the thickness of the line represents the weight (importance) of the risk factor to the outcome.

References

    1. World Health Organization. Global Status Report on Noncommunicable Diseases Geneva, Switzerland: World Health Organization, 2014.
    1. Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’Agostino RB, Gibbons R, et al. 2013 ACC/AHA Guideline on the Assessment of Cardiovascular Risk: A Report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 2013; 135(11): 1–50.
    1. Hippisley-Cox J, Coupland C, Vinogradova Y, Robson J, Minhas R, Sheikh A, et al. Predicting cardiovascular risk in England and Wales: prospective derivation and validation of QRISK2. BMJ 2008; 336(7659): 1475–82. 10.1136/bmj.39609.449676.25
    1. D’Agostino RB, Vasan RS, Pencina MJ, Wolf PA, Cobain M, Massaro JM, et al. General Cardiovascular Risk Profile for Use in Primary Care: The Framingham Heart Study. Circulation 2008; 117(6): 743–53. 10.1161/CIRCULATIONAHA.107.699579
    1. Ridker P, Buring JE, Rifai N, Cook NR. Development and validation of improved algorithms for the assessment of global cardiovascular risk in women: The reynolds risk score. JAMA 2007; 297(6): 611–9. 10.1001/jama.297.6.611
    1. Ridker PM, Danielson E, Fonseca FAH, Genest J, Gotto AM, Kastelein JJP, et al. Rosuvastatin to Prevent Vascular Events in Men and Women with Elevated C-Reactive Protein. New England Journal of Medicine 2008; 359(21): 2195–207. 10.1056/NEJMoa0807646
    1. Obermeyer Z, Emanuel EJ. Predicting the Future—Big Data, Machine Learning, and Clinical Medicine. The New England journal of medicine 2016; 375(13): 1216–9. 10.1056/NEJMp1606181
    1. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. Journal of Biomedical Informatics 2002; 35(5–6): 352–9.
    1. Berglund E, Lytsy P, Westerling R. Adherence to and beliefs in lipid-lowering medical treatments: A structural equation modeling approach including the necessity-concern framework. Patient Education and Counseling 2013; 91(1): 105–12. 10.1016/j.pec.2012.11.001
    1. Herrett E, Thomas SL, Schoonen WM, Smeeth L, Hall AJ. Validation and validity of diagnoses in the General Practice Research Database: a systematic review. British journal of clinical pharmacology 2010; 69(1): 4–14. 10.1111/j.1365-2125.2009.03537.x
    1. Eeg-Olofsson K, Cederholm J, Nilsson PM, Zethelius B, Svensson AM, Gudbjornsdottir S, et al. New aspects of HbA1c as a risk factor for cardiovascular diseases in type 2 diabetes: an observational study from the Swedish National Diabetes Register (NDR). Journal of internal medicine 2010; 268(5): 471–82. 10.1111/j.1365-2796.2010.02265.x
    1. Emerging Risk Factors Collaboration. C-Reactive Protein, Fibrinogen, and Cardiovascular Disease Prediction. New England Journal of Medicine 2012; 367(14): 1310–20. 10.1056/NEJMoa1107477
    1. Jardine AG, Gaston RS, Fellstrom BC, Holdaas H. Prevention of cardiovascular disease in adult recipients of kidney transplants. The Lancet; 378(9800): 1419–27.
    1. Mason JE, Starke RD, Van Kirk JE. Gamma-glutamyl transferase: a novel cardiovascular risk biomarker. Preventive cardiology 2010; 13(1): 36–41. 10.1111/j.1751-7141.2009.00054.x
    1. Mullerova H, Agusti A, Erqou S, Mapel DW. Cardiovascular comorbidity in COPD: systematic literature review. Chest 2013; 144(4): 1163–78. 10.1378/chest.12-2847
    1. Osborn DP, Hardoon S, Omar RZ, Holt RI, King M, Larsen J, et al. Cardiovascular risk prediction models for people with severe mental illness: results from the prediction and management of cardiovascular risk in people with severe mental illnesses (PRIMROSE) research program. JAMA psychiatry 2015; 72(2): 143–51. 10.1001/jamapsychiatry.2014.2133
    1. Ray WA, Chung CP, Murray KT, Hall K, Stein CM. Atypical Antipsychotic Drugs and the Risk of Sudden Cardiac Death. New England Journal of Medicine 2009; 360(3): 225–35. 10.1056/NEJMoa0806994
    1. Sin DD, Wu L, Man SF. The relationship between reduced lung function and cardiovascular mortality: a population-based study and a systematic review of the literature. Chest 2005; 127(6): 1952–9. 10.1378/chest.127.6.1952
    1. Souverein PC, Berard A, Van Staa TP, Cooper C, Egberts ACG, Leufkens HGM, et al. Use of oral glucocorticoids and risk of cardiovascular and cerebrovascular disease in a population based case–control study. Heart 2004; 90(8): 859–65. 10.1136/hrt.2003.020180
    1. Wannamethee SG, Shaper AG, Perry IJ. Serum creatinine concentration and risk of cardiovascular disease: a possible marker for increased risk of stroke. Stroke; a journal of cerebral circulation 1997; 28(3): 557–63.
    1. Weng SF, Kai J, Guha IN, Qureshi N. The value of aspartate aminotransferase and alanine aminotransferase in cardiovascular disease risk assessment. Open Heart 2015; 2(e000272): 1–10.
    1. Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 2003; 17(5–6): 519–33.
    1. Bhaskaran K, Forbes HJ, Douglas I, Leon DA, Smeeth L. Representativeness and optimal use of body mass index (BMI) in the UK Clinical Practice Research Datalink (CPRD). BMJ Open 2013; 3(e003389): 1–8.
    1. Assmann G, Cullen P, Schulte H. Simple Scoring Scheme for Calculating the Risk of Acute Coronary Events Based on the 10-Year Follow-Up of the Prospective Cardiovascular Münster (PROCAM) Study. Circulation 2002; 105(3): 310–5.
    1. Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression, 3rd Edition New Jersey, USA: John Wiley & Sons; 2013.
    1. Breiman L. Random Forests. Machine Learning 2001; 45(1): 5–32.
    1. Friedman J. Greedy boosting approximation: a gradient boosting machine. The Annals of Statistics 2001; 29(5): 1189–232.
    1. Hagan M, Demuth H, Beale M, De Jesus O. Neural Network Design, 2nd Edition Boston: PWS Publishers; 2014.
    1. Newson R. Comparing the predictive power of survival models using Harrell’s c or Somers’ D. The Stata Journal 2010; 10(3): 339–58.
    1. Newson R. Confidence intervals for rank statistics: Somers’ D and extensions. The Stata Journal 2006; 6(3): 309–34.
    1. The Emerging Risk Factors Collaboration. C-Reactive Protein, Fibrinogen, and Cardiovascular Disease Prediction. New England Journal of Medicine 2012; 367(14): 1310–20. 10.1056/NEJMoa1107477
    1. Waljee AK, Higgins PDR, Singal AG. A Primer on Predictive Models. Clinical and Translational Gastroenterology 2014; 5(1): e44.
    1. Dybowski R, Gant V, Weller P, Chang R. Prediction of outcome in critically ill patients using artificial neural network synthesised by genetic algorithm. The Lancet 1996; 347(9009): 1146–50.
    1. Voss R, Cullen P, Schulte H, Assmann G. Prediction of risk of coronary events in middle-aged men in the Prospective Cardiovascular Münster Study (PROCAM) using neural networks. International Journal of Epidemiology 2002; 31(6): 1253–62.
    1. Olden J, Jackson D. Illuminating the "black box": a randomization approach for understanding variable contributions in artificial neural networks. Ecological Modelling 2002; 2002(154): 135–50.
    1. Bengio Y. Practical Recommendations for Gradient-Based Training of Deep Architectures In: Montavon G, Orr GB, Müller K-R, eds. Neural Networks: Tricks of the Trade: Second Edition Berlin, Heidelberg: Springer Berlin Heidelberg; 2012: 437–78.
    1. Woodward M, Brindle P, Tunstall-Pedoe H. Adding social deprivation and family history to cardiovascular risk assessment: the ASSIGN score from the Scottish Heart Health Extended Cohort (SHHEC). Heart 2007; 93(2): 172–6. 10.1136/hrt.2006.108167
    1. Chen J, Long R, Wang XL, Liu B, Chou KC. dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation. Sci Rep 2016; 6(32333): 1–7.
    1. Liu B, Long R, Chou KC. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 2016; 32(16): 2411–8. 10.1093/bioinformatics/btw186
    1. Liu B, Wang S, Dong Q, Li S, Liu X. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning. IEEE Trans Nanobioscience 2016; 15(4): 328–44.
    1. Kennedy EH, Wiitala WL, Hayward RA, Sussman JB. Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Medical care 2013; 51(3): 251–8. 10.1097/MLR.0b013e31827da594
    1. National Institute for Health and Care Excellence. Cardiovascular disease: risk assessment and reduction, including lipid modification London, UK: National Institute for Health and Care Excellence, 2016.
    1. NHS England Board. Personalised Medicine Strategy. London, UK: National Health Service England (NHS England), 2015.
    1. Precision Medicine Intiative (PMI) Working Group. The Precision Medicine Initiative Cohort Program—Building a Research Foundation for the 21st Century Medicine. Washington D.C.: National Institutes of Health (NIH), 2015.

Source: PubMed

Подписаться