Analysis of Machine Learning Techniques for Heart Failure Readmissions

Bobak J Mortazavi, Nicholas S Downing, Emily M Bucholz, Kumar Dharmarajan, Ajay Manhapra, Shu-Xia Li, Sahand N Negahban, Harlan M Krumholz, Bobak J Mortazavi, Nicholas S Downing, Emily M Bucholz, Kumar Dharmarajan, Ajay Manhapra, Shu-Xia Li, Sahand N Negahban, Harlan M Krumholz

Abstract

Background: The current ability to predict readmissions in patients with heart failure is modest at best. It is unclear whether machine learning techniques that address higher dimensional, nonlinear relationships among variables would enhance prediction. We sought to compare the effectiveness of several machine learning algorithms for predicting readmissions.

Methods and results: Using data from the Telemonitoring to Improve Heart Failure Outcomes trial, we compared the effectiveness of random forests, boosting, random forests combined hierarchically with support vector machines or logistic regression (LR), and Poisson regression against traditional LR to predict 30- and 180-day all-cause readmissions and readmissions because of heart failure. We randomly selected 50% of patients for a derivation set, and a validation set comprised the remaining patients, validated using 100 bootstrapped iterations. We compared C statistics for discrimination and distributions of observed outcomes in risk deciles for predictive range. In 30-day all-cause readmission prediction, the best performing machine learning model, random forests, provided a 17.8% improvement over LR (mean C statistics, 0.628 and 0.533, respectively). For readmissions because of heart failure, boosting improved the C statistic by 24.9% over LR (mean C statistic 0.678 and 0.543, respectively). For 30-day all-cause readmission, the observed readmission rates in the lowest and highest deciles of predicted risk with random forests (7.8% and 26.2%, respectively) showed a much wider separation than LR (14.2% and 16.4%, respectively).

Conclusions: Machine learning methods improved the prediction of readmission after hospitalization for heart failure compared with LR and provided the greatest predictive range in observed readmission rates.

Trial registration: ClinicalTrials.gov NCT00303212.

Keywords: computers; heart failure; machine learning; meta-analysis; patient readmission.

Conflict of interest statement

Conflict of Interest Disclosures: Drs. Dharmarajan and Krumholz work under contract with the Centers for Medicare & Medicaid Services to develop and maintain performance measures. Dr. Dharmarajan serves on a scientific advisory board for Clover Health, Inc. Dr. Krumholz is chair of a cardiac scientific advisory board for UnitedHealth and is the recipient of research agreements from Medtronic and Johnson and Johnson (Janssen), to develop methods of clinical trial data sharing. The other authors report no disclosures.

© 2016 American Heart Association, Inc.

Figures

Figure 1. Statistical analysis flow
Figure 1. Statistical analysis flow
The data are split into derivation and validation sets. These sets are passed to each algorithm for comparison.
Figure 2. Forest plots for c-statistics and…
Figure 2. Forest plots for c-statistics and 95% confidence intervals for each method with respect to prediction of 30-day and 180-day all-cause readmission (2a and 2b)
Diamond represents c-statistic and the line represents the 95% confidence interval. SVM, Support Vector Machine. Color coding for model training: Red - trained in 30-day binary case only; Green - trained with 30-day and 180-day binary outcome; Blue - trained with 180-day counts case.
Figure 3. Deciles of algorithm risk versus…
Figure 3. Deciles of algorithm risk versus readmission rates (%) for the best 30-day all-cause models for each method
The y-axis presents the observed readmission rates in each decile, the x-axis the ordered deciles of risk predicted. SVM, Support Vector Machine.
Figure 4. Deciles of algorithm risk versus…
Figure 4. Deciles of algorithm risk versus readmission rates (%) for the best 180-day all-cause models for each method
The y-axis presents the observed readmission rates in each decile, the x-axis the ordered deciles of risk predicted. SVM, Support Vector Machine.
Figure 5. Forest plots for c-statistics and…
Figure 5. Forest plots for c-statistics and 95% confidence intervals for each method with respect to prediction of 30-day and 180-day heart failure readmission (5a and 5b)
Diamond represents c-statistic and the line represents the 95% confidence interval. SVM, Support Vector Machine. Color coding for model training: Red - trained in 30-day binary case only; Green - trained with 30-day and 180-day binary outcome; Blue - trained with 180-day counts case.
Figure 6. Deciles of algorithm risk versus…
Figure 6. Deciles of algorithm risk versus readmission rates (%) for the best 30-day heart failure-only models for each method
The y-axis presents the observed readmission rates in each decile, the x-axis the ordered deciles of risk predicted. SVM, Support Vector Machine.
Figure 7. Deciles of algorithm risk versus…
Figure 7. Deciles of algorithm risk versus readmission rates (%) for the best 180-day heart failure-only models for each method
The y-axis presents the observed readmission rates in each decile, the x-axis the ordered deciles of risk predicted. SVM, Support Vector Machine.

Source: PubMed

3
Abonneren