Predictive utilities of lipid traits, lipoprotein subfractions and other risk factors for incident diabetes: a machine learning approach in the Diabetes Prevention Program

Tibor V Varga, Jinxi Liu, Ronald B Goldberg, Guannan Chen, Samuel Dagogo-Jack, Carlos Lorenzo, Kieren J Mather, Xavier Pi-Sunyer, Søren Brunak, Marinella Temprosa, Diabetes Prevention Program Research Group, Tibor V Varga, Jinxi Liu, Ronald B Goldberg, Guannan Chen, Samuel Dagogo-Jack, Carlos Lorenzo, Kieren J Mather, Xavier Pi-Sunyer, Søren Brunak, Marinella Temprosa, Diabetes Prevention Program Research Group

Abstract

Introduction: Although various lipid and non-lipid analytes measured by nuclear magnetic resonance (NMR) spectroscopy have been associated with type 2 diabetes, a structured comparison of the ability of NMR-derived biomarkers and standard lipids to predict individual diabetes risk has not been undertaken in larger studies nor among individuals at high risk of diabetes.

Research design and methods: Cumulative discriminative utilities of various groups of biomarkers including NMR lipoproteins, related non-lipid biomarkers, standard lipids, and demographic and glycemic traits were compared for short-term (3.2 years) and long-term (15 years) diabetes development in the Diabetes Prevention Program, a multiethnic, placebo-controlled, randomized controlled trial of individuals with pre-diabetes in the USA (N=2590). Logistic regression, Cox proportional hazards model and six different hyperparameter-tuned machine learning algorithms were compared. The Matthews Correlation Coefficient (MCC) was used as the primary measure of discriminative utility.

Results: Models with baseline NMR analytes and their changes did not improve the discriminative utility of simpler models including standard lipids or demographic and glycemic traits. Across all algorithms, models with baseline 2-hour glucose performed the best (max MCC=0.36). Sophisticated machine learning algorithms performed similarly to logistic regression in this study.

Conclusions: NMR lipoproteins and related non-lipid biomarkers were associated but did not augment discrimination of diabetes risk beyond traditional diabetes risk factors except for 2-hour glucose. Machine learning algorithms provided no meaningful improvement for discrimination compared with logistic regression, which suggests a lack of influential latent interactions among the analytes assessed in this study.

Trial registration number: Diabetes Prevention Program: NCT00004992; Diabetes Prevention Program Outcomes Study: NCT00038727.

Keywords: diabetes mellitus; lipids; lipoproteins; prediabetic state; type 2.

Conflict of interest statement

Competing interests: None declared.

© Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.

Figures

Figure 1
Figure 1
MCC and ROC AUC statistics across all machine learning algorithms and baseline lipid-related prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all Model 1 predictors and all baseline lipid-related NMR analytes. ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes.
Figure 2
Figure 2
(A) Univariate discriminative utilities of continuous analytes at baseline in relation to short-term and long-term diabetes incidence (N=2590). The MCC and ROC AUC values are averages, calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework using the GLM method (logistic regression). The black circles represent MCC and ROC AUC values for short-term diabetes, while the red circles represent MCC and ROC AUC values for long-term diabetes. The predictors are sorted according to their MCC values for short-term diabetes. Model 1 includes predictors: age at randomization, sex, self-reported ethnicity, all laboratory lipids, lipid-lowering medication use and treatment arm. Model 2 includes all model 1 predictors and all baseline lipid-related NMR analytes. (B) Distributions of the six best performing univariate predictors for short-term diabetes, stratified by incident diabetes status (N=2590). The upper panel of the figure shows a schematic explanation for distributions that generally indicate good versus poor discriminative utility. The lower panel of the figure shows the density plots of the variables fasting glucose, 2-hour glucose, HbA1c, insulinogenic index, TRL size and glycine. AcAc, acetoacetate; ApoA1, apolipoprotein A1; ApoB, apolipoprotein B; BHB, beta-hydroxy-butyrate; BMI, body mass index; GlycA, glycoprotein acetylation; GLM, generalized linear model; HbA1c, hemoglobin A1c; HDL-C, high-density lipoprotein cholesterol; IFI, reciprocal of the fasting insulin level; LDL-C, low-density lipoprotein cholesterol; MCC, Matthews Correlation Coefficient; NMR, nuclear magnetic resonance; PPD, peak particle density; ROC AUC, receiver operating characteristic area under the curve; SBP, systolic blood pressure; T2D, type 2 diabetes; TC, total cholesterol; TG, triglycerides; TRL, triglyceride rich lipoprotein; TRL-C, TRL-cholesterol; TRL-G, TRL-triacylglycerol.
Figure 3
Figure 3
MCC and ROC AUC statistics across all machine learning algorithms and all prediction models in relation to short-term and long-term diabetes incidence (N=2590). MCC averages are represented by circles and ROC AUC averages are represented by squares. The averages are calculated from the five obtained MCC and ROC AUC values from the five separate test sets in the nested cross-validation framework. The error bars represent SD of the five obtained MCC and ROC AUC values. The left panel shows the discriminative utilities for short-term, while the right panel shows the discriminative utilities for long-term diabetes incidence. This figure demonstrates the model results for all baseline models (N=2590). ANN, artificial neural network; GLM, generalized linear model (refers to logistic regression here); MCC, Matthews Correlation Coefficient; RF, random forest; ROC AUC, receiver operating characteristic area under the curve; SGB, stochastic gradient boosting; SVM-L, support vector machine with linear kernel; SVM-P, support vector machine with polynomial kernel; SVM-R, support vector machine with radial kernel; T2D, type 2 diabetes.

References

    1. Krauss RM. Lipids and lipoproteins in patients with type 2 diabetes. Diabetes Care 2004;27:1496–504. 10.2337/diacare.27.6.1496
    1. Haffner SM, Stern MP, Hazuda HP, et al. . Cardiovascular risk factors in confirmed prediabetic individuals. does the clock for coronary heart disease start ticking before the onset of clinical diabetes? JAMA 1990;263:2893–8. 10.1001/jama.263.21.2893
    1. McPhillips JB, Barrett-Connor E, Wingard DL. Cardiovascular disease risk factors prior to the diagnosis of impaired glucose tolerance and non-insulin-dependent diabetes mellitus in a community of older adults. Am J Epidemiol 1990;131:443–53. 10.1093/oxfordjournals.aje.a115519
    1. Mykkänen L, Kuusisto J, Pyörälä K, et al. . Cardiovascular disease risk factors as predictors of type 2 (non-insulin-dependent) diabetes mellitus in elderly subjects. Diabetologia 1993;36:553–9. 10.1007/BF02743273
    1. American Diabetes Association . 2. Classification and diagnosis of diabetes: Standards of Medical Care in Diabetes-2019. Diabetes Care 2019;42:S13–28. 10.2337/dc19-S002
    1. Garvey WT, Kwon S, Zheng D, et al. . Effects of insulin resistance and type 2 diabetes on lipoprotein subclass particle size and concentration determined by nuclear magnetic resonance. Diabetes 2003;52:453–62. 10.2337/diabetes.52.2.453
    1. Festa A, Williams K, Hanley AJG, et al. . Nuclear magnetic resonance lipoprotein abnormalities in prediabetic subjects in the insulin resistance atherosclerosis study. Circulation 2005;111:3465–72. 10.1161/CIRCULATIONAHA.104.512079
    1. Mora S, Otvos JD, Rosenson RS, et al. . Lipoprotein particle size and concentration by nuclear magnetic resonance and incident type 2 diabetes in women. Diabetes 2010;59:1153–60. 10.2337/db09-1114
    1. Dugani SB, Akinkuolie AO, Paynter N, et al. . Association of Lipoproteins, Insulin Resistance, and Rosuvastatin With Incident Type 2 Diabetes Mellitus : Secondary Analysis of a Randomized Clinical Trial. JAMA Cardiol 2016;1:136–45. 10.1001/jamacardio.2016.0096
    1. Hodge AM, Jenkins AJ, English DR, et al. . Nmr-Determined lipoprotein subclass profile predicts type 2 diabetes. Diabetes Res Clin Pract 2009;83:132–9. 10.1016/j.diabres.2008.11.007
    1. Noble D, Mathur R, Dent T, et al. . Risk models and scores for type 2 diabetes: systematic review. BMJ 2011;343:d7163. 10.1136/bmj.d7163
    1. Schork NJ. Personalized medicine: time for one-person trials. Nature 2015;520:609–11. 10.1038/520609a
    1. Feng Z. Classification versus association models: should the same methods apply? Scand J Clin Lab Invest Suppl 2010;242:53–8. 10.3109/00365513.2010.493387
    1. Varga TV, Niss K, Estampador AC, et al. . Association is not prediction: A landscape of confused reporting in diabetes - A systematic review. Diabetes Res Clin Pract 2020;170:108497. 10.1016/j.diabres.2020.108497
    1. Diabetes Prevention Program (DPP) Research Group . The diabetes prevention program (DPP): description of lifestyle intervention. Diabetes Care 2002;25:2165–71. 10.2337/diacare.25.12.2165
    1. Goldberg R, Temprosa M, Otvos J, et al. . Lifestyle and metformin treatment favorably influence lipoprotein subfraction distribution in the diabetes prevention program. J Clin Endocrinol Metab 2013;98:3989–98. 10.1210/jc.2013-1452
    1. Flores-Guerrero JL, Osté MCJ, Kieneker LM, et al. . Plasma branched-chain amino acids and risk of incident type 2 diabetes: results from the PREVEND prospective cohort study. J Clin Med 2018;7. 10.3390/jcm7120513. [Epub ahead of print: 04 12 2018].
    1. Shah NH, Milstein A, Bagley PhD SC. Making machine learning models clinically useful. JAMA 2019. 10.1001/jama.2019.10306. [Epub ahead of print: 08 Aug 2019] (published Online First: 2019/08/09).
    1. The Diabetes Prevention Program . The diabetes prevention program: baseline characteristics of the randomized cohort. The diabetes prevention program Research Group. Diabetes Care 2000;23:1619–29. 10.2337/diacare.23.11.1619
    1. Knowler WC, Barrett-Connor E, Fowler SE, et al. . Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. N Engl J Med 2002;346:393–403. 10.1056/NEJMoa012512
    1. Diabetes Prevention Program Research Group . Long-Term effects of lifestyle intervention or metformin on diabetes development and microvascular complications over 15-year follow-up: the diabetes prevention program outcomes study. Lancet Diabetes Endocrinol 2015;3:866–75. 10.1016/S2213-8587(15)00291-0
    1. Varga TV, Winters AH, Jablonski KA, et al. . Comprehensive analysis of established Dyslipidemia-Associated loci in the diabetes prevention program. Circ Cardiovasc Genet 2016;9:495–503. 10.1161/CIRCGENETICS.116.001457
    1. Hanson RL, Pratley RE, Bogardus C, et al. . Evaluation of simple indices of insulin sensitivity and insulin secretion for use in epidemiologic studies. Am J Epidemiol 2000;151:190–8. 10.1093/oxfordjournals.aje.a010187
    1. Connelly MA, Otvos JD, Shalaurova I, et al. . GlycA, a novel biomarker of systemic inflammation and cardiovascular disease risk. J Transl Med 2017;15:219. 10.1186/s12967-017-1321-6
    1. R Core Team R . R: a language and environment for statistic computing [program. Vienna, Austria: R Foundation for Statistic Computing, 2015.
    1. Kuhn M. Building Predictive Models in R Using the caret Package. J Stat Softw 2008;28. 10.18637/jss.v028.i05
    1. Chollet F. Deep learning with python. Shelter Island: Manning Publications, 2018.
    1. Chicco D. Ten quick tips for machine learning in computational biology. BioData Min 2017;10:35. 10.1186/s13040-017-0155-3
    1. Abderrahmani A, Niederhauser G, Favre D, et al. . Human high-density lipoprotein particles prevent activation of the JNK pathway induced by human oxidised low-density lipoprotein particles in pancreatic beta cells. Diabetologia 2007;50:1304–14. 10.1007/s00125-007-0642-z
    1. Waldman B, Jenkins AJ, Davis TME, et al. . Hdl-C and HDL-C/ApoA-I predict long-term progression of glycemia in established type 2 diabetes. Diabetes Care 2014;37:2351–8. 10.2337/dc13-2738
    1. Wang J, Stančáková A, Soininen P, et al. . Lipoprotein subclass profiles in individuals with varying degrees of glucose tolerance: a population-based study of 9399 Finnish men. J Intern Med 2012;272:562–72. 10.1111/j.1365-2796.2012.02562.x
    1. Harada PHN, Demler OV, Dugani SB, et al. . Lipoprotein insulin resistance score and risk of incident diabetes during extended follow-up of 20 years: The Women’s Health Study. J Clin Lipidol 2017;11:1257–67. 10.1016/j.jacl.2017.06.008
    1. Austin MA, Mykkänen L, Kuusisto J, et al. . Prospective study of small LDLs as a risk factor for non-insulin dependent diabetes mellitus in elderly men and women. Circulation 1995;92:1770–8. 10.1161/01.CIR.92.7.1770
    1. Lo A, Chernoff H, Zheng T, et al. . Why significant variables aren't automatically good predictors. Proc Natl Acad Sci U S A 2015;112:13892–7. 10.1073/pnas.1518285112
    1. Mosley JD, Gupta DK, Tan J, et al. . Predictive accuracy of a polygenic risk score compared with a clinical risk score for incident coronary heart disease. JAMA 2020;323:627–35. 10.1001/jama.2019.21782
    1. Guasch-Ferré M, Hruby A, Toledo E, et al. . Metabolomics in prediabetes and diabetes: a systematic review and meta-analysis. Diabetes Care 2016;39:833–46. 10.2337/dc15-2251
    1. Carvalho LSF, Benseñor IM, Nogueira AC. Increased particle size of triacylglycerol-enriched remnant lipoproteins, but not their plasma concentration or lipid content, augments risk prediction of incident type 2 diabetes. Diabetologia 2020:1–12.
    1. Pajouheshnia R, Groenwold RHH, Peelen LM, et al. . When and how to use data from randomised trials to develop or validate prognostic models. BMJ 2019;365:l2154. 10.1136/bmj.l2154
    1. Mora S, Otvos JD, Rifai N, et al. . Lipoprotein particle profiles by nuclear magnetic resonance compared with standard lipids and apolipoproteins in predicting incident cardiovascular disease in women. Circulation 2009;119:931–9. 10.1161/CIRCULATIONAHA.108.816181
    1. Diabetes Prevention Program Research Group . Long-Term effects of metformin on diabetes prevention: identification of subgroups that Benefited most in the diabetes prevention program and diabetes prevention program outcomes study. Diabetes Care 2019;42:601–8. 10.2337/dc18-1970
    1. Wagner-Golbs A, Neuber S, Kamlage B, et al. . Effects of long-term storage at −80 °C on the human plasma metabolome. Metabolites 2019;9:99. 10.3390/metabo9050099
    1. Schmidt MI, Bracco PA, Yudkin JS, et al. . Intermediate hyperglycaemia to predict progression to type 2 diabetes (ELSA-Brasil): an occupational cohort study in Brazil. Lancet Diabetes Endocrinol 2019;7:267–77. 10.1016/S2213-8587(19)30058-0
    1. Nielsen AB, Thorsen-Meyer H-C, Belling K, et al. . Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish national patient registry and electronic patient records. Lancet Digit Health 2019;1:e78–89. 10.1016/S2589-7500(19)30024-X
    1. Meikle PJ, Wong G, Barlow CK, et al. . Lipidomics: potential role in risk prediction and therapeutic monitoring for diabetes and cardiovascular disease. Pharmacol Ther 2014;143:12–23. 10.1016/j.pharmthera.2014.02.001
    1. Mamtani M, Kulkarni H, Wong G, et al. . Lipidomic risk score independently and cost-effectively predicts risk of future type 2 diabetes: results from diverse cohorts. Lipids Health Dis 2016;15:67. 10.1186/s12944-016-0234-3
    1. Suvitaival T, Bondia-Pons I, Yetukuri L, et al. . Lipidome as a predictive tool in progression to type 2 diabetes in Finnish men. Metabolism 2018;78:1–12. 10.1016/j.metabol.2017.08.014
    1. Fernandez C, Surma MA, Klose C, et al. . Plasma lipidome and prediction of type 2 diabetes in the population-based Malmö diet and cancer cohort. Diabetes Care 2020;43:366–73. 10.2337/dc19-1199
    1. Rhee EP, Cheng S, Larson MG, et al. . Lipid profiling identifies a triacylglycerol signature of insulin resistance and improves diabetes prediction in humans. J Clin Invest 2011;121:1402–11. 10.1172/JCI44442
    1. Razquin C, Toledo E, Clish CB, et al. . Plasma lipidomic profiling and risk of type 2 diabetes in the PREDIMED trial. Diabetes Care 2018;41:2617–24. 10.2337/dc18-0840
    1. Lu J, Lam SM, Wan Q, et al. . High-coverage targeted lipidomics reveals novel serum lipid predictors and lipid pathway dysregulation antecedent to type 2 diabetes onset in normoglycemic Chinese adults. Diabetes Care 2019;42:2117–26. 10.2337/dc19-0100
    1. Weng SF, Reps J, Kai J, et al. . Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One 2017;12:e0174944. 10.1371/journal.pone.0174944

Source: PubMed

3
Abonneren