Potential predictors of type-2 diabetes risk: machine learning, synthetic data and wearable health devices

Paola Stolfi, Ilaria Valentini, Maria Concetta Palumbo, Paolo Tieri, Andrea Grignolio, Filippo Castiglione, Paola Stolfi, Ilaria Valentini, Maria Concetta Palumbo, Paolo Tieri, Andrea Grignolio, Filippo Castiglione

Abstract

Background: The aim of a recent research project was the investigation of the mechanisms involved in the onset of type 2 diabetes in the absence of familiarity. This has led to the development of a computational model that recapitulates the aetiology of the disease and simulates the immunological and metabolic alterations linked to type-2 diabetes subjected to clinical, physiological, and behavioural features of prototypical human individuals.

Results: We analysed the time course of 46,170 virtual subjects, experiencing different lifestyle conditions. We then set up a statistical model able to recapitulate the simulated outcomes.

Conclusions: The resulting machine learning model adequately predicts the synthetic dataset and can, therefore, be used as a computationally-cheaper version of the detailed mathematical model, ready to be implemented on mobile devices to allow self-assessment by informed and aware individuals. The computational model used to generate the dataset of this work is available as a web-service at the following address: http://kraken.iac.rm.cnr.it/T2DM .

Keywords: Computational modeling; Emulator; Machine learning; Random forest; Synthetic data; T2D.

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
The dots represent the correlations between each couple of variables: the bigger the dots the higher the correlation in absolute value. Numerical value follows the color code in the bar.
Fig. 2
Fig. 2
Scatter plots of the independent versus the dependent variables, together with a polynomial fit in orange
Fig. 2
Fig. 2
Scatter plots of the independent versus the dependent variables, together with a polynomial fit in orange
Fig. 3
Fig. 3
Each row shows the out-of-sample (i.e. in the test set) scatter plots of the true and fitted (i.e. predicted) values of the variables specified in each panel’s caption (from left to right, BMI, GBL and TNF). Inset plots show the histogram of the out-of-sample residues’ (i.e. the prediction error). The last row shows that multivariate random forest performs better predictions when compared to the linear or polynomial regression
Fig. 3
Fig. 3
Each row shows the out-of-sample (i.e. in the test set) scatter plots of the true and fitted (i.e. predicted) values of the variables specified in each panel’s caption (from left to right, BMI, GBL and TNF). Inset plots show the histogram of the out-of-sample residues’ (i.e. the prediction error). The last row shows that multivariate random forest performs better predictions when compared to the linear or polynomial regression
Fig. 4
Fig. 4
Impact of each input variable x on the output y. Inset plot shows the same data in y-log scale to increase readability (y-scale is in arbitrary units). This plot offers a one-sight readout of the impact of subjects anthropometric measures and lifestyle patterns on the likelihood to progress toward a state of higher risk of development of diabetes
Fig. 5
Fig. 5
Top twelve pairwise co-influence on y calculated by method in [62]

References

    1. Organization, W.H. Media Centre. . Accessed 27 Sept 2016
    1. Donath MY, Schumann DM, Faulenbach M, Ellingsgaard H, Perren A, Ehses JA. Islet inflammation in type 2 diabetes. Diabetes Care. 2008;31(Supplement 2):161–164. doi: 10.2337/dc08-s243.
    1. Donath MY, Shoelson SE. Type 2 diabetes as an inflammatory disease. Nat Rev Immunol. 2011;11(2):98–107. doi: 10.1038/nri2925.
    1. Gregor MF, Hotamisligil GS. Inflammatory mechanisms in obesity. Annu Rev Immunol. 2011;29(1):415–445. doi: 10.1146/annurev-immunol-031210-101322.
    1. Akash MSH, Rehman K, Chen S. Role of inflammatory mechanisms in pathogenesis of type 2 diabetes mellitus. J Cell Biochem. 2013;114(3):525–531. doi: 10.1002/jcb.24402.
    1. Hotamisligil GS. Inflammation and metabolic disorders. Nature. 2006;444(7121):860–867. doi: 10.1038/nature05485.
    1. Hotamisligil GS, Erbay E. Nutrient sensing and inflammation in metabolic diseases. Nat Rev Immunol. 2008;8(12):923. doi: 10.1038/nri2449.
    1. Donath MY, Dalmas É, Sauter NS, BÉni-Schnetzler M. Inflammation in obesity and diabetes: islet dysfunction and therapeutic opportunity. Cell Metab. 2013;17(6):860–872. doi: 10.1016/j.cmet.2013.05.001.
    1. Castiglione F, Tieri P, De Graaf A, Franceschi C, Liò P, Van Ommen B, Mazzà C, Tuchel A, Bernaschi M, Samson C, Colombo T, Castellani GC, Capri M, Garagnani P, Salvioli S, Nguyen VA, Bobeldijk-Pastorova I, Krishnan S, Cappozzo A, Sacchetti M, Morettini M, Ernst M. The onset of type 2 diabetes: proposal for a multi-scale model. JMIR Res Protoc. 2013;2(2):44. doi: 10.2196/resprot.2854.
    1. Sacks J, Welch WJ, Mitchell TJ, Wynn HP. Design and analysis of computer experiments. Stat Sci. 1989;4(4):409–423. doi: 10.1214/ss/1177012413.
    1. Currin C, Mitchell T, Morris M, Ylvisaker D. Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments. J Am Stat Assoc. 1991;86(416):953–963. doi: 10.1080/01621459.1991.10475138.
    1. Meert K, Rijckaert M. Intelligent modelling in the chemical process industry with neural networks: a case study. Comput Chem Eng. 1998;22:587–593. doi: 10.1016/S0098-1354(98)00104-5.
    1. Banerjee S, Gelfand AE, Finley AO, Sang H. Gaussian predictive process models for large spatial data sets. J R Stat Soc Ser B (Stat Methodol) 2008;70(4):825–848. doi: 10.1111/j.1467-9868.2008.00663.x.
    1. Reichert P, White G, Bayarri MJ, Pitman EB. Mechanism-based emulation of dynamic simulation models: concept and application in hydrology. Comput Stat Data Anal. 2011;55(4):1638–1655. doi: 10.1016/j.csda.2010.10.011.
    1. Bhosekar A, Ierapetritou M. Advances in surrogate based modeling, feasibility analysis, and optimization: a review. Comput Chem Eng. 2018;108:250–267. doi: 10.1016/j.compchemeng.2017.09.017.
    1. Babic A, Bodemar G, Mathiesen U, Ahlfeldt H, Franzen L, Wigertz O. Machine learning to support diagnostics in the domain of asymptomatic liver disease. Medinfo. MEDINFO. 1995;8:809–813.
    1. Ellis RJ, Wang Z, Genes N, Ma’ayan A. Predicting opioid dependence from electronic health records with machine learning. BioData Min. 2019;12(1):3. doi: 10.1186/s13040-019-0193-0.
    1. Engchuan W, Dimopoulos AC, Tyrovolas S, Caballero FF, Sanchez-Niubo A, Arndt H, Ayuso-Mateos JL, Haro JM, Chatterji S, Panagiotakos DB. Sociodemographic indicators of health status using a machine learning approach and data from the english longitudinal study of aging (elsa) Med Sci Monit Int Med J Exp Clin Res. 2019;25:1994.
    1. Fernandes R, GL RD. A new approach to predict user mobility using semantic analysis and machine learning. J Med Syst. 2017;41(12):188. doi: 10.1007/s10916-017-0837-x.
    1. Fritz BA, Chen Y, Murray-Torres TM, Gregory S, Ben Abdallah A, Kronzer A, McKinnon SL, Budelier T, Helsten DL, Wildes TS, Sharma A, Avidan MS. Using machine learning techniques to develop forecasting algorithms for postoperative complications: protocol for a retrospective study. BMJ Open. 2018;8(4):e020124. doi: 10.1136/bmjopen-2017-020124.
    1. Fuscà E, Bolzon A, Buratin A, Ruffolo M, Berchialla P, Gregori D, Perissinotto E, Baldi I. Measuring caloric intake at the population level (notion): protocol for an experimental study. JMIR Res Protoc. 2019;8(3):12116. doi: 10.2196/12116.
    1. Kang J, Rancati T, Lee S, Oh JH, Kerns SL, Scott JG, Schwartz R, Kim S, Rosenstein BS. Machine learning and radiogenomics: lessons learned and future directions. Front Oncol. 2018;8:228. doi: 10.3389/fonc.2018.00228.
    1. Lacson RC, Baker B, Suresh H, Andriole K, Szolovits P, Lacson J. Eduardo: use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients. Clin Kidney J. 2018;12(2):206–212. doi: 10.1093/ckj/sfy049.
    1. Belizario GO, Junior RGB, Salvini R, Lafer B, da Silva Dias R. Predominant polarity classification and associated clinical variables in bipolar disorder: a machine learning approach. J Affect Disord. 2019;245:279–282. doi: 10.1016/j.jad.2018.11.051.
    1. Kurasawa H, Hayashi K, Fujino A, Takasugi K, Haga T, Waki K, Noguchi T, Ohe K. Machine-learning-based prediction of a missed scheduled clinical appointment by patients with diabetes. J Diabetes Sci Technol. 2016;10(3):730–736. doi: 10.1177/1932296815614866.
    1. Casanova R, Saldana S, Simpson SL, Lacy ME, Subauste AR, Blackshear C, Wagenknecht L, Bertoni AG. Prediction of incident diabetes in the jackson heart study using high-dimensional machine learning. PLoS ONE. 2016;11(10):e0163942. doi: 10.1371/journal.pone.0163942.
    1. Alghamdi M, Al-Mallah M, Keteyian S, Brawner C, Ehrman J, Sakr S. Predicting diabetes mellitus using smote and ensemble machine learning approach: the henry ford exercise testing (fit) project. PLoS ONE. 2017;12(7):e0179805. doi: 10.1371/journal.pone.0179805.
    1. Choi BG, Rha S-W, Kim SW, Kang JH, Park JY, Noh Y-K. Machine learning for the prediction of new-onset diabetes mellitus during 5-year follow-up in non-diabetic patients with cardiovascular risks. Yonsei Med J. 2019;60(2):191–199. doi: 10.3349/ymj.2019.60.2.191.
    1. Cinar A. Multivariable adaptive artificial pancreas system in type 1 diabetes. Curr Diabetes Rep. 2017;17(10):88. doi: 10.1007/s11892-017-0920-1.
    1. Basu S, Raghavan S, Wexler DJ, Berkowitz SA. Characteristics associated with decreased or increased mortality risk from glycemic therapy among patients with type 2 diabetes and high cardiovascular risk: Machine learning analysis of the accord trial. Diabetes Care. 2018;41(3):604–612. doi: 10.2337/dc17-2252.
    1. Farran B, AlWotayan R, Alkandari H, Al-Abdulrazzaq D, Channanath A, Thanaraj TA. Use of non-invasive parameters and machine-learning algorithms for predicting future risk of type 2 diabetes: a retrospective cohort study of health data from kuwait. Front Endocrinol. 2019;10:624. doi: 10.3389/fendo.2019.00624.
    1. Klonoff DC, Gutierrez A, Fleming A, Kerr D. Real-world evidence should be used in regulatory decisions about new pharmaceutical and medical device products for diabetes. Los Angeles: SAGE Publications; 2019.
    1. Alimadadi A, Aryal S, Manandhar I, Munroe PB, Joe B, Cheng X. Artificial intelligence and machine learning to fight covid-19. Physiol Genom. 2020;52(4):200–202. doi: 10.1152/physiolgenomics.00029.2020.
    1. Tárnok A. Machine learning, covid-19 (2019-ncov), and multi-omics. Cytometry Part A. 2020;97(3):215–216. doi: 10.1002/cyto.a.23990.
    1. Castiglione F, Diaz V, Gaggioli A, Liò P, Mazzà C, Merelli E, Meskers CGM, Pappalardo F, von Ammon R. Physio-environmental sensing and live modeling. Interact J Med Res. 2013;2(1):3. doi: 10.2196/ijmr.2092.
    1. Yoram V, Csete M, Bartels J, Chang S, An G. Translational systems biology of inflammation. PLoS Comput Biol. 2008;4(4):1–6.
    1. Palumbo MC, Morettini M, Tieri P, Diele F, Sacchetti M, Castiglione F. Personalizing physical exercise in a computational model of fuel homeostasis. PLoS Comput Biol. 2018;14(4):e1006073. doi: 10.1371/journal.pcbi.1006073.
    1. Palumbo M, Morettini M, Tieri P, de Graaf A, Krishnan S, Castiglione F. Modeling meal consumption and physical exercise for fuel homeostasis (2020) (in preparation)
    1. Kim J, Saidel GM, Cabrera ME. Multi-scale computational model of fuel homeostasis during exercise: effect of hormonal control. Ann Biomed Eng. 2007;35(1):69–90. doi: 10.1007/s10439-006-9201-x.
    1. Saunders PT, Koeslag JH, Wessels JA. Integral rein control in physiology. J Theore Biol. 1998;194(2):163–173. doi: 10.1006/jtbi.1998.0746.
    1. Roy A, Parker RS. Dynamic modeling of exercise effects on plasma glucose and insulin levels. IFAC Proc Vol. 2006;39(2):509–514. doi: 10.3182/20060402-4-BR-2902.00509.
    1. Kildegaard J, Christensen TF, Johansen MD, Randløv J, Hejlesen OK. Modeling the effect of blood glucose and physical exercise on plasma adrenaline in people with type 1 diabetes. Diabetes Technol Therapeut. 2007;9(6):501–508. doi: 10.1089/dia.2007.0242.
    1. Dalla Man C, Camilleri M, Cobelli C. A system model of oral glucose absorption: validation on gold standard data. IEEE Trans Biomed Eng. 2006;53(12):2472–2478. doi: 10.1109/TBME.2006.883792.
    1. Elashoff JD, Reedy TJ, Meyer JH. Analysis of gastric emptying data. Gastroenterology. 1982;83(6):1306–1312. doi: 10.1016/S0016-5085(82)80145-5.
    1. Palumbo M, Morettini M, Tieri P, de Graaf A, Liò P, Diele F, Castiglione F. An integrated multi-scale model for the simulation and prediction of metabolic and inflammatory processes in the onset and progress of type 2 diabetes (in preparation) (2020)
    1. Mifflin MD, St Jeor ST, Hill LA, Scott BJ, Daugherty SA, Koh YO. A new predictive equation for resting energy expenditure in healthy individuals. Am J Clin Nutr. 1990;51(2):241–247. doi: 10.1093/ajcn/51.2.241.
    1. Westerterp KR, Donkers JHHLM, Fredrix EWHM, Oekhoudt P. Energy intake, physical activity and body weight: a simulation model. Br J Nutr. 1995;73(3):337–347. doi: 10.1079/BJN19950037.
    1. Prana V, Tieri P, Palumbo MC, Mancini E, Castiglione F. Modeling the effect of high calorie diet on the interplay between adipose tissue, inflammation, and diabetes. Comput Math Methods Med 2019;2019
    1. Morettini M, Palumbo MC, Sacchetti M, Castiglione F, Mazza C. A system model of the effects of exercise on plasma interleukin-6 dynamics in healthy individuals: role of skeletal muscle and adipose tissue. PLoS ONE. 2017;12(7):e0181224. doi: 10.1371/journal.pone.0181224.
    1. Bernaschi M, Castiglione F. Design and implementation of an immune system simulator. Comput Biol Med. 2001;31(5):303–331. doi: 10.1016/S0010-4825(01)00011-7.
    1. Castiglione F, Duca K, Jarrah A, Laubenbacher R, Hochberg D, Thorley-Lawson D. Simulating Epstein-Barr virus infection with C-ImmSim. Bioinformatics. 2007;23(11):1371–1377. doi: 10.1093/bioinformatics/btm044.
    1. Pappalardo F, Lollini P-L, Castiglione F, Motta S. Modeling and simulation of cancer immunoprevention vaccine. Bioinformatics. 2005;21(12):2891–2897. doi: 10.1093/bioinformatics/bti426.
    1. Mancini E, Quax R, De Luca A, Fidler S, Stohr W, Sloot PM. A study on the dynamics of temporary hiv treatment to assess the controversial outcomes of clinical trials: an in-silico approach. PLoS ONE. 2018;13(7):e0200892. doi: 10.1371/journal.pone.0200892.
    1. Baldazzi V, Paci P, Bernaschi M, Castiglione F. Modeling lymphocyte homing and encounters in lymph nodes. BMC Bioinform. 2009;10(1):387. doi: 10.1186/1471-2105-10-387.
    1. Castiglione F, Tieri P, Palma A, Jarrah AS. Statistical ensemble of gene regulatory networks of macrophage differentiation. BMC Bioinform. 2016;17(19):506. doi: 10.1186/s12859-016-1363-4.
    1. Madonia A, Melchiorri C, Bonamano S, Marcelli M, Bulfon C, Castiglione F, Galeotti M, Volpatti D, Mosca F, Tiscar P-G, Romano N. Computational modeling of immune system of the fish for a more effective vaccination in aquaculture. Bioinformatics. 2017;33(19):3065–3071. doi: 10.1093/bioinformatics/btx341.
    1. Melanson EL, Keadle SK, Donnelly JE, Braun B, King NA. Resistance to exercise-induced weight loss: compensatory behavioral adaptations. Med Sci Sports Exerc. 2013;45(8):1600. doi: 10.1249/MSS.0b013e31828ba942.
    1. Westerterp KR. Diet induced thermogenesis. Nutr Metab. 2004;1(1):5. doi: 10.1186/1743-7075-1-5.
    1. Atwater WO, Bryant AP. The chemical composition of American food materials. Washington: US Government Printing Office; 1906.
    1. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32. doi: 10.1023/A:1010933404324.
    1. Friedman J, Hastie T, Tibshirani R. The elements of statistical learning. New York: Springer; 2001.
    1. Ishwaran H. Variable importance in binary regression trees and forests. Electron J Stat. 2007;1:519–537. doi: 10.1214/07-EJS039.
    1. Franceschi C, Garagnani P, Parini P, Giuliani C, Santoro A. Inflammaging: a new immune-metabolic viewpoint for age-related diseases. Nat Rev Endocrinol. 2018;14(10):576–590. doi: 10.1038/s41574-018-0059-4.
    1. Ishwaran H, Kogalur UB, Gorodeski EZ, Minn AJ, Lauer MS. High-dimensional variable selection for survival data. J Am Stat Assoc. 2010;105(489):205–217. doi: 10.1198/jasa.2009.tm08622.
    1. Ishwaran H, Kogalur UB, Chen X, Minn AJ. Random survival forests for high-dimensional data. Stat Anal Data Min ASA Data Sci J. 2011;4(1):115–132. doi: 10.1002/sam.10103.
    1. Ashrafzadeh S, Hamdy O. Patient-driven diabetes care of the future in the technology era. Cell Metab. 2019;29(3):564–575. doi: 10.1016/j.cmet.2018.09.005.
    1. Basch E, Schrag D. The evolving uses of “real-world” data. JAMA. 2019;321:1359–1360. doi: 10.1001/jama.2019.4064.
    1. Stolfi P, Valentini I, Palumbo MC, Tieri P, Grignolio A, Castiglione F. Potential predictors of type-2 diabetes risk: machine learning, synthetic data and wearable health devices. In: 2019 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 2214–2221 (2019)

Source: PubMed

3
Subskrybuj