Clinical characteristics and prognostic factors for Crohn's disease relapses using natural language processing and machine learning: a pilot study

Fernando Gomollón, Javier P Gisbert, Iván Guerra, Rocío Plaza, Ramón Pajares Villarroya, Luis Moreno Almazán, Mª Carmen López Martín, Mercedes Domínguez Antonaya, María Isabel Vera Mendoza, Jesús Aparicio, Vicente Martínez, Ignacio Tagarro, Alonso Fernández-Nistal, Sara Lumbreras, Claudia Maté, Carmen Montoto, Premonition-CD Study Group, Fernando Gomollón, Javier P Gisbert, Iván Guerra, Rocío Plaza, Ramón Pajares Villarroya, Luis Moreno Almazán, Mª Carmen López Martín, Mercedes Domínguez Antonaya, María Isabel Vera Mendoza, Jesús Aparicio, Vicente Martínez, Ignacio Tagarro, Alonso Fernández-Nistal, Sara Lumbreras, Claudia Maté, Carmen Montoto, Premonition-CD Study Group

Abstract

Background: The impact of relapses on disease burden in Crohn's disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics and treatment with biologics of CD patients and generate a data-driven predictive model for relapse using natural language processing (NLP) and machine learning (ML).

Methods: We performed a multicenter, retrospective study using a previously validated corpus of CD patient data from eight hospitals of the Spanish National Healthcare Network from 1 January 2014 to 31 December 2018 using NLP. Predictive models were created with ML algorithms, namely, logistic regression, decision trees, and random forests.

Results: CD phenotype, analyzed in 5938 CD patients, was predominantly inflammatory, and tobacco smoking appeared as a risk factor, confirming previous clinical studies. We also documented treatments, treatment switches, and time to discontinuation in biologics-treated CD patients. We found correlations between CD and patient family history of gastrointestinal neoplasms. Our predictive model ranked 25 000 variables for their potential as risk factors for CD relapse. Of highest relative importance were past relapses and patients' age, as well as leukocyte, hemoglobin, and fibrinogen levels.

Conclusion: Through NLP, we identified variables such as smoking as a risk factor and described treatment patterns with biologics in CD patients. CD relapse prediction highlighted the importance of patients' age and some biochemistry values, though it proved highly challenging and merits the assessment of risk factors for relapse in a clinical setting.

Trial registration: ClinicalTrials.gov NCT03668249.

Conflict of interest statement

Dr. F. Gomollón has received educational grants from Janssen, MSD, Takeda, and Abbvie, and nonpersonal investigation grants from MSD, Janssen, Abbvie, Takeda, and Tilllots. Dr. J.P. Gisbert has served as a speaker, a consultant, and advisory member for, or has received research funding from, MSD, Abbvie, Hospira, Pfizer, Kern Pharma, Biogen, Takeda, Janssen, Roche, Sandoz, Celgene, Ferring, Faes Farma, Shire Pharmaceuticals, Dr. Falk Pharma, Tillotts Pharma, Chiesi, Casen Fleet, Gebro Pharma, Otsuka Pharmaceutical, and Vifor Pharma. Dr. I. Guerra has served as a speaker, a consultant, and advisory member for, or has received research funding from, Kern Pharma, Takeda and Janssen. Dr. R. Plaza has served as a speaker for Takeda and Janssen. Dr. M.I. Vera has served as a speaker, consultant, and advisory member for, or has received funding from, MSD, Abbvie, Pfizer, Ferring, Shire Pharmaceuticals, Takeda and Jannsen. Jesús Aparicio, Vicente Martínez, Ignacio Tagarro, Alonso Fernández-Nistal, and Carmen Montoto are employees at Takeda Farmacéutica España S.A. S. Lumbreras is an employee at Universidad Pontificia Comillas, Madrid. Dr. C. Maté is an employee at Medsavana S.L. The remaining authors have no conflicts of interest to declare.

Copyright © 2021 The Author(s). Published by Wolters Kluwer Health, Inc.

Figures

Fig. 1
Fig. 1
Study design and timeline. For each patient in the database, the Index Date (i.e., Baseline) was defined as the timepoint when diagnostic criteria for CD is first identified. All available EHRs before January 2014 were handled to extract information regarding the clinical history of patients (dotted line). The follow-up period ranged from the index date to the end of the study period or the last data point available. Data from patients’ EHRs were extracted and organized with the EHRead technology. See the Methods section for further details. CD, Crohn’s disease; EHR, electronic health record.

References

    1. Baumgart DC, Sandborn WJ. Crohn’s disease. Lancet 2012; 380:1590–1605.
    1. Veauthier B, Hornecker JR. Crohn’s disease: diagnosis and management. Am Fam Physician 2018; 98:661–669.
    1. Torres J, Mehandru S, Colombel JF, Peyrin-Biroulet L. Crohn’s disease. Lancet 2017; 389:1741–1755.
    1. Golan D, Gross B, Miller A, Klil-Drori S, Lavi I, Shiller M, et al. . Cognitive function of patients with Crohn’s disease is associated with intestinal disease activity. Inflamm Bowel Dis 2016; 22:364–371.
    1. van Langenberg DR, Yelland GW, Robinson SR, Gibson PR. Cognitive impairment in Crohn’s disease is associated with systemic inflammation, symptom burden and sleep disturbance. United European Gastroenterol J 2017; 5:579–587.
    1. Barberio B, Zamani M, Black CJ, Savarino EV, Ford AC. Prevalence of symptoms of anxiety and depression in patients with inflammatory bowel disease: a systematic review and meta-analysis. Lancet Gastroenterol Hepatol 2021; 6:359–370
    1. Loftus EV, Jr, Schoenfeld P, Sandborn WJ. The epidemiology and natural history of Crohn’s disease in population-based patient cohorts from North America: a systematic review. Aliment Pharmacol Ther 2002; 16:51–60.
    1. Panes J, Reinisch W, Rupniewska E, Khan S, Forns J, Khalid JM, et al. . Burden and outcomes for complex perianal fistulas in Crohn’s disease: systematic review. World J Gastroenterol 2018; 24:4821–4834.
    1. Floyd DN, Langham S, Séverac HC, Levesque BG. The economic and quality-of-life burden of Crohn’s disease in Europe and the United States, 2000 to 2013: a systematic review. Dig Dis Sci 2015; 60:299–312.
    1. Kawalec P. Indirect costs of inflammatory bowel diseases: Crohn’s disease and ulcerative colitis. A systematic review. Arch Med Sci 2016; 12:295–302.
    1. Lichtenstein GR, Shahabi A, Seabury SA, Lakdawalla DN, Espinosa OD, Green S, et al. . Lifetime economic burden of Crohn’s disease and ulcerative colitis by age at diagnosis. Clin Gastroenterol Hepatol 2020; 18:889–897.e10.
    1. Le Berre C, Ananthakrishnan AN, Danese S, Singh S, Peyrin-Biroulet L. Ulcerative colitis and Crohn’s disease have similar burden and goals for treatment. Clin Gastroenterol Hepatol 2020; 18:14–23.
    1. Bounthavong M, Li M, Watanabe JH. An evaluation of health care expenditures in Crohn’s disease using the United States Medical Expenditure Panel Survey from 2003 to 2013. Res Social Adm Pharm 2017; 13:530–538.
    1. Balfour Sartor R. Enteric microflora in IBD: pathogens or commensals? Inflamm Bowel Dis 1997; 3:230–235.
    1. Soon IS, Molodecky NA, Rabi DM, Ghali WA, Barkema HW, Kaplan GG. The relationship between urban environment and the inflammatory bowel diseases: a systematic review and meta-analysis. BMC Gastroenterol 2012; 12:51.
    1. Ananthakrishnan AN. Epidemiology and risk factors for IBD. Nat Rev Gastroenterol Hepatol 2015; 12:205–217.
    1. Wisniewski A, Danese S, Peyrin-Biroulet L. Evolving treatment algorithms in Crohn’s disease. Curr Drug Targets 2018; 19:782–790.
    1. Peyrin-Biroulet L, Reinisch W, Colombel JF, Mantzaris GJ, Kornbluth A, Diamond R, et al. . Clinical disease activity, C-reactive protein normalisation and mucosal healing in Crohn’s disease in the SONIC trial. Gut 2014; 63:88–95.
    1. Braun T, Di Segni A, BenShoshan M, Neuman S, Levhar N, Bubis M, et al. . Individualized dynamics in the gut microbiota precede Crohn’s disease flares. Am J Gastroenterol 2019; 114:1142–1151.
    1. Burakoff R, Pabby V, Onyewadume L, Odze R, Adackapara C, Wang W, et al. . Blood-based biomarkers used to predict disease activity in Crohn’s disease and ulcerative colitis. Inflamm Bowel Dis 2015; 21:1132–1140.
    1. Parkes M, Noor NM, Dowling F, Leung H, Bond S, Whitehead L, et al. . PRedicting Outcomes For Crohn’s dIsease using a moLecular biomarkEr (PROFILE): protocol for a multicentre, randomised, biomarker-stratified trial. BMJ Open 2018; 8:e026767.
    1. Ghaly S, Murray K, Baird A, Martin K, Prosser R, Mill J, et al. . High vitamin D-binding protein concentration, low albumin, and mode of remission predict relapse in Crohn’s disease. Inflamm Bowel Dis 2016; 22:2456–2464.
    1. Karoui S, Ouerdiane S, Serghini M, Jomni T, Kallel L, Fekih M, et al. . Correlation between levels of C-reactive protein and clinical activity in Crohn’s disease. Dig Liver Dis 2007; 39:1006–1010.
    1. Dasari A, Shen C, Halperin D, Zhao B, Zhou S, Xu Y, et al. . Trends in the incidence, prevalence, and survival outcomes in patients with neuroendocrine tumors in the United States. JAMA Oncol 2017; 3:1335–1342.
    1. Kiernan MC, Vucic S, Cheah BC, Turner MR, Eisen A, Hardiman O, et al. . Amyotrophic lateral sclerosis. Lancet 2011; 377:942–955.
    1. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. JAMA 2014; 311:2479–2480.
    1. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA 2013; 309:1351–1352.
    1. Kong HJ. Managing unstructured big data in healthcare system. Healthc Inform Res 2019; 25:1–2.
    1. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JP. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc 2017; 24:198–208.
    1. Zeiberg D, Prahlad T, Nallamothu BK, Iwashyna TJ, Wiens J, Sjoding MW. Machine learning for patient risk stratification for acute respiratory distress syndrome. PLoS One 2019; 14:e0214465.
    1. Moon KA, Pollak J, Hirsch AG, Aucott JN, Nordberg C, Heaney CD, et al. . Epidemiology of Lyme disease in Pennsylvania 2006-2014 using electronic health records. Ticks Tick Borne Dis 2019; 10:241–50.
    1. Qiao Z, Sun N, Li X, Xia E, Zhao S, Qin Y. Using machine learning approaches for emergency room visit prediction based on electronic health record data. Stud Health Technol Inform 2018; 247:111–115.
    1. Kurowski JA, Milinovich A, Ji X, Bauman J, Sugano D, Kattan MW, Achkar JP. Differences in biologic utilization and surgery rates in pediatric and adult Crohn’s disease: results from a large electronic medical record-derived cohort. Inflamm Bowel Dis 2020; 27:1035–1044.
    1. Gubatan J, Levitte S, Patel A, Balabanis T, Wei MT, Sinha SR. Artificial intelligence applications in inflammatory bowel disease: emerging technologies and future directions. World J Gastroenterol 2021; 27:1920–1935.
    1. Ananthakrishnan AN, Cai T, Savova G, Cheng SC, Chen P, Perez RG, et al. . Improving case definition of Crohn’s disease and ulcerative colitis in electronic medical records using natural language processing: a novel informatics approach. Inflamm Bowel Dis 2013; 19:1411–1420.
    1. Hernandez Medrano ITG, J, Belda C, Urena A, Salcedo I, Espinosa-Anke L, Saggion H. Savana: re-using electronic health records with artificial intelligence. Int J Interact Multimed Artif Intel 2017; 4:8–12.
    1. Graziani D, Soriano JB, Del Rio-Bermudez C, Morena D, Díaz T, Castillo M, et al. . Characteristics and prognosis of COVID-19 in patients with COPD. J Clin Med 2020; 9:E3259.
    1. Ancochea J, Izquierdo JL, Medrano IH, Porras A, Serrano M, Lumbreras S, et al. . Evidence of gender differences in the diagnosis and management of COVID-19 patients: an analysis of electronic health records using natural language processing and machine learning. J Women Health 2020;In press.
    1. Izquierdo JL, Ancochea J, Soriano JB; Savana COVID-19 Research Group. Clinical characteristics and prognostic factors for intensive care unit admission of patients with COVID-19: retrospective study using machine learning and natural language processing. J Med Internet Res 2020; 22:e21801.
    1. Izquierdo JL, Almonacid C, González Y, Del Rio-Bermúdez C, Ancochea J, Cárdenas R, et al. . The impact of COVID-19 on patients with asthma. Eur Respir J 2020; 57:2003142.
    1. Izquierdo JL, Morena D, González Y, Paredero JM, Pérez B, Graziani D, et al. . Clinical management of COPD in a real-world setting. A big data analysis. Arch Bronconeumol (Engl Ed) 2021; 57:94–100.
    1. Canales L, Menke S, Marchesseau S, D’Agostino A, Del Rio-Bermudez C, Taberna M, Tello J. Assessing the performance of clinical natural language processing systems: development of an evaluation methodology. JMIR Med Inform 2021; 9:e20492.
    1. Silverberg MS, Satsangi J, Ahmad T, Arnott ID, Bernstein CN, Brant SR, et al. . Toward an integrated clinical, molecular and serological classification of inflammatory bowel disease: report of a Working Party of the 2005 Montreal World Congress of Gastroenterology. Can J Gastroenterol 2005; 19 Suppl A:5A–36A.
    1. Seyed Tabib NS, Madgwick M, Sudhakar P, Verstockt B, Korcsmaros T, Vermeire S. Big data in IBD: big progress for clinical practice. Gut 2020; 69:1520–1532.
    1. Tong Y, Lu K, Yang Y, Li J, Lin Y, Wu D, et al. . Can natural language processing help differentiate inflammatory intestinal diseases in China? Models applying random forest and convolutional neural network approaches. BMC Med Inform Decis Mak 2020; 20:248.
    1. Dong Y, Xu L, Fan Y, Xiang P, Gao X, Chen Y, et al. . A novel surgical predictive model for Chinese Crohn’s disease patients. Medicine (Baltimore) 2019; 98:e17510.
    1. Wang L, Fan R, Zhang C, Hong L, Zhang T, Chen Y, et al. . Applying machine learning models to predict medication nonadherence in Crohn’s disease maintenance therapy. Patient Prefer Adherence 2020; 14:917–926.
    1. Aniwan S, Park SH, Loftus EV, Jr. Epidemiology, natural history, and risk stratification of Crohn’s disease. Gastroenterol Clin North Am 2017; 46:463–480.
    1. Kayar Y, Baran B, Ormeci AC, Akyuz F, Demir K, Besisik F, Kaymakoglu S. Risk factors associated with progression to intestinal complications of Crohn disease. Chin Med J (Engl) 2019; 132:2423–2429.
    1. Lichtenstein GR, Loftus EV, Isaacs KL, Regueiro MD, Gerson LB, Sands BE. ACG Clinical Guideline: management of Crohn’s disease in adults. Am J Gastroenterol 2018; 113:481–517.
    1. Maaser C, Langholz E, Gordon H, Burisch J, Ellul P, Ramirez VH, et al. . European Crohn’s and colitis organisation topical review on environmental factors in IBD. J Crohns Colitis 2017; 11:905–920.
    1. Feuerstein JD, Cheifetz AS. Crohn disease: epidemiology, diagnosis, and management. Mayo Clin Proc 2017; 92:1088–1103.
    1. Gajendran M, Loganathan P, Catinella AP, Hashash JG. A comprehensive review and update on Crohn’s disease. Dis Mon 2018; 64:20–57.
    1. Cholapranee A, Hazlewood GS, Kaplan GG, Peyrin-Biroulet L, Ananthakrishnan AN. Systematic review with meta-analysis: comparative efficacy of biologics for induction and maintenance of mucosal healing in Crohn’s disease and ulcerative colitis controlled trials. Aliment Pharmacol Ther 2017; 45:1291–1302.
    1. Torres J, Bonovas S, Doherty G, Kucharzik T, Gisbert JP, Raine T, et al. . ECCO Guidelines on therapeutics in Crohn’s disease: medical treatment. J Crohns Colitis 2020; 14:4–22.
    1. Lu DG, Ji XQ, Liu X, Li HJ, Zhang CQ. Pulmonary manifestations of Crohn’s disease. World J Gastroenterol 2014; 20:133–141.
    1. Hong CJ, Kaur MN, Farrokhyar F, Thoma A. Accuracy and completeness of electronic medical records obtained from referring physicians in a Hamilton, Ontario, plastic surgery practice: a prospective feasibility study. Plast Surg (Oakv) 2015; 23:48–50.
    1. Del Rio-Bermudez C, Medrano IH, Yebes L, Poveda JL. Towards a symbiotic relationship between big data, artificial intelligence, and hospital pharmacy. J Pharm Policy Pract 2020; 13:75.
    1. Lai FW, Kant JA, Dombagolla MH, Hendarto A, Ugoni A, Taylor DM. Variables associated with completeness of medical record documentation in the emergency department. Emerg Med Australas 2019; 31:632–638.
    1. Wu CHK, Luk SMH, Holder RL, Rodrigues Z, Ahmed F, Murdoch I. How do paper and electronic records compare for completeness? A three centre study. Eye (Lond) 2018; 32:1232–1236.

Source: PubMed

3
購読する