Predicting patient-level new-onset atrial fibrillation from population-based nationwide electronic health records: protocol of FIND-AF for developing a precision medicine prediction model using artificial intelligence

Ramesh Nadarajah, Jianhua Wu, Alejandro F Frangi, David Hogg, Campbell Cowan, Chris Gale, Ramesh Nadarajah, Jianhua Wu, Alejandro F Frangi, David Hogg, Campbell Cowan, Chris Gale

Abstract

Introduction: Atrial fibrillation (AF) is a major cardiovascular health problem: it is common, chronic and incurs substantial healthcare expenditure because of stroke. Oral anticoagulation reduces the risk of thromboembolic stroke in those at higher risk; but for a number of patients, stroke is the first manifestation of undetected AF. There is a rationale for the early diagnosis of AF, before the first complication occurs, but population-based screening is not recommended. Previous prediction models have been limited by their data sources and methodologies. An accurate model that uses existing routinely collected data is needed to inform clinicians of patient-level risk of AF, inform national screening policy and highlight predictors that may be amenable to primary prevention.

Methods and analysis: We will investigate the application of a range of deep learning techniques, including an adapted convolutional neural network, recurrent neural network and Transformer, on routinely collected primary care data to create a personalised model predicting the risk of new-onset AF over a range of time periods. The Clinical Practice Research Datalink (CPRD)-GOLD dataset will be used for derivation, and the CPRD-AURUM dataset will be used for external geographical validation. Both comprise a sizeable representative population and are linked at patient-level to secondary care databases. The performance of the deep learning models will be compared against classic machine learning and traditional statistical predictive modelling methods. We will only use risk factors accessible in primary care and endow the model with the ability to update risk prediction as it is presented with new data, to make the model more useful in clinical practice.

Ethics and dissemination: Permissions for CPRD-GOLD and CPRD-AURUM datasets were obtained from CPRD (ref no: 19_076). The CPRD ethical approval committee approved the study. The results will be submitted as a research paper for publication to a peer-reviewed journal and presented at peer-reviewed conferences.

Trial registration details: A systematic review to incorporate within the overall project was registered on PROSPERO (registration number CRD42021245093). The study was registered on ClinicalTrials.gov (NCT04657900).

Keywords: adult cardiology; cardiac epidemiology; health informatics; pacing & electrophysiology; primary care.

Conflict of interest statement

Competing interests: None declared.

© Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY. Published by BMJ.

Figures

Figure 1
Figure 1
An example of how a patient’s EHR could be represented as a temporal matrix (A) compared with a sequence (B). In (A), time is on the x dimension and medical events are on the y dimension. In (B), the temporal information, in this example, is represented as intervisit interval through timestamps (eg, t2–t1). EHR, electronic health record.

References

    1. Hindricks G, Potpara T, Dagres N, et al. . 2020 ESC guidelines for the diagnosis and management of atrial fibrillation developed in collaboration with the European association for Cardio-Thoracic surgery (EACTS). Eur Heart J 2021;42:373–498. 10.1093/eurheartj/ehaa612
    1. Benjamin EJ, Virani SS, Callaway CW, et al. . Heart disease and stroke statistics-2018 update: a report from the American heart association. Circulation 2018;137:e67–492. 10.1161/CIR.0000000000000558
    1. Ruff CT, Giugliano RP, Braunwald E, et al. . Comparison of the efficacy and safety of new oral anticoagulants with warfarin in patients with atrial fibrillation: a meta-analysis of randomised trials. Lancet 2014;383:955–62. 10.1016/S0140-6736(13)62343-0
    1. Freedman B, Camm J, Calkins H, et al. . Screening for atrial fibrillation: a report of the AF-SCREEN International Collaboration. Circulation 2017;135:1851–67. 10.1161/CIRCULATIONAHA.116.026693
    1. Boriani G, Laroche C, Diemberger I, et al. . Asymptomatic atrial fibrillation: clinical correlates, management, and outcomes in the EORP-AF pilot General registry. Am J Med 2015;128:509–18. 10.1016/j.amjmed.2014.11.026
    1. Aronsson M, Svennberg E, Rosenqvist M, et al. . Cost-Effectiveness of mass screening for untreated atrial fibrillation using intermittent ECG recording. Europace 2015;17:1023–9. 10.1093/europace/euv083
    1. Hobbs FDR, Fitzmaurice DA, Mant J, et al. . A randomised controlled trial and cost-effectiveness study of systematic screening (targeted and total population screening) versus routine practice for the detection of atrial fibrillation in people aged 65 and over. The SAFE study. Health Technol Assess 2005;9:iii-iv, ix-x, 1-74. 10.3310/hta9400
    1. Svennberg E, Engdahl J, Al-Khalili F, et al. . Mass screening for untreated atrial fibrillation: the STROKESTOP study. Circulation 2015;131:2176–84. 10.1161/CIRCULATIONAHA.114.014343
    1. Committee UNS . The UK NSC recommendation on atrial fibrillation screening in adults, 2019. Available:
    1. Moons KGM, Kengne AP, Woodward M, et al. . Risk prediction models: I. Development, internal validation, and assessing the incremental value of a new (bio)marker. Heart 2012;98:683–90. 10.1136/heartjnl-2011-301246
    1. Attia ZI, Noseworthy PA, Lopez-Jimenez F, et al. . An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction. Lancet 2019;394:861–7. 10.1016/S0140-6736(19)31721-0
    1. Alonso A, Krijthe BP, Aspelund T, et al. . Simple risk model predicts incidence of atrial fibrillation in a racially and geographically diverse population: the CHARGE-AF Consortium. J Am Heart Assoc 2013;2:e000102. 10.1161/JAHA.112.000102
    1. Schnabel RB, Sullivan LM, Levy D, et al. . Development of a risk score for atrial fibrillation (Framingham heart study): a community-based cohort study. Lancet 2009;373:739–45. 10.1016/S0140-6736(09)60443-8
    1. Aronson D, Shalev V, Katz R, et al. . Risk score for prediction of 10-year atrial fibrillation: a community-based study. Thromb Haemost 2018;118:1556–63. 10.1055/s-0038-1668522
    1. Li Y-G, Pastori D, Farcomeni A, et al. . A Simple Clinical Risk Score (C2HEST) for Predicting Incident Atrial Fibrillation in Asian Subjects: Derivation in 471,446 Chinese Subjects, With Internal Validation and External Application in 451,199 Korean Subjects. Chest 2019;155:510–8. 10.1016/j.chest.2018.09.011
    1. Himmelreich JCL, Veelers L, Lucassen WAM, et al. . Prediction models for atrial fibrillation applicable in the community: a systematic review and meta-analysis. Europace 2020;22:684–94. 10.1093/europace/euaa005
    1. Himmelreich JC, Lucassen WA, Harskamp RE. CHARGE-AF in a national routine primary care electronic health records database in the Netherlands: validation for 5-year risk of atrial fibrillation and implications for patient selection in atrial fibrillation screening 2021;8:e001459.
    1. Kolek MJ, Graves AJ, Xu M, et al. . Evaluation of a prediction model for the development of atrial fibrillation in a Repository of electronic medical records. JAMA Cardiol 2016;1:1007–13. 10.1001/jamacardio.2016.3366
    1. Hill NR, Ayoubkhani D, McEwan P, et al. . Predicting atrial fibrillation in primary care using machine learning. PLoS One 2019;14:e0224582. 10.1371/journal.pone.0224582
    1. Weng SF, Reps J, Kai J, et al. . Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One 2017;12:e0174944. 10.1371/journal.pone.0174944
    1. . Health-atm: a deep architecture for multifaceted patient health record representation and risk prediction. Proceedings of the 2018 SIAM International Conference on Data Mining. SIAM, 2018.
    1. Chen D, Liu S, Kingsbury P, et al. . Deep learning and alternative learning strategies for retrospective real-world clinical data. NPJ Digit Med 2019;2:1–5. 10.1038/s41746-019-0122-0
    1. Herrett E, Gallagher AM, Bhaskaran K, et al. . Data resource profile: clinical practice research Datalink (CPRD). Int J Epidemiol 2015;44:827–36. 10.1093/ije/dyv098
    1. Herrett E, Thomas SL, Schoonen WM, et al. . Validation and validity of diagnoses in the general practice research database: a systematic review. Br J Clin Pharmacol 2010;69:4–14. 10.1111/j.1365-2125.2009.03537.x
    1. Wolf A, Dedman D, Campbell J, et al. . Data resource profile: clinical practice research Datalink (cprd) aurum. Int J Epidemiol 2019;48:1740–1740g. 10.1093/ije/dyz034
    1. Chisholm J. The read clinical classification. BMJ 1990;300:1092. 10.1136/bmj.300.6732.1092
    1. American Medical Informatics Association . SNOMED clinical terms: overview of the development process and project status. Proc AMIA Symp, 2001.
    1. Hippisley-Cox J, Coupland C, Brindle P. Development and validation of QRISK3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. BMJ 2017;357:j2099. 10.1136/bmj.j2099
    1. Ehrenstein V, Nielsen H, Pedersen AB, et al. . Clinical epidemiology in the era of big data: new opportunities, familiar challenges. Clin Epidemiol 2017;9:245–50. 10.2147/CLEP.S129779
    1. Riley RD, Snell KI, Ensor J, et al. . Minimum sample size for developing a multivariable prediction model: PART II - binary and time-to-event outcomes. Stat Med 2019;38:1276–96. 10.1002/sim.7992
    1. Cowan JC, Wu J, Hall M, et al. . A 10 year study of hospitalized atrial fibrillation-related stroke in England and its association with uptake of oral anticoagulation. Eur Heart J 2018;39:2975–83. 10.1093/eurheartj/ehy411
    1. Wu J, Alsaeed ES, Barrett J, et al. . Prescription of oral anticoagulants and antiplatelets for stroke prophylaxis in atrial fibrillation: nationwide time series ecological analysis. Europace 2020;22:1311–9. 10.1093/europace/euaa126
    1. Carpenter J, Kenward M. Multiple imputation and its application. John Wiley & Sons, 2012.
    1. Batista GEAPA, Monard MC. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 2003;17:519–33. 10.1080/713827181
    1. Kuan V, Denaxas S, Gonzalez-Izquierdo A, et al. . A chronological map of 308 physical and mental health conditions from 4 million individuals in the English National health service. Lancet Digit Health 2019;1:e63–77. 10.1016/S2589-7500(19)30012-3
    1. BNF publications. Available: [Accessed 22 Apr 2021].
    1. . Risk prediction with electronic health records: a deep learning approach. Proceedings of the 2016 SIAM International Conference on Data Mining, 2016.
    1. Che Z, Cheng Y, Sun Z. Exploiting convolutional neural network for risk prediction with medical feature embedding 2017.
    1. Wang Y-H, Nguyen P-A, Islam MM, et al. . Development of deep learning algorithm for detection of colorectal cancer in EHR data. Stud Health Technol Inform 2019;264:438–41. 10.3233/SHTI190259
    1. Suo Q, Ma F, Yuan Y, et al. . Deep patient similarity learning for personalized healthcare. IEEE Trans Nanobioscience 2018;17:219–27. 10.1109/TNB.2018.2837622
    1. Mikolov T, Sutskever I, Chen K. Distributed representations of words and phrases and their compositionality 2013.
    1. Zaremba W, Sutskever I, OJapa V. Recurrent neural network regularization 2014.
    1. Shickel B, Tighe PJ, Bihorac A, et al. . Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform 2018;22:1589–604. 10.1109/JBHI.2017.2767063
    1. PMLR . Doctor AI: predicting clinical events via recurrent neural networks. Machine learning for healthcare conference, 2016.
    1. Choi E, Schuetz A, Stewart WF, et al. . Using recurrent neural network models for early detection of heart failure onset. J Am Med Inform Assoc 2017;24:361–70. 10.1093/jamia/ocw112
    1. Choi E, Bahadori MT, Kulas JA. Retain: an interpretable predictive model for healthcare using reverse time attention mechanism 2016.
    1. . GRAM: graph-based attention model for healthcare representation learning. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017.
    1. . Dipole: diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017.
    1. Kwon BC, Choi M-J, Kim JT. Retainvis: visual analytics with interpretable and interactive recurrent neural networks on electronic medical records 2018;25:299–309.
    1. . Kame: knowledge-based attention model for diagnosis prediction in healthcare. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018.
    1. Choi E, Xiao C, Stewart WF. Mime: multilevel medical embedding of electronic health records for predictive healthcare 2018.
    1. Ayala Solares JR, Diletta Raimondi FE, Zhu Y, et al. . Deep learning for electronic health records: a comparative review of multiple deep neural architectures. J Biomed Inform 2020;101:103337. 10.1016/j.jbi.2019.103337
    1. Devlin J, Chang M-W, Lee K. Bert: Pre-training of deep bidirectional transformers for language understanding 2018.
    1. Vaswani A, Shazeer N, Parmar N. Attention is all you need 2017.
    1. Li Y, Rao S, Solares JRA. BEHRT: transformer for electronic health records 2020;10:1–12.
    1. Banerjee A, Chen S, Fatemifar G. Machine learning for subtype definition and risk prediction in heart failure acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility 2021;19:1–14.
    1. Huang Z, Dong W, Duan H, et al. . A regularized deep learning approach for clinical risk prediction of acute coronary syndrome using electronic health records. IEEE Trans Biomed Eng 2018;65:956–68. 10.1109/TBME.2017.2731158
    1. Steyerberg EW, Moons KGM, van der Windt DA, et al. . Prognosis research strategy (progress) 3: prognostic model research. PLoS Med 2013;10:e1001381. 10.1371/journal.pmed.1001381
    1. Collins GS, Reitsma JB, Altman DG, et al. . Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. The TRIPOD group. Circulation 2015;131:211–9. 10.1161/CIRCULATIONAHA.114.014508
    1. Nicholls SG, Quach P, von Elm E, et al. . The reporting of studies conducted using observational routinely-collected health data (record) statement: methods for arriving at consensus and developing reporting guidelines. PLoS One 2015;10:e0125620. 10.1371/journal.pone.0125620
    1. Kramer DB, Xu S, Kesselheim AS. Regulation of medical devices in the United States and European Union. The ethical challenges of emerging medical technologies. Taylor and Francis, 2020: 41–9.

Source: PubMed

3
Abonneren