Accuracy of Consumer Wearable Heart Rate Measurement During an Ecologically Valid 24-Hour Period: Intraindividual Validation Study

Benjamin W Nelson, Nicholas B Allen, Benjamin W Nelson, Nicholas B Allen

Abstract

Background: Wrist-worn smart watches and fitness monitors (ie, wearables) have become widely adopted by consumers and are gaining increased attention from researchers for their potential contribution to naturalistic digital measurement of health in a scalable, mobile, and unobtrusive way. Various studies have examined the accuracy of these devices in controlled laboratory settings (eg, treadmill and stationary bike); however, no studies have investigated the heart rate accuracy of wearables during a continuous and ecologically valid 24-hour period of actual consumer device use conditions.

Objective: The aim of this study was to determine the heart rate accuracy of 2 popular wearable devices, the Apple Watch 3 and Fitbit Charge 2, as compared with the gold standard reference method, an ambulatory electrocardiogram (ECG), during consumer device use conditions in an individual. Data were collected across 5 daily conditions, including sitting, walking, running, activities of daily living (ADL; eg, chores, brushing teeth), and sleeping.

Methods: One participant, (first author; 29-year-old Caucasian male) completed a 24-hour ecologically valid protocol by wearing 2 popular wrist wearable devices (Apple Watch 3 and Fitbit Charge 2). In addition, an ambulatory ECG (Vrije Universiteit Ambulatory Monitoring System) was used as the gold standard reference method, which resulted in the collection of 102,740 individual heartbeats. A single-subject design was used to keep all variables constant except for wearable devices while providing a rapid response design to provide initial assessment of wearable accuracy for allowing the research cycle to keep pace with technological advancements. Accuracy of these devices compared with the gold standard ECG was assessed using mean error, mean absolute error, and mean absolute percent error. These data were supplemented with Bland-Altman analyses and concordance class correlation to assess agreement between devices.

Results: The Apple Watch 3 and Fitbit Charge 2 were generally highly accurate across the 24-hour condition. Specifically, the Apple Watch 3 had a mean difference of -1.80 beats per minute (bpm), a mean absolute error percent of 5.86%, and a mean agreement of 95% when compared with the ECG across 24 hours. The Fitbit Charge 2 had a mean difference of -3.47 bpm, a mean absolute error of 5.96%, and a mean agreement of 91% when compared with the ECG across 24 hours. These findings varied by condition.

Conclusions: The Apple Watch 3 and the Fitbit Charge 2 provided acceptable heart rate accuracy (<±10%) across the 24 hour and during each activity, except for the Apple Watch 3 during the daily activities condition. Overall, these findings provide preliminary support that these devices appear to be useful for implementing ambulatory measurement of cardiac activity in research studies, especially those where the specific advantages of these methods (eg, scalability, low participant burden) are particularly suited to the population or research question.

Keywords: Apple Watch 3; Fitbit Charge 2; digital health; electrocardiography; heart rate; mobile health; passive sensing; photoplethysmography; wearables.

Conflict of interest statement

Conflicts of Interest: None declared.

©Benjamin W Nelson, Nicholas B Allen. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 11.03.2019.

Figures

Figure 1
Figure 1
Rainbow plot of heart rate observations for electrocardiogram (ECG), Fitbit Charge 2, and Apple Watch 3. bpm: beats per minute.
Figure 2
Figure 2
Fitbit Charge 2 (top) and Apple Watch 3 (bottom) compared to the electrocardiogram (ECG) across 24-hours. bpm: beats per minute.
Figure 3
Figure 3
Mean absolute percent error (MAPE) by device across types of activities. Note: Horizontal line represents threshold for validity.
Figure 4
Figure 4
Bland-Altman plot and density plots across 24-hours of the Apple Watch 3 (left) with 394 heart rate observations and Fitbit Charge 2 (right) with 1425 heart rate observations.
Figure 5
Figure 5
Bland-Altman plots by daily activity. Left: Apple Watch 3 during sitting; right: Fitbit Charge 2 during sitting.
Figure 6
Figure 6
Bland-Altman plots by daily activity. Left: Apple Watch 3 during walking; right: Fitbit Charge 2 during walking.
Figure 7
Figure 7
Bland-Altman plots by daily activity. Left: Apple Watch 3 during running; right: Fitbit Charge 2 during running.
Figure 8
Figure 8
Bland-Altman plots by daily activity. Left: Apple Watch 3 during activities of daily living; right: Fitbit Charge 2 during activities of daily living.
Figure 9
Figure 9
Bland-Altman plots by daily activity. Left: Apple Watch 3 during sleep; right: Fitbit Charge 2 during sleep.

References

    1. Lang M. Beyond Fitbit: a critical appraisal of optical heart rate monitoring wearables and apps, their current limitations and legal implications. Alb L J Sci Tech. 2017;28(39):39–72.
    1. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res. 1999;8(2):135–160.
    1. Boudreaux BD, Hebert EP, Hollander DB, Williams BM, Cormier CL, Naquin MR, Gillan WW, Gusew EE, Kraemer RR. Validity of wearable activity monitors during cycling and resistance exercise. Med Sci Sports Exerc. 2018 Dec;50(3):624–633. doi: 10.1249/MSS.0000000000001471.
    1. de Zambotti M, Baker FC, Willoughby AR, Godino JG, Wing D, Patrick K, Colrain IM. Measures of sleep and cardiac functioning during sleep using a multi-sensory commercially-available wristband in adolescents. Physiol Behav. 2016 May 01;158:143–9. doi: 10.1016/j.physbeh.2016.03.006.
    1. Gillinov S, Etiwy M, Wang R, Blackburn G, Phelan D, Gillinov AM, Houghtaling P, Javadikasgari H, Desai MY. Variable accuracy of wearable heart rate monitors during aerobic exercise. Med Sci Sports Exerc. 2017 Dec;49(8):1697–1703. doi: 10.1249/MSS.0000000000001284.
    1. Kroll RR, Boyd JG, Maslove DM. Accuracy of a wrist-worn wearable device for monitoring heart rates in hospital inpatients: a prospective observational study. J Med Internet Res. 2016 Dec 20;18(9):e253. doi: 10.2196/jmir.6025.
    1. Shcherbina A, Mattsson CM, Waggott D, Salisbury H, Christle JW, Hastie T, Wheeler MT, Ashley EA. Accuracy in wrist-worn, sensor-based measurements of heart rate and energy expenditure in a diverse cohort. J Pers Med. 2017 May 24;7(2) doi: 10.3390/jpm7020003.
    1. Wallen MP, Gomersall SR, Keating SE, Wisløff U, Coombes JS. Accuracy of heart rate watches: implications for weight management. PLoS One. 2016;11(5):e0154420. doi: 10.1371/journal.pone.0154420.
    1. Wang R, Blackburn G, Desai M, Phelan D, Gillinov L, Houghtaling P, Gillinov M. Accuracy of wrist-worn heart rate monitors. JAMA Cardiol. 2017 Jan 01;2(1):104–106. doi: 10.1001/jamacardio.2016.3340.
    1. Dooley EE, Golaszewski NM, Bartholomew JB. Estimating accuracy at exercise intensities: a comparative study of self-monitoring heart rate and physical activity wearable devices. JMIR Mhealth Uhealth. 2017 Mar 16;5(3):e34. doi: 10.2196/mhealth.7043.
    1. Stahl SE, An HS, Dinkel DM, Noble JM, Lee JM. How accurate are the wrist-based heart rate monitors during walking and running activities? Are they accurate enough? BMJ Open Sport Exerc Med. 2016;2(1):e000106. doi: 10.1136/bmjsem-2015-000106.
    1. El-Amrawy F, Nounou MI. Are currently available wearable devices for activity tracking and heart rate monitoring accurate, precise, and medically beneficial? Healthc Inform Res. 2015 Oct;21(4):315–20. doi: 10.4258/hir.2015.21.4.315.
    1. Benedetto S, Caldato C, Bazzan E, Greenwood DC, Pensabene V, Actis P. Assessment of the Fitbit Charge 2 for monitoring heart rate. PLoS One. 2018;13(2):e0192691. doi: 10.1371/journal.pone.0192691.
    1. Cadmus-Bertram L, Gangnon R, Wirkus EJ, Thraen-Borowski KM, Gorzelitz-Liebhauser J. Accuracy of heart rate monitoring by some wrist-worn activity trackers. Ann Intern Med. 2017 Dec 17;167(8):607–608. doi: 10.7326/L17-0380.
    1. Jo E, Lewis K, Directo D, Kim MJ, Dolezal BA. Validation of biofeedback wearables for photoplethysmographic heart rate tracking. J Sports Sci Med. 2016 Sep;15(3):540–547.
    1. Spierer DK, Rosen Z, Litman LL, Fujii K. Validation of photoplethysmography as a method to detect heart rate during rest and exercise. J Med Eng Technol. 2015;39(5):264–71. doi: 10.3109/03091902.2015.1047536.
    1. Bai Y, Hibbing P, Mantis C, Welk GJ. Comparative evaluation of heart rate-based monitors: Apple Watch vs Fitbit Charge HR. J Sports Sci. 2018 Aug;36(15):1734–1741. doi: 10.1080/02640414.2017.1412235.
    1. Parak J, Korhonen I. Evaluation of wearable consumer heart rate monitors based on photopletysmography. Conf Proc IEEE Eng Med Biol Soc. 2014;2014:3670–3. doi: 10.1109/EMBC.2014.6944419.
    1. Xie J, Wen D, Liang L, Jia Y, Gao L, Lei J. Evaluating the validity of current mainstream wearable devices in fitness tracking under various physical activities: comparative study. JMIR Mhealth Uhealth. 2018 Apr 12;6(4):e94. doi: 10.2196/mhealth.9754.
    1. Sartor F, Papini G, Cox LG, Cleland J. Methodological shortcomings of wrist-worn heart rate monitors validations. J Med Internet Res. 2018 Jul 02;20(7):e10108. doi: 10.2196/10108.
    1. Gorny AW, Liew SJ, Tan CS, Müller-Riemenschneider F. Fitbit Charge HR Wireless Heart Rate Monitor: validation study conducted under free-living conditions. JMIR Mhealth Uhealth. 2017 Oct 20;5(10):e157. doi: 10.2196/mhealth.8233.
    1. Wilson K, Bell C, Wilson L, Witteman H. Agile research to complement agile development: a proposal for an mHealth research lifecycle. NPJ Digit Med. 2018 Sep 13;1(1) doi: 10.1038/s41746-018-0053-1.
    1. Nelson BW. Open Science Framework. Open Science Framework; 2018. Oct 12, [2018-10-12]. Accuracy of the wrist-worn wearables during a 24-hour ecologically valid protocol .
    1. Fisher AJ, Medaglia JD, Jeronimus BF. Lack of group-to-individual generalizability is a threat to human subjects research. Proc Natl Acad Sci U S A. 2018 Dec 03;115(27):E6106–E6115. doi: 10.1073/pnas.1711978115.
    1. Whitney RL, Ward DH, Marois MT, Schmid CH, Sim I, Kravitz RL. Patient perceptions of their own data in mHealth technology-enabled N-of-1 trials for chronic pain: qualitative study. JMIR Mhealth Uhealth. 2018 Oct 11;6(10):e10291. doi: 10.2196/10291.
    1. Wichers M, Groot PC, Psychosystems. ESM Group. EWS Group Critical slowing down as a personalized early warning signal for depression. Psychother Psychosom. 2016;85(2):114–6. doi: 10.1159/000441458.
    1. Diaz KM, Thanataveerat A, Parsons FE, Yoon S, Cheung YK, Alcántara C, Duran AT, Ensari I, Krupka DJ, Schwartz JE, Burg MM, Davidson KW. The influence of daily stress on sedentary behavior: group and person (N of 1) level results of a 1-year observational study. Psychosom Med. 2018 Sep;80(7):620–627. doi: 10.1097/PSY.0000000000000610.
    1. Sainsbury K, Vieira R, Walburn J, Sniehotta FF, Sarkany R, Weinman J, Araujo-Soares V. Understanding and predicting a complex behavior using n-of-1 methods: photoprotection in xeroderma pigmentosum. Health Psychol. 2018 Dec;37(12):1145–1158. doi: 10.1037/hea0000673.
    1. Vahia IV, Sewell DD. Late-life depression: a role for accelerometer technology in diagnosis and management. Am J Psychiatry. 2016 Aug 01;173(8):763–8. doi: 10.1176/appi.ajp.2015.15081000.
    1. Poldrack RA, Laumann TO, Koyejo O, Gregory B, Hover A, Chen MY, Gorgolewski KJ, Luci J, Joo SJ, Boyd RL, Hunicke-Smith S, Simpson ZB, Caven T, Sochat V, Shine JM, Gordon E, Snyder AZ, Adeyemo B, Petersen SE, Glahn DC, Reese Mckay D, Curran JE, Göring HH, Carless MA, Blangero J, Dougherty R, Leemans A, Handwerker DA, Frick L, Marcotte EM, Mumford JA. Long-term neural and physiological phenotyping of a single human. Nat Commun. 2015 Dec 09;6:8885. doi: 10.1038/ncomms9885.
    1. Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HY, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, Blasco MA, Greenberg PL, Snyder P, Klein TE, Altman RB, Butte AJ, Ashley EA, Gerstein M, Nadeau KC, Tang H, Snyder M. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012 Mar 16;148(6):1293–307. doi: 10.1016/j.cell.2012.02.009.
    1. de Geus EJ, Willemsen GH, Klaver CH, van Doornen LJ. Ambulatory measurement of respiratory sinus arrhythmia and respiration rate. Biol Psychol. 1995 Nov 16;41(3):205–27.
    1. Willemsen GH, De Geus EJ, Klaver CH, Van Doornen LJ, Carroll D. Ambulatory monitoring of the impedance cardiogram. Psychophysiology. 1996 Mar;33(2):184–93.
    1. Datta D. Github. 2018. [2018-10-12]. AppleHealthAnalysis .
    1. Teramo N. Github. 2018. [2018-10-12]. fitbitr .
    1. Nelson MB, Kaminsky LA, Dickin DC, Montoye AH. Validity of consumer-based physical activity monitors for specific activity types. Med Sci Sports Exerc. 2016 Dec;48(8):1619–28. doi: 10.1249/MSS.0000000000000933.
    1. Bunn JA, Navalta JW, Fountaine CJ, Reece JD. Current state of commercial wearable technology in physical activity monitoring 2015-2017. Int J Exerc Sci. 2018;11(7):503–515.
    1. Consumer Technology Association. 2018. [2018-12-02]. Physical Activity Monitoring for Heart Rate ANSI/CTA-2065 .
    1. US Department of Health and Human Services. Food and Drug Administration. Center for Drug Evaluation and Research (CDER) Food and Drug Administration. 2006. [2018-12-02]. Guidance for Industry, Investigating Out-of-Specification (OOS), Test Results for Pharmaceutical Production .
    1. Rosenberger ME, Buman MP, Haskell WL, McConnell MV, Carstensen LL. Twenty-four hours of sleep, sedentary behavior, and physical activity with nine wearable devices. Med Sci Sports Exerc. 2016 Mar;48(3):457–65. doi: 10.1249/MSS.0000000000000778.
    1. Fokkema T, Kooiman TJ, Krijnen WP, VAN DER Schans CP, DE Groot M. Reliability and validity of ten consumer activity trackers depend on walking speed. Med Sci Sports Exerc. 2017 Dec;49(4):793–800. doi: 10.1249/MSS.0000000000001146.
    1. Datta D. Github. 2018. [2018-10-12]. blandr: a Bland-Altman Method Comparison package for R
    1. Lehnert B. The Comprehensive R Archive Network. [2018-10-12]. Package ‘BlandAltmanLeh’ .
    1. Zaki R, Bulgiba A, Ismail R, Ismail NA. Statistical methods used to test for agreement of medical instruments measuring continuous variables in method comparison studies: a systematic review. PLoS One. 2012;7(5):e37908. doi: 10.1371/journal.pone.0037908.
    1. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986 Feb 08;1(8476):307–10.
    1. Signorell A. The Comprehensive R Archive Network. 2018. [2018-10-12]. DescTools: Tools for Descriptive Statistics .
    1. Terry N, Wiley LF. Liability for mobile health and wearable technologies. Ann Health Law. 2016 Feb 02;
    1. Vandewalle G, Middleton B, Rajaratnam SM, Stone BM, Thorleifsdottir B, Arendt J, Dijk DJ. Robust circadian rhythm in heart rate and its variability: influence of exogenous melatonin and photoperiod. J Sleep Res. 2007 Jun;16(2):148–55. doi: 10.1111/j.1365-2869.2007.00581.x. doi: 10.1111/j.1365-2869.2007.00581.x.
    1. Gupta A, Shetty H. Circadian variation in stroke - a prospective hospital-based study. Int J Clin Pract. 2005 Nov;59(11):1272–5. doi: 10.1111/j.1368-5031.2005.00678.x.

Source: PubMed

3
Tilaa