Validation of Fitbit Charge 2 Sleep and Heart Rate Estimates Against Polysomnographic Measures in Shift Workers: Naturalistic Study

Benjamin Stucky, Ian Clark, Yasmine Azza, Walter Karlen, Peter Achermann, Birgit Kleim, Hans-Peter Landolt, Benjamin Stucky, Ian Clark, Yasmine Azza, Walter Karlen, Peter Achermann, Birgit Kleim, Hans-Peter Landolt

Abstract

Background: Multisensor fitness trackers offer the ability to longitudinally estimate sleep quality in a home environment with the potential to outperform traditional actigraphy. To benefit from these new tools for objectively assessing sleep for clinical and research purposes, multisensor wearable devices require careful validation against the gold standard of sleep polysomnography (PSG). Naturalistic studies favor validation.

Objective: This study aims to validate the Fitbit Charge 2 against portable home PSG in a shift-work population composed of 59 first responder police officers and paramedics undergoing shift work.

Methods: A reliable comparison between the two measurements was ensured through the data-driven alignment of a PSG and Fitbit time series that was recorded at night. Epoch-by-epoch analyses and Bland-Altman plots were used to assess sensitivity, specificity, accuracy, the Matthews correlation coefficient, bias, and limits of agreement.

Results: Sleep onset and offset, total sleep time, and the durations of rapid eye movement (REM) sleep and non-rapid-eye movement sleep stages N1+N2 and N3 displayed unbiased estimates with nonnegligible limits of agreement. In contrast, the proprietary Fitbit algorithm overestimated REM sleep latency by 29.4 minutes and wakefulness after sleep onset (WASO) by 37.1 minutes. Epoch-by-epoch analyses indicated better specificity than sensitivity, with higher accuracies for WASO (0.82) and REM sleep (0.86) than those for N1+N2 (0.55) and N3 (0.78) sleep. Fitbit heart rate (HR) displayed a small underestimation of 0.9 beats per minute (bpm) and a limited capability to capture sudden HR changes because of the lower time resolution compared to that of PSG. The underestimation was smaller in N2, N3, and REM sleep (0.6-0.7 bpm) than in N1 sleep (1.2 bpm) and wakefulness (1.9 bpm), indicating a state-specific bias. Finally, Fitbit suggested a distribution of all sleep episode durations that was different from that derived from PSG and showed nonbiological discontinuities, indicating the potential limitations of the staging algorithm.

Conclusions: We conclude that by following careful data processing processes, the Fitbit Charge 2 can provide reasonably accurate mean values of sleep and HR estimates in shift workers under naturalistic conditions. Nevertheless, the generally wide limits of agreement hamper the precision of quantifying individual sleep episodes. The value of this consumer-grade multisensor wearable in terms of tackling clinical and research questions could be enhanced with open-source algorithms, raw data access, and the ability to blind participants to their own sleep data.

Keywords: actigraphy; mobile phone; multisensory; polysomnography; validation; wearables.

Conflict of interest statement

Conflicts of Interest: None declared.

©Benjamin Stucky, Ian Clark, Yasmine Azza, Walter Karlen, Peter Achermann, Birgit Kleim, Hans-Peter Landolt. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 05.10.2021.

Figures

Figure 1
Figure 1
The consecutive study participant numbers (higher numbers indicate chronologically later entry into the study) from the entire study sample are shown on the x-axis; the data-driven timeshift between polysomnography and Fitbit is shown on the y-axis. There was a significant linear relationship between the identifier and the shift (P<.001; adjusted R2=0.85). Thus, the times drifted apart as the study went on, with a minimum time misalignment of 1.9 minutes and a maximum of 7.5 minutes. PSG: polysomnography.
Figure 2
Figure 2
Data on the validation night of the first participant in the study with identifying number 004 (left column) and the last participant in the study with number 104 (right column) are shown. Row A displays the cross-correlation function, which displays a large visible maximum at the orange vertical line representing the best alignment between the two devices (PSG and Fitbit). The dashed vertical reference line shows a lag of 0 minutes. Rows B-D share the same x-axis, which denotes hours after PSG-derived sleep onset with criteria. For each hour in the recording, a vertical dashed gray line was added. Row B shows the HR in bpm derived from PSG (red) and Fitbit (black) that were seen before any time alignment was applied, whereas row C presents the HR data after the data-driven shift from panel A was applied. The time-aligned time series visually shows good agreement after correcting for the time difference. Fitbit shows reduced variability in the signal but fairly good average correspondence. In panel D, the top row shows PSG-derived hypnograms for both participants, whereas in the bottom row, the Fitbit-derived hypnograms are displayed. All hypnograms have been time-corrected according to panel A. The overall sleep structure is captured reasonably well by Fitbit, but Fitbit detects more wake and REM episodes compared with PSG, and the distinction of light (N1+N2) and deep (N3) sleep often seems to be particularly challenging for Fitbit. bpm: beats per minute; HR: heart rate; PSG: polysomnography; REM: rapid eye movement; W: wake.
Figure 3
Figure 3
The available data of all nights (n=59) were extracted and counted for the number of heart rate measures contained. A total of roughly 28,320 minutes (corresponding to 59 study participants who, on average, spent 8×60 minutes asleep) were expected. In fact, 28,601 individual minutes of data were recorded; this figure displays the distribution of all heart rate measures, yielding an average of 7.48 measures per minute. Count data for >12 measures per minute and

Figure 4

The distribution of sleep stage…

Figure 4

The distribution of sleep stage durations for Fitbit (left panel) and PSG (right…

Figure 4
The distribution of sleep stage durations for Fitbit (left panel) and PSG (right panel). Both were computed on the sample of the nights used for validation. Here, the plot has been cut off at 40 minutes for visual purposes; the tails continue to decrease as one would expect. The Fitbit sleep staging data types "classic" (red) and "stages" (blue) show large deviations compared with PSG sleep stages (black). Of note, deep and REM sleep show nonbiological discontinuity at around 4.5 minutes, and all Fitbit stages have larger tails. The stage "restless" has a peak at 11 minutes with unknown meaning. PSG: polysomnography. REM: rapid eye movement; WASO: wakefulness after sleep onset.

Figure 5

Bland-Altman plots for various sleep…

Figure 5

Bland-Altman plots for various sleep variables are shown with sleep onset defined as…

Figure 5
Bland-Altman plots for various sleep variables are shown with sleep onset defined as the first occurrence of N1. The dashed lines denote lower limits of agreement, bias, and upper limits of agreement. The dotted lines are the respective 95% CI of limits of agreement. On the top and right of each panel, the marginal densities are plotted. The x-axis displays the PSG variables, and the y-axis denotes the differences between the two devices (PSG-Fitbit). N1-derived sleep onset is unbiased. Sleep offset, total sleep time, light sleep or N1+N2 sleep duration, deep sleep or N3 sleep duration, and REMd do not have significant bias. WASO and REML display a significant deviation of the difference between the devices from 0. deepd: deep sleep duration; lightd: light sleep duration; PSG: polysomnography; REMd: rapid eye movement sleep duration; REML: rapid eye movement sleep latency; Soff: sleep offset; Son: sleep onset; TST: total sleep time; WASO: wake after sleep onset.

Figure 6

Bland-Altman plots for heart rate–derived…

Figure 6

Bland-Altman plots for heart rate–derived variables. The dashed lines denote lower limits of…

Figure 6
Bland-Altman plots for heart rate–derived variables. The dashed lines denote lower limits of agreement, bias, and upper limits of agreement for a mixed model dealing with the repeated measures. On the top and right of each panel are the marginal densities. The x-axis displays the means of both devices (ie, [polysomnography + Fitbit]/2), and the y-axis denotes the differences between the two devices (polysomnography-Fitbit). Overall average 10%-trimmed heart rate and 10%-trimmed heart rate variance values are calculated for 1-minute intervals between 30 minutes before sleep onset with N1 criteria and 30 minutes after sleep offset. All other variables are calculated between sleep onset and sleep offset, only extracting the designated variable, in 1-minute intervals. HR10: 10%-trimmed heart rate average; HRvar10: 10%-trimmed heart rate variance average; REM: rapid eye movement; WASO: wake after sleep onset.
Figure 4
Figure 4
The distribution of sleep stage durations for Fitbit (left panel) and PSG (right panel). Both were computed on the sample of the nights used for validation. Here, the plot has been cut off at 40 minutes for visual purposes; the tails continue to decrease as one would expect. The Fitbit sleep staging data types "classic" (red) and "stages" (blue) show large deviations compared with PSG sleep stages (black). Of note, deep and REM sleep show nonbiological discontinuity at around 4.5 minutes, and all Fitbit stages have larger tails. The stage "restless" has a peak at 11 minutes with unknown meaning. PSG: polysomnography. REM: rapid eye movement; WASO: wakefulness after sleep onset.
Figure 5
Figure 5
Bland-Altman plots for various sleep variables are shown with sleep onset defined as the first occurrence of N1. The dashed lines denote lower limits of agreement, bias, and upper limits of agreement. The dotted lines are the respective 95% CI of limits of agreement. On the top and right of each panel, the marginal densities are plotted. The x-axis displays the PSG variables, and the y-axis denotes the differences between the two devices (PSG-Fitbit). N1-derived sleep onset is unbiased. Sleep offset, total sleep time, light sleep or N1+N2 sleep duration, deep sleep or N3 sleep duration, and REMd do not have significant bias. WASO and REML display a significant deviation of the difference between the devices from 0. deepd: deep sleep duration; lightd: light sleep duration; PSG: polysomnography; REMd: rapid eye movement sleep duration; REML: rapid eye movement sleep latency; Soff: sleep offset; Son: sleep onset; TST: total sleep time; WASO: wake after sleep onset.
Figure 6
Figure 6
Bland-Altman plots for heart rate–derived variables. The dashed lines denote lower limits of agreement, bias, and upper limits of agreement for a mixed model dealing with the repeated measures. On the top and right of each panel are the marginal densities. The x-axis displays the means of both devices (ie, [polysomnography + Fitbit]/2), and the y-axis denotes the differences between the two devices (polysomnography-Fitbit). Overall average 10%-trimmed heart rate and 10%-trimmed heart rate variance values are calculated for 1-minute intervals between 30 minutes before sleep onset with N1 criteria and 30 minutes after sleep offset. All other variables are calculated between sleep onset and sleep offset, only extracting the designated variable, in 1-minute intervals. HR10: 10%-trimmed heart rate average; HRvar10: 10%-trimmed heart rate variance average; REM: rapid eye movement; WASO: wake after sleep onset.

References

    1. Hirshkowitz M. Polysomnography and beyond. In: Kryger MH, Roth T, Dement WC, editors. Principles and Practice of Sleep Medicine. 6th edition. Amsterdam: Elsevier; 2017. pp. 1564–66.
    1. Hirshkowitz M. The history of polysomnography: tool of scientific discovery. In: Chokroverty S, Billiard M, editors. Sleep Medicine. New York, NY: Springer; 2015. pp. 91–100.
    1. de Zambotti M, Goldstone A, Claudatos S, Colrain IM, Baker FC. A validation study of Fitbit Charge 2™ compared with polysomnography in adults. Chronobiol Int. 2018 Apr;35(4):465–76. doi: 10.1080/07420528.2017.1413578.
    1. de Zambotti M, Cellini N, Goldstone A, Colrain IM, Baker FC. Wearable sleep technology in clinical and research settings. Med Sci Sports Exerc. 2019 Jul;51(7):1538–57. doi: 10.1249/MSS.0000000000001947.
    1. Ancoli-Israel S, Cole R, Alessi C, Chambers M, Moorcroft W, Pollak CP. The role of actigraphy in the study of sleep and circadian rhythms. Sleep. 2003 May 01;26(3):342–92. doi: 10.1093/sleep/26.3.342.
    1. Borbély AA, Rusterholz T, Achermann P. Three decades of continuous wrist-activity recording: analysis of sleep duration. J Sleep Res. 2017 Apr;26(2):188–94. doi: 10.1111/jsr.12492. doi: 10.1111/jsr.12492.
    1. Sadeh A, Alster J, Urbach D, Lavie P. Actigraphically based automatic bedtime sleep-wake scoring: validity and clinical applications. J Ambul Monit. 1989;2(3):209–16.
    1. Cole RJ, Kripke DF, Gruen W, Mullaney DJ, Gillin JC. Automatic sleep/wake identification from wrist activity. Sleep. 1992 Oct;15(5):461–9. doi: 10.1093/sleep/15.5.461.
    1. Walch O, Huang Y, Forger D, Goldstein C. Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device. Sleep. 2019 Dec 24;42(12):zsz180. doi: 10.1093/sleep/zsz180. 5549536
    1. Inderkum A, Tarokh L. High heritability of adolescent sleep-wake behavior on free, but not school days: a long-term twin study. Sleep. 2018 Mar 01;41(3) doi: 10.1093/sleep/zsy004.4797120
    1. Vailshery LS. Number of Fitbit devices sold worldwide from 2010 to 2020. Statista. 2021. [2020-06-11].
    1. Scott H, Lack L, Lovato N. A systematic review of the accuracy of sleep wearable devices for estimating sleep onset. Sleep Med Rev. 2020 Feb;49:101227. doi: 10.1016/j.smrv.2019.101227.S1087-0792(19)30195-9
    1. Haghayegh S, Khoshnevis S, Smolensky MH, Diller KR. Accuracy of PurePulse photoplethysmography technology of Fitbit Charge 2 for assessment of heart rate during sleep. Chronobiol Int. 2019 Jul;36(7):927–33. doi: 10.1080/07420528.2019.1596947.
    1. Haghayegh S, Khoshnevis S, Smolensky MH, Diller KR, Castriotta RJ. Performance comparison of different interpretative algorithms utilized to derive sleep parameters from wrist actigraphy data. Chronobiol Int. 2019 Dec;36(12):1752–60. doi: 10.1080/07420528.2019.1679826.
    1. Haghayegh S, Khoshnevis S, Smolensky MH, Diller KR, Castriotta RJ. Performance assessment of new-generation Fitbit technology in deriving sleep parameters and stages. Chronobiol Int. 2020 Jan;37(1):47–59. doi: 10.1080/07420528.2019.1682006.
    1. Liang Z, Chapa Martell MA. Validity of consumer activity wristbands and wearable EEG for measuring overall sleep parameters and sleep structure in free-living conditions. J Healthc Inform Res. 2018 Apr 20;2(1-2):152–78. doi: 10.1007/s41666-018-0013-1.
    1. Trinder J, Waloszek J, Woods MJ, Jordan AS. Sleep and cardiovascular regulation. Pflugers Arch. 2012 Jan;463(1):161–8. doi: 10.1007/s00424-011-1041-3.
    1. Cajochen C, Pischke J, Aeschbach D, Borbély AA. Heart rate dynamics during human sleep. Physiol Behav. 1994 Apr;55(4):769–74. doi: 10.1016/0031-9384(94)90058-2.0031-9384(94)90058-2
    1. Ako M, Kawara T, Uchida S, Miyazaki S, Nishihara K, Mukai J, Hirao K, Ako J, Okubo Y. Correlation between electroencephalography and heart rate variability during sleep. Psychiatry Clin Neurosci. 2003 Feb;57(1):59–65. doi: 10.1046/j.1440-1819.2003.01080.x.
    1. What should I know about sleep stages? Fitbit. 2020. [2020-07-10]. .
    1. Karlen W, Floreano D. Adaptive sleep-wake discrimination for wearable devices. IEEE Trans Biomed Eng. 2011 Apr;58(4):920–6. doi: 10.1109/TBME.2010.2097261.
    1. Karlen W, Mattiussi C, Floreano D. Improving actigraph sleep/wake classification with cardio-respiratory signals. Proceedings of the 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; Aug 20-25, 2008; Vancouver, BC, Canada. 2008.
    1. Moreno-Pino F, Porras-Segovia A, López-Esteban P, Artés A, Baca-García E. Validation of Fitbit Charge 2 and Fitbit Alta HR against polysomnography for assessing sleep in adults with obstructive sleep apnea. J Clin Sleep Med. 2019 Nov 15;15(11):1645–53. doi: 10.5664/jcsm.8032. doi: 10.5664/jcsm.8032.
    1. Benedetto S, Caldato C, Bazzan E, Greenwood DC, Pensabene V, Actis P. Assessment of the Fitbit Charge 2 for monitoring heart rate. PLoS One. 2018 Feb 28;13(2):e0192691. doi: 10.1371/journal.pone.0192691. PONE-D-17-29021
    1. de Zambotti M, Baker FC, Willoughby AR, Godino JG, Wing D, Patrick K, Colrain IM. Measures of sleep and cardiac functioning during sleep using a multi-sensory commercially-available wristband in adolescents. Physiol Behav. 2016 May 01;158:143–9. doi: 10.1016/j.physbeh.2016.03.006. S0031-9384(16)30093-2
    1. Menghini L, Cellini N, Goldstone A, Baker FC, de Zambotti M. A standardized framework for testing the performance of sleep-tracking technology: step-by-step guidelines and open-source code. Sleep. 2021 Feb 12;44(2):zsaa170. doi: 10.1093/sleep/zsaa170.5901094
    1. Buysse DJ, Reynolds CF, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res. 1989 May;28(2):193–213. doi: 10.1016/0165-1781(89)90047-4.0165-1781(89)90047-4
    1. Blevins CA, Weathers FW, Davis MT, Witte TK, Domino JL. The posttraumatic stress disorder checklist for DSM-5 (PCL-5): development and initial psychometric evaluation. J Trauma Stress. 2015 Dec;28(6):489–98. doi: 10.1002/jts.22059.
    1. Cohen S, Williamson G. Perceived stress in a probability sample of the United States. In: Spacapan S, Oskamp S, editors. The Social Psychology of Health: Claremont Symposium on Applied Social Psychology. Newbury Park, CA: Sage Publications Inc; 1988. pp. 1–256.
    1. Adan A, Almirall H. Horne and Ostberg morningess-eveningness questionnaire: a reduced scale. Pers Indiv Differ. 1991;12(3):241–53. doi: 10.1016/0191-8869(91)90110-w.
    1. Jasper HH. The ten-twenty electrode system of the International Federation. Electroen Clin Neuro. 1958;10:371–5.
    1. The AASM manual for the scoring of sleep and associated events. American Academy of Sleep Medicine. [2021-08-05].
    1. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [2021-08-05].
    1. Pan J, Tompkins WJ. A real-time QRS detection algorithm. IEEE Trans Biomed Eng. 1985 Mar;BME-32(3):230–6. doi: 10.1109/tbme.1985.325532.
    1. Bouchequet P. rsleep: analysis of sleep data. R package version 1.0.3. 2020. [2021-08-05]. .
    1. Khaleghi B, Khamis A, Karray FO, Razavi SN. Multisensor data fusion: a review of the state-of-the-art. Inform Fusion. 2013 Jan;14(1):28–44. doi: 10.1016/j.inffus.2011.08.001.
    1. Rhudy M. Time alignment techniques for experimental sensor data. Int J Comput Sci Eng Surv. 2014 Apr 30;5(2):1–14. doi: 10.5121/ijcses.2014.5201.
    1. Datta D. blandr: a Bland-Altman method comparison package for R. GitHub. 2018. [2021-08-05]. .
    1. Pinheiro J, Bates D, DebRoy S, Sarkar D. nlme: linear and nonlinear mixed effects models. R Core Team. [2021-08-05]. .
    1. Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020 Jan 02;21(1):6. doi: 10.1186/s12864-019-6413-7. 10.1186/s12864-019-6413-7
    1. Liang Z, Chapa-Martell MA. Accuracy of Fitbit wristbands in measuring sleep stage transitions and the effect of user-specific factors. JMIR Mhealth Uhealth. 2019 Jun 06;7(6):e13384. doi: 10.2196/13384. v7i6e13384
    1. Clark I, Stucky B, Azza Y, Schwab P, Müller SJ, Weibel D, Button D, Karlen W, Seifritz E, Kleim B, Landolt H-P. Diurnal variations in multi-sensor wearable-derived sleep characteristics in morning- and evening-type shift workers under naturalistic conditions. Chronobiol Int. 2021 Jul 18;:1–12. doi: 10.1080/07420528.2021.1941074. (forthcoming)
    1. de Zambotti M, Goldstone A, Claudatos S, Colrain IM, Baker FC. A validation study of Fitbit Charge 2™ compared with polysomnography in adults. Chronobiol Int. 2018 Apr 13;35(4):465–476. doi: 10.1080/07420528.2017.1413578.
    1. How do I track my sleep with my Fitbit device? 2020. Fitbit Inc. [2020-12-08]. .
    1. How do I set Fitbit sleep sensitivity? Fitbit Community. 2019. [2021-08-05]. .

Source: PubMed

3
購読する