Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy

Jens B Stephansen, Alexander N Olesen, Mads Olsen, Aditya Ambati, Eileen B Leary, Hyatt E Moore, Oscar Carrillo, Ling Lin, Fang Han, Han Yan, Yun L Sun, Yves Dauvilliers, Sabine Scholz, Lucie Barateau, Birgit Hogl, Ambra Stefani, Seung Chul Hong, Tae Won Kim, Fabio Pizza, Giuseppe Plazzi, Stefano Vandi, Elena Antelmi, Dimitri Perrin, Samuel T Kuna, Paula K Schweitzer, Clete Kushida, Paul E Peppard, Helge B D Sorensen, Poul Jennum, Emmanuel Mignot, Jens B Stephansen, Alexander N Olesen, Mads Olsen, Aditya Ambati, Eileen B Leary, Hyatt E Moore, Oscar Carrillo, Ling Lin, Fang Han, Han Yan, Yun L Sun, Yves Dauvilliers, Sabine Scholz, Lucie Barateau, Birgit Hogl, Ambra Stefani, Seung Chul Hong, Tae Won Kim, Fabio Pizza, Giuseppe Plazzi, Stefano Vandi, Elena Antelmi, Dimitri Perrin, Samuel T Kuna, Paula K Schweitzer, Clete Kushida, Paul E Peppard, Helge B D Sorensen, Poul Jennum, Emmanuel Mignot

Abstract

Analysis of sleep for the diagnosis of sleep disorders such as Type-1 Narcolepsy (T1N) currently requires visual inspection of polysomnography records by trained scoring technicians. Here, we used neural networks in approximately 3,000 normal and abnormal sleep recordings to automate sleep stage scoring, producing a hypnodensity graph-a probability distribution conveying more information than classical hypnograms. Accuracy of sleep stage scoring was validated in 70 subjects assessed by six scorers. The best model performed better than any individual scorer (87% versus consensus). It also reliably scores sleep down to 5 s instead of 30 s scoring epochs. A T1N marker based on unusual sleep stage overlaps achieved a specificity of 96% and a sensitivity of 91%, validated in independent datasets. Addition of HLA-DQB1*06:02 typing increased specificity to 99%. Our method can reduce time spent in sleep clinics and automates T1N diagnosis. It also opens the possibility of diagnosing T1N using home sleep studies.

Conflict of interest statement

E.M. has received Jazz Pharmaceuticals contract, clinical trial and gift funding as principal investigator at Stanford University. He also consulted for Idorsia and Merck, consulted or presented clinical trial results for Jazz Pharmaceuticals at congresses, this resulting in trip reimbursements and honoraria never exceeding 5000 dollars per year. G.P. has been on advisory boards for UCB, Jazz, Bioprojet and Idorsia. F.P. received a fee from UCB for speaking at a symposium, and a congress subscription from Bioprojet. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Accuracy per scorer and by time resolution. a The effect on scoring accuracy as golden standard is improved. Every combination of N scorers is evaluated in an unweighted manner and the mean is calculated. Accuracy is shown with mean (solid black line) and a 95% confidence interval (gray area). b Predictive performance of best model at different resolutions. Performance is shown as mean accuracy (solid black line) with a 95% confidence interval (gray area)
Fig. 2
Fig. 2
Hypnodensity example evaluated by multiple scorers and different predictive models. a The figure displays the hypnodensity graph. Displayed models are, in order: multiple scorer assessment (1); ensembles as described in Supplementary Table 8: All models, those with memory (LSTM) and those without memory (FF) (2–4); single models, as described in Supplementary Table 8 (5–7). OCT is octave encoding, Color codes: white, wake; red, N1; light blue, N2; dark blue, N3; black, REM. b The 150 epochs of a recording from the AASM ISR program are analyzed by 16 models with randomly varying parameters, using the CC/SH/LS/LSTM model as a template. These data were also evaluated by 5234 ± 14 different scorers. The distribution of these is shown on top, the average model predictions are shown in the middle, and the model variance is shown at the bottom
Fig. 3
Fig. 3
Examples of hypnodensity graph in subjects with and without narcolepsy. Hypnodensity, i.e., probability distribution per stage of sleep for a subject without narcolepsy (top) and a subject with narcolepsy (Bottom). Color codes: white, wake; red, N1; light blue, N2; dark blue, N3; black, REM
Fig. 4
Fig. 4
Diagnostic receiver operating characteristics curves. Diagnostic receiver operating characteristics (ROC) curves, displaying the trade-offs between sensitivity and specificity for our narcolepsy biomarker for a training sample, b testing sample, c replication sample and e high pretest sample. df Adding HLA to model vastly increases specificity. Cut-off thresholds are presented for models with (red dot) and without HLA (green dot)
Fig. 5
Fig. 5
Overall design of the study. a Pre-processing steps taken to achieve the format of data as it is used in the neural networks. One of the 5 channels is first high-pass filtered with a cut-off at 0.2 Hz, then low-pass filtered with a cut-off at 49 Hz followed by a re-sampling to 100 Hz to ensure data homogeneity. In the case of EEG signals, a channel selection is employed to choose the channel with the least noise. The data are then encoded using either the CC or the octave encoding. b Steps taken to produce and test the automatic scoring algorithm. A part of the SSC, and WSC, is randomly selected, as described in Supplementary Table 1. These data are then segmented in 5 min segments and scrambled with segments from other subjects to increase batch similarity during training. A neural network is then trained until convergence (evaluated using a separate validation sample). Once trained, the networks are tested on a separate part of the SSC and WSC along with data from the IS-RC and KHC, . c Steps taken to produce and test the narcolepsy detector. Hypnodensities are extracted from data, as described in Supplementary Table 1. These data are separated into a training (60%) and a testing (40%) split. From the training split, 481 potentially relevant features, as described in Supplementary Table 9, are extracted from each hypnodensity. The prominent features are maintained using a recursive selection algorithm, and from these features a GP classifier is created. From the testing split, the same relevant features are extracted, and the GP classifier is evaluated
Fig. 6
Fig. 6
Neural network strategy. a An example of the octave and the CC encoding on 10 s of EEG, EOG and EMG data. These processed data are fed into the neural networks in one of the two formats. The data in the octave encoding are offset for visualization purposes. Color scale is unitless. b Simplified network configuration, displaying how data are fed and processed through the networks. A more detailed description can be found in Supplementary Figure 3

References

    1. Krieger, A. C. Social and Economic Dimensions of Sleep Disorders, An Issue of Sleep Medicine Clinics (Elsevier, Philadelphia, PA, 2007).
    1. American Academy of Sleep Medicine. International Classification of Sleep Disorders, 3rd edn (American Academy of Sleep Medicine, Darien, IL, 2014).
    1. Peyron C, et al. A mutation in a case of early onset narcolepsy and a generalized absence of hypocretin peptides in human narcoleptic brains. Nat. Med. 2000;6:991–997. doi: 10.1038/79690.
    1. Kornum BR, et al. Narcolepsy. Nat. Rev. Dis. Prim. 2017;3:16100. doi: 10.1038/nrdp.2016.100.
    1. Han F, et al. HLA DQB1*06:02 negative narcolepsy with hypocretin/orexin deficiency. Sleep. 2014;37:1601–1608. doi: 10.5665/sleep.4066.
    1. Malhotra Raman K., Avidan Alon Y. Atlas of Sleep Medicine. 2014. Sleep Stages and Scoring Technique; pp. 77–99.
    1. Berry, R. B. et al. The AASM Manual for the Scoring of Sleep and Associated Events: Rules, Terminology and Technical Specifications, version 2.4 (American Academy of Sleep Medicine, Darien, IL, 2017).
    1. Subramanian S, Hesselbacher S, Mattewal A, Surani S. Gender and age influence the effects of slow-wave sleep on respiration in patients with obstructive sleep apnea. Sleep Breath. 2013;17:51–56. doi: 10.1007/s11325-011-0644-4.
    1. Littner MR, et al. Practice parameters for clinical use of the multiple sleep latency test and the maintenance of wakefulness test. Sleep. 2005;28:113–121. doi: 10.1093/sleep/28.1.113.
    1. Andlauer O, et al. Nocturnal rapid eye movement sleep latency for identifying patients with narcolepsy/hypocretin deficiency. JAMA Neurol. 2013;70:891–902. doi: 10.1001/jamaneurol.2013.1589.
    1. Mignot E, et al. The role of cerebrospinal fluid hypocretin measurement in the diagnosis of narcolepsy and other hypersomnias. Arch. Neurol. 2002;59:1553–1562. doi: 10.1001/archneur.59.10.1553.
    1. Andlauer O, et al. Predictors of hypocretin (orexin) deficiency in narcolepsy without cataplexy. Sleep. 2012;35:1247–1255. doi: 10.5665/sleep.2080.
    1. Luca G, et al. Clinical, polysomnographic and genome-wide association analyses of narcolepsy with cataplexy: a European Narcolepsy Network study. J. Sleep Res. 2013;22:482–495. doi: 10.1111/jsr.12044.
    1. Dauvilliers Y, et al. Effect of age on MSLT results in patients with narcolepsy-cataplexy. Neurology. 2004;62:46–50. doi: 10.1212/01.WNL.0000101725.34089.1E.
    1. Moscovitch A, Partinen M, Guilleminault C. The positive diagnosis of narcolepsy and narcolepsy’s borderland. Neurology. 1993;43:55–60. doi: 10.1212/WNL.43.1_Part_1.55.
    1. Rosenberg RS, Van Hout S. The American Academy of Sleep Medicine inter-scorer reliability program: sleep stage scoring. J. Clin. Sleep Med. 2013;9:81–87.
    1. Zhang X, et al. Process and outcome for international reliability in sleep scoring. Sleep Breath. 2015;19:191–195. doi: 10.1007/s11325-014-0990-0.
    1. Danker-Hopfe H, et al. Interrater reliability for sleep scoring according to the Rechtschaffen & Kales and the new AASM standard. J. Sleep Res. 2009;18:74–84. doi: 10.1111/j.1365-2869.2008.00700.x.
    1. MacLean AW, Lue F, Moldofksy H. The reliability of visual scoring of alpha EEG activity during sleep. Sleep. 1995;18:565–569.
    1. Kim Y, Kurachi M, Horita M, Matsuura K, Kamikawa Y. Agreement of visual scoring of sleep stages among many laboratories in Japan: effect of a supplementary definition of slow wave on scoring of slow wave sleep. J. Psychiatry Clin. Neurosci. 1993;47:91–97. doi: 10.1111/j.1440-1819.1993.tb02035.x.
    1. Hinton G, et al. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 2012;83:82–97. doi: 10.1109/MSP.2012.2205597.
    1. He, K., Zhang, X., Ren, S. & Sun, J. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In 2015 IEEEInternational Conference on Computer Vision (ICCV) 1026–1034 (IEEE, Santiago, Chile, 2015).
    1. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 770–778 (IEEE, Las Vegas, NV, 2016).
    1. Gulshan V, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA. 2016;316:2402–2410. doi: 10.1001/jama.2016.17216.
    1. Ting DSW, et al. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA. 2017;318:2211–2223. doi: 10.1001/jama.2017.18152.
    1. Bejnordi BE, et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA. 2017;318:2199–2210. doi: 10.1001/jama.2017.14585.
    1. Esteva A, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056.
    1. Cheng JZ, et al. Computer-aided diagnosis with deep learning architecture: applications to breast lesions in US images and pulmonary nodules in CT scans. Sci. Rep. 2016;6:24454. doi: 10.1038/srep24454.
    1. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology. 2017;284:574–582. doi: 10.1148/radiol.2017162326.
    1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539.
    1. Kuna ST, et al. Agreement in computer-assisted manual scoring of polysomnograms across sleep centers. Sleep. 2013;36:583–589. doi: 10.5665/sleep.2550.
    1. Moore HI, et al. Design and validation of a periodic leg movement detector. PLoS One. 2014;9:e114565. doi: 10.1371/journal.pone.0114565.
    1. Young T, et al. Burden of sleep apnea: rationale, design, and major findings of the Wisconsin Sleep Cohort study. WMJ. 2009;108:246–249.
    1. Hong SC, et al. A study of the diagnostic utility of HLA typing, CSF hypocretin-1 measurements, and MSLT testing for the diagnosis of narcolepsy in 163 Korean patients with unexplained excessive daytime sleepiness. Sleep. 2006;29:1429–1438. doi: 10.1093/sleep/29.11.1429.
    1. Frauscher B, et al. Delayed diagnosis, range of severity, and multiple sleep comorbidities: a clinical and polysomnographic analysis of 100 patients of the Innsbruck Narcolepsy Cohort. J. Clin. Sleep Med. 2013;9:805–812.
    1. Mander BA, Winer JR, Walker MP. Sleep and human aging. Neuron. 2017;94:19–36. doi: 10.1016/j.neuron.2017.02.004.
    1. Christensen JAE, et al. Sleep-stage transitions during polysomnographic recordings as diagnostic features of type 1 narcolepsy. Sleep Med. 2015;16:1558–1566. doi: 10.1016/j.sleep.2015.06.007.
    1. Olsen AV, et al. Diagnostic value of sleep stage dissociation as visualized on a 2-dimensional sleep state space in human narcolepsy. J. Neurosci. Methods. 2017;282:9–19. doi: 10.1016/j.jneumeth.2017.02.004.
    1. Jensen JB, et al. Sleep-wake transition in narcolepsy and healthy controls using a support vector machine. J. Clin. Neurophysiol. 2014;31:397–401. doi: 10.1097/WNP.0000000000000074.
    1. Vassalli A, et al. Electroencephalogram paroxysmal theta characterizes cataplexy in mice and children. Brain. 2013;136:1592–1608. doi: 10.1093/brain/awt069.
    1. Pizza F, et al. Nocturnal sleep dynamics identify narcolepsy type 1. Sleep. 2015;38:1277–1284. doi: 10.5665/sleep.4908.
    1. Guyon I, Weston J, Barnhill S. Gene selection for cancer classification using support vector machines. Mach. Learn. 2002;46:389–422. doi: 10.1023/A:1012487302797.
    1. International Xyrem Study Group. A double-blind, placebo-controlled study demonstrates sodium oxybate is effective for the treatment of excessive daytime sleepiness in narcolepsy. J. Clin. Sleep Med. 2005;1:391–397.
    1. Anderer P, et al. An E-health solution for automatic sleep classification according to Rechtschaffen and Kales: validation study of the Somnolyzer 24 x 7 utilizing the Siesta database. Neuropsychobiology. 2005;51:115–133. doi: 10.1159/000085205.
    1. Olesen, A. N., Christensen, J. A. E., Sorensen, H. B. D. & Jennum, P. J. A noise-assisted data analysis method for automatic EOG-based sleep stage classification using ensemble learning. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 3769–3772 (IEEE, Orlando, FL, 2016).
    1. Boostani R, Karimzadeh F, Nami M. A comparative review on sleep stage classification methods in patients and healthy individuals. Comput. Methods Prog. Biomed. 2017;140:77–91. doi: 10.1016/j.cmpb.2016.12.004.
    1. Lajnef T, et al. Learning machines and sleeping brains: automatic sleep stage classification using decision-tree multi-class support vector machines. J. Neurosci. Methods. 2015;250:94–105. doi: 10.1016/j.jneumeth.2015.01.022.
    1. da Silveira TLT, Kozakevicius AJ, Rodrigues CR. Single-channel EEG sleep stage classification based on a streamlined set of statistical features in wavelet domain. Med. Biol. Eng. Comput. 2017;55:343–352. doi: 10.1007/s11517-016-1519-4.
    1. Ronzhina M, et al. Sleep scoring using artificial neural networks. Sleep Med. Rev. 2012;16:251–263. doi: 10.1016/j.smrv.2011.06.003.
    1. Reiter J, Katz E, Scammell TE, Maski K. Usefulness of a nocturnal SOREMP for diagnosing narcolepsy with cataplexy in a pediatric population. Sleep. 2015;38:859–865.
    1. Banko, M. & Brill, E. Scaling to very very large corpora for natural language disambiguation. In ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics 26–33 (Association for Computational Linguistics, Stroudsburg, PA, 2001).
    1. Shotton J, et al. Real-time human pose recognition in parts from single depth images. Stud. Comput. Intell. 2013;411:119–135.
    1. Christensen JAE, et al. Novel method for evaluation of eye movements in patients with narcolepsy. Sleep Med. 2017;33:171–180. doi: 10.1016/j.sleep.2016.10.016.
    1. Goldbart A, et al. Narcolepsy and predictors of positive MSLTs in the Wisconsin Sleep Cohort. Sleep. 2014;37:1043–1051. doi: 10.5665/sleep.3758.
    1. Silber MH, et al. The visual scoring of sleep in adults. J. Clin. Sleep Med. 2007;3:121–131.
    1. Hjorth B. EEG analysis based on time domain properties. Electroencephalogr. Clin. Neurophysiol. 1970;29:306–310. doi: 10.1016/0013-4694(70)90143-4.
    1. Mahalanobis PC. On the generalised distance in statistics. Proc. Natl. Inst. Sci. India. 1936;2:49–55.
    1. Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735.
    1. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge: MIT Press; 2016.
    1. Bishop, C. M. Pattern Recognition and Machine Learning (Springer, New York, 2006).
    1. Hornik K, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural Netw. 1989;2:359–366. doi: 10.1016/0893-6080(89)90020-8.
    1. Lenc, K. & Vedaldi, A. Understanding image representations by measuring their equivariance and equivalence. In 2015IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 991–999 (IEEE, Boston, MA, 2015).
    1. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 2015 International Conference on Learning Representation (ICLR) 1–14 (ICLR, San Diego, CA, 2015).
    1. Polyak BT. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964;4:1–17. doi: 10.1016/0041-5553(64)90137-5.
    1. Ioffe S, Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. Proc. 32nd Int. Conf. Mach. Learn. PLMR. 2015;37:448–456.
    1. Krogh A, Hertz JA. A simple weight decay can improve generalization. Adv. Neural Inf. Process. Syst. 1992;4:950–957.
    1. Caruana, R., Lawrence, S. & Giles, L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Proc. Advances in Neural Information Processing Systems 13 402–408 (MIT Press, Cambridge, MA, 2001).
    1. Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014;15:1929–1958.
    1. Roth T, et al. Disrupted nighttime sleep in narcolepsy. J. Clin. Sleep Med. 2013;9:955–965.
    1. Hansen MH, Kornum BR, Jennum P. Sleep-wake stability in narcolepsy patients with normal, low and unmeasurable hypocretin levels. Sleep Med. 2017;34:1–6. doi: 10.1016/j.sleep.2017.01.021.
    1. Drakatos P, et al. First rapid eye movement sleep periods and sleep-onset rapid eye movement periods in sleep-stage sequencing of hypersomnias. Sleep Med. 2013;14:897–901. doi: 10.1016/j.sleep.2013.03.021.
    1. Liu Y, et al. Altered sleep stage transitions of REM sleep: a novel and stable biomarker of narcolepsy. J. Clin. Sleep Med. 2015;11:885–894.
    1. Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. Cambridge: The MIT Press; 2006.
    1. Hensman, J., Matthews, A. & Ghahramani, Z. Scalable variational Gaussian process classification. In 18th International Conference on Artificial Intelligence and Statistics (AISTATS) (PMLR, San Diego, CA, 2015).
    1. Matthews AGDG, Nickson T, Boukouvalas A, Hensman J. GPflow: a Gaussian Process Library using TensorFlow. J. Mach. Learn. Res. 2017;18:1–6.

Source: PubMed

3
Předplatit