Deep Learning-Based Natural Language Processing for Screening Psychiatric Patients

Hong-Jie Dai, Chu-Hsien Su, You-Qian Lee, You-Chen Zhang, Chen-Kai Wang, Chian-Jue Kuo, Chi-Shin Wu, Hong-Jie Dai, Chu-Hsien Su, You-Qian Lee, You-Chen Zhang, Chen-Kai Wang, Chian-Jue Kuo, Chi-Shin Wu

Abstract

The introduction of pre-trained language models in natural language processing (NLP) based on deep learning and the availability of electronic health records (EHRs) presents a great opportunity to transfer the "knowledge" learned from data in the general domain to enable the analysis of unstructured textual data in clinical domains. This study explored the feasibility of applying NLP to a small EHR dataset to investigate the power of transfer learning to facilitate the process of patient screening in psychiatry. A total of 500 patients were randomly selected from a medical center database. Three annotators with clinical experience reviewed the notes to make diagnoses for major/minor depression, bipolar disorder, schizophrenia, and dementia to form a small and highly imbalanced corpus. Several state-of-the-art NLP methods based on deep learning along with pre-trained models based on shallow or deep transfer learning were adapted to develop models to classify the aforementioned diseases. We hypothesized that the models that rely on transferred knowledge would be expected to outperform the models learned from scratch. The experimental results demonstrated that the models with the pre-trained techniques outperformed the models without transferred knowledge by micro-avg. and macro-avg. F-scores of 0.11 and 0.28, respectively. Our results also suggested that the use of the feature dependency strategy to build multi-labeling models instead of problem transformation is superior considering its higher performance and simplicity in the training process.

Keywords: deep learning; natural language processing; patient screening; psychiatric diagnoses; text classification.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2021 Dai, Su, Lee, Zhang, Wang, Kuo and Wu.

Figures

Figure 1
Figure 1
Distributions of the five disorders in the training and test sets.
Figure 2
Figure 2
Flowchart of the screening process of the neural networks based on the two methods: problem transformation and feature dependency.
Figure 3
Figure 3
Deep learning text classification models developed in this study.
Figure 4
Figure 4
Micro- and macro-average F-scores on the test set of all developed models. “Linear (Mix)” signifies the linear model that considers the BH and PME sections in combination, whereas the “Linear (Sep)” model considers them separately. The figure also includes the performance of BERT* listed in Table 2 as a baseline (BERT-fine tuning) for comparison.
Figure 5
Figure 5
Performance comparison of the top performing models selected from the developed network architectures. PT and FD denote problem transformation and feature dependency, respectively. The fine tuning approaches (ROBERTa*_PT/FD and Distill*_PT) with the best micro- and macro-avg. F-scores are also included for comparison.
Figure 6
Figure 6
Important terms used by the HAN model for classifying a patient with major depressive disorder. The font size of a word indicates the attention score. Words with higher scores appear in larger font sizes. The rectangle shown in front of each sentence indicates the importance (attention score) of the sentence. Larger rectangles signify more important sentences.
Figure 7
Figure 7
Comparison of the results of the pre-trained model and random initialization. Each sub-figure compares the results of HAN-glove_FD (upper) and HAN-rand_FD (lower). (A) TP vs. FN; (B) TP vs. TP; (C) FP vs. FP.
Figure 8
Figure 8
Comparison of the top-30 similar words for “depressed,” “manic,” “hallucination,” and “zyprexa” regarding the randomly initialized embedding of HAN-rand (left) and the pre-trained word embedding of the HAN-glove model (right).

References

    1. Wu C-S, Wang S-C, Cheng Y-C, Gau SS-F. Association of cerebrovascular events with antidepressant use: a case-crossover study. Am J Psychiatry. (2011) 168:511–21. 10.1176/appi.ajp.2010.10071064
    1. Devlin J, Chang M-W, Lee K, Toutanova K. Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding, Proceedings of NAACL-HLT 2019. Minneapolis: Association for Computational Linguistics; (2019). p. 4171–86.
    1. Peng Y, Yan S, Lu Z. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. In: Proceedings of the 18th BioNLP Workshop and Shared Task. Florence: (2019). p. 58–65.
    1. Wu C-S, Kuo C-J, Su C-H, Wei L-X, Lu W-H, Wang SH, et al. . Text mining approach to extract depressive symptoms and to validate the diagnosis of major depressive disorder from electronic health records. J Affect Disord. (2020) 260:617–23. 10.1016/j.jad.2019.09.044
    1. Chang N-W, Dai H-J, Jonnagaddala J, Chen C-W, Tsai RT-H, Hsu W-L. A context-aware approach for progression tracking of medical concepts in electronic medical records. J Biomed Inform. (2015) 58:S150–7. 10.1016/j.jbi.2015.09.013
    1. Dai H-J, Syed-Abdul S, Chen C-W, Wu C-C. Recognition and evaluation of clinical section headings in clinical documents using token-based formulation with conditional random fields. BioMed Res Int. (2015) 2015:873012. 10.1155/2015/873012
    1. Tsoumakas G, Katakis I. Multi-label classification: an overview. IJDWM. (2007) 3:1–13. 10.4018/jdwm.2007070101
    1. Wu S, Roberts K, Datta S, Du J, Ji Z, Si Y, et al. . Deep learning in clinical natural language processing: a methodical review. J Am Med Inform Assoc. (2020) 27:457–70. 10.1093/jamia/ocz200
    1. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, et al., editor. Advances in Neural Information Processing Systems. Long Beach, CA: (2017). p. 5998–6008.
    1. Adhikari A, Ram A, Tang R, Lin J. DocBERT: BERT for Document Classification. arXiv. (2019) (preprint) arXiv:1904.
    1. Cortes C, Vapnik V. Support-vector networks. Mach Learn. (1995) 20:273–97. 10.1007/BF00994018
    1. Grave E, Mikolov T, Joulin A, Bojanowski P. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Valencia: (2017). p. 427–31.
    1. Kim Y, Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha: (2014). p. 1746–51.
    1. Yang Z, Huang Y, Jiang Y, Sun Y, Zhang Y-J, Luo P. Clinical assistant diagnosis for electronic medical record based on convolutional neural network. Sci Rep. (2018) 8:1–9. 10.1038/s41598-018-24389-w
    1. Dai H-J, Jonnagaddala J. Assessing the severity of positive valence symptoms in initial psychiatric evaluation records: should we use convolutional neural networks? PLoS ONE. (2018) 13:e0204493. 10.1371/journal.pone.0204493
    1. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E. Hierarchical attention networks for document classification. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, CA: (2016). p. 1480–9.
    1. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations. Scottsdale, AZ: (2013).
    1. Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. Proc Empir Methods Nat Lang Proc (EMNLP 2014). (2014) 12:1532–43. 10.3115/v1/D14-1162
    1. Naili M, Chaibi AH, Ghezala HHB. Comparative study of word embedding methods in topic segmentation. Procedia Comp Sci. (2017) 112:340–9. 10.1016/j.procs.2017.08.009
    1. Dai H-J. Family member information extraction via neural sequence labeling models with different tag schemes. BMC Med Inform Decis Mak. (2019) 19:257. 10.1186/s12911-019-0996-4
    1. Howard J, Ruder S. Universal language model fine-tuning for text classification. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, VIC; (2018). p. 328–39.
    1. Peters M, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, et al. Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, LA: (2018). p. 2227–37.
    1. Dai H-J, Wang C-K. Classifying adverse drug reactions from imbalanced twitter data. Int J Med Inform. (2019) 129:122–32. 10.1016/j.ijmedinf.2019.05.017
    1. Johnson JM, Khoshgoftaar TM. Survey on deep learning with class imbalance. J Big Data. (2019) 6:27 10.1186/s40537-019-0192-5
    1. Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F. Learning from Imbalanced Data Sets. Berlin: Springer; (2018).
    1. Opitz J, Burst S. Macro F1 and Macro F1. arXiv (preprint). arXiv:1911.03347 (2019).
    1. Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. Roberta: a robustly optimized bert pretraining approach. arXiv (preprint). arXiv:1907.11692 (2019).
    1. Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Proceedings of the 5th Workshop on Energy Efficient Machine Learning and Cognitive Computing (NeurIPS 2019). Vancouver, BC: (2019).
    1. Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. ALBERT: a lite BERT for self-supervised learning of language representations. In: International Conference on Learning Representations (2019).
    1. Dai HJ, Su CH, Wu CS. Adverse drug event and medication extraction in electronic health records via a cascading architecture with different sequence labeling models and word embeddings. J Am Med Inform Assoc. 27:47–55. 10.1093/jamia/ocz120
    1. Peters ME, Ruder S, Smith NA. To tune or not to tune? Adapting pretrained representations to diverse tasks. In: Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019). (2019). p. 7–14.
    1. Wang H, Cui Z, Chen Y, Avidan M, Abdallah AB, Kronzer A. Predicting hospital readmission via cost-sensitive deep learning. IEEE/ACM Trans Comput Biol Bioinform. (2018) 15:1968–78. 10.1109/TCBB.2018.2827029
    1. Ke H, Chen D, Shi B, Zhang J, Liu X, Zhang X, et al. Improving brain e-health services via high-performance EEG classification with grouping bayesian optimization. In: IEEE Transactions on Services Computing, Vol. 13. (2020) p. 696–708. 10.1109/TSC.2019.2962673
    1. Ke H, Chen D, Shi B, Zhang J, Liu X, Zhang X, et al. Improving brain E-health services via high-performance EEG classification with grouping Bayesian optimization. IEEE Trans Serv Comp. (2019).
    1. Madabushi HT, Kochkina E, Castelle M. Cost-sensitive BERT for generalisable sentence classification on imbalanced data. In: Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda. Hong Kong: (2019). p. 125–34.
    1. Hinton GE, Roweis S. Stochastic neighbor embedding. Adv Neural Inform Proc Syst. (2002) 15:857–64. 10.5555/2968618.2968725
    1. Zhang Y-C, Chung W-C, Li K-H, Huang C-C, Dai H-J, Wu C-S, et al. Depressive symptoms and functional impairment extraction from electronic health records. In: International Conference on Machine Learning and Cybernetics. Kobe: (2019).

Source: PubMed

3
購読する