Predicting probable Alzheimer's disease using linguistic deficits and biomarkers

Sylvester O Orimaye, Jojo S-M Wong, Karen J Golden, Chee P Wong, Ireneous N Soyiri, Sylvester O Orimaye, Jojo S-M Wong, Karen J Golden, Chee P Wong, Ireneous N Soyiri

Abstract

Background: The manual diagnosis of neurodegenerative disorders such as Alzheimer's disease (AD) and related Dementias has been a challenge. Currently, these disorders are diagnosed using specific clinical diagnostic criteria and neuropsychological examinations. The use of several Machine Learning algorithms to build automated diagnostic models using low-level linguistic features resulting from verbal utterances could aid diagnosis of patients with probable AD from a large population. For this purpose, we developed different Machine Learning models on the DementiaBank language transcript clinical dataset, consisting of 99 patients with probable AD and 99 healthy controls.

Results: Our models learned several syntactic, lexical, and n-gram linguistic biomarkers to distinguish the probable AD group from the healthy group. In contrast to the healthy group, we found that the probable AD patients had significantly less usage of syntactic components and significantly higher usage of lexical components in their language. Also, we observed a significant difference in the use of n-grams as the healthy group were able to identify and make sense of more objects in their n-grams than the probable AD group. As such, our best diagnostic model significantly distinguished the probable AD group from the healthy elderly group with a better Area Under the Receiving Operating Characteristics Curve (AUC) using the Support Vector Machines (SVM).

Conclusions: Experimental and statistical evaluations suggest that using ML algorithms for learning linguistic biomarkers from the verbal utterances of elderly individuals could help the clinical diagnosis of probable AD. We emphasise that the best ML model for predicting the disease group combines significant syntactic, lexical and top n-gram features. However, there is a need to train the diagnostic models on larger datasets, which could lead to a better AUC and clinical diagnosis of probable AD.

Keywords: Alzheimer’s disease; Clinical diagnostics; Machine learning; Neurolinguistics; Prediction.

References

    1. Ballard C, Gauthier S, Corbett A, Brayne C, Aarsland D, Jones E. Alzheimer’s disease. The Lancet. 2011;377(9770):1019–31. doi: 10.1016/S0140-6736(10)61349-9.
    1. Rocca WA, Petersen RC, Knopman DS, Hebert LE, Evans DA, Hall KS, Gao S, Unverzagt FW, Langa KM, Larson EB, et al. Trends in the incidence and prevalence of Alzheimer’s disease, dementia, and cognitive impairment in the united states. Alzheimer’s Dementia. 2011;7(1):80–93. doi: 10.1016/j.jalz.2010.11.002.
    1. Roark B, Mitchell M, Hosom JP, Hollingshead K, Kaye J. Spoken language derived measures for detecting Mild Cognitive Impairment. Audio Speech Lang Process IEEE Trans. 2011;19(7):2081–90. doi: 10.1109/TASL.2011.2112351.
    1. Pozueta A, Rodríguez-Rodríguez E, Vazquez-Higuera JL, Mateo I, Sánchez-Juan P, González-Perez S, Berciano J, Combarros O. Detection of early Alzheimer’s disease in MCI patients by the combination of MMSE and an episodic memory test. BMC Neurol. 2011;11(1):78. doi: 10.1186/1471-2377-11-78.
    1. Querbes O, Aubry F, Pariente J, Lotterie JA, Démonet JF, Duret V, Puel M, Berry I, Fort JC, Celsis P, et al. Early diagnosis of Alzheimer’s disease using cortical thickness: impact of cognitive reserve. Brain. 2009;132(8):2036–47. doi: 10.1093/brain/awp105.
    1. Ewers M, Sperling RA, Klunk WE, Weiner MW, Hampel H. Neuroimaging markers for the prediction and early diagnosis of Alzheimer’s disease dementia. Trends Neurosci. 2011;34(8):430–42. doi: 10.1016/j.tins.2011.05.005.
    1. Abbott A. Dementia: a problem for our age. Nature. 2011;475(7355):2–4. doi: 10.1038/475S2a.
    1. Scheubert L, Luštrek M, Schmidt R, Repsilber D, Fuellen G. Tissue-based Alzheimer gene expression markers–comparison of multiple machine learning approaches and investigation of redundancy in small biomarker sets. BMC Bioinformatics. 2012;13(1):266. doi: 10.1186/1471-2105-13-266.
    1. Williams JA, Weakley A, Cook DJ, Schmitter-Edgecombe M. Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence. Bellevue: The Association for the Advancement of Artificial Intelligence (AAAI); 2013. Machine learning techniques for diagnostic differentiation of Mild Cognitive Impairment and dementia.
    1. Johnson P, Vandewater L, Wilson W, Maruff P, Savage G, Graham P, Macaulay LS, Ellis KA, Szoeke C, Martins RN, et al. Genetic algorithm with logistic regression for prediction of progression to Alzheimer’s disease. BMC Bioinformatics. 2014;15(Suppl 16):11. doi: 10.1186/1471-2105-15-S16-S11.
    1. Mitolo M, Gardini S, Caffarra P, Ronconi L, Venneri A, Pazzaglia F. Relationship between spatial ability, visuospatial working memory and self-assessed spatial orientation ability: a study in older adults. Cogn Process. 2015;16(2):165–76. doi: 10.1007/s10339-015-0647-3.
    1. Goryawala M, Zhou Q, Barker W, Loewenstein DA, Duara R, Adjouadi M. Inclusion of neuropsychological scores in atrophy models improves diagnostic classification of Alzheimer’s disease and Mild Cognitive Impairment. Comput Intell Neurosci. 2015:;2015(56). .
    1. Roselli F, Tartaglione B, Federico F, Lepore V, Defazio G, Livrea P. Rate of MMSE score change in Alzheimer’s disease: influence of education and vascular risk factors. Clin Neurol Neurosurg. 2009;111(4):327–30. doi: 10.1016/j.clineuro.2008.10.006.
    1. Fjell A, Amlien I, Westlye L, Walhovd K. Mini-Mental State Examination is sensitive to brain atrophy in Alzheimer’s disease. Dement Geriatr Cogn Disord. 2009;28(3):252–8. doi: 10.1159/000241878.
    1. Damian AM, Jacobson SA, Hentz JG, Belden CM, Shill HA, Sabbagh MN, Caviness JN, Adler CH. The Montreal Cognitive Assessment and the Mini-Mental State Examination as screening instruments for cognitive impairment: item analyses and threshold scores. Dement Geriatr Cogn Disord. 2011;31(2):126–31. doi: 10.1159/000323867.
    1. Mitchell AJ. A meta-analysis of the accuracy of the Mini-Mental State Examination in the detection of dementia and Mild Cognitive Impairment. J Psychiatr Res. 2009;43(4):411–31. doi: 10.1016/j.jpsychires.2008.04.014.
    1. Evans MC, Barnes J, Nielsen C, Kim LG, Clegg SL, Blair M, Leung KK, Douiri A, Boyes RG, Ourselin S, et al. Volume changes in Alzheimer’s disease and Mild Cognitive Impairment: cognitive associations. Eur Radiol. 2010;20(3):674–82. doi: 10.1007/s00330-009-1581-5.
    1. Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox NC, Gamst A, Holtzman DM, Jagust WJ, Petersen RC, et al. The diagnosis of Mild Cognitive Impairment due to Alzheimer’s disease: Recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dementia. 2011;7(3):270–9. doi: 10.1016/j.jalz.2011.03.008.
    1. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack CR, Kawas CH, Klunk WE, Koroshetz WJ, Manly JJ, Mayeux R, et al. The diagnosis of dementia due to Alzheimer’s disease: Recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dementia. 2011;7(3):263–9. doi: 10.1016/j.jalz.2011.03.005.
    1. Tillas A. Language as grist to the mill of cognition. Cogn Process. 2015;16(3):219–43. doi: 10.1007/s10339-015-0656-2.
    1. Reilly J, Rodriguez AD, Lamy M, Neils-Strunjas J. Cognition, language, and clinical pathological features of non-Alzheimer’s dementias: an overview. J Commun Disord. 2010;43(5):438–52. doi: 10.1016/j.jcomdis.2010.04.011.
    1. Verma M, Howard R. Semantic memory and language dysfunction in early Alzheimer’s disease: a review. Int J Geriatr Psychiatr. 2012;27(12):1209–17. doi: 10.1002/gps.3766.
    1. Ball MJ, Perkins MR, Müller N, Howard S. The Handbook of Clinical Linguistics: Vol 56. USA: John Wiley & Sons; 2009.
    1. Locke JL. A theory of neurolinguistic development. Brain Lang. 1997;58(2):265–326. doi: 10.1006/brln.1997.1791.
    1. de Lira JO, Ortiz KZ, Campanha AC, Bertolucci PHF, Minett TSC. Microlinguistic aspects of the oral narrative in patients with Alzheimer’s disease. Int Psychogeriatr. 2011;23(03):404–12. doi: 10.1017/S1041610210001092.
    1. Fraser KC, Meltzer JA, Graham NL, Leonard C, Hirst G, Black SE, Rochon E. Automated classification of primary progressive Aphasia subtypes from narrative speech transcripts. Cortex. 2014;55:43–60. doi: 10.1016/j.cortex.2012.12.006.
    1. Dubois B, Feldman HH, Jacova C, DeKosky ST, Barberger-Gateau P, Cummings J, Delacourte A, Galasko D, Gauthier S, Jicha G, et al. Research criteria for the diagnosis of Alzheimer’s disease: revising the nincds–adrda criteria. Lancet Neurol. 2007;6(8):734–46. doi: 10.1016/S1474-4422(07)70178-3.
    1. Sperling RA, Aisen PS, Beckett LA, Bennett DA, Craft S, Fagan AM, Iwatsubo T, Jack CR, Kaye J, Montine TJ, et al. Toward defining the preclinical stages of Alzheimer’s disease: Recommendations from the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for Alzheimer’s disease. Alzheimer’s Dementia. 2011;7(3):280–92. doi: 10.1016/j.jalz.2011.03.003.
    1. Jack CR, Knopman DS, Jagust WJ, Petersen RC, Weiner MW, Aisen PS, Shaw LM, Vemuri P, Wiste HJ, Weigand SD, et al. Tracking pathophysiological processes in Alzheimer’s disease: an updated hypothetical model of dynamic biomarkers. Lancet Neurol. 2013;12(2):207–16. doi: 10.1016/S1474-4422(12)70291-0.
    1. Klimova B, Kuca K. Alzheimer’s disease: Potential preventive, non-invasive, intervention strategies in lowering the risk of cognitive decline–a review study. J Appl Biomed. 2015;13(4):257–61. doi: 10.1016/j.jab.2015.07.004.
    1. Klimova B, Maresova P, Valis M, Hort J, Kuca K. Alzheimer’s disease and language impairments: social intervention and medical treatment. Clin Interv Aging. 2015;10:1401.
    1. Le X, Lancashire I, Hirst G, Jokel R. Longitudinal detection of dementia through lexical and syntactic changes in writing: a case study of three British novelists. Literary Linguistic Comput. 2011;26(4):435–61. doi: 10.1093/llc/fqr013.
    1. Pekkala S, Wiener D, Himali JJ, Beiser AS, Obler LK, Liu Y, McKee A, Auerbach S, Seshadri S, Wolf PA, et al. Lexical retrieval in discourse: An early indicator of Alzheimer’s dementia. Clin Linguist Phon. 2013;27(12):905–21. doi: 10.3109/02699206.2013.815278.
    1. Mondini S, Arcara G, Jarema G. Semantic and syntactic processing of mass and count nouns: Data from dementia. J Clin Exp Neuropsychol. 2014;36(9):967–80. doi: 10.1080/13803395.2014.958437.
    1. Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L. Syntactic n-grams as machine learning features for natural language processing. Expert Syst Appl. 2014;41(3):853–60. doi: 10.1016/j.eswa.2013.08.015.
    1. Garrard P, Rentoumi V, Gesierich B, Miller B, Gorno-Tempini ML. Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse. Cortex. 2013;55:122–29. doi: 10.1016/j.cortex.2013.05.008.
    1. Orimaye SO, Wong JS-M, Golden KJ. Proceedings of the 1st Workshop on Computational Linguistics and Clinical Psychology (CLPsych) Baltimore: Association for Computational Linguistics; 2014. Learning predictive linguistic features for Alzheimer’s disease and related dementias using verbal utterances.
    1. Kaplan E, Goodglass H, Weintraub S, Segal O, van Loon-Vervoorn A. Boston Naming Test. USA: Pro-ed; 2001.
    1. Rohrer JD, Rossor MN, Warren JD. Syndromes of nonfluent primary progressive Aphasia: A clinical and neurolinguistic analysis. Neurology. 2010;75(7):603–10. doi: 10.1212/WNL.0b013e3181ed9c6b.
    1. MacWhinney B. The CHILDES Project: The Database. New York: Psychology Press; 2000.
    1. Klein D, Manning CD. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1. Sapporo: Association for Computational Linguistics; 2003. Accurate unlexicalized parsing.
    1. Pakhomov S, Chacon D, Wicklund M, Gundel J. Computerized assessment of syntactic complexity in Alzheimerś disease: A case study of Iris Murdochś writting. Behav Res Methods. 2011;43(1):136–44. doi: 10.3758/s13428-010-0037-9.
    1. Surdeanu M, Harabagiu S, Williams J, Aarseth P. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Sapporo: Association for Computational Linguistics; 2003. Using predicate-argument structures for information extraction.
    1. Orimaye SO. 9th Asia Information Retrieval Societies Conference: December 9-11. Singapore: Springer; 2013. Learning to classify subjective sentences from multiple domains using extended subjectivity lexicon and subjective predicates.
    1. Wong S-MJ, Dras M. Parser features for sentence grammaticality classification. In: Proceedings of the Australasian Language Technology Association Workshop 2010. Melbourne: 2010. p. 67–75.
    1. Post M, Bergsma S. Proceedings of the 51st Annual Meeting on Association for Computational Linguistics - Volume 2. Sofia: ACL ’13; 2013. Explicit and implicit syntactic features for text classification.
    1. Croisile B, Ska B, Brabant MJ, Duchene A, Lepage Y, Aimard G, Trillet M. Comparative study of oral and written picture description in patients with Alzheimer’s disease. Brain Lang. 1996;53(1):1–19. doi: 10.1006/brln.1996.0033.
    1. Marini A, Spoletini I, Rubino IA, Ciuffa M, Bria P, Martinotti G, Banfi G, Boccascino R, Strom P, Siracusano A, et al. The language of Schizophrenia: An analysis of micro and macrolinguistic abilities and their neuropsychological correlates. Schizophr Res. 2008;105(1):144–55. doi: 10.1016/j.schres.2008.07.011.
    1. Yoder PJ, Molfese D, Gardner E. Initial mean length of utterance predicts the relative efficacy of two grammatical treatments in preschoolers with specific language impairment. J Speech Lang Hearing Re. 2011;54(4):1170–81. doi: 10.1044/1092-4388(2010/09-0246).
    1. Friederici AD. The brain basis of language processing: from structure to function. Physiol Rev. 2011;91(4):1357–92. doi: 10.1152/physrev.00006.2011.
    1. Creutz M, Lagus K. Proceedings of the ACL-02 Workshop on Morphological and Phonological learning-Volume 6. Philadelphia: Association for Computational Linguistics; 2002. Unsupervised discovery of morphemes.
    1. Juola P. Using the google n-gram corpus to measure cultural complexity. Lit Linguist Comput. 2013;28(4):668–75. doi: 10.1093/llc/fqt017.
    1. Chen SF, Chu SM. Enhanced word classing for model m. Makuhari: INTERSPEECH; 2010.
    1. Platt J. Sequential minimal optimization: A fast algorithm for training support vector machines. Technical Report MSR-TR-98-14, Microsoft Research. 1998:21. .
    1. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explorations Newslett. 2009;11(1):10–18. doi: 10.1145/1656274.1656278.
    1. Thornton C, Hutter F, Hoos HH, Leyton-Brown K. Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Chicago: ACM; 2013. Auto-weka: Combined selection and hyperparameter optimization of classification algorithms.
    1. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (roc) curve. Radiology. 1982;143(1):29–36. doi: 10.1148/radiology.143.1.7063747.
    1. Prud’hommeaux E, Roark B. Graph-based word alignment for clinical language evaluation. Comput Linguist. 2015;41(4):549–78. doi: 10.1162/COLI_a_00232.
    1. Zou KH, O’Malley AJ, Mauri L. Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation. 2007;115(5):654–7. doi: 10.1161/CIRCULATIONAHA.105.594929.
    1. Zweig MH, Campbell G. Receiver-operating characteristic (roc) plots: a fundamental evaluation tool in clinical medicine. Clin Chem. 1993;39(4):561–77.
    1. Pepe MS, Longton G, Anderson GL, Schummer M. Selecting differentially expressed genes from microarray experiments. Biometrics. 2003;59(1):133–42. doi: 10.1111/1541-0420.00016.
    1. Hajian-Tilaki K. Receiver operating characteristic (roc) curve analysis for medical diagnostic test evaluation. Caspian J Int Med. 2013;4(2):627.
    1. Newcombe RG. Confidence intervals for an effect size measure based on the Mann–Whitney statistic. part 1: general issues and tail-area-based methods. Stat Med. 2006;25(4):543–57. doi: 10.1002/sim.2323.
    1. Fan J, Upadhye S, Worster A. Understanding receiver operating characteristic (roc) curves. Cjem. 2006;8(01):19–20. doi: 10.1017/S1481803500013336.
    1. Airola A, Pahikkala T, Waegeman W, De Baets B, Salakoski T. An experimental comparison of cross-validation techniques for estimating the area under the Roc curve. Comput Stat Data Anal. 2011;55(4):1828–44. doi: 10.1016/j.csda.2010.11.018.
    1. Smith GC, Seaman SR, Wood AM, Royston P, White IR. Correcting for optimistic prediction in small data sets. Am J Epidemiol. 2014;180(3):318–24. doi: 10.1093/aje/kwu140.
    1. Wechsler D. Wechsler Memory Scale-(WMS-IV). New York: The Psychological Corporation. 2009.
    1. Lee C, Lee GG. Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manag. 2006;42(1):155–65. doi: 10.1016/j.ipm.2004.08.006.
    1. Ahmed S, Arnold R, Thompson SA, Graham KS, Hodges JR. Naming of objects, faces and buildings in Mild Cognitive Impairment. Cortex. 2008;44(6):746–52. doi: 10.1016/j.cortex.2007.02.002.
    1. MacWhinney B. The CHILDES project tools for analyzing talk-electronic edition part 1: The chat transcription format. 2011. .

Source: PubMed

3
Iratkozz fel