Generalizable spelling using a speech neuroprosthesis in an individual with severe limb and vocal paralysis
Sean L Metzger, Jessie R Liu, David A Moses, Maximilian E Dougherty, Margaret P Seaton, Kaylo T Littlejohn, Josh Chartier, Gopala K Anumanchipalli, Adelyn Tu-Chan, Karunesh Ganguly, Edward F Chang, Sean L Metzger, Jessie R Liu, David A Moses, Maximilian E Dougherty, Margaret P Seaton, Kaylo T Littlejohn, Josh Chartier, Gopala K Anumanchipalli, Adelyn Tu-Chan, Karunesh Ganguly, Edward F Chang
Abstract
Neuroprostheses have the potential to restore communication to people who cannot speak or type due to paralysis. However, it is unclear if silent attempts to speak can be used to control a communication neuroprosthesis. Here, we translated direct cortical signals in a clinical-trial participant (ClinicalTrials.gov; NCT03698149) with severe limb and vocal-tract paralysis into single letters to spell out full sentences in real time. We used deep-learning and language-modeling techniques to decode letter sequences as the participant attempted to silently spell using code words that represented the 26 English letters (e.g. "alpha" for "a"). We leveraged broad electrode coverage beyond speech-motor cortex to include supplemental control signals from hand cortex and complementary information from low- and high-frequency signal components to improve decoding accuracy. We decoded sentences using words from a 1,152-word vocabulary at a median character error rate of 6.13% and speed of 29.4 characters per minute. In offline simulations, we showed that our approach generalized to large vocabularies containing over 9,000 words (median character error rate of 8.23%). These results illustrate the clinical viability of a silently controlled speech neuroprosthesis to generate sentences from a large vocabulary through a spelling-based approach, complementing previous demonstrations of direct full-word decoding.
Conflict of interest statement
S.L.M., J.R.L., D.A.M., and E.F.C. are inventors on a pending provisional patent application that is directly relevant to the neural-decoding approach used in this work. G.K.A and E.F.C are inventors on patent application PCT/US2020/028926, D.A.M. and E.F.C. are inventors on patent application PCT/US2020/043706 and E.F.C. is an inventor on patent US9905239B2 which are broadly relevant to the neural-decoding approach in this work. The remaining authors declare no competing interests.
© 2022. The Author(s).
Figures
References
- Beukelman DR, Fager S, Ball L, Dietz A. AAC for adults with acquired neurological conditions: a review. Augment. Altern. Commun. 2007;23:230–242. doi: 10.1080/07434610701553668.
- Felgoise SH, Zaccheo V, Duff J, Simmons Z. Verbal communication impacts quality of life in patients with amyotrophic lateral sclerosis. Amyotroph. Lateral Scler. Front. Degener. 2016;17:179–183. doi: 10.3109/21678421.2015.1125499.
- Brumberg JS, Pitt KM, Mantie-Kozlowski A, Burnison JD. Brain–computer interfaces for augmentative and alternative communication: a tutorial. Am. J. Speech Lang. Pathol. 2018;27:1–12. doi: 10.1044/2017_AJSLP-16-0244.
- Vansteensel MJ, et al. Fully implanted brain–computer interface in a locked-in patient with ALS. N. Engl. J. Med. 2016;375:2060–2066. doi: 10.1056/NEJMoa1608085.
- Pandarinath C, et al. High performance communication by people with paralysis using an intracortical brain-computer interface. eLife. 2017;6:1–27. doi: 10.7554/eLife.18554.
- Willett FR, Avansino DT, Hochberg LR, Henderson JM, Shenoy KV. High-performance brain-to-text communication via handwriting. Nature. 2021;593:249–254. doi: 10.1038/s41586-021-03506-2.
- Branco MP, et al. Brain-computer interfaces for communication: preferences of individuals with locked-in syndrome. Neurorehabil. Neural Repair. 2021;35:267–279. doi: 10.1177/1545968321989331.
- Bouchard KE, Mesgarani N, Johnson K, Chang EF. Functional organization of human sensorimotor cortex for speech articulation. Nature. 2013;495:327–332. doi: 10.1038/nature11911.
- Carey D, Krishnan S, Callaghan MF, Sereno MI, Dick F. Functional and quantitative MRI mapping of somatomotor representations of human supralaryngeal vocal tract. Cereb. Cortex. 2017;27:265–278.
- Chartier J, Anumanchipalli GK, Johnson K, Chang EF. Encoding of articulatory kinematic trajectories in human speech sensorimotor cortex. Neuron. 2018;98:1042–1054.e4. doi: 10.1016/j.neuron.2018.04.031.
- Lotte F, et al. Electrocorticographic representations of segmental features in continuous speech. Front. Hum. Neurosci. 2015;09:1–13. doi: 10.3389/fnhum.2015.00097.
- Herff C, et al. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 2015;9:1–11. doi: 10.3389/fnins.2015.00217.
- Makin JG, Moses DA, Chang EF. Machine translation of cortical activity to text with an encoder–decoder framework. Nat. Neurosci. 2020;23:575–582. doi: 10.1038/s41593-020-0608-8.
- Mugler EM, et al. Direct classification of all American English phonemes using signals from functional speech motor cortex. J. Neural Eng. 2014;11:035015–035015. doi: 10.1088/1741-2560/11/3/035015.
- Sun P, Anumanchipalli GK, Chang EF. Brain2Char: a deep architecture for decoding text from brain recordings. J. Neural Eng. 2020;17:066015. doi: 10.1088/1741-2552/abc742.
- Dash, D., Ferrari, P. & Wang, J. Decoding imagined and spoken phrases from non-invasive neural (MEG) signals. Front. Neurosci. 14, 290 (2020).
- Wilson GH, et al. Decoding spoken English from intracortical electrode arrays in dorsal precentral gyrus. J. Neural Eng. 2020;17:066007. doi: 10.1088/1741-2552/abbfef.
- Cooney, C., Folli, R. & Coyle, D. H. A bimodal deep learning architecture for EEG-fNIRS decoding of overt and imagined speech. IEEE Trans. Biomed. Eng. 1–1 10.1109/TBME.2021.3132861 (2021).
- Angrick, M. et al. Speech synthesis from stereotactic EEG using an electrode shaft dependent multi-input convolutional neural network approach. In 2021 43rd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). p. 6045–6048. 10.1109/EMBC46164.2021.9629711 (2021).
- Moses DA, et al. Neuroprosthesis for decoding speech in a paralyzed person with anarthria. N. Engl. J. Med. 2021;385:217–227. doi: 10.1056/NEJMoa2027540.
- Adolphs S, Schmitt N. Lexical coverage of spoken discourse. Appl. Linguist. 2003;24:425–438. doi: 10.1093/applin/24.4.425.
- van Tilborg, A. & Deckers, S. R. J. M. Vocabulary selection in AAC: application of core vocabulary in atypical populations. Perspectives of the ASHA Special Interest Groups. Vol. 1, p. 125–138 (American Speech-Language-Hearing Association, 2016).
- Hannun, A. Y., Maas, A. L., Jurafsky, D. & Ng, A. Y. First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs. arXiv10.48550/arXiv.1408.2873 (2014).
- Moses DA, Leonard MK, Makin JG, Chang EF. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat. Commun. 2019;10:3096. doi: 10.1038/s41467-019-10994-4.
- Dash, D. et al. Neural Speech Decoding for Amyotrophic Lateral Sclerosis. 10.21437/Interspeech.2020-3071 (2020).
- Proix T, et al. Imagined speech can be decoded from low- and cross-frequency intracranial EEG features. Nat. Commun. 2022;13:48. doi: 10.1038/s41467-021-27725-3.
- Anumanchipalli GK, Chartier J, Chang EF. Speech synthesis from neural decoding of spoken sentences. Nature. 2019;568:493–498. doi: 10.1038/s41586-019-1119-1.
- Simonyan, K., Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv13126034 Cs (2014).
- Rezeika A, et al. Brain–computer interface spellers: a review. Brain Sci. 2018;8:57. doi: 10.3390/brainsci8040057.
- Sellers EW, Ryan DB, Hauser CK. Noninvasive brain-computer interface enables communication after brainstem stroke. Sci. Transl. Med. 2014;6:257re7–257re7. doi: 10.1126/scitranslmed.3007801.
- Gilja V, et al. A high-performance neural prosthesis enabled by control algorithm design. Nat. Neurosci. 2012;15:1752–1757. doi: 10.1038/nn.3265.
- Kawala-Sterniuk A, et al. Summary of over fifty years with brain-computer interfaces—a review. Brain Sci. 2021;11:43. doi: 10.3390/brainsci11010043.
- Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP. Instant neural control of a movement signal. Nature. 2002;416:141–142. doi: 10.1038/416141a.
- Laufer, B. Special Language: From Human Thinking to Thinking Machines. 316323 (Multilingual Matters, 1989).
- Webb S, Rodgers MPH. Vocabulary demands of television programs. Lang. Learn. 2009;59:335–366. doi: 10.1111/j.1467-9922.2009.00509.x.
- Conant DF, Bouchard KE, Leonard MK, Chang EF. Human sensorimotor cortex control of directly-measured vocal tract movements during vowel production. J. Neurosci. 2018;38:2382–17. doi: 10.1523/JNEUROSCI.2382-17.2018.
- Gerardin E, et al. Partially overlapping neural networks for real and imagined hand movements. Cereb. Cortex. 2000;10:1093–1104. doi: 10.1093/cercor/10.11.1093.
- Silversmith DB, et al. Plug-and-play control of a brain–computer interface through neural map stabilization. Nat. Biotechnol. 2020;39:326–335. doi: 10.1038/s41587-020-0662-5.
- Guenther, F. H. & Hickok, G. Neurobiology of Language. p. 725–740 (Elsevier, 2016).
- Moses, D. A., Leonard, M. K. & Chang, E. F. Real-time classification of auditory sentences using evoked cortical activity in humans. J. Neural Eng. 15, 036005 (2018).
- Ludwig KA, et al. Using a common average reference to improve cortical neuron recordings from microelectrode arrays. J. Neurophysiol. 2009;101:1679–1689. doi: 10.1152/jn.90989.2008.
- Williams, A. J., Trumpis, M., Bent, B., Chiang, C.-H. & Viventi, J. A Novel µECoG Electrode Interface for Comparison of Local and Common Averaged Referenced Signals. in 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 5057–5060 (IEEE, 2018).
- Parks TW, McClellan JH. Chebyshev approximation for nonrecursive digital filters with linear phase. IEEE Trans. Circuit Theory. 1972;19:189–194. doi: 10.1109/TCT.1972.1083419.
- Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. arXiv10.48550/arXiv.1412.6980 (2017).
- Cho, K. et al. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. 1724–1734 (Association for Computational Linguistics, 2014).
- Fort, S., Hu, H. & Lakshminarayanan, B. Deep ensembles: a loss landscape perspective. arXiv10.48550/arXiv.1912.02757 (2020).
- About the Oxford 3000 and 5000 word lists at Oxford Learner’s Dictionaries. Oxford University Press. .
- Brants, T. & Franz, A.. Web 1T 5-gram Version 1. 20971520 KB. 10.35111/CQPA-A498 (2006).
Source: PubMed