Machine translation of cortical activity to text with an encoder–decoder framework

Authored by nature.com and submitted by Neopterin

1. Nuyujukian, P. et al. Cortical control of a tablet computer by people with paralysis. PLoS ONE 13, 1–16 (2018).

2. Gilja, V. et al. Clinical translation of a high-performance neural prosthesis. Nat. Med. 21, 1142–1145 (2015).

3. Jarosiewicz, B. et al. Virtual typing by people with tetraplegia using a self-calibrating intracortical brain–computer interface. Sci. Transl. Med. 7, 1–19 (2015).

4. Brumberg, J.S. Kennedy, P.R. & Guenther, F.H. Artificial speech synthesizer control by brain–computer interface. In Interspeech, 636–639 (International Speech Communication Association, 2009).

5. Brumberg, J. S., Wright, E. J., Andreasen, D. S., Guenther, F. H. & Kennedy, P. R. Classification of intended phoneme production from chronic intracortical microelectrode recordings in speech-motor cortex. Front. Neuroeng. 5, 1–12 (2011).

6. Pei, X., Barbour, D. L. & Leuthardt, E. C. Decoding vowels and consonants in spoken and imagined words using electrocorticographic signals in humans. J. Neural Eng. 8, 1–11 (2011).

7. Mugler, E. M. et al. Differential representation of articulatory gestures and phonemes in precentral and inferior frontal gyri. J. Neurosci. 4653, 1206–18 (2018).

8. Stavisky, S.D. et al. Decoding speech from intracortical multielectrode arrays in dorsal ‘arm/hand areas’ of human motor cortex. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS (ed. Patton, J.) 93–97 (IEEE, 2018).

9. Herff, C. et al. Brain-to-text: decoding spoken phrases from phone representations in the brain. Front. Neurosci. 9, 1–11 (2015).

10. Sutskever, I., Vinyals, O. & Le, Q.V. Sequence to sequence learning with neural networks. Adv. Neural Inform. Process. Syst. 27, 3104–3112 (2014).

11. Cho, K. et al. Learning phrase representations using RNN encoder–decoder for statistical machine translation. In 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (eds Moschitti, A., Pang, B. & Daelemans, W.) 1724–1734 (Association for Computational Linguistics, 2014).

12. Koehn, P. Europarl: a parallel corpus for statistical machine translation. In Machine Translation Summit X, 79–86 (Asia-Pacific Association for Machine Translation, 2005).

13. Beelen, K. et al. Digitization of the Canadian parliamentary debates. Can. J. Polit. Sci. 50, 849–864 (2017).

14. Wrench, A.A. A multichannel articulatory database and its application for automatic speech recognition. In Proceedings of the 5th Seminar of Speech Production (ed. Hoole, P.) 305–308 (Institut für Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians-Universität, 2000).

15. Dichter, B. K., Breshears, J. D., Leonard, M. K. & Chang, E. F. The control of vocal pitch in human laryngeal motor cortex. Cell 174, 21–31.e9 (2018).

16. Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. Functional organization of human sensorimotor cortex for speech articulation. Nature 495, 327–332 (2013).

17. Caruana, R. Multitask learning. Mach. Learn. 28, 41–75 (1997).

18. Szegedy, C. et al. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1–9 (IEEE, 2015).

19. Rumelhart, D., Hinton, G. E. & Williams, R. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

20. Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

21. Xiong, W. et al. Toward human parity in conversational speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 25, 2410–2423 (2017).

22. Munteanu, C. Penn, G. Baecker, R. Toms, E. & James, D. Measuring the acceptable word error rate of machine-generated webcast transcripts. In Interspeech, 157–160 (ISCA, 2006).

23. Schalkwyk, J. et al. in Advances in Speech Recognition (ed. Neustein, A.) 61–90 (Springer, 2010).

24. Moses, D. A., Leonard, M. K., Makin, J. G. & Chang, E. F. Real-time decoding of question-and-answer speech dialogue using human cortical activity. Nat. Commun. 10, 3096 (2019).

25. Cho, K. van Merrienboer, B. Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: encoder–decoder approaches. In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (eds Wu, D., Carpuat, M., Carreras, X. & Vecchi, E. M.) 103–111 (Association for Computational Linguistics, 2014).

26. Pratt, L., Mostow, J. & Kamm, C. Direct transfer of learned information among neural networks. In Proceedings of the Ninth National Conference on Artificial Intelligence Vol. 2, 584–589 (AAAI Press, 1991).

27. Simonyan, K. Vedaldi, A. & Zisserman, A. Deep inside convolutional networks: visualising image classification models and saliency maps. In Workshop at the International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) 1–8 (ICLR, 2014).

28. Burke, J. F. et al. Synchronous and asynchronous theta and gamma activity during episodic memory formation. J. Neurosci. 33, 292–304 (2013).

29. Meisler, S. L., Kahana, M. J. & Ezzyat, Y. Does data cleaning improve brain state classification? J. Neurosci. Methods 328, 1–10 (2019).

30. Conant, D. F., Bouchard, K. E., Leonard, M. K. & Chang, E. F. Human sensorimotor cortex control of directly measured vocal tract movements during vowel production. J. Neurosci. 38, 2955–2966 (2018).

31. Mesgarani, N., Cheung, C., Johnson, K. & Chang, E. F. Phonetic feature encoding in human superior temporal gyrus. Science 343, 1006–1010 (2014).

32. Yi, H. G., Leonard, M. K. & Chang, E. F. The encoding of speech sounds in the superior temporal gyrus. Neuron 102, 1096–1110 (2019).

33. Chang, E. F., Niziolek, C. A., Knight, R. T., Nagarajan, S. S. & Houde, J. F. Human cortical sensorimotor network underlying feedback control of vocal pitch. Proc. Natl Acad. Sci. USA 110, 2653–2658 (2013).

34. Bahdanau, D. Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) 1–15 (ICLR, 2015).

35. Bai, S. Kolter, J.Z. & Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. Preprint at arXiv https://arxiv.org/pdf/1803.01271.pdf (2018).

36. Tian, X. & Poeppel, D. Mental imagery of speech and movement implicates the dynamics of internal forward models. Front. Psychol. 1, 1–23 (2010).

37. Lyons, J. et al. Python Speech Features v.0.6.1 https://doi.org/10.5281/zenodo.3607820 (Zenodo, 2020).

38. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

39. Gers, F. A., Schmidhuber, J. & Cummins, F. Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000).

40. Abadi, M. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. in Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).

41. Kingma, D.P. & Ba, J. Adam: a method for stochastic optimization. in 3rd International Conference on Learning Representations (eds Bengio, Y. & LeCun, Y.) (ICLR, 2015).

PalpatineForEmperor on March 30th, 2020 at 22:49 UTC »

The other day I learned that not all people can hear themselves speak in their mind. I wonder if this would somehow still work for them.

myfingid on March 30th, 2020 at 21:01 UTC »

Going to be fun when they use this for interviews, and police interrogations. You can already be compelled to give up your fingerprint and your blood, so why not your thoughts!

Neopterin on March 30th, 2020 at 20:18 UTC »

For those who can't access the Nature article

Report from Guardian science