Low-Power Audio Keyword Spotting Using Tsetlin Machines

作者: Ole-Christoffer Granmo , Rishad A. Shafik , Alex Yakovlev , Akhil Mathur , Fahim Kawsar

DOI: 10.3390/JLPEA11020018

关键词:

摘要: The emergence of Artificial Intelligence (AI) driven Keyword Spotting (KWS) technologies has revolutionized human to machine interaction. Yet, the challenge end-to-end energy efficiency, memory footprint and system complexity current Neural Network (NN) powered AI-KWS pipelines remained ever present. This paper evaluates KWS utilizing a learning automata algorithm called Tsetlin Machine (TM). Through significant reduction in parameter requirements choosing logic over arithmetic based processing, TM offers new opportunities for low-power while maintaining high efficacy. In this we explore keyword spotting pipeline demonstrate low with faster rate convergence compared NNs. Further, investigate scalability increasing keywords potential enabling on-chip KWS.

参考文章(45)
Automatic Speech Recognition John Wiley & Sons, Inc.. pp. 299- 300 ,(2011) , 10.1002/9781118142882.PART6
Santiago Fernández, Alex Graves, Jürgen Schmidhuber, An Application of Recurrent Neural Networks to Discriminative Keyword Spotting Lecture Notes in Computer Science. pp. 220- 229 ,(2007) , 10.1007/978-3-540-74695-9_23
N.J. Nalini, S. Palanivel, Music emotion recognition: The combined evidence of MFCC and residual phase Egyptian Informatics Journal. ,vol. 17, pp. 1- 10 ,(2016) , 10.1016/J.EIJ.2015.05.004
Guoguo Chen, Carolina Parada, Georg Heigold, Small-footprint keyword spotting using deep neural networks international conference on acoustics, speech, and signal processing. pp. 4087- 4091 ,(2014) , 10.1109/ICASSP.2014.6854370
J.G. Wilpon, L.R. Rabiner, C.-H. Lee, E.R. Goldman, Automatic recognition of keywords in unconstrained speech using hidden Markov models IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 38, pp. 1870- 1878 ,(1990) , 10.1109/29.103088
F. Hilger, H. Ney, Quantile based histogram equalization for noise robust large vocabulary speech recognition IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 845- 854 ,(2006) , 10.1109/TSA.2005.857792
J.W. Picone, Signal modeling techniques in speech recognition Proceedings of the IEEE. ,vol. 81, pp. 1215- 1247 ,(1993) , 10.1109/5.237532
Selina Chu, Shrikanth Narayanan, C.-C. Jay Kuo, Environmental Sound Recognition With Time–Frequency Audio Features IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 17, pp. 1142- 1158 ,(2009) , 10.1109/TASL.2009.2017438
J.C. Segura, C. Benitez, A. delaTorre, A.J. Rubio, J. Ramirez, Cepstral domain segmental nonlinear feature transformations for robust speech recognition IEEE Signal Processing Letters. ,vol. 11, pp. 517- 520 ,(2004) , 10.1109/LSP.2004.826648
A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, A.J. Rubio, Histogram equalization of speech representation for robust speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 355- 366 ,(2005) , 10.1109/TSA.2005.845805