A fixed-point neural network for keyword detection on resource constrained hardware

作者: Mohit Shah , Jingcheng Wang , David Blaauw , Dennis Sylvester , Hun-Seok Kim

DOI: 10.1109/SIPS.2015.7345026

关键词: Fixed pointTime delay neural networkComputer hardwareMultiplier (economics)Resource constrainedReal-time computingComputer scienceSpoken dialog systemsArtificial neural networkHardware architecture

摘要: Keyword detection is typically used as a front-end to trigger automatic speech recognition and spoken dialog systems. The engine needs be continuously listening, which has strong implications on power memory consumption. In this paper, we devise neural network architecture for keyword present set of techniques reducing the requirements in order make suitable resource constrained hardware. Specifically, fixed-point implementation considered; aggressively scaling down precision weights lowers compared naive floating-point implementation. For further optimization, node pruning technique proposed identify remove least active nodes network. Experiments are conducted over 10 keywords selected from Resource Management (RM) database. trade-off between performance assessed different weight representations. We show that with few 5 bits per yields marginal acceptable loss performance, while requiring only 200 kilobytes (KB) on-board latency 150 ms. A hardware using single multiplier consumption less than 10mW also presented.

参考文章(20)
Thomas Colthurst, Richard M. Schwartz, David R. H. Miller, Chia-Lin Kao, Herbert Gish, Michael Kleber, Owen Kimball, Stephen A. Lowe, Rapid and accurate spoken term detection. conference of the international speech communication association. pp. 314- 317 ,(2007)
Petr Motlicek, Georg Stemmer, Ondrej Glembek, Karel Vesely, Lukas Burget, Gilles Boulianne, Yanmin Qian, Mirko Hannemann, Nagendra Goel, Petr Schwarz, Arnab Ghoshal, Jan Silovsky, Daniel Povey, The Kaldi Speech Recognition Toolkit ieee automatic speech recognition and understanding workshop. ,(2011)
Marius Calin Silaghi, Spotting subsequences matching a HMM using the average observation probability criteria with application to keyword spotting national conference on artificial intelligence. pp. 1118- 1123 ,(2005)
P. Price, W.M. Fisher, J. Bernstein, D.S. Pallett, The DARPA 1000-word resource management database for continuous speech recognition international conference on acoustics speech and signal processing. pp. 651- 654 ,(1988) , 10.1109/ICASSP.1988.196669
J.G. Wilpon, L.G. Miller, P. Modi, Improvements and applications for key word recognition using hidden Markov modeling techniques international conference on acoustics, speech, and signal processing. pp. 309- 312 ,(1991) , 10.1109/ICASSP.1991.150338
Jonathan Mamou, Bhuvana Ramabhadran, Olivier Siohan, Vocabulary independent spoken term detection international acm sigir conference on research and development in information retrieval. pp. 615- 622 ,(2007) , 10.1145/1277741.1277847
Guoguo Chen, Carolina Parada, Georg Heigold, Small-footprint keyword spotting using deep neural networks international conference on acoustics, speech, and signal processing. pp. 4087- 4091 ,(2014) , 10.1109/ICASSP.2014.6854370
M.D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q.V. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, G.E. Hinton, On rectified linear units for speech processing international conference on acoustics, speech, and signal processing. pp. 3517- 3521 ,(2013) , 10.1109/ICASSP.2013.6638312
J.R. Rohlicek, W. Russell, S. Roukos, H. Gish, Continuous hidden Markov modeling for speaker-independent word spotting international conference on acoustics, speech, and signal processing. pp. 627- 630 ,(1989) , 10.1109/ICASSP.1989.266505
Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, Hoi-Jun Yoo, None, 4.6 A1.93TOPS/W scalable deep learning/inference processor with tetra-parallel MIMD architecture for big-data applications international solid-state circuits conference. pp. 1- 3 ,(2015) , 10.1109/ISSCC.2015.7062935