Recurrent neural networks for voice activity detection

作者: Thad Hughes , Keir Mierle

DOI: 10.1109/ICASSP.2013.6639096

关键词:

摘要: We present a novel recurrent neural network (RNN) model for voice activity detection. Our multi-layer RNN model, in which nodes compute quadratic polynomials, outperforms much larger baseline system composed of Gaussian mixture models (GMMs) and hand-tuned state machine (SM) temporal smoothing. All parameters our are optimized together, so that it properly weights its preference continuity against the acoustic features each frame. uses one tenth GMM+SM by 26% reduction false alarms, reducing overall speech recognition computation time 17% while word error rate 1% relative.

参考文章(15)
Renato de Mori, Roberto Gemello, Franco Mana, Non-linear estimation of voice activity to improve automatic recognition of noisy speech. conference of the international speech communication association. pp. 2617- 2620 ,(2005)
Ilya Sutskever, Geoffrey E. Hinton, James Martens, Generating Text with Recurrent Neural Networks international conference on machine learning. pp. 1017- 1024 ,(2011)
Ilya Sutskever, James Martens, Learning Recurrent Neural Networks with Hessian-Free Optimization international conference on machine learning. pp. 1033- 1040 ,(2011)
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams, Learning representations by back-propagating errors Nature. ,vol. 323, pp. 696- 699 ,(1988) , 10.1038/323533A0
Oliver Obst, Martin Riedmiller, Taming the reservoir: Feedforward training for recurrent neural networks international joint conference on neural network. pp. 1- 7 ,(2012) , 10.1109/IJCNN.2012.6252506
Leonard E. Baum, Ted Petrie, George Soules, Norman Weiss, A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains Annals of Mathematical Statistics. ,vol. 41, pp. 164- 171 ,(1970) , 10.1214/AOMS/1177697196
Gin-Der Wu, Chin-Teng Lin, A recurrent neural fuzzy network for word boundary detection in variable noise-level environments systems man and cybernetics. ,vol. 31, pp. 84- 97 ,(2001) , 10.1109/3477.907566
Oriol Vinyals, Suman V. Ravuri, Daniel Povey, Revisiting Recurrent Neural Networks for robust ASR 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 4085- 4088 ,(2012) , 10.1109/ICASSP.2012.6288816
Jongseo Sohn, Nam Soo Kim, Wonyong Sung, A statistical model-based voice activity detection IEEE Signal Processing Letters. ,vol. 6, pp. 1- 3 ,(1999) , 10.1109/97.736233