作者: Thad Hughes , Keir Mierle
DOI: 10.1109/ICASSP.2013.6639096
关键词:
摘要: We present a novel recurrent neural network (RNN) model for voice activity detection. Our multi-layer RNN model, in which nodes compute quadratic polynomials, outperforms much larger baseline system composed of Gaussian mixture models (GMMs) and hand-tuned state machine (SM) temporal smoothing. All parameters our are optimized together, so that it properly weights its preference continuity against the acoustic features each frame. uses one tenth GMM+SM by 26% reduction false alarms, reducing overall speech recognition computation time 17% while word error rate 1% relative.