A Segmentation Posterior Based Endpointing Algorithm

作者: YanLu Xie , Yu Shi , Frank K. Soong , BeiQian Dai

DOI: 10.1109/ICASSP.2007.367037

关键词:

摘要: A segmentation posterior probability based endpointing algorithm for robust ASR is proposed. First, each speech signal partitioned into homogeneous segments via auto-segmentation. Then probabilities of all possible endpoints are computed, on the likelihoods levels in a selected range. Endpoints with highest finally selected. The new method differs from previous auto-segmentation and clustering that former considers hypotheses several levels, while latter depends only one appropriate level. Another potential benefit proposed any or VAD results can be integrated, as hypotheses, framework. Experiments AURORA2 digit database show robustness method.

参考文章(10)
Jian-Lai Zhou, Yu Shi, Frank K. Soong, Auto-segmentation based VAD for robust ASR. conference of the international speech communication association. ,(2006)
R. Schwartz, Y.-L. Chow, The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses international conference on acoustics, speech, and signal processing. pp. 81- 84 ,(1990) , 10.1109/ICASSP.1990.115542
Javier Ramı́rez, José C Segura, Carmen Benı́tez, Ángel de la Torre, Antonio Rubio, Efficient voice activity detection algorithms using long-term speech information Speech Communication. ,vol. 42, pp. 271- 287 ,(2004) , 10.1016/J.SPECOM.2003.10.002
J. Ramirez, J.C. Segura, C. Benitez, A. de la Torre, A. Rubio, An effective subband OSF-based VAD with noise reduction for robust speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 13, pp. 1119- 1129 ,(2005) , 10.1109/TSA.2005.853212
C. Myers, L. Rabiner, Connected word recognition using a level building dynamic time warping algorithm international conference on acoustics, speech, and signal processing. ,vol. 6, pp. 951- 955 ,(1981) , 10.1109/ICASSP.1981.1171123
Hans-Günter Hirsch, David Pearce, THE AURORA EXPERIMENTAL FRAMEWORK FOR THE PERFORMANCE EVALUATION OF SPEECH RECOGNITION SYSTEMS UNDER NOISY CONDITIONS conference of the international speech communication association. ,vol. 4, pp. 29- 32 ,(2000)
S.G. Tanyer, H. Ozer, Voice activity detection in nonstationary noise IEEE Transactions on Speech and Audio Processing. ,vol. 8, pp. 478- 482 ,(2000) , 10.1109/89.848229
C. Myers, L. Rabiner, A level building dynamic time warping algorithm for connected word recognition IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 29, pp. 284- 297 ,(1981) , 10.1109/TASSP.1981.1163527
Yu Shi, F.K. Soong, Jian-lai Zhou, Auto-Segmentation Based Partitioning and Clustering Approach to Robust Endpointing international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 793- 796 ,(2006) , 10.1109/ICASSP.2006.1660140
E. Nemer, R. Goubran, S. Mahmoud, Robust voice activity detection using higher-order statistics in the LPC residual domain IEEE Transactions on Speech and Audio Processing. ,vol. 9, pp. 217- 231 ,(2001) , 10.1109/89.905996