Use of Microphone Array and Model Adaptation for Hands-Free Speech Acquisition and Recognition

作者: Jen-Tzung Chien , Jain-Ray Lai

DOI: 10.1023/B:VLSI.0000015093.07192.EB

关键词:

摘要: This paper presents a combined microphone array and model adaptation algorithm for hands-free speech recognition. Our purpose is to remove the inconvenience of using head-mounted/hand-holding in conventional recognizer. To improve quality with car noise interference, linear applied acted as robust acquisition system. A time-domain coherence measure (TDCM) reliably estimate time delay signals collected by different microphones. The estimated adopted delay-and-sum beamformer enhancement. Further, we adapt hidden Markov models get close acoustic conditions enhanced test In recognition experiments connected Chinese digits, found that TDCM can effectively delay. increase sampling rate helpful determine Incorporating scheme significantly reduces errors moderate computation overhead.

参考文章(27)
Matthias Dörbecker, Small microphone arrays with optimized directivity for speech enhancement. conference of the international speech communication association. ,(1997)
Don H. Johnson, Dan E. Dudgeon, Array Signal Processing: Concepts and Techniques ,(1993)
Satoshi Nakamura, Kiyohiro Shikano, Takeshi Yamada, Masaaki Inoue, Microphone Array Design Measures for Hands-Free Speech Recognition Transactions of the Institute of electronics, information and communication engineers. ,vol. 81, pp. 2511- 2518 ,(1997)
Piergiorgio Svaizer, Maurizio Omologo, Diego Giuliani, Marco Matassoni, Use of different microphone array configurations for hands-free speech recognition in noisy and reverberant environment. conference of the international speech communication association. ,(1997)
Djamila Mahmoudi, A microphone array for speech Enhancement using multiresolution wavelet transform conference of the international speech communication association. pp. 339- 342 ,(1997)
T. Yamada, S. Nakamura, K. Shikano, Robust speech recognition with speaker localization by a microphone array international conference on spoken language processing. ,vol. 3, pp. 1317- 1320 ,(1996) , 10.1109/ICSLP.1996.607855
T. Nishiura, T. Yamada, S. Nakamura, K. Shikano, Localization of multiple sound sources based on a CSP analysis with a microphone array international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1053- 1056 ,(2000) , 10.1109/ICASSP.2000.859144
Harvey F. Silverman, Stuart E. Kirtman, A two-stage algorithm for determining talker location from linear microphone array data Computer Speech & Language. ,vol. 6, pp. 129- 152 ,(1992) , 10.1016/0885-2308(92)90023-W
Yoshifumi Nagata, Masato Abe, Two-channel adaptive microphone array with target tracking Electronics and Communications in Japan Part Iii-fundamental Electronic Science. ,vol. 83, pp. 19- 24 ,(2000) , 10.1002/1520-6440(200012)83:12<19::AID-ECJC3>3.0.CO;2-X
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X