"polyaural" array processing for automatic speech recognition in degraded environments.

作者: Evandro B. Gouvêa , Richard M. Stern , Govindarajan Thattai

DOI:

关键词:

摘要: In this paper we present a new method of signal processing for robust speech recognition using multiple microphones. The method, loosely based on the human binaural hearing system, consists passing signals detected by microphones through bandpass filtering and nonlinear halfwave rectification operations, then cross-correlating outputs from each channel within frequency band. These operations provide rejection off-axis interfering signals. are repeated (in non-physiological fashion) negative signal, an estimate desired is obtained combining positive outputs. We demonstrate that use approach provides substantially better accuracy than delay-and-sum beamforming same sensors target in presence additive broadband maskers. Improvements reverberant environments tangible but more modest.

参考文章(12)
H. STEVEN COLBURN, NATHANIEL I. DURLACH, Chapter 11 – MODELS OF BINAURAL INTERACTION Hearing. pp. 467- 518 ,(1978) , 10.1016/B978-0-12-161904-6.50018-X
H. Steven Colburn, Abhijit Kulkarni, Models of Sound Localization Springer, New York, NY. pp. 272- 316 ,(2005) , 10.1007/0-387-28863-5_8
Richard M. Stern, Constantine Trahiotis, The Role of Consistency of Interaural Timing Over Frequency in Binaural Lateralization Auditory Physiology and Perception#R##N#Proceedings of the 9th International Symposium on Hearing Held in Carcens, France, on 9–14 June 1991. pp. 547- 554 ,(1992) , 10.1016/B978-0-08-041847-6.50067-8
Nicoleta Roman, DeLiang Wang, Guy J. Brown, Speech segregation based on sound localization The Journal of the Acoustical Society of America. ,vol. 114, pp. 2236- 2252 ,(2003) , 10.1121/1.1610463
Kalle J. Palomäki, Guy J. Brown, DeLiang Wang, A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation Speech Communication. ,vol. 43, pp. 361- 378 ,(2004) , 10.1016/J.SPECOM.2004.03.005
Jont B. Allen, David A. Berkley, Image method for efficiently simulating small‐room acoustics Journal of the Acoustical Society of America. ,vol. 65, pp. 943- 950 ,(1976) , 10.1121/1.382599
M.L. Seltzer, B. Raj, R.M. Stern, Likelihood-maximizing beamforming for robust hands-free speech recognition IEEE Transactions on Speech and Audio Processing. ,vol. 12, pp. 489- 498 ,(2004) , 10.1109/TSA.2004.832988
J. L. Flanagan, J. D. Johnston, R. Zahn, G. W. Elko, Computer-steered microphone arrays for sound transduction in large rooms Journal of the Acoustical Society of America. ,vol. 78, pp. 1508- 1518 ,(1985) , 10.1121/1.2022858
N. Roman, DeLiang Wang, Binaural tracking of multiple moving sources international conference on acoustics, speech, and signal processing. ,vol. 5, pp. 149- 152 ,(2003) , 10.1109/ICASSP.2003.1199890
M.L. Seltzer, R.M. Stern, Subband Likelihood-Maximizing Beamforming for Speech Recognition in Reverberant Environments IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 2109- 2121 ,(2006) , 10.1109/TASL.2006.872614