Automatic adaptive speech separation using beamformer-output-ratio for voice activity classification

作者: Thuy Ngoc Tran , William Cowley , André Pollok

DOI: 10.1016/J.SIGPRO.2015.01.015

关键词:

摘要: This paper focuses on the practical challenge of adaptation control for speech separation systems. Adaptive beamforming methods, such as minimum variance distortionless response (MDVR), can effectively extract desired signal from interference and noise. However, to avoid cancellation problem, beamformer is halted when speaker active. An automated scheme this requires classifying speakers' voice activity status, which remains a multi-speaker environments. In paper, we propose novel approach identify activities two speakers based new metric, called beamformer-output-ratio (BOR). Statistical properties BOR are studied used develop hypothesis-based method classification. The further refined using an algorithm detecting incorrect by analysing changes in output power blind adapting MVDR beamformer. Based construct automatic adaptive system simultaneously separate speakers. module uses beamformers whose guided Our methods lead to, some cases, 20% reduction classification error, 8dB improvement SINR. results verified both synthesised signals realistic recordings. HighlightsWe design speakers.The quantity its roles active identification introduced.The BOR-VAC developed, generic form realisation.We model behaviour detect adaptation.The proposed systems tested real

参考文章(36)
Ulrik Kjems, Michael Syskind Pedersen, Lucas C. Parra, Jan Larsen, A SURVEY OF CONVOLUTIVE BLIND SOURCE SEPARATION METHODS ,(2007)
John McDonough, Matthias Woelfel, Distant Speech Recognition ,(2009)
DeLiang Wang, On Ideal Binary Mask As the Computational Goal of Auditory Scene Analysis Speech Separation by Humans and Machines. pp. 181- 197 ,(2005) , 10.1007/0-387-22794-6_12
Wei Zhang, S. Gazor, Statistical modelling of speech signals international conference on signal processing. ,vol. 1, pp. 480- 483 ,(2002) , 10.1109/ICOSP.2002.1181096
Javier Ramirez, Juan Manuel Górriz, José Carlos Segura, Voice Activity Detection. Fundamentals and Speech Recognition System Robustness InTech. ,(2007) , 10.5772/4740
Ivan Himawan, Iain McCowan, Mike Lincoln, Microphone Array Beamforming Approach to Blind Speech Separation Machine Learning for Multimodal Interaction. pp. 295- 305 ,(2007) , 10.1007/978-3-540-78155-4_26
Alan V. Oppenheim, Ronald W. Schafer, Discrete-Time Signal Processing ,(1989)
Sailes K. Sengijpta, Fundamentals of Statistical Signal Processing: Estimation Theory Technometrics. ,vol. 37, pp. 465- 466 ,(1995) , 10.1080/00401706.1995.10484391