Localization of Multiple Sources from a Binaural Head in a Known Noisy Environment

作者: Alban Portello , Gabriel Bustamante , Patrick Danes , Alexis Mifsud

DOI: 10.1109/IROS.2014.6943001

关键词: Simulated dataExperimental dataBinaural recordingNoise statisticsHead (linguistics)MathematicsSpeech recognitionAlgorithmAzimuthTransfer functionSound sources

摘要: This paper presents a strategy to the localization of multiple sound sources from static binaural head. The are supposed W-Disjoint Orthogonal and their number is assumed known. Their most likely azimuths computed by means Expectation-Maximization algorithm. Application method on simulated data reported, as well some evaluations its HARK implementation experimental data. Two important properties observed: scattering effects can be coped with, thanks required prior knowledge (room-independent) head interaural transfer function; environment noise statistics handled separately.

参考文章(13)
S. Argentieri, A. Portello, M. Bernard, P. Danès, B. Gas, Binaural Systems in Robotics Springer Berlin Heidelberg. pp. 225- 253 ,(2013) , 10.1007/978-3-642-37762-4_9
A.G. Jaffer, Maximum likelihood direction finding of stochastic sources: a separable solution international conference on acoustics speech and signal processing. pp. 2893- 2896 ,(1988) , 10.1109/ICASSP.1988.197258
Valentin Lunati, Jerome Manhes, Patrick Danes, A versatile System-on-a-Programmable-Chip for array processing and binaural robot audition intelligent robots and systems. pp. 998- 1003 ,(2012) , 10.1109/IROS.2012.6386144
O. Yilmaz, S. Rickard, Blind separation of speech mixtures via time-frequency masking IEEE Transactions on Signal Processing. ,vol. 52, pp. 1830- 1847 ,(2004) , 10.1109/TSP.2004.828896
Alban Portello, Patrick Danes, Sylvain Argentieri, Sylvain Pledel, HRTF-based source azimuth estimation and activity detection from a binaural sensor intelligent robots and systems. pp. 2908- 2913 ,(2013) , 10.1109/IROS.2013.6696768
Antoine Deleforge, Radu Horaud, The cocktail party robot Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction - HRI '12. pp. 431- 438 ,(2012) , 10.1145/2157689.2157834
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Kazuhiro Nakadai, Toru Takahashi, Hiroshi G. Okuno, Hirofumi Nakajima, Yuji Hasegawa, Hiroshi Tsujino, Design and Implementation of Robot Audition System 'HARK' — Open Source Software for Listening to Three Simultaneous Speakers Advanced Robotics. ,vol. 24, pp. 739- 761 ,(2010) , 10.1163/016918610X493561
Charles Blandin, Alexey Ozerov, Emmanuel Vincent, Multi-source TDOA estimation in reverberant audio using angular spectra and clustering Signal Processing. ,vol. 92, pp. 1950- 1960 ,(2012) , 10.1016/J.SIGPRO.2011.09.032
T. May, S. van de Par, A. Kohlrausch, Binaural Localization and Detection of Speakers in Complex Acoustic Scenes Modern Acoustics and Signal Processing. pp. 397- 425 ,(2013) , 10.1007/978-3-642-37762-4_15