作者: Israel D. Gebru , Xavier Alameda-Pineda , Radu Horaud , Florence Forbes
DOI: 10.1109/MLSP.2014.6958874
关键词:
摘要: In this paper we address the problem of detecting and locating speakers using audiovisual data. We in framework clustering. propose a novel weighted clustering method based on finite mixture model which explores idea non-uniform weighting observations. Weighted-data techniques have already been proposed, but not generative setting as presented here. introduce weighted-data formally devise associated EM procedure. The algorithm is applied to localizing speaker over time both visual auditory observations gathered with single camera two microphones. Audiovisual fusion enforced by introducing cross-modal scheme. test robustness experiments challenging scenarios: disambiguate between an active non-active speaker, associate speech signal person.