Nonparametric Bayesian dereverberation of power spectrograms based on infinite-order autoregressive processes

作者: Akira Maezawa , Katsutoshi Itoyama , Kazuyoshi Yoshii , Hiroshi G. Okuno

DOI: 10.1109/TASLP.2014.2355772

关键词:

摘要: This paper describes a monaural audio dereverberation method that operates in the power spectrogram domain. The is robust to different kinds of source signals such as speech or music. Moreover, it requires little manual intervention, including complexity room acoustics. based on non-conjugate Bayesian model spectrogram. It extends idea multi-channel linear prediction domain, and formulates reverberation non-negative, infinite-order autoregressive process. To this end, interpreted histogram count data, which allows nonparametric be used prior for process, allowing effective number active components grow, without bound, with data. In order determine marginal posterior distribution, convergent algorithm, inspired by variational Bayes method, formulated. employs minorization-maximization technique arrive at an iterative, algorithm approximates distribution. Both objective subjective evaluations show advantage over other methods spectrum. We also apply music information retrieval task demonstrate its effectiveness.

参考文章(43)
Kazuo Hiyane, Futoshi Asano, Satoshi Nakamura, Takeshi Yamada, Takanobu Nishiura, Acoustical Sound Database in Real Environments for Sound Scene Understanding and Hands-Free Speech Recognition language resources and evaluation. pp. 965- 968 ,(2000)
Nancy L. Dahlgren, Jonathan G. Fiscus, L F. Lamel, D S. Pallett, John S. Garofolo, W M. Fisher, Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST NIST Interagency/Internal Report (NISTIR) - 4930. ,(1993)
Emily B Fox, Erik B Sudderth, Michael I Jordan, Alan S Willsky, None, A Sticky HDP-HMM With Application to Speaker Diarization The Annals of Applied Statistics. ,vol. 5, pp. 1020- 1056 ,(2011) , 10.1214/10-AOAS395
Takuya Yoshioka, Speech Enhancement in Reverberant Environments 京都大学 (Kyoto University). ,(2010)
Gerhard Widmer, Andreas Arzt, Simon Dixon, Automatic Page Turning for Musicians via Real-Time Machine Listening european conference on artificial intelligence. pp. 241- 245 ,(2008) , 10.3233/978-1-58603-891-5-241
Eap Emanuël Habets, Single- and multi-microphone speech dereverberation using spectral enhancement Technische Universiteit Eindhoven. ,(2007) , 10.6100/IR627677
Masataka Goto, Ryuichi Oka, Hiroki Hashiguchi, Takuichi Nishimura, RWC Music Database: Popular, Classical, and Jazz Music Databases international symposium/conference on music information retrieval. ,(2002)
Alexandros Tsilfidis, John Mourjopoulos, None, Blind Single-Channel Dereverberation for Music Post-Processing Journal of The Audio Engineering Society. ,(2011)
John S Garofolo, Lori F Lamel, William M Fisher, Jonathan G Fiscus, David S Pallett, Nancy L Dahlgren, DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 NASA STI/Recon Technical Report N. ,vol. 93, pp. 27403- ,(1993) , 10.6028/NIST.IR.4930
W. J. Ewens, Population Genetics Theory - The Past and the Future Mathematical and Statistical Developments of Evolutionary Theory. pp. 177- 227 ,(1990) , 10.1007/978-94-009-0513-9_4