Automatic state discovery for unstructured audio scene classification

作者: Julian Ramos , Sajid Siddiqi , Artur Dubrawski , Geoffrey Gordon , Abhishek Sharma

DOI: 10.1109/ICASSP.2010.5495605

关键词: Audio signal processingMachine learningViterbi algorithmHidden Markov modelOverfittingPattern recognitionArtificial intelligenceRobustness (computer science)Expectation–maximization algorithmComputer science

摘要: In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, robustness. Our is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextual Splitting (STACS) discovers the appropriate number of states efficiently learns accurate Hidden Markov Model (HMM) parameters given data. STACS-based algorithms train HMMs up to five times faster than Baum-Welch, avoid overfitting problem commonly encountered in large state-space using Expectation Maximization (EM) methods such as achieve superior results very diverse dataset with minimal pre-processing. Furthermore, has proven be effective building real-world applications been integrated into commercial surveillance system an event detection component.

参考文章(13)
Geoffrey J. Gordon, Andrew W. Moore, Sajid M. Siddiqi, Fast State Discovery for HMM Model Selection and Learning international conference on artificial intelligence and statistics. pp. 492- 499 ,(2007)
Andreas Stolcke, Stephen M. Omohundro, Best-first Model Merging for Hidden Markov Model Induction. arXiv: Computation and Language. ,(1994)
Hiroshi G. Okuno, Tetsuya Ogata, Kazunori Komatani, Computational Auditory Scene Analysis and Its Application to Robot Audition: Five Years Experience Second International Conference on Informatics Research for Development of Knowledge Society Infrastructure (ICKS'07). pp. 69- 76 ,(2007) , 10.1109/ICKS.2007.7
M. Ostendorf, H. Singer, HMM topology design using maximum likelihood successive state splitting Computer Speech & Language. ,vol. 11, pp. 17- 41 ,(1997) , 10.1006/CSLA.1996.0021
David Rybach, Christian Gollan, Ralf Schluter, Hermann Ney, Audio segmentation for speech recognition using segment features international conference on acoustics, speech, and signal processing. pp. 4197- 4200 ,(2009) , 10.1109/ICASSP.2009.4960554
Yuxiang Liu, Qiaoliang Xiang, Ye Wang, Lianhong Cai, Cultural style based music classification of audio signals international conference on acoustics, speech, and signal processing. pp. 57- 60 ,(2009) , 10.1109/ICASSP.2009.4959519
Isabel Trancoso, Joao Neto, Alberto Abad, Antonio Serralheiro, Jose Portelo, Miguel Bugalho, Non-speech audio event detection international conference on acoustics, speech, and signal processing. pp. 1973- 1976 ,(2009) , 10.1109/ICASSP.2009.4959998
L.R. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition Proceedings of the IEEE. ,vol. 77, pp. 267- 296 ,(1989) , 10.1109/5.18626
K. Nakadai, T. Ogata, K. Komatani, H.G. Okuno, Computational auditory scene analysis and its application to robot audition International Conference on Informatics Research for Development of Knowledge Society Infrastructure, 2004. ICKS 2004.. pp. 73- 80 ,(2004) , 10.1109/ICKS.2004.7