Polyphonic sound event detection using multi label deep neural networks

作者: Emre Cakir , Toni Heittola , Heikki Huttunen , Tuomas Virtanen

DOI: 10.1109/IJCNN.2015.7280624

关键词:

摘要: In this paper, the use of multi label neural networks are proposed for detection temporally overlapping sound events in realistic environments. Real-life recordings typically have many events, making it hard to recognize each event with standard methods. Frame-wise spectral-domain features used as inputs train a deep network classification work. The model is evaluated from everyday environments and obtained overall accuracy 63.8%. method compared against state-of-the-art using non-negative matrix factorization pre-processing stage hidden Markov models classifier. improves by 19% percentage points overall.

参考文章(17)
Toni Heittola, Antti Eronen, Annamaria Mesaros, Tuomas Virtanen, Acoustic event detection in real life recordings european signal processing conference. pp. 1267- 1271 ,(2010)
Aki Harma, Martin F McKinney, Janto Skowronek, Automatic surveillance of the acoustic activity in our living environment international conference on multimedia and expo. pp. 634- 637 ,(2005) , 10.1109/ICME.2005.1521503
S. Kullback, R. A. Leibler, On Information and Sufficiency Annals of Mathematical Statistics. ,vol. 22, pp. 79- 86 ,(1951) , 10.1214/AOMS/1177729694
Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis, Deep learning for monaural speech separation international conference on acoustics, speech, and signal processing. pp. 1562- 1566 ,(2014) , 10.1109/ICASSP.2014.6853860
Toni Heittola, Annamaria Mesaros, Tuomas Virtanen, Moncef Gabbouj, Supervised model training for overlapping sound events based on unsupervised source separation international conference on acoustics, speech, and signal processing. pp. 8677- 8681 ,(2013) , 10.1109/ICASSP.2013.6639360
Pawel Swietojanski, Jinyu Li, Jui-Ting Huang, INVESTIGATION OF MAXOUT NETWORKS FOR SPEECH RECOGNITION international conference on acoustics, speech, and signal processing. pp. 7649- 7653 ,(2014) , 10.1109/ICASSP.2014.6855088
George E. Dahl, Tara N. Sainath, Geoffrey E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout international conference on acoustics, speech, and signal processing. pp. 8609- 8613 ,(2013) , 10.1109/ICASSP.2013.6639346
Onur Dikmen, Annamaria Mesaros, Sound event detection using non-negative dictionaries learned from annotated overlapping events workshop on applications of signal processing to audio and acoustics. pp. 1- 4 ,(2013) , 10.1109/WASPAA.2013.6701861
J. Dennis, H.D. Tran, E.S. Chng, Overlapping sound event recognition using local spectrogram features and the generalised hough transform Pattern Recognition Letters. ,vol. 34, pp. 1085- 1093 ,(2013) , 10.1016/J.PATREC.2013.02.015