Temporal coding of local spectrogram features for robust sound recognition

作者： Jonathan Dennis , Qiang Yu , Huajin Tang , Huy Dat Tran , Haizhou Li

DOI: 10.1109/ICASSP.2013.6637759

关键词:

摘要: There is much evidence to suggest that the human auditory system uses localised time-frequency information for robust recognition of sounds. Despite this, conventional systems typically rely on features extracted from short windowed frames over time, covering whole frequency spectrum. Such approaches are not inherently noise, as each frame will contain a mixture spectral noise and signal. Here, we propose novel approach based temporal coding Local Spectrogram Features (LSFs), which generate spikes used train Spiking Neural Network (SNN) with learning. LSFs represent location in spectrogram surrounding keypoints, detected signal-driven manner such effect reduced. Our experiments demonstrate performance our across variety conditions, it able outperform frame-based baseline methods.

uni-trier.de PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(26)

Sam T. Roweis, Factorial models and refiltering for speech separation and denoising. conference of the international speech communication association. ,(2003)

Esa Alhoniemi, Juha Vesanto, Juha Parhankangas, Johan Himberg, Self-organizing map in Matlab: the SOM Toolbox ,(1999)

Andrew Varga, Herman J.M. Steeneken, Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems Speech Communication. ,vol. 12, pp. 247- 251 ,(1993) , 10.1016/0167-6393(93)90095-3

Daniel A. Butts, Chong Weng, Jianzhong Jin, Chun-I Yeh, Nicholas A. Lesica, Jose-Manuel Alonso, Garrett B. Stanley, Temporal precision in the neural code and the timescales of natural vision Nature. ,vol. 449, pp. 92- 95 ,(2007) , 10.1038/NATURE06105

Xavier Valero, Francesc Alias, Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification IEEE Transactions on Multimedia. ,vol. 14, pp. 1684- 1689 ,(2012) , 10.1109/TMM.2012.2199972

Panu Somervuo, Teuvo Kohonen, Self-Organizing Maps and Learning Vector Quantization forFeature Sequences Neural Processing Letters. ,vol. 10, pp. 151- 159 ,(1999) , 10.1023/A:1018741720065

R. Christopher deCharms, Michael M. Merzenich, Primary cortical representation of sounds by the coordination of action-potential timing Nature. ,vol. 381, pp. 610- 613 ,(1996) , 10.1038/381610A0

C. Mark Wessinger, Michael H. Buonocore, Clif L. Kussmaul, George R. Mangun, Tonotopy in human auditory cortex examined with functional magnetic resonance imaging Human Brain Mapping. ,vol. 5, pp. 18- 25 ,(1997) , 10.1002/(SICI)1097-0193(1997)5:1<18::AID-HBM3>3.0.CO;2-Q

Teuvo Kohonen, The Self-Organizing Map Neurocomputing. ,vol. 21, pp. 1- 6 ,(1998) , 10.1016/S0925-2312(98)00030-7

10.

Richard Lyon, Machine Hearing: An Emerging Field [Exploratory DSP] IEEE Signal Processing Magazine. ,vol. 27, pp. 131- 139 ,(2010) , 10.1109/MSP.2010.937498

Temporal coding of local spectrogram features for robust sound recognition

来源期刊

我的账户

Temporal coding of local spectrogram features for robust sound recognition

来源期刊

相似文章 10

我的账户