Learning emotion-based acoustic features with deep belief networks

作者:

DOI: 10.1109/ASPAA.2011.6082328

关键词:

摘要: The medium of music has evolved specifically for the expression emotions, and it is natural us to organize in terms its emotional associations. But while such organization a process humans, quantifying empirically proves be very difficult task, as no dominant feature representation emotion recognition yet emerged. Much difficulty developing emotion-based features ambiguity ground-truth. Even using smallest time window, opinions on are bound vary reflect some disagreement between listeners. In previous work, we have modeled human response labels arousal-valence (A-V) affect time-varying, stochastic distribution. Current methods automatic detection seek performance increases by combining several domains (e.g. loudness, timbre, harmony, rhythm). Such work focused largely dimensionality reduction minor classification gains, but provided little insight into relationship audio this new employ regression-based deep belief networks learn directly from magnitude spectra. While system applied specific problem recognition, could easily any learning problem.

参考文章(12)
Youngmoo E. Kim, Erik M. Schmidt, Lloyd Emelle, MOODSWINGS: A COLLABORATIVE GAME FOR MUSIC MOOD LABEL COLLECTION international symposium/conference on music information retrieval. pp. 231- 236 ,(2008)
Youngmoo E. Kim, Erik M. Schmidt, PREDICTION OF TIME-VARYING MUSICAL MOOD DISTRIBUTIONS FROM AUDIO international symposium/conference on music information retrieval. pp. 465- 470 ,(2010)
Erik M. Schmidt, Douglas Turnbull, Youngmoo E. Kim, Feature selection for content-based, time-varying musical emotion regression multimedia information retrieval. pp. 267- 274 ,(2010) , 10.1145/1743384.1743431
Geoffrey E Hinton, Ruslan R Salakhutdinov, Reducing the Dimensionality of Data with Neural Networks Science. ,vol. 313, pp. 504- 507 ,(2006) , 10.1126/SCIENCE.1127647
Andrew Y. Ng, Honglak Lee, Peter Pham, Yan Largman, Unsupervised feature learning for audio classification using convolutional deep belief networks neural information processing systems. ,vol. 22, pp. 1096- 1104 ,(2009)
Yoshua Bengio, Hugo Larochelle, Pascal Lamblin, Dan Popovici, Greedy Layer-Wise Training of Deep Networks neural information processing systems. ,vol. 19, pp. 153- 160 ,(2006)
Evan C. Smith, Michael S. Lewicki, Efficient auditory coding Nature. ,vol. 439, pp. 978- 982 ,(2006) , 10.1038/NATURE04485
Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh, A fast learning algorithm for deep belief nets Neural Computation. ,vol. 18, pp. 1527- 1554 ,(2006) , 10.1162/NECO.2006.18.7.1527
James A. Russell, A CIRCUMPLEX MODEL OF AFFECT Journal of Personality and Social Psychology. ,vol. 39, pp. 1161- 1178 ,(1980) , 10.1037/H0077714
Erik M. Schmidt, Youngmoo E. Kim, Prediction of Time-Varying Musical Mood Distributions Using Kalman Filtering international conference on machine learning and applications. pp. 655- 660 ,(2010) , 10.1109/ICMLA.2010.101