作者:
DOI: 10.1109/ASPAA.2011.6082328
关键词:
摘要: The medium of music has evolved specifically for the expression emotions, and it is natural us to organize in terms its emotional associations. But while such organization a process humans, quantifying empirically proves be very difficult task, as no dominant feature representation emotion recognition yet emerged. Much difficulty developing emotion-based features ambiguity ground-truth. Even using smallest time window, opinions on are bound vary reflect some disagreement between listeners. In previous work, we have modeled human response labels arousal-valence (A-V) affect time-varying, stochastic distribution. Current methods automatic detection seek performance increases by combining several domains (e.g. loudness, timbre, harmony, rhythm). Such work focused largely dimensionality reduction minor classification gains, but provided little insight into relationship audio this new employ regression-based deep belief networks learn directly from magnitude spectra. While system applied specific problem recognition, could easily any learning problem.