Automatic detection of speaker state: Lexical, prosodic, and phonetic approaches to level-of-interest and intoxication classification

作者: William Yang Wang , Fadi Biadsy , Andrew Rosenberg , Julia Hirschberg

DOI: 10.1016/J.CSL.2012.03.004

关键词:

摘要: Traditional studies of speaker state focus primarily upon one-stage classification techniques using standard acoustic features. In this article, we investigate multiple novel features and approaches to two recent tasks in detection: level-of-interest (LOI) detection intoxication detection. the task LOI prediction, propose a Discriminative TFIDF feature capture important lexical information Prosodic Event approach AuToBI; combine these with for new multilevel multistream prediction feedback similarity-based hierarchical fusion learning approach. Our experimental results outperform published all systems 2010 Interspeech Paralinguistic Challenge - Affect Subchallenge. task, evaluate performance Event-based, phone duration-based, phonotactic, phonetic-spectral based approaches, finding that combination phonotactic achieve significant improvement over 2011 Speaker State Intoxication Subchallenge baseline. We discuss our their implications future research.

参考文章(57)
Andrew Rosenberg, AutoBI - a tool for automatic toBI annotation. conference of the international speech communication association. pp. 146- 149 ,(2010)
Laurence Devillers, Björn W. Schuller, Stefan Steidl, Felix Burkhardt, Shrikanth S. Narayanan, Anton Batliner, Christian A. Müller, The INTERSPEECH 2010 Paralinguistic Challenge conference of the international speech communication association. pp. 2794- 2797 ,(2010)
John F. Pitrelli, Janet B. Pierrehumbert, Julia Hirschberg, Colin W. Wightman, Mary E. Beckman, Mari Ostendorf, Patti Price, Kim E. A. Silverman, TOBI: a standard for labeling English prosody. conference of the international speech communication association. ,(1992)
Rok Gajšek, Janez Žibert, Tadej Justin, Vitomir Štruc, Boštjan Vesnicer, France Mihelič, Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation. conference of the international speech communication association. pp. 2810- 2813 ,(2010)
Rui Xia, Je Hun Jeon, Yang Liu, Level of interest sensing in spoken dialog using multi-level fusion of acoustic and lexical evidence conference of the international speech communication association. pp. 2802- 2805 ,(2010)
Gerhard Rigoll, Björn W. Schuller, Ronald Müller, Niels Köhler, Recognition of interest in human conversational speech. conference of the international speech communication association. ,(2006)
Nitendra Rajput, Purnima Gupta, Two-Stream Emotion Recognition For Call Center Monitoring conference of the international speech communication association. pp. 2241- 2244 ,(2007)
Fadi Biadsy, Julia Bell Hirschberg, Daniel P. W. Ellis, Dialect and Accent Recognition using Phonetic-Segmentation Supervectors conference of the international speech communication association. pp. 745- 748 ,(2011) , 10.7916/D8P84MCW
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)