Influence of Weak Labels for Emotion Recognition of Tweets

作者: Olivier Janssens , Steven Verstockt , Erik Mannens , Sofie Van Hoecke , Rik Van de Walle

DOI: 10.1007/978-3-319-13817-6_12

关键词: Statistical classificationData miningEmotion recognitionCrowdsourcingFeature engineeringAlgorithm designAnnotationNatural language processingArtificial intelligenceComputer scienceSet (abstract data type)

摘要: Research on emotion recognition of tweets focuses feature engineering or algorithm design, while dataset labels are barely questioned. Datasets often labelled manually via crowdsourcing, which results in strong labels. These methods time intensive and can be expensive. Alternatively, tweet hashtags used as free, inexpensive weak This paper investigates the impact using compared to The study uses two label sets for a corpus tweets. weakly annotated set is created employing tweets, by use crowdsourcing. Both separately input five classification algorithms determine performance indicate only 9.25% decrease f1-score when does not outweigh benefits having free

参考文章(20)
Jared Suttles, Nancy Ide, Distant supervision for emotion classification with discrete binary values international conference on computational linguistics. pp. 121- 136 ,(2013) , 10.1007/978-3-642-37256-8_11
Julie Beth Lovins, Development of a Stemming Algorithm Mech. Transl. Comput. Linguistics. ,vol. 11, pp. 22- 31 ,(1968)
Soumaya Chaffar, Diana Inkpen, Using a heterogeneous dataset for emotion analysis in text canadian conference on artificial intelligence. pp. 62- 67 ,(2011) , 10.1007/978-3-642-21043-3_8
Alberto Pepe, Johan Bollen, Huina Mao, Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena international conference on weblogs and social media. ,(2011)
Wenbo Wang, Lu Chen, Krishnaprasad Thirunarayan, Amit P. Sheth, Harnessing Twitter "Big Data" for Automatic Emotion Identification privacy security risk and trust. pp. 587- 592 ,(2012) , 10.1109/SOCIALCOM-PASSAT.2012.119
Peter Willett, The Porter stemming algorithm: then and now Program: Electronic Library and Information Systems. ,vol. 40, pp. 219- 223 ,(2006) , 10.1108/00330330610681295
Olivier Janssens, Maarten Slembrouck, Steven Verstockt, Sofie Van Hoecke, Rik Van de Walle, Real-time emotion classification of Tweets advances in social networks analysis and mining. pp. 1430- 1431 ,(2013) , 10.1145/2492517.2492577
Shachar Kaufman, Saharon Rosset, Claudia Perlich, Leakage in data mining Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. pp. 556- 563 ,(2011) , 10.1145/2020408.2020496
Aobo Wang, Cong Duy Vu Hoang, Min-Yen Kan, Perspectives on crowdsourcing annotations for natural language processing language resources and evaluation. ,vol. 47, pp. 9- 31 ,(2013) , 10.1007/S10579-012-9176-1