作者: Harith Alani , Yulan He , Miriam Fernández , Hassan Saif
DOI:
关键词:
摘要: Sentiment analysis over Twitter offers organisations and individuals a fast effective way to monitor the publics' feelings towards them their competitors. To assess performance of sentiment methods small set evaluation datasets have been released in last few years. In this paper we present an overview eight publicly available manually annotated for analysis. Based on review, show that common limitation most these datasets, when assessing at target (entity) level, is lack distinctive annotations among tweets entities contained them. For example, tweet "I love iPhone, but I hate iPad" can be with mixed label, entity iPhone within should positive label. Aiming overcome limitation, complement current STS-Gold, new dataset where targets (entities) are individually therefore may different labels. This also provides comparative study various along several dimensions including: total number tweets, vocabulary size sparsity. We investigate pair-wise correlation as well correlations classification datasets.