Sentiment Analysis in Czech Social Media Using Supervised Machine Learning

作者: Josef Steinberger , Ivan Habernal , Tomáš Ptáċek

DOI:

关键词:

摘要: This article provides an in-depth research of machine learning methods for sentiment analysis Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets various domains are widely available, case language there not yet been any systematical conducted. We tackle issue establish common ground further by providing large humanannotated media corpus. Furthermore, we evaluate state-of-the-art supervised analysis. explore different pre-processing techniques employ features classifiers. Moreover, addition to our newly created dataset, also report results on other popular domains, such as movie product reviews. believe that will only extend the current another family languages, but encourage competition which potentially leads production high-end commercial solutions.

参考文章(28)
Mohammad Sadegh Hajmohammadi, Roliana Ibrahim, Zulaiha Ali Othman, Opinion Mining and Sentiment Analysis: A Survey international conference on bioinformatics. ,vol. 2, pp. 171- 178 ,(2012) , 10.24297/IJCT.V2I3C.2717
M. Teresa Mart'in-Valdivia, L. Alfonso Ureña-López, Arturo Montejo-Ráez, Eugenio Mart'inez-Cámara, Random Walk Weighting over SentiWordNet for Sentiment Polarity Detection on Twitter meeting of the association for computational linguistics. pp. 3- 10 ,(2012)
Patrick Paroubek, Alexander Pak, Twitter as a Corpus for Sentiment Analysis and Opinion Mining language resources and evaluation. ,(2010)
Bing Liu, Lei Zhang, A Survey of Opinion Mining and Sentiment Analysis Mining Text Data. pp. 415- 463 ,(2012) , 10.1007/978-1-4614-3223-4_13
Ben Blamey, Tom Crick, Giles Oatley, R U :-) or :-( ? Character- vs. Word-Gram Feature Selection for Sentiment Classification of OSN Corpora Research and Development in Intelligent Systems XXIX. pp. 207- 212 ,(2012) , 10.1007/978-1-4471-4739-8_16
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
James Pustejovsky, Amber Stubbs, Natural Language Annotation for Machine Learning ,(2012)
Rebecca Passonneau, Ilia Vovsha, Owen Rambow, Boyi Xie, Apoorv Agarwal, Sentiment Analysis of Twitter Data Proceedings of the Workshop on Language in Social Media (LSM 2011). pp. 30- 38 ,(2011)
Jan Hajic, Jana Sindlerová, Katerina Veselovská, Creating annotated resources for polarity classification in Czech. Proceedings of KONVENS 2012. pp. 296- 304 ,(2012)
Gustavo Laboreiro, Luís Sarmento, Jorge Teixeira, Eugénio Oliveira, Tokenizing micro-blogging messages using a text classification approach Proceedings of the fourth workshop on Analytics for noisy unstructured text data - AND '10. pp. 81- 88 ,(2010) , 10.1145/1871840.1871853