Sentiment Analysis of Iraqi Arabic Dialect on Facebook Based on Distributed Representations of Documents

作者: Anwar Alnawas , Nursal Arici

DOI: 10.1145/3278605

关键词:

摘要: Nowadays, social media is used by many people to express their opinions about a variety of topics. Opinion Mining or Sentiment Analysis techniques extract from user generated contents. Over the years, multitude studies has been done English language with deficiencies research in all other languages. Unfortunately, Arabic one languages that seems lack substantial research, despite rapid growth its use on outlets. Furthermore, specific dialects should be studied, not just Modern Standard Arabic. In this paper, we experiment sentiments analysis Iraqi dialect using word embedding. First, made large corpus previous works learn representations. Second, embedding model training Doc2Vec representations based Paragraph and Distributed Memory Model Vectors (DM-PV) architecture. Lastly, represented feature for four binary classifiers (Logistic Regression, Decision Tree, Support Vector Machine Naive Bayes) detect sentiment. We also experimented different values parameters (window size, dimension negative samples). light experiments, it can concluded our approach achieves better performance Logistic Regression than classifiers.

参考文章(37)
N. A. Abdulla, N. A. Mahyoub. A. Shehab, and M. Al-Ayyoub, None, Arabic sentiment analysis: Lexicon-based and corpus-based ieee jordan conference on applied electrical engineering and computing technologies. pp. 1- 6 ,(2013) , 10.1109/AEECT.2013.6716448
Hassan Saif, Yulan He, Miriam Fernandez, Harith Alani, Contextual semantics for sentiment analysis of Twitter Information Processing and Management. ,vol. 52, pp. 5- 19 ,(2016) , 10.1016/J.IPM.2015.01.005
Ilya Sutskever, Tomas Mikolov, Quoc V. Le, Exploiting Similarities among Languages for Machine Translation arXiv: Computation and Language. ,(2013)
Tomas Mikolov, Quoc Le, Distributed Representations of Sentences and Documents international conference on machine learning. ,vol. 4, pp. 1188- 1196 ,(2014)
Fadi Biadsy, Julia Hirschberg, Nizar Habash, Spoken Arabic Dialect Identification Using Phonotactic Modeling conference of the european chapter of the association for computational linguistics. pp. 53- 61 ,(2009) , 10.3115/1621774.1621784
Tomas Mikolov, Geoffrey Zweig, Wen-tau Yih, Linguistic Regularities in Continuous Space Word Representations north american chapter of the association for computational linguistics. pp. 746- 751 ,(2013)
Mohammed Rushdi-Saleh, M. Teresa Martín-Valdivia, L. Alfonso Ureña-López, José M. Perea-Ortega, OCA: Opinion corpus for Arabic Journal of the Association for Information Science and Technology. ,vol. 62, pp. 2045- 2054 ,(2011) , 10.1002/ASI.21598
Carmen Banea, Rada Mihalcea, Janyce Wiebe, Multilingual Subjectivity: Are More Languages Better? international conference on computational linguistics. pp. 28- 36 ,(2010)
Qusai Shambour, Osama Al-Haj Hassan, Aymen Abu-Errub, Ashraf Odeh, Arabic Roots Extraction Using Morphological Analysis ,(2014)
Matic Perovšek, Janez Kranjc, Tomaž Erjavec, Bojan Cestnik, Nada Lavrač, TextFlows: A visual programming platform for text mining and natural language processing Science of Computer Programming. ,vol. 121, pp. 128- 152 ,(2016) , 10.1016/J.SCICO.2016.01.001