How Many Users Are Enough? Exploring Semi-Supervision and Stylometric Features to Uncover a Russian Troll Farm

作者: Nayeema Nasrin , Kim-Kwang Raymond Choo , Myung Ko , Anthony Rios

DOI: 10.18653/V1/D19-5003

关键词:

摘要: Social media has reportedly been (ab)used by Russian troll farms to promote political agendas. Specifically, state-affiliated actors disguise themselves as native citizens of the United States discord and their motives. Therefore, developing methods automatically detect trolls can ensure fair elections possibly reduce extremism stopping that produce discord. While data exists for some organizations (e.g., Internet Research Agency), it is challenging collect ground-truth accounts new in a timely fashion. In this paper, we study impact number labeled on detection performance. We analyze use self-supervision with less than 100 training data. improve classification performance nearly 4% F1. Furthermore, combination self-supervision, also explore novel features grounded stylometry. Intuitively, assume writing style consistent across because single organization employee may control multiple user accounts. Overall, models based words ~9%

参考文章(30)
Yoon Kim, Convolutional Neural Networks for Sentence Classification empirical methods in natural language processing. pp. 1746- 1751 ,(2014) , 10.3115/V1/D14-1181
Meng Wang, Xian-Sheng Hua, Tao Mei, Richang Hong, Guojun Qi, Yan Song, Li-Rong Dai, Semi-supervised kernel density estimation for video annotation Computer Vision and Image Understanding. ,vol. 113, pp. 384- 396 ,(2009) , 10.1016/J.CVIU.2008.08.003
P. T. Metaxas, E. Mustafaraj, Social Media and the Elections Science. ,vol. 338, pp. 472- 473 ,(2012) , 10.1126/SCIENCE.1230456
Mark Steedman, Rebecca Hwa, Stephen Clark, Miles Osborne, Anoop Sarkar, Julia Hockenmaier, Paul Ruhlen, Steven Baker, Jeremiah Crim, Example selection for bootstrapping statistical parsers Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 157- 164 ,(2003) , 10.3115/1073445.1073476
Stefan Stieglitz, Linh Dang-Xuan, Emotions and Information Diffusion in Social Media—Sentiment of Microblogs and Sharing Behavior Journal of Management Information Systems. ,vol. 29, pp. 217- 248 ,(2013) , 10.2753/MIS0742-1222290408
Ahmed Abbasi, Hsinchun Chen, Writeprints ACM Transactions on Information Systems. ,vol. 26, pp. 1- 29 ,(2008) , 10.1145/1344411.1344413
Takeshi Sakaki, Makoto Okazaki, Yutaka Matsuo, Earthquake shakes Twitter users: real-time event detection by social sensors the web conference. pp. 851- 860 ,(2010) , 10.1145/1772690.1772777
Efstathios Stamatatos, A survey of modern authorship attribution methods Journal of the Association for Information Science and Technology. ,vol. 60, pp. 538- 556 ,(2009) , 10.1002/ASI.V60:3
Edward Loper, Steven Bird, NLTK Proceedings of the ACL-02 Workshop on Effective tools and methodologies for teaching natural language processing and computational linguistics -. pp. 69- 72 ,(2002) , 10.3115/1118108.1118117
Ivan Habernal, Iryna Gurevych, Exploiting Debate Portals for Semi-Supervised Argumentation Mining in User-Generated Web Discourse empirical methods in natural language processing. pp. 2127- 2137 ,(2015) , 10.18653/V1/D15-1255