Don't follow me: Spam detection in Twitter

作者: Alex Hai Wang

DOI:

关键词:

摘要: The rapidly growing social network Twitter has been infiltrated by large amount of spam. In this paper, a spam detection prototype system is proposed to identify suspicious users on Twitter. A directed graph model explore the “follower” and “friend” relationships among Based Twitter's policy, novel content-based features graph-based are also facilitate detection. Web crawler developed relying API methods provided Around 25K users, 500K tweets, 49M follower/friend in total collected from public available data Bayesian classification algorithm applied distinguish behaviors normal ones. I analyze set evaluate performance system. Classic evaluation metrics used compare various traditional methods. Experiment results show that classifier best overall term F-measure. trained entire set. result shows can achieve 89% precision.

参考文章(12)
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)
Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8
Wouter de Nooy, Andrej Mrvar, Vladimir Batagelj, Exploratory Social Network Analysis with Pajek ,(2005)
Balachander Krishnamurthy, Phillipa Gill, Martin Arlitt, A few chirps about twitter Proceedings of the first workshop on Online social networks - WOSP '08. pp. 19- 24 ,(2008) , 10.1145/1397735.1397741
Fabrício Benevenuto, Tiago Rodrigues, Virgílio Almeida, Jussara Almeida, Marcos Gonçalves, Detecting spammers and content promoters in online video social networks international acm sigir conference on research and development in information retrieval. pp. 620- 627 ,(2009) , 10.1145/1571941.1572047
Hector Garcia-Molina, Zoltan Gyongyi, Pavel Berkhin, Jan Pedersen, Link spam detection based on mass estimation very large data bases. pp. 439- 450 ,(2006) , 10.5555/1182635.1164166
Dengyong Zhou, Christopher JC Burges, Tao Tao, None, Transductive link spam detection Proceedings of the 3rd international workshop on Adversarial information retrieval on the web - AIRWeb '07. pp. 21- 28 ,(2007) , 10.1145/1244408.1244413
Carlos Castillo, Debora Donato, Aristides Gionis, Vanessa Murdock, Fabrizio Silvestri, Know your neighbors: web spam detection using the web topology international acm sigir conference on research and development in information retrieval. pp. 423- 430 ,(2007) , 10.1145/1277741.1277814