A Quantitative Study of Forum Spamming Using Context-based Analysis.

作者: Yuan Niu , Yi-Min Wang , Francis Hsu , Ming Ma , Hao Chen

DOI:

关键词:

摘要: Forum spamming has become a major means of search engine spamming. To evaluate the impact forum on quality, we have conducted comprehensive study from three perspectives: that user, spammer, and hosting site. We examine spam blogs comments in both legitimate honey forums. Our shows is widespread problem. Spammed forums, powered by most popular software, show up top 20 results for all 189 keywords. On two blog sites, more than half (75% 54% respectively) are spam, even reputably well maintained site, 8.1% spam. The observation our forums confirms spammers target abandoned pages comment meant to increase page rank rather generate immediate traffic. propose contextbased analyses, consisting redirection cloaking analysis, detect automatically overcome shortcomings content-based analyses. these analyses very effective identifying pages.

参考文章(17)
Brian D. Davison, Baohua Wu, Identifying link farm pages the web conference. ,(2005)
Baoning Wu, Brian D. Davison, Cloaking and Redirection: A Preliminary Study. adversarial information retrieval on the web. pp. 7- 16 ,(2005)
Chad Verbowski, Jeffrey Wang, Yi-Min Wang, Doug Beck, Brad Daniels, Strider typo-patrol: discovery and analysis of systematic typo-squatting conference on steps to reducing unwanted traffic on internet. pp. 5- 5 ,(2006)
Pranam Kolari, Tim Finin, Anupam Joshi, SVMs for the Blogosphere: Blog Identification and Splog Detection national conference on artificial intelligence. pp. 92- 99 ,(2006)
Károly Csalogány, András A. Benczúr, Tamás Sarlós, Máté Uher, SpamRank -- Fully Automatic Link Spam Detection. adversarial information retrieval on the web. pp. 25- 38 ,(2005)
Yi-Min Wang, D. Beck, Binh Vo, R. Roussev, C. Verbowski, Detecting stealth software with Strider GhostBuster dependable systems and networks. pp. 368- 377 ,(2005) , 10.1109/DSN.2005.39
Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8
Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)
Dennis Fetterly, Mark Manasse, Marc Najork, Spam, damn spam, and statistics: using statistical analysis to locate spam web pages international workshop on the web and databases. pp. 1- 6 ,(2004) , 10.1145/1017074.1017077
Alexandros Ntoulas, Marc Najork, Mark Manasse, Dennis Fetterly, Detecting spam web pages through content analysis Proceedings of the 15th international conference on World Wide Web - WWW '06. pp. 83- 92 ,(2006) , 10.1145/1135777.1135794