Survey on web spam detection

作者: Nikita Spirin , Jiawei Han

DOI: 10.1145/2207243.2207252

关键词: Content farmSpamdexingComputer scienceAlgorithmInformation retrievalPruning (decision trees)SpambotLink farmAdversarial information retrievalPageRankArtificial intelligenceCluster analysisMachine learning

摘要: Search engines became a de facto place to start information acquisition on the Web. Though due web spam phenomenon, search results are not always as good desired. Moreover, evolves that makes problem of providing high quality even more challenging. Over last decade research adversarial retrieval has gained lot interest both from academia and industry. In this paper we present systematic review detection techniques with focus algorithms underlying principles. We categorize all existing into three categories based type they use: content-based methods, link-based methods non-traditional data such user behaviour, clicks, HTTP sessions. turn, perform subcategorization category five groups ideas principles used: labels propagation, link pruning reweighting, refinement, graph regularization, featurebased. also define concept numerically provide brief survey various forms. Finally, summarize observations applied for detection.

参考文章(112)
Malik Magdon-Ismail, Sibel Adali, Tina Liu, Optimal Link Bombs are Uncoordinated. adversarial information retrieval on the web. pp. 58- 69 ,(2005)
Baoning Wu, Brian D. Davison, Cloaking and Redirection: A Preliminary Study. adversarial information retrieval on the web. pp. 7- 16 ,(2005)
Ricardo A. Baeza-Yates, Carlos Castillo, Vicente López, Pagerank Increase under Different Collusion Topologies. adversarial information retrieval on the web. pp. 17- 24 ,(2005)
Pranam Kolari, Akshay Java, Tim Finin, Anupam Joshi, Tim Oates, Detecting spam blogs: a machine learning approach national conference on artificial intelligence. pp. 1351- 1356 ,(2006) , 10.13016/M27M0444D
Neil Daswani, Michael Stoppelman, The anatomy of Clickbot.A conference on workshop on hot topics in understanding botnets. pp. 11- 11 ,(2007)
Vinay Goel, Baoning Wu, Brian D. Davison, Propagating Trust and Distrust to Demote Web Spam. MTW. ,(2006)
Kumar Chellapilla, Carlos Castillo, Brian D. Davison, AIRWeb 2007 : proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web, May 8, 2007, Banff, Alberta, Canada Association for Computing Machinery. ,(2007)
Andrew Y Ng, Alice X Zheng, Michael I Jordan, None, Link analysis, eigenvectors and stability international joint conference on artificial intelligence. pp. 903- 910 ,(2001)
Jian Pei, ZhaoHui Tang, Bin Zhou, A Spamicity Approach to Web Spam Detection. siam international conference on data mining. pp. 277- 288 ,(2008)
Robert J. Plemmons, Abraham Berman, Nonnegative Matrices in the Mathematical Sciences ,(1979)