作者: Nikita Spirin , Jiawei Han
关键词: Content farm 、 Spamdexing 、 Computer science 、 Algorithm 、 Information retrieval 、 Pruning (decision trees) 、 Spambot 、 Link farm 、 Adversarial information retrieval 、 PageRank 、 Artificial intelligence 、 Cluster analysis 、 Machine learning
摘要: Search engines became a de facto place to start information acquisition on the Web. Though due web spam phenomenon, search results are not always as good desired. Moreover, evolves that makes problem of providing high quality even more challenging. Over last decade research adversarial retrieval has gained lot interest both from academia and industry. In this paper we present systematic review detection techniques with focus algorithms underlying principles. We categorize all existing into three categories based type they use: content-based methods, link-based methods non-traditional data such user behaviour, clicks, HTTP sessions. turn, perform subcategorization category five groups ideas principles used: labels propagation, link pruning reweighting, refinement, graph regularization, featurebased. also define concept numerically provide brief survey various forms. Finally, summarize observations applied for detection.