Using rank propagation and Probabilistic counting for Link-Based Spam Detection

作者: R. Baeza-Yates , L. Becchetti , C. Castillo , S. Leonardi , D. Donato

DOI:

关键词:

摘要: This paper describes a link-based technique for automating the detection of Web spam, that is, pages using deceptive techniques obtaining an undeservedly high score in search engines. The problem spam is widespread and difficult to solve, mostly due large size makes many algorithms infeasible practice. We propose only consider link structure Web, regardless page contents. In particular, we compute statistics links vicinity every applying rank propagation probabilistic counting over graph. These statistical features are used build classifier tested collection spam. After ten-fold cross-validation, our best can detect about 80% hosts with rate false positives 2%. competitive state-of-the-art classifiers use content attributes.

参考文章(28)
Ricardo A. Baeza-Yates, Carlos Castillo, Vicente López, Pagerank Increase under Different Collusion Topologies. adversarial information retrieval on the web. pp. 17- 24 ,(2005)
Károly Csalogány, András A. Benczúr, Tamás Sarlós, Máté Uher, SpamRank -- Fully Automatic Link Spam Detection. adversarial information retrieval on the web. pp. 25- 38 ,(2005)
T. Haveliwala, Efficient Computation of PageRank Stanford. ,(1999)
J. F. Naughton, R. J. Lipton, Estimating the size of generalized transitive closures very large data bases. pp. 165- 171 ,(1989)
Andrew Tomkins, David Gibson, Ravi Kumar, Discovering large dense subgraphs in massive graphs very large data bases. pp. 721- 732 ,(2005)
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Hui Zhang, Ashish Goel, Ramesh Govindan, Kahn Mason, Benjamin Van Roy, Making Eigenvector-Based Reputation Systems Robust to Collusion workshop on algorithms and models for the web graph. pp. 92- 104 ,(2004) , 10.1007/978-3-540-30216-2_8
Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8
Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)