作者: R. Baeza-Yates , L. Becchetti , C. Castillo , S. Leonardi , D. Donato
DOI:
关键词:
摘要: This paper describes a link-based technique for automating the detection of Web spam, that is, pages using deceptive techniques obtaining an undeservedly high score in search engines. The problem spam is widespread and difficult to solve, mostly due large size makes many algorithms infeasible practice. We propose only consider link structure Web, regardless page contents. In particular, we compute statistics links vicinity every applying rank propagation probabilistic counting over graph. These statistical features are used build classifier tested collection spam. After ten-fold cross-validation, our best can detect about 80% hosts with rate false positives 2%. competitive state-of-the-art classifiers use content attributes.