作者: Ricardo A. Baeza-Yates , Luca Becchetti , Carlos Castillo , Stefano Leonardi , Debora Donato
DOI:
关键词:
摘要: We perform a statistical analysis of large collection Web pages, focusing on spam detection. study several metrics such as degree correlations, number neighbors, rank propagation through links, TrustRank and others to build automatic web classiers. This paper presents the performance each these classiers alone, well their combined performance. Using this approach we are able detect 80.4% in our sample, with only 1.1% false positives.