作者: Zoltán Gyöngyi , Hector Garcia-Molina , Jan Pedersen
DOI: 10.1016/B978-012088469-8.50052-8
关键词:
摘要: Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive manually evaluate large number of pages. Instead, we propose semi-automatically separate reputable, good from spam. We first select small set seed be evaluated by an expert. Once the reputable pages, link structure web discover other that are likely good. In this paper discuss possible ways implement selection and discovery present results experiments run on World Wide indexed AltaVista performance our techniques. Our show effectively filter out significant fraction web, based less than 200 sites.