Combating web spam with trustrank

作者: Zoltán Gyöngyi , Hector Garcia-Molina , Jan Pedersen

DOI: 10.1016/B978-012088469-8.50052-8

关键词:

摘要: Web spam pages use various techniques to achieve higher-than-deserved rankings in a search engine's results. While human experts can identify spam, it is too expensive manually evaluate large number of pages. Instead, we propose semi-automatically separate reputable, good from spam. We first select small set seed be evaluated by an expert. Once the reputable pages, link structure web discover other that are likely good. In this paper discuss possible ways implement selection and discovery present results experiments run on World Wide indexed AltaVista performance our techniques. Our show effectively filter out significant fraction web, based less than 200 sites.

参考文章(13)
T. Haveliwala, Efficient Computation of PageRank Stanford. ,(1999)
Mehran Sahami, Susan Dumais, Eric Horvitz, David Heckerman, A Bayesian Approach to Filtering Junk E-Mail national conference on artificial intelligence. ,(1998)
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)
Rajeev Motwani, John E. Hopcroft, Jeffrey D. Ullman, Rotwani, Introduction to Automata Theory, Languages, and Computation ,(1979)
Rajeev Motwani, John E. Hopcroft, Jeffrey D. Ullman, Introduction To Automata Theory, Languages And Computation, 3Rd Edition ,(2012)
Amy N Langville, Carl D Meyer, Deeper Inside PageRank Internet Mathematics. ,vol. 1, pp. 335- 380 ,(2004) , 10.1080/15427951.2004.10129091
Jon M. Kleinberg, Authoritative sources in a hyperlinked environment Journal of the ACM. ,vol. 46, pp. 604- 632 ,(1999) , 10.1145/324133.324140
Sepandar D. Kamvar, Mario T. Schlosser, Hector Garcia-Molina, The Eigentrust algorithm for reputation management in P2P networks Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 640- 651 ,(2003) , 10.1145/775152.775242
Sepandar D. Kamvar, Taher H. Haveliwala, Christopher D. Manning, Gene H. Golub, Extrapolation methods for accelerating PageRank computations Proceedings of the twelfth international conference on World Wide Web - WWW '03. pp. 261- 270 ,(2003) , 10.1145/775152.775190