On characterizing and computing the diversity of hyperlinks for anti-spamming page ranking

作者: Bo Yang , Hechang Chen , Xuehua Zhao , Masato Naka , Jing Huang

DOI: 10.1016/J.KNOSYS.2014.12.028

关键词:

摘要: With the advent of big data era, efficiently and effectively querying useful information on Web, largest heterogeneous source in world, is becoming increasingly challenging. Page ranking an essential component search engines because it determines presentation sequence tens millions returned pages associated with a single query. It therefore plays significant role regulating quality user experience for retrieval. When measuring authority web page, most methods focus quantity neighborhood that direct to using inbound hyperlinks. However, these ignore diversity such pages, which we believe important metric objectively evaluating page authority. In comparison true usually contain large number hyperlinks from wide variety sources, difficult fake authorities, boost their rank techniques as link farms, occupy high due prohibitively costs. We propose probabilistic counting-based method quantitatively compute then novel link-based algorithm, named Drank, by simultaneously analyzing quantity, The validations both synthetic real-world show Drank outperforms other state-of-the-art terms finding high-quality suppressing spams.

参考文章(31)
Vinay Goel, Baoning Wu, Brian D. Davison, Propagating Trust and Distrust to Demote Web Spam. MTW. ,(2006)
Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8
Monika R. Henzinger, Rajeev Motwani, Craig Silverstein, Challenges in web search engines international acm sigir conference on research and development in information retrieval. ,vol. 36, pp. 11- 22 ,(2002) , 10.1145/792550.792553
Gerard Salton, Christopher Buckley, Term Weighting Approaches in Automatic Text Retrieval Information Processing and Management. ,vol. 24, pp. 323- 328 ,(1988) , 10.1016/0306-4573(88)90021-0
Luca Becchetti, Carlos Castillo, Debora Donato, Ricardo Baeza-YATES, Stefano Leonardi, Link analysis for Web spam detection ACM Transactions on the Web. ,vol. 2, pp. 1- 42 ,(2008) , 10.1145/1326561.1326563
R. Lambiotte, M. Rosvall, Ranking and clustering of nodes in networks with smart teleportation Physical Review E. ,vol. 85, pp. 056107- 056107 ,(2012) , 10.1103/PHYSREVE.85.056107
Rohit Kaul, Yeogirl Yun, Seong-Gon Kim, Ranking billions of web pages using diodes Communications of the ACM. ,vol. 52, pp. 132- 136 ,(2009) , 10.1145/1536616.1536649
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, Jon Kleinberg, Automatic resource compilation by analyzing hyperlink structure and associated text the web conference. ,vol. 30, pp. 65- 74 ,(1998) , 10.1016/S0169-7552(98)00087-7
Kyu-Young Whang, Brad T. Vander-Zanden, Howard M. Taylor, A linear-time probabilistic counting algorithm for database applications ACM Transactions on Database Systems. ,vol. 15, pp. 208- 229 ,(1990) , 10.1145/78922.78925