Efficient link-based similarity search in web networks

作者: Mingxi Zhang , Hao Hu , Zhenying He , Liping Gao , Liujie Sun

DOI: 10.1016/J.ESWA.2015.07.042

关键词: Similarity (network science)Data miningMathematicsRecommender systemPruning (decision trees)Learning to rankReduction (complexity)Nearest neighbor searchSimRankTheoretical computer scienceNetwork analysis

摘要: The pre-computation cost in the off-line stage is significantly reduced.The efficiency of query processing optimized by proposing a pruning algorithm.The accuracy loss algorithm controlled tuning threshold.The effectiveness returned result effective and acceptable. Similarity search web networks, aiming to find entities similar given entity, one core tasks network analysis. With proliferation applications, including recommendation system, SimRank has been well-known measure for evaluating entity similarity network. However, existing work computes iteratively over huge matrix, which expensive terms time space cannot efficiently support large networks. In this paper, we propose link-based method, WebSim, towards finding WebSim defines between as 2-hop SimRank. To reduce computation cost, divide process into two stages: on-line stage. stage, 1-hop similarities are computed, an designed unnecessary accumulation operations on zero similarities. developed fast through searching entries from partial sums index derived items that lower than threshold skipped space. Compared iterative computation, reduced, since maintains only matrix much smaller multi-hop. Experiments comparison with its algorithms demonstrate average 99.83% reduction 92.12% achieves 99.98% NDCG.

参考文章(43)
Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi Wu, PathSim Proceedings of the VLDB Endowment. ,vol. 4, pp. 992- 1003 ,(2011) , 10.14778/3402707.3402736
Xiang Zhao, Chuan Xiao, Xuemin Lin, Qing Liu, Wenjie Zhang, A partition-based approach to structure similarity search Proceedings of the VLDB Endowment. ,vol. 7, pp. 169- 180 ,(2013) , 10.14778/2732232.2732236
Weiren Yu, Xuemin Lin, Wenjie Zhang, Lijun Chang, Jian Pei, More is simpler Proceedings of the VLDB Endowment. ,vol. 7, pp. 13- 24 ,(2013) , 10.14778/2732219.2732221
Jeffrey Xu Yu, Hongyan Liu, Xiaoyong Du, Pei Li, Jun He, Fast Single-Pair SimRank Computation. siam international conference on data mining. pp. 571- 582 ,(2010)
Engin Mendi, Image quality assessment metrics combining structural similarity and image fidelity with visual attention Journal of Intelligent and Fuzzy Systems. ,vol. 28, pp. 1039- 1046 ,(2015) , 10.3233/IFS-141387
Mladen Nikolić, Measuring similarity of graph nodes by neighbor matching intelligent data analysis. ,vol. 16, pp. 865- 878 ,(2012) , 10.3233/IDA-2012-00556
Wenbo Tao, Minghe Yu, Guoliang Li, Efficient top-k simrank-based similarity join Proceedings of the VLDB Endowment. ,vol. 8, pp. 317- 328 ,(2014) , 10.14778/2735508.2735520
Trong Hai Duong, Ngoc Thanh Nguyen, Hai Bang Truong, Van Huan Nguyen, None, A collaborative algorithm for semantic video annotation using a consensus-based social network analysis Expert Systems With Applications. ,vol. 42, pp. 246- 258 ,(2015) , 10.1016/J.ESWA.2014.07.046
David Sánchez, Montserrat Batet, A semantic similarity method based on information content exploiting multiple ontologies Expert Systems With Applications. ,vol. 40, pp. 1393- 1399 ,(2013) , 10.1016/J.ESWA.2012.08.049
M. M. Kessler, Bibliographic coupling between scientific papers American Documentation. ,vol. 14, pp. 10- 25 ,(1963) , 10.1002/ASI.5090140103