A scalable pattern mining approach to web graph compression with communities

作者: Gregory Buehrer , Kumar Chellapilla

DOI: 10.1145/1341531.1341547

关键词:

摘要: A link server is a system designed to support efficient implementations of graph computations on the web graph. In this work, we present compression scheme for specifically accommodate community queries and other random access algorithms servers. We use frequent pattern mining approach extract meaningful connectivity formations. Our Virtual Node Miner achieves without sacrificing by generating virtual nodes from itemsets in vertex adjacency lists. The phase guarantees scalability bounding complexity O(E log E). facilitate global mining, relaxing requirement be sorted URL, enabling discovery both inter-domain as well intra-domain patterns. As consequence, allows incremental updates. Further, it not only facilitates but can also expedite such PageRank local walks implementing them directly compressed demonstrate effectiveness proposed several publicly available large data sets. Experimental results indicate that algorithm 10- 15-fold most real word sets

参考文章(28)
Padhraic Smyth, Scott White, A Spectral Clustering Approach To Finding Communities in Graph. siam international conference on data mining. pp. 274- 285 ,(2005)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Piotr Indyk, Aristides Gionis, Rajeev Motwani, Similarity Search in High Dimensions via Hashing very large data bases. pp. 518- 529 ,(1999)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Fan R K Chung, Spectral Graph Theory ,(1996)
Hannu Toivonen, Sampling Large Databases for Association Rules very large data bases. pp. 134- 145 ,(1996)
Andrew Tomkins, David Gibson, Ravi Kumar, Discovering large dense subgraphs in massive graphs very large data bases. pp. 721- 732 ,(2005)
Edith Cohen, Size-Estimation Framework with Applications to Transitive Closure and Reachability Journal of Computer and System Sciences. ,vol. 55, pp. 441- 453 ,(1997) , 10.1006/JCSS.1997.1534
Gary William Flake, Steve Lawrence, C. Lee Giles, Efficient identification of Web communities knowledge discovery and data mining. pp. 150- 160 ,(2000) , 10.1145/347090.347121
P. Boldi, S. Vigna, The webgraph framework I Proceedings of the 13th conference on World Wide Web - WWW '04. pp. 595- 602 ,(2004) , 10.1145/988672.988752