Scalable mining of large disk-based graph databases

作者: Chen Wang , Wei Wang , Jian Pei , Yongtai Zhu , Baile Shi

DOI: 10.1145/1014052.1014088

关键词: Adjacency listMolecule miningGraphLinear subspaceComputer scienceGraph databaseGraph (abstract data type)ScalabilityData mining

摘要: Mining frequent structural patterns from graph databases is an interesting problem with broad applications. Most of the previous studies focus on pruning unfruitful search subspaces effectively, but few them address mining large, disk-based databases. As many in applications cannot be held into main memory, scalable remains a challenging problem. In this paper, we develop effective index structure, ADI (for adjacency index), to support various over large that memory. The simple and efficient build. Moreover, new structure can easily adopted existing pattern algorithms. example, adapt well-known gSpan algorithm by using structure. experimental results show enables one set experiments, method mine million graphs, while original only handle up 300 thousand graphs. our faster than when both run

参考文章(14)
Lawrence B. Holder, Diane J. Cook, Surnjani Djoko, Substructure discovery in the SUBDUE system knowledge discovery and data mining. pp. 169- 180 ,(1994)
TH Cormen, RL Rivest, CE Leiserson, C Stein, Introduction to Algorithms, 2nd edition. ,(2001)
Akihiro Inokuchi, Takashi Washio, Hiroshi Motoda, An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data european conference on principles of data mining and knowledge discovery. pp. 13- 23 ,(2000) , 10.1007/3-540-45372-5_2
Denis M. Bayada, Richard W. Simpson, A. Peter Johnson, Claude Laurenco, An algorithm for the multiple common subgraph problem Journal of Chemical Information and Computer Sciences. ,vol. 32, pp. 680- 685 ,(1992) , 10.1021/CI00010A015
Mohammed J. Zaki, Efficiently mining frequent trees in a forest Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02. pp. 71- 80 ,(2002) , 10.1145/775047.775058
Yoshimasa TAKAHASHI, Yuzuru SATOH, Hajime SUZUKI, Shin-ichi SASAKI, Recognition of Largest Common Structural Fragment among a Variety of Chemical Structures Analytical Sciences. ,vol. 3, pp. 23- 28 ,(1987) , 10.2116/ANALSCI.3.23
Xifeng Yan, Jiawei Han, CloseGraph: mining closed frequent graph patterns knowledge discovery and data mining. pp. 286- 295 ,(2003) , 10.1145/956750.956784
Xifeng Yan, Philip S. Yu, Jiawei Han, Graph indexing: a frequent structure-based approach international conference on management of data. pp. 335- 346 ,(2004) , 10.1145/1007568.1007607
C. Borgelt, M.R. Berthold, Mining molecular fragments: finding relevant substructures of molecules international conference on data mining. pp. 51- 58 ,(2002) , 10.1109/ICDM.2002.1183885
M. Kuramochi, G. Karypis, Frequent subgraph discovery international conference on data mining. pp. 313- 320 ,(2001) , 10.1109/ICDM.2001.989534