On mining cross-graph quasi-cliques

作者: Jian Pei , Daxin Jiang , Aidong Zhang

DOI: 10.1145/1081870.1081898

关键词:

摘要: Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group customers who behave similarly markets should considered as more coherent cluster than clusters found market. As another bioinformatics, by joint gene expression protein interaction data, we find genes show also produce interacting proteins. Such may potential pathways.In this paper, investigate novel problem, cross-graph quasi-cliques, is generalized several interesting applications such segmentation data. We build general model for why the complete set quasi-cliques previous methods, study complexity problem. While problem difficult, develop an efficient algorithm, Crochet, exploits effective techniques heuristics to efficaciously mine quasi-cliques. A systematic performance reported on both synthetic real sets. demonstrate some meaningful bioinformatics. The experimental results that algorithm Crochet scalable.

参考文章(37)
Andrew Ng, Michael Jordan, Yair Weiss, None, On Spectral Clustering: Analysis and an algorithm neural information processing systems. ,vol. 14, pp. 849- 856 ,(2001)
Xifeng Yan, Jiawei Han, gSpan: graph-based substructure pattern mining international conference on data mining. pp. 721- 724 ,(2002) , 10.1109/ICDM.2002.1184038
Amir Ben-Dor, Ron Shamir, Zohar Yakhini, Clustering gene expression patterns. Journal of Computational Biology. ,vol. 6, pp. 281- 297 ,(1999) , 10.1089/106652799318274
Arno J. Knobbe, Multi-Relational Data Mining ,(2006)
Michael Randolph Garey, D. S. Johanson, Computers and Intractability: A Guide to the Theory of NP-Completeness AE. ,(1999)
Lawrence B. Holder, Diane J. Cook, Surnjani Djoko, Substructure discovery in the SUBDUE system knowledge discovery and data mining. pp. 169- 180 ,(1994)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Ron Rymon, Search through systematic set enumeration principles of knowledge representation and reasoning. pp. 539- 550 ,(1992)
Ron Shamir, Roded Sharan, CLICK: A Clustering Algorithm for Gene Expression Analysis intelligent systems in molecular biology. ,(2000)
Mauricio G. C. Resende, Sandra Sudarsky, James Abello, Massive Quasi-Clique Detection latin american symposium on theoretical informatics. pp. 598- 612 ,(2002) , 10.1007/3-540-45995-2_51