MaPle: a fast algorithm for maximal pattern-based clustering

作者: Jian Pei , Xiaoling Zhang , Moonjung Cho , Haixun Wang , P.S. Yu

DOI: 10.1109/ICDM.2003.1250928

关键词:

摘要: Pattern-based clustering is important in many applications, such as DNA micro-array data analysis, automatic recommendation systems and target marketing systems. However, pattern-based large databases challenging. On the one hand, there can be a huge number of clusters them redundant thus make ineffective. other previous proposed methods may not efficient or scalable mining databases. We study problem maximal clustering. Redundant are avoided completely by only clusters. MaPle, an algorithm developed. It conducts depth-first, divide-and-conquer search prunes unnecessary branches smartly. Our extensive performance on both synthetic sets real shows that effective. reduces substantially. Moreover, MaPle more than previously

参考文章(13)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15
H. V. Jagadish, Raymond T. Ng, J. Madar, Semantic Compression and Pattern Extraction with Fascicles very large data bases. pp. 186- 198 ,(1999) , 10.14288/1.0051612
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314
Haixun Wang, Wei Wang, Jiong Yang, Philip S. Yu, Clustering by pattern similarity in large data sets Proceedings of the 2002 ACM SIGMOD international conference on Management of data - SIGMOD '02. pp. 394- 405 ,(2002) , 10.1145/564691.564737
Chun-Hung Cheng, Ada Waichee Fu, Yi Zhang, None, Entropy-based subspace clustering for mining numerical data knowledge discovery and data mining. pp. 84- 93 ,(1999) , 10.1145/312129.312199
Jiawei Han, Jian Pei, Yiwen Yin, Mining frequent patterns without candidate generation international conference on management of data. ,vol. 29, pp. 1- 12 ,(2000) , 10.1145/335191.335372
Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong Soo Park, Fast algorithms for projected clustering international conference on management of data. ,vol. 28, pp. 61- 72 ,(1999) , 10.1145/304181.304188