The minimum consistent subset cover problem and its applications in data mining

作者: Byron J. Gao , Martin Ester , Jin-Yi Cai , Oliver Schulte , Hui Xiong

DOI: 10.1145/1281192.1281228

关键词:

摘要: In this paper, we introduce and study the Minimum Consistent Subset Cover (MCSC) problem. Given a finite ground set X constraint t, find minimum number of consistent subsets that cover X, where subset is if it satisfies t. The MCSC problem generalizes traditional covering has Clique Partition, dual graph coloring, as an instance. Many practical data mining problems in areas rule learning, clustering, frequent pattern can be formulated instances. particular, discuss Rule Set minimizes model complexity decision rules well some converse k-clustering minimize clusters satisfying certain distance constraints. We also show how applications summarization. For any these formulations, our proposed novel graph-based generic algorithm CAG directly applicable. starts by constructing maximal optimal partial solution, then performs example-driven specific-to-general search on dynamically maintained bipartite assignment to simultaneously learn with small cardinality set. Our experiments benchmark datasets achieves good results compared existing popular heuristics.

参考文章(22)
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Jaroslaw Pietrzykowski, Janusz Wojtusiak, Kenneth A. Kaufman, Ryszard S. Michalski, Multitype Pattern Discovery Via AQ21: A Brief Description of the Method and Its Novel Features ,(2006)
Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)
Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal, Discovering Frequent Closed Itemsets for Association Rules international conference on database theory. ,vol. 1540, pp. 398- 416 ,(1999) , 10.1007/3-540-49257-7_25
Johannes Fürnkranz, Separate-and-Conquer Rule Learning Artificial Intelligence Review. ,vol. 13, pp. 3- 54 ,(1999) , 10.1023/A:1006524209794
Heikki Mannila, Roni Khardon, Dimitrios Gunopulos, Hannu Toivonen, Data mining, hypergraph transversals, and machine learning symposium on principles of database systems. pp. 209- 216 ,(1997)
Sreerama K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey Data Mining and Knowledge Discovery. ,vol. 2, pp. 345- 389 ,(1998) , 10.1023/A:1009744630224
Se June Hong, R-MINI: An iterative approach for generating minimal rules from examples IEEE Transactions on Knowledge and Data Engineering. ,vol. 9, pp. 709- 717 ,(1997) , 10.1109/69.634750
Moses Charikar, Chandra Chekuri, Tomas Feder, Rajeev Motwani, Incremental Clustering and Dynamic Information Retrieval SIAM Journal on Computing. ,vol. 33, pp. 1417- 1440 ,(2004) , 10.1137/S0097539702418498