Odkrivanje podskupin z uporabo algoritmov za u£enje pravil

作者: Doktorska Disertacija

DOI:

关键词:

摘要: This dissertation investigates how to adapt standard classification rule learning approaches subgroup discovery. The goal of discovery is find rules describing subsets a selected population that are sufficiently large and statistically unusual in terms class distribution. presents algorithm, CN2-SD, developed by modifying parts the CN2 learner: its covering search heuristic, probabilistic instances, evaluation measures. Experimental CN2-SD on data sets shows substantial reduction number induced rules, increased coverage, significance overall coverage target concept as well slight improvements area under ROC curve, when compared with algorithms RIPPER. An application traffic accident set confirms these findings. also algorithm APRIORI-SD, adapting association was achieved building learner APRIORI-C, enhanced novel post–processing mechanism, new quality measure for (weighted relative accuracy) using instances. results similar behavior APRIORI-SD i.e. CN2, RIPPER APRIORI-C. A optimization approach based analysis presented implemented an adaptation algorithm. implications “number-of-rules–unusualness–coverage” trade off investigated through experimental adapted sets. form 2D graphs depicting dependencies between unusualness, accuracy original xi xii

参考文章(64)
Ian H. Witten, Eibe Frank, Data mining ACM SIGMOD Record. ,vol. 31, pp. 76- 77 ,(2002) , 10.1145/507338.507355
Huan Liu, Farhad Hussain, Chew Lim Tan, Manoranjan Dash, Discretization: An Enabling Technique Data Mining and Knowledge Discovery. ,vol. 6, pp. 393- 423 ,(2002) , 10.1023/A:1016304305535
Peter Clark, Tim Niblett, The CN2 Induction Algorithm Machine Learning. ,vol. 3, pp. 261- 283 ,(1989) , 10.1023/A:1022641700528
N. Lavrac, P. Flach, B. Kavsek, L. Todorovski, Adapting classification rule induction to subgroup discovery international conference on data mining. pp. 266- 273 ,(2002) , 10.1109/ICDM.2002.1183912
Johannes Grabmeier, Andreas Rudolph, Techniques of Cluster Algorithms in Data Mining Data Mining and Knowledge Discovery. ,vol. 6, pp. 303- 360 ,(2002) , 10.1023/A:1016308404627
Peter A. Flach, The geometry of ROC space: understanding machine learning metrics through ROC isometrics international conference on machine learning. pp. 194- 201 ,(2003)
Branko Kavÿsek, Nada Lavraÿc, Using Subgroup Discovery to Analyze the UK Traffic Data ,(2004)
Wynne Hsu, Yiming Ma, Bing Liu, Integrating classification and association rule mining knowledge discovery and data mining. pp. 80- 86 ,(1998)
Rakesh Agrawal, Tomasz Imieliński, Arun Swami, Mining association rules between sets of items in large databases Proceedings of the 1993 ACM SIGMOD international conference on Management of data - SIGMOD '93. ,vol. 22, pp. 207- 216 ,(1993) , 10.1145/170035.170072
Ronald L. Rivest, Learning Decision Lists Machine Learning. ,vol. 2, pp. 229- 246 ,(1987) , 10.1023/A:1022607331053