Irrelevant Feature and Rule Removal for Structural Associative Classification Using Structure-Preserving Flat Representation

作者: Izwan Nizal Mohd Shaharanee , Fedja Hadzic

DOI: 10.1007/978-3-662-45620-0_10

关键词:

摘要: Practical applications of association rule mining often suffer from overwhelming number rules that are generated, many which not interesting or useful for the application in question. Removing irrelevant features and/or comprised can significantly improve overall performance. Many statistical and constraint based measures used to discard unnecessary when vectorial tabular data is In contrast, use such limited tree-structured domain, due structural aspects easily incorporated. this chapter, we explore a feature subset selection measure as well common interestingness via recently proposed structure-preserving flat representation XML. A prior generation. Once initial set obtained, determined those attributes be statistically significant classification task. The experiments performed using real world web access trees property management dataset. results indicate where dataset has more standard structure large insignificant will discarded accuracy increase. However, tree instances vary greatly terms label distribution among nodes, while removed increases, there reduction coverage rate set.

参考文章(60)
Yannick Le Bras, Philippe Lenca, Stéphane Lallich, Formal Framework for the Study of Algorithmic Properties of Objective Interestingness Measures Springer Berlin Heidelberg. pp. 77- 98 ,(2012) , 10.1007/978-3-642-23241-1_5
Jilles Vreeken, Arno Siebes, Matthijs van Leeuwen, Item Sets that Compress. siam international conference on data mining. pp. 395- 406 ,(2006)
Chengqi Zhang, Shichao Zhang, Collecting Quality Data for Database Mining australian joint conference on artificial intelligence. pp. 593- 604 ,(2001) , 10.1007/3-540-45656-2_51
Yannick Le Bras, Philippe Lenca, Stéphane Lallich, Mining classification rules without support: an anti-monotone property of Jaccard measure discovery science. pp. 179- 193 ,(2011) , 10.1007/978-3-642-24477-3_16
Richard J. Roiger, Data Mining: A Tutorial Based Primer ,(2002)
Fedja Hadzic, Henry Tan, Tharam S. Dillon, Mining of Data with Complex Structures Springer. ,vol. 333, pp. 1- 326 ,(2011) , 10.1007/978-3-642-17557-2
Fedja Hadzic, A Structure Preserving Flat Data Format Representation for Tree-Structured Data New Frontiers in Applied Data Mining. pp. 221- 233 ,(2012) , 10.1007/978-3-642-28320-8_19
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Yonatan Aumann, Yehuda Lindell, A Statistical Theory for Quantitative Association Rules intelligent information systems. ,vol. 20, pp. 255- 283 ,(2003) , 10.1023/A:1022812808206
Atsuyoshi Nakamura, Mineichi Kudo, Mining Frequent Trees with Node-Inclusion Constraints Advances in Knowledge Discovery and Data Mining. pp. 850- 860 ,(2005) , 10.1007/11430919_101