Feature subset selection problem on microarray data

作者: Nihan Özşamlı

DOI:

关键词:

摘要: Recent advance of technology gave birth to tools such as microarray chips. The use chips enabled the scientists measure amount protein production from their genes in a cell, known gene expression data. classification cell samples by means data is hot research area. used for analysis massive and therefore features, i.e., genes, must be reduced reasonable level due computational cost experiments possibility misleading irrelevant genes. Therefore, usually, based on includes feature subset selection phase. This thesis aims develop tool that can during phase analyses. Three novel algorithms are proposed problem basic association rule mining. first algorithm starts with fuzzy partitioning discovers highly confident IF-THEN rules enable sample tissues. second search possible IFTHEN heuristic pruning approach which beam algorithm. Finally, third focuses hierarchical information carried through expressions constructing decision trees different performance measures. We found satisfactory results Leukemia Dataset. In addition, colon cancer dataset, construction showed good performance.

参考文章(60)
Thomas G. Dietterich, Hussein Almuallim, Learning with many irrelevant features national conference on artificial intelligence. pp. 547- 552 ,(1991)
Carolina Ruiz, Elizabeth F. Ryder, Aleksandar Icev, Distance-enhanced association rules for gene expression international conference on data mining. pp. 34- 40 ,(2003)
Rudy Setiono, Huan Liu, A probabilistic approach to feature selection - a filter solution international conference on machine learning. pp. 319- 327 ,(1996)
Srinivasan Parthasarathy, Mitsunori Ogihara, Mohammed J Zaki, Wei Li, New algorithms for fast discovery of association rules knowledge discovery and data mining. pp. 283- 286 ,(1997)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Céline Becquet, Sylvain Blachon, Baptiste Jeudy, Jean-Francois Boulicaut, Olivier Gandrillon, Strong-association-rule mining for large-scale gene-expression data analysis: a case study on human SAGE data Genome Biology. ,vol. 3, pp. 1- 16 ,(2002) , 10.1186/GB-2002-3-12-RESEARCH0067
Mark Andrew Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning international conference on machine learning. pp. 359- 366 ,(2000)
Vipin Kumar, Pang-Ning Tan, Michael M. Steinbach, Introduction to Data Mining ,(2013)
Kenji Kira, Larry A. Rendell, A Practical Approach to Feature Selection international conference on machine learning. pp. 249- 256 ,(1992) , 10.1016/B978-1-55860-247-2.50037-1
Eric P. Xing, Richard M. Karp, Michael I. Jordan, Feature selection for high-dimensional genomic microarray data international conference on machine learning. pp. 601- 608 ,(2001)