Combining Feature Selection and Feature Construction to Improve Concept Learning for High Dimensional Data

作者: Blaise Hanczar

DOI: 10.1007/11527862_19

关键词:

摘要: This paper describes and experimentally analyses a new dimension reduction method for microarray data. Microarrays, which allow simultaneous measurement of the level expression thousands genes in given situation (tissue, cell or time), produce data poses particular machine-learning problems. The disproportion between number attributes (tens thousands) examples (hundreds) requires dimension. While gene/class mutual information is often used to filter we propose an approach takes into account gene-pair/class information. A gene selection heuristic based on this principle proposed as well automatic feature-construction procedure forcing learning algorithms make use these pairs. We report significant improvements accuracy several public databases.

参考文章(18)
Julius T. Tou, FEATURE SELECTION FOR PATTERN RECOGNITION SYSTEMS Methodologies of Pattern Recognition. pp. 493- 508 ,(1969) , 10.1016/B978-1-4832-3093-1.50031-6
Nir Friedman, Amir Ben-Dor, Zohar Yakhini, Scoring Genes for Relevance ,(2000)
Kyu-Baek Hwang, Dong-Yeon Cho, Sang-Wook Park, Sung-Dong Kim, Byoung-Tak Zhang, Applying Machine Learning Techniques to Analysis of Gene Expression Data: Cancer Diagnosis Springer, Boston, MA. pp. 167- 182 ,(2002) , 10.1007/978-1-4615-0873-1_13
Aleks Jakulin, Ivan Bratko, Analyzing Attribute Dependencies european conference on principles of data mining and knowledge discovery. pp. 229- 240 ,(2003) , 10.1007/978-3-540-39804-2_22
Iñaki Inza, Basilio Sierra, Pedro Larrañaga, Rosa Blanco, Gene selection by sequential search wrapper approaches in microarray cancer class prediction Journal of Intelligent and Fuzzy Systems. ,vol. 12, pp. 25- 33 ,(2002)
Eric P. Xing, Richard M. Karp, Michael I. Jordan, Feature selection for high-dimensional genomic microarray data international conference on machine learning. pp. 601- 608 ,(2001)
Blaise Hanczar, Mélanie Courtine, Arriel Benis, Corneliu Hennegar, Karine Clément, Jean-Daniel Zucker, Improving classification of microarray data using prototype-based feature selection Sigkdd Explorations. ,vol. 5, pp. 23- 30 ,(2003) , 10.1145/980972.980977
Sandrine Dudoit, Jane Fridlyand, Terence P Speed, None, Comparison of discrimination methods for the classification of tumors using gene expression data Journal of the American Statistical Association. ,vol. 97, pp. 77- 87 ,(2002) , 10.1198/016214502753479248
Leping Li, Thomas Darden, Clarice Weingberg, A. Levine, Lee Pedersen, Gene assessment and sample classification for gene expression data using a genetic algorithm/k-nearest neighbor method. Combinatorial Chemistry & High Throughput Screening. ,vol. 4, pp. 727- 739 ,(2001) , 10.2174/1386207013330733