Improving Feature Subset Selection Using a Genetic Algorithm for Microarray Gene Expression Data

作者: Feng Tan , Xuezheng Fu , Yanqing Zhang , A.G. Bourgeois

DOI: 10.1109/CEC.2006.1688623

关键词: Dimensionality reductionSmall numberPattern recognitionMachine learningComputer scienceMinimum redundancy feature selectionClassifier (UML)Artificial intelligenceFeature extractionGenetic algorithmFeature selectionTruncation selection

摘要: Microarray data usually contains a huge number of genes (features) and comparatively small samples, which make accurate classification or prediction diseases challenging. Feature selection techniques can help us identify important irrelevant (unimportant) features by applying certain criteria. However, different feature algorithms based on various theoretical arguments often produce results when applied to the same set. This makes selecting an optimal near subset for set difficult. In this paper, we propose using genetic algorithm improve combining valuable outcomes from multiple methods. The goal our is achieve balance between accuracy size subsets selected. advantages approach include ability accommodate criteria find that perform well particular inductive learning interest build classifier. experimental demonstrate with higher and/or smaller compared each individual algorithm.

参考文章(13)
Huan Liu, Manoranjan Dash, Handling Large Unsupervised Data via Dimensionality Reduction. international conference on management of data. ,(1999)
Ying Liu, A comparative study on feature selection methods for drug discovery. Journal of Chemical Information and Computer Sciences. ,vol. 44, pp. 1823- 1828 ,(2004) , 10.1021/CI049875D
J. Yang, V. Honavar, Feature subset selection using a genetic algorithm IEEE Intelligent Systems & Their Applications. ,vol. 13, pp. 44- 49 ,(1998) , 10.1109/5254.671091
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, A. J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays Proceedings of the National Academy of Sciences of the United States of America. ,vol. 96, pp. 6745- 6750 ,(1999) , 10.1073/PNAS.96.12.6745
Tomaso Poggio, Vladimir Vapnik, Olivier Chapelle, Jason Weston, Sayan Mukherjee, Massimiliano Pontil, Feature Selection for SVMs neural information processing systems. ,vol. 13, pp. 668- 674 ,(2000)
Huan Liu, R. Setiono, Chi2: feature selection and discretization of numeric attributes international conference on tools with artificial intelligence. pp. 88- ,(1995) , 10.1109/TAI.1995.479783
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, E. S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. ,vol. 286, pp. 531- 537 ,(1999) , 10.1126/SCIENCE.286.5439.531
Isabelle Guyon, André Elisseeff, An introduction to variable and feature selection Journal of Machine Learning Research. ,vol. 3, pp. 1157- 1182 ,(2003) , 10.1162/153244303322753616
Christopher J.C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition Data Mining and Knowledge Discovery. ,vol. 2, pp. 121- 167 ,(1998) , 10.1023/A:1009715923555
Isabelle Guyon, Jason Weston, Stephen Barnhill, Vladimir Vapnik, Gene Selection for Cancer Classification using Support Vector Machines Machine Learning. ,vol. 46, pp. 389- 422 ,(2002) , 10.1023/A:1012487302797