Redundancy based feature selection for microarray data

作者: Lei Yu , Huan Liu

DOI: 10.1145/1014052.1014149

关键词:

摘要: In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands is an important problem for accurate classification diseases or phenotypes. The becomes particularly challenging due to the large features (genes) and sample size. Traditional selection methods often select top-ranked according their individual power without handling high degree redundancy among genes. Latest research shows that removing redundant selected ones can achieve better representation characteristics targeted phenotypes lead improved accuracy. Hence, we study in this paper relationship between feature relevance propose efficient method effectively remove efficiency effectiveness our comparison with representative has been demonstrated through empirical using public sets.

参考文章(28)
David D. Jensen, Paul R. Cohen, Multiple Comparisons in Induction Algorithms Machine Learning. ,vol. 38, pp. 309- 338 ,(2000) , 10.1023/A:1007631014630
Marko Robnik-Šikonja, Igor Kononenko, Theoretical and Empirical Analysis of ReliefF and RReliefF Machine Learning. ,vol. 53, pp. 23- 69 ,(2003) , 10.1023/A:1025667309714
Steven L. Salzberg, Alberto Segre, Programs for Machine Learning ,(1994)
Mark Andrew Hall, Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning international conference on machine learning. pp. 359- 366 ,(2000)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Eric P. Xing, Richard M. Karp, Michael I. Jordan, Feature selection for high-dimensional genomic microarray data international conference on machine learning. pp. 601- 608 ,(2001)
George H John, Ron Kohavi, Karl Pfleger, None, Irrelevant Features and the Subset Selection Problem Machine Learning Proceedings 1994. pp. 121- 129 ,(1994) , 10.1016/B978-1-55860-335-6.50023-4
Huan Liu, Lei Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution international conference on machine learning. pp. 856- 863 ,(2003)
Mehran Sahami, Daphne Koller, Toward optimal feature selection international conference on machine learning. pp. 284- 292 ,(1996)