Improving PLS-RFE based gene selection for microarray data classification

作者: Aiguo Wang , Ning An , Guilin Chen , Lian Li , Gil Alterovitz

DOI: 10.1016/J.COMPBIOMED.2015.04.011

关键词:

摘要: Gene selection plays a crucial role in constructing efficient classifiers for microarray data classification, since is characterized by high dimensionality and small sample sizes contains irrelevant redundant genes. In practical use, partial least squares-based gene approaches can obtain subsets of good qualities, but are considerably time-consuming. this paper, we propose to integrate squares based recursive feature elimination (PLS-RFE) with two schemes: simulated annealing square root, respectively, speed up the process. Inspired from strategy schedule, proposed eliminate number features rather than one informative during each iteration removed decreases as proceeds. To verify effectiveness efficiency approaches, perform extensive experiments on six publicly available three typical classifiers, including Naive Bayes, K-Nearest-Neighbor Support Vector Machine, compare our ReliefF, PLS PLS-RFE selectors terms classification accuracy running time. Experimental results demonstrate that accelerate process impressively without degrading more compact both two-category multi-category problems. Further experimental comparisons subset consistency show approach scheme not only has better time performance, also obtains slightly root scheme. We classify data.Two dynamic schemes combined PLS-RFE.The select similar PLS-RFE.Experimental their actual use.

参考文章(48)
Ludmila I. Kuncheva, A stability index for feature selection conference on artificial intelligence for applications. pp. 390- 395 ,(2007)
Marko Robnik-Šikonja, Igor Kononenko, Theoretical and Empirical Analysis of ReliefF and RReliefF Machine Learning. ,vol. 53, pp. 23- 69 ,(2003) , 10.1023/A:1025667309714
Wengang Zhou, Julie A. Dickerson, A novel class dependent feature selection method for cancer biomarker discovery Computers in Biology and Medicine. ,vol. 47, pp. 66- 75 ,(2014) , 10.1016/J.COMPBIOMED.2014.01.014
Sijmen de Jong, SIMPLS: an alternative approach to partial least squares regression Chemometrics and Intelligent Laboratory Systems. ,vol. 18, pp. 251- 263 ,(1993) , 10.1016/0169-7439(93)85002-X
Ryan Gosselin, Denis Rodrigue, Carl Duchesne, A Bootstrap-VIP approach for selecting wavelength intervals in spectral imaging applications Chemometrics and Intelligent Laboratory Systems. ,vol. 100, pp. 12- 21 ,(2010) , 10.1016/J.CHEMOLAB.2009.09.005
Aiguo Wang, Ning An, Guilin Chen, Lian Li, Gil Alterovitz, Accelerating incremental wrapper based gene selection with K-Nearest-Neighbor bioinformatics and biomedicine. pp. 21- 23 ,(2014) , 10.1109/BIBM.2014.6999395
Marc C. Robini, Pierre-Jean Reissman, From simulated annealing to stochastic continuation: a new trend in combinatorial optimization Journal of Global Optimization. ,vol. 56, pp. 185- 215 ,(2013) , 10.1007/S10898-012-9860-0
Gregory Piatetsky-Shapiro, Pablo Tamayo, Microarray data mining: facing the challenges Sigkdd Explorations. ,vol. 5, pp. 1- 5 ,(2003) , 10.1145/980972.980974
Jianping Huang, Hong Fang, Xiaohui Fan, None, Decision forest for classification of gene expression data Computers in Biology and Medicine. ,vol. 40, pp. 698- 704 ,(2010) , 10.1016/J.COMPBIOMED.2010.06.004
Kim-Anh Lê Cao, Simon Boitard, Philippe Besse, Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems BMC Bioinformatics. ,vol. 12, pp. 253- 253 ,(2011) , 10.1186/1471-2105-12-253