Feature selection with a general hybrid algorithm

作者: Jerffeson Teixeira De Souza

DOI: 10.20381/RUOR-19633

关键词:

摘要: The Feature Selection problem involves discovering a subset of features, such that classifier built only with this would have better predictive accuracy than from the entire set features. A large number algorithms already been proposed for feature selection problem. Although significantly different regards to (1) the search strategy they use determine right features and (2) how each is evaluated, are usually classified in three general groups: Filters, Wrappers Hybrid solutions. In thesis, we propose new hybrid system machine learning. idea behind algorithm, FortalFS, extract combine best characteristics filters wrappers one algorithm. FortalFS uses results another as starting point through subsets evaluated by learning With an efficient heuristic, can decrease be consequently decreasing computational effort still able select accurate subset. We also designed variant original algorithm attempt work weighting order evaluate experiments were run compared well-known filter wrapper algorithms, Focus, Relief, LVF, others. Such aver datasets UCI Repository. Results showed outperforms most significantly. However, it presents time-consuming performance similar wrappers. Additional using specially artificial demonstrated identify remove both irrelevant, redundant randomly class-correlated The time-consumption issue addressed parallelism. parallel version based on master/slave design pattern implemented evaluated. In several experiments, achieve near optimal speedups.

参考文章(10)
Thomas G. Dietterich, Hussein Almuallim, Learning with many irrelevant features national conference on artificial intelligence. pp. 547- 552 ,(1991)
Kenji Kira, Larry A. Rendell, A Practical Approach to Feature Selection international conference on machine learning. pp. 249- 256 ,(1992) , 10.1016/B978-1-55860-247-2.50037-1
Sanmay Das, Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection international conference on machine learning. pp. 74- 81 ,(2001)
Eric P. Xing, Richard M. Karp, Michael I. Jordan, Feature selection for high-dimensional genomic microarray data international conference on machine learning. pp. 601- 608 ,(2001)
George H John, Ron Kohavi, Karl Pfleger, None, Irrelevant Features and the Subset Selection Problem Machine Learning Proceedings 1994. pp. 121- 129 ,(1994) , 10.1016/B978-1-55860-335-6.50023-4
J. Bala, K. De Jong, J. Huang, H. Vafaie, H. Wechsler, Using learning to facilitate the evolution of features for recognizing visual concepts Evolutionary Computation. ,vol. 4, pp. 297- 311 ,(1996) , 10.1162/EVCO.1996.4.3.297
Avrim L. Blum, Pat Langley, Selection of relevant features and examples in machine learning Artificial Intelligence. ,vol. 97, pp. 245- 271 ,(1997) , 10.1016/S0004-3702(97)00063-5
C. L. Blake, UCI Repository of machine learning databases www.ics.uci.edu/〜mlearn/MLRepository.html. ,(1998)
Marc Sebban, Richard Nock, A hybrid filter/wrapper approach of feature selection using information theory Pattern Recognition. ,vol. 35, pp. 835- 846 ,(2002) , 10.1016/S0031-3203(01)00084-X
H. Vafaie, K. De Jong, Genetic algorithms as a tool for feature selection in machine learning international conference on tools with artificial intelligence. pp. 200- 203 ,(1992) , 10.1109/TAI.1992.246402