Balanced accuracy for feature subset selection with genetic algorithms

作者: M.R. Peterson , M.L. Raymer , G.B. Lamont

DOI: 10.1109/CEC.2005.1555008

关键词:

摘要: The relevance of a set measured features describing labeled patterns within problem domain affects classifier performance. Feature subset selection algorithms employing wrapper approach typically assess the fitness feature simply as accuracy given over available using candidate set. For datasets with many for some classes and few others, relatively high may be achieved by labeling unknown according to largest class. wrappers that only emphasize follow this bias. Class bias mitigated emphasizing well-balanced during optimization algorithm. This paper proposes adding selective pressure balanced mitigate class evolution. Experiments compare performance genetic various functions varying in terms accuracy, balance, parsimony. Several including greedy, genetic, filter, hybrid filter/GA approaches are then compared best function. experiments employ naive Bayes public datasets. results suggest improvements balance size can made without compromising overall or run-time efficiency.

参考文章(21)
A.K. Jain, B. Chandrasekaran, 39 Dimensionality and sample size considerations in pattern recognition practice Handbook of Statistics. ,vol. 2, pp. 835- 855 ,(1982) , 10.1016/S0169-7161(82)02042-2
James D. Kelly, Lawrence Davis, Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm. ICGA. pp. 377- 383 ,(1991)
Thomas W. Brotherton, Patrick K. Simpson, Dynamic Feature Set Training of Neural Nets for Classification. Evolutionary Programming. pp. 83- 94 ,(1995)
William F. Punch, Richard J. Enbody, Paul D. Hovland, Min Pei, Erik D. Goodman, Lai Chia-Shun, Further Research on Feature Selection and Classification Using Genetic Algorithms international conference on genetic algorithms. pp. 557- 564 ,(1993)
Michael R. Peterson, Travis E. Doom, Michael L. Raymer, GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation genetic and evolutionary computation conference. ,vol. 3102, pp. 426- 437 ,(2004) , 10.1007/978-3-540-24854-5_43
E. Cantu-Paz, Feature Subset Selection, Class Separability, and Genetic Algorithms genetic and evolutionary computation conference. pp. 959- 970 ,(2004) , 10.1007/978-3-540-24854-5_96
W. Siedlecki, J. Sklansky, A note on genetic algorithms for large-scale feature selection Pattern Recognition Letters. ,vol. 10, pp. 335- 347 ,(1989) , 10.1016/0167-8655(89)90037-8
J. Bala, K. De Jong, J. Huang, H. Vafaie, H. Wechsler, Using learning to facilitate the evolution of features for recognizing visual concepts Evolutionary Computation. ,vol. 4, pp. 297- 311 ,(1996) , 10.1162/EVCO.1996.4.3.297
Anil K. Jain, Richard C. Dubes, Chaur-Chin Chen, Bootstrap Techniques for Error Estimation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-9, pp. 628- 633 ,(1987) , 10.1109/TPAMI.1987.4767957
I. Inza, P. Larrañaga, R. Etxeberria, B. Sierra, Feature subset selection by Bayesian network-based optimization Artificial Intelligence. ,vol. 123, pp. 157- 184 ,(2000) , 10.1016/S0004-3702(00)00052-7