Feature Selection through Minimization of the VC dimension.

作者: Siddharth Sabharwal , Jayadeva , Sanjit S. Batra

DOI:

关键词:

摘要: Feature selection involes identifying the most relevant subset of input features, with a view to improving generalization predictive models by reducing overfitting. Directly searching for combination attributes is NP-hard. Variable critical importance in many applications, such as micro-array data analysis, where selecting small number discriminative features crucial developing useful disease mechanisms, well prioritizing targets drug discovery. The recently proposed Minimal Complexity Machine (MCM) provides way learn hyperplane classifier minimizing an exact (\boldmath{$\Theta$}) bound on its VC dimension. It known that lower dimension contributes good generalization. For linear space, upper bounded features; hence, parsimonious set it employs. In this paper, we use MCM which large weights are zero; non-zero ones chosen. Selected used kernel SVM classifier. On benchmark datasets, chosen yield comparable or better test accuracy than when methods ReliefF and FCBF task. typically chooses one-tenth other methods; some very high dimensional about $0.6\%$ comparison, choose 70 140 times more thus demonstrating may provide new, effective route feature learning sparse representations.

参考文章(28)
Shinichi Nakajima, Derin Babacan, Masashi Sugiyama, On Bayesian PCA: Automatic Dimensionality Selection and Analytic Solution international conference on machine learning. pp. 497- 504 ,(2011)
Kenji Kira, Larry A. Rendell, The feature selection problem: traditional methods and a new algorithm national conference on artificial intelligence. pp. 129- 134 ,(1992)
Huan Liu, Lei Yu, Feature selection for high-dimensional data: a fast correlation-based filter solution international conference on machine learning. pp. 856- 863 ,(2003)
James Theiler, Simon Perkins, Kevin Lacker, Grafting: fast, incremental feature selection by gradient descent in function space Journal of Machine Learning Research. ,vol. 3, pp. 1333- 1356 ,(2003)
Ran El-Yaniv, Yoad Winter, Naftali Tishby, Ron Bekkerman, Distributional word clusters vs. words for text categorization Journal of Machine Learning Research. ,vol. 3, pp. 1183- 1208 ,(2003)
Ron Kohavi, George H. John, Wrappers for feature subset selection Artificial Intelligence. ,vol. 97, pp. 273- 324 ,(1997) , 10.1016/S0004-3702(97)00043-X
R. B. O'Hara, M. J. Sillanpää, A review of Bayesian variable selection methods: what, how and which Bayesian Analysis. ,vol. 4, pp. 85- 117 ,(2009) , 10.1214/09-BA403
Jun Li, Dacheng Tao, On Preserving Original Variables in Bayesian PCA With Application to Image Analysis IEEE Transactions on Image Processing. ,vol. 21, pp. 4830- 4843 ,(2012) , 10.1109/TIP.2012.2211372
C.M. Bishop, Variational principal components 9th International Conference on Artificial Neural Networks: ICANN '99. ,vol. 1, pp. 509- 514 ,(1999) , 10.1049/CP:19991160
John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, Martin Anthony, A framework for structural risk minimisation Proceedings of the ninth annual conference on Computational learning theory - COLT '96. pp. 68- 76 ,(1996) , 10.1145/238061.238070