Consensus group stable feature selection

作者: Steven Loscalzo , Lei Yu , Chris Ding

DOI: 10.1145/1557019.1557084

关键词: Computer scienceSample size determinationStability (learning theory)Data miningPattern recognitionFeature (computer vision)Dimensionality reductionArtificial intelligenceFeature selectionGeneralizationClustering high-dimensional data

摘要: Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of has a strong dependency on size. We propose novel framework for stable which first identifies consensus groups subsampling training samples, then performs by treating each group as single entity. Experiments both synthetic real-world data sets algorithm developed under effective at alleviating the problem size leads to more results comparable or better generalization performance than state-of-the-art algorithms. Synthetic source code are available http://www.cs.binghamton.edu/~lyu/KDD09/.

参考文章(27)
Andrew Y. Ng, On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples international conference on machine learning. pp. 404- 412 ,(1998)
Kees Jong, Jérémie Mary, Antoine Cornuéjols, Elena Marchiori, Michèle Sebag, Ensemble feature ranking european conference on principles of data mining and knowledge discovery. pp. 267- 278 ,(2004) , 10.1007/978-3-540-30116-5_26
Umesh V. Vazirani, Michael J. Kearns, An Introduction to Computational Learning Theory ,(1994)
Mark A. Hall, Ian H. Witten, Eibe Frank, Data Mining: Practical Machine Learning Tools and Techniques ,(1999)
Xue-wen Chen, Jong Cheol Jeong, Minimum reference set based feature selection for small sample classifications international conference on machine learning. pp. 153- 160 ,(2007) , 10.1145/1273496.1273516
Ron Kohavi, George H. John, Wrappers for feature subset selection Artificial Intelligence. ,vol. 97, pp. 273- 324 ,(1997) , 10.1016/S0004-3702(97)00043-X
Yizong Cheng, Mean shift, mode seeking, and clustering IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 17, pp. 790- 799 ,(1995) , 10.1109/34.400568
Lei Yu, Chris Ding, Steven Loscalzo, Stable feature selection via dense feature groups Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 803- 811 ,(2008) , 10.1145/1401890.1401986
Alexandros Kalousis, Julien Prados, Melanie Hilario, Stability of feature selection algorithms: a study on high-dimensional spaces Knowledge and Information Systems. ,vol. 12, pp. 95- 116 ,(2007) , 10.1007/S10115-006-0040-8
Alexander Strehl, Joydeep Ghosh, Cluster ensembles --- a knowledge reuse framework for combining multiple partitions Journal of Machine Learning Research. ,vol. 3, pp. 583- 617 ,(2003) , 10.1162/153244303321897735