作者: Steven Loscalzo , Lei Yu , Chris Ding
关键词: Computer science 、 Sample size determination 、 Stability (learning theory) 、 Data mining 、 Pattern recognition 、 Feature (computer vision) 、 Dimensionality reduction 、 Artificial intelligence 、 Feature selection 、 Generalization 、 Clustering high-dimensional data
摘要: Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of has a strong dependency on size. We propose novel framework for stable which first identifies consensus groups subsampling training samples, then performs by treating each group as single entity. Experiments both synthetic real-world data sets algorithm developed under effective at alleviating the problem size leads to more results comparable or better generalization performance than state-of-the-art algorithms. Synthetic source code are available http://www.cs.binghamton.edu/~lyu/KDD09/.