作者: Lei Yu , Chris Ding , Steven Loscalzo
关键词: Selection (genetic algorithm) 、 Stability (learning theory) 、 Minimum redundancy feature selection 、 Feature (computer vision) 、 k-nearest neighbors algorithm 、 Dimensionality reduction 、 Artificial intelligence 、 Clustering high-dimensional data 、 Data mining 、 Pattern recognition 、 Mathematics 、 Feature selection
摘要: Many feature selection algorithms have been proposed in the past focusing on improving classification accuracy. In this work, we point out importance of stable for knowledge discovery from high-dimensional data, and identify two causes instability algorithms: a minimum subset without redundant features small sample size. We propose general framework which emphasizes both good generalization stability results. The identifies dense groups based kernel density estimation treats each group as coherent entity selection. An efficient algorithm DRAGS (Dense Relevant Attribute Group Selector) is developed under framework. also introduce measure assessing algorithms. Our empirical study microarray data verifies that remain random hold out, effective identifying set exhibit high accuracy stability.