Measuring Stability of Feature Selection in Biomedical Datasets

作者: Shyam Visweswaran , Jonathan L. Lustgarten , Vanathi Gopalakrishnan

DOI:

关键词: Dimensionality reductionBayes' theoremFeature selectionFeature (machine learning)Data miningComputer scienceRobustness (computer science)Stability (learning theory)Feature (computer vision)Pattern recognitionProperty (programming)Measure (mathematics)Artificial intelligence

摘要: An important step in the analysis of high-dimensional biomedical data is feature selection. Typically, a subset selected by selection method evaluated for relevance towards task such as prediction or classification. Another property stability that refers to robustness features perturbations data. In biomarker discovery, example, domain experts prefer parsimonious are relatively robust slight changes We present measure called adjusted computes with respect random This useful comparing methods and superior similar measures do not account demonstrate application this on dataset.

参考文章(9)
Shyam Visweswaran, Jonathan L. Lustgarten, Himanshu Grover, Vanathi Gopalakrishnan, An Evaluation of Discretization Methods for Learning Rules from Biomedical Datasets. BIOCOMP. pp. 527- 532 ,(2008)
Ludmila I. Kuncheva, A stability index for feature selection conference on artificial intelligence for applications. pp. 390- 395 ,(2007)
Padraig Cunningham, Francisco Azuaje, Kevin Dunne, Solutions to Instability Problems with Sequential Wrapper-based Approaches to Feature Selection Trinity College Dublin, Department of Computer Science. ,(2002)
Rick Jordan, Satish Patel, Hai Hu, James Lyons-Weiler, Efficiency Analysis of Competing Tests for Finding Differentially Expressed Genes in Lung Adenocarcinoma Cancer Informatics. ,vol. 6, pp. 389- 421 ,(2008) , 10.4137/CIN.S791
C. A. Davis, F. Gerick, V. Hintermair, C. C. Friedel, K. Fundel, R. Kuffner, R. Zimmer, Reliable gene signatures for microarray classification: assessment of stability and performance Bioinformatics. ,vol. 22, pp. 2356- 2363 ,(2006) , 10.1093/BIOINFORMATICS/BTL400
Milos Hauskrecht, Richard Pelikan, David E Malehorn, William L Bigbee, Michael T Lotze, Herbert J Zeh, David C Whitcomb, James Lyons-Weiler, Feature Selection for Classification of SELDI-TOF-MS Proteomic Profiles Applied Bioinformatics. ,vol. 4, pp. 227- 246 ,(2005) , 10.2165/00822942-200504040-00003
Alexandros Kalousis, Julien Prados, Melanie Hilario, Stability of feature selection algorithms: a study on high-dimensional spaces Knowledge and Information Systems. ,vol. 12, pp. 95- 116 ,(2007) , 10.1007/S10115-006-0040-8
Avrim L. Blum, Pat Langley, Selection of relevant features and examples in machine learning Artificial Intelligence. ,vol. 97, pp. 245- 271 ,(1997) , 10.1016/S0004-3702(97)00063-5
Ji-Hyun Kim, Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap Computational Statistics & Data Analysis. ,vol. 53, pp. 3735- 3745 ,(2009) , 10.1016/J.CSDA.2009.04.009