Gene and sample selection for cancer classification with support vectors based t-statistic

作者: Piyushkumar A. Mundra , Jagath C. Rajapakse

DOI: 10.1016/J.NEUCOM.2010.02.025

关键词: t-statisticPattern recognitionTask (project management)Filter (signal processing)Artificial intelligenceBenchmark (computing)MathematicsData miningFeature selectionSample selectionGeneData point

摘要: T-statistic is widely used for gene ranking in the analysis of microarray expressions. Such a filter based criterion generally computed using all training samples, which, however, may not be equally important classification task. In this paper, we decompose t-statistic into two parts, corresponding to relevant and irrelevant data points. The points are selected support vectors then compute feature selection. By simultaneously selecting genes, significantly better results achieved on synthetic as well several benchmark cancer datasets.

参考文章(40)
Kenji Kira, Larry A. Rendell, The feature selection problem: traditional methods and a new algorithm national conference on artificial intelligence. pp. 129- 134 ,(1992)
Piyushkumar A. Mundra, Jagath C. Rajapakse, Support Vector Based T-Score for Gene Ranking Pattern Recognition in Bioinformatics. pp. 144- 153 ,(2008) , 10.1007/978-3-540-88436-1_13
I.M. Guyon, S.R. Gunn, L. Zadeh, M. Nikravesh, Feature extraction : foundations and applications Springer. ,(2006)
Yu Wang, Igor V Tetko, Mark A Hall, Eibe Frank, Axel Facius, Klaus FX Mayer, Hans W Mewes, None, Gene selection from microarray data for cancer classification-a machine learning approach Computational Biology and Chemistry. ,vol. 29, pp. 37- 46 ,(2005) , 10.1016/J.COMPBIOLCHEM.2004.11.001
Jooyong Shim, Insuk Sohn, Sujong Kim, Jae Won Lee, Paul E. Green, Changha Hwang, Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine Computational Statistics & Data Analysis. ,vol. 53, pp. 1736- 1742 ,(2009) , 10.1016/J.CSDA.2008.04.028
Robert Clarke, Habtom W Ressom, Antai Wang, Jianhua Xuan, Minetta C Liu, Edmund A Gehan, Yue Wang, None, The properties of high-dimensional data spaces: implications for exploring gene and protein expression data Nature Reviews Cancer. ,vol. 8, pp. 37- 49 ,(2008) , 10.1038/NRC2294
Huan Liu, Hiroshi Motoda, Lei Yu, A selective sampling approach to active feature selection Artificial Intelligence. ,vol. 159, pp. 49- 74 ,(2004) , 10.1016/J.ARTINT.2004.05.009
Chia Huey Ooi, Madhu Chetty, Shyh Wei Teng, Characteristics of predictor sets found using differential prioritization Algorithms for Molecular Biology. ,vol. 2, pp. 7- 7 ,(2007) , 10.1186/1748-7188-2-7
Mattias Wahde, Zoltan Szallasi, A survey of methods for classification of gene expression data using evolutionary algorithms. Expert Review of Molecular Diagnostics. ,vol. 6, pp. 101- 110 ,(2006) , 10.1586/14737159.6.1.101