Reliable gene signatures for microarray classification: assessment of stability and performance

作者: C. A. Davis , F. Gerick , V. Hintermair , C. C. Friedel , K. Fundel

DOI: 10.1093/BIOINFORMATICS/BTL400

关键词: GeneGene selectionClassification methodsCartilage sampleData miningMicroarrayPairwise comparisonBiologyClassifier (UML)R package

摘要: Motivation: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) identify meaningful signatures (ranked lists) exhibiting differences between subsets. Solutions both have immediate biological biomedical applications. To achieve optimal classification performance, a suitable combination classifier selection method needs be specifically selected given dataset. The can unstable resulting accuracy unreliable, particularly when considering subsets samples. Both overestimated impair conclusions. Methods: We address these two issues by repeatedly evaluating performance all models, i.e. pairwise combinations various methods, random arrays (sampling). A model score is used select most appropriate Consensus constructed extracting those genes frequently over many samplings. Sampling additionally permits measurement stability each model, which serves as measure reliability. Results: analyzed large dataset with 78 four cartilage classes. Classifiers trained on produce models highly variable performance. Our approach provides reliable estimates via sampling. In addition we determined stable consensus (i.e. Manual literature screening showed that relevant our experiment osteoarthritic cartilage. compared others based publicly available breast cancer. Availability: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt Contact: ralf.zimmer@bio.ifi.lmu.de

参考文章(19)
James Vaigl, Greg Burns, Raja Daoud, LAM: An Open Cluster Environment for MPI ,(2002)
Katrin Fundel, Robert Küffner, Thomas Aigner, Ralf Zimmer, Data processing effects on the interpretation of microarray gene expression experiments german conference on bioinformatics. pp. 77- 91 ,(2005)
Sandrine Dudoit, Jean Yee Hwa Yang, Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Data Statistics for Biology and Health. pp. 73- 101 ,(2003) , 10.1007/0-387-21679-0_3
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Sorin Drǎghici, Purvesh Khatri, Rui P. Martins, G.Charles Ostermeier, Stephen A. Krawetz, Global functional profiling of gene expression. Genomics. ,vol. 81, pp. 98- 104 ,(2003) , 10.1016/S0888-7543(02)00021-6
Richard A Olshen, Charles J Stone, Leo Breiman, Jerome H Friedman, Classification and regression trees ,(1983)
Thomas Boyle, The Analysis of Gene Expression Data: Methods and Software Journal of Statistical Software. ,vol. 14, pp. 1- 2 ,(2013) , 10.18637/JSS.V014.B01
Sandrine Dudoit, Jane Fridlyand, Terence P Speed, None, Comparison of discrimination methods for the classification of tumors using gene expression data Journal of the American Statistical Association. ,vol. 97, pp. 77- 87 ,(2002) , 10.1198/016214502753479248
Bernhard E. Boser, Isabelle M. Guyon, Vladimir N. Vapnik, A training algorithm for optimal margin classifiers conference on learning theory. pp. 144- 152 ,(1992) , 10.1145/130385.130401
Stefan Michiels, Serge Koscielny, Catherine Hill, Prediction of cancer outcome with microarrays: a multiple random validation strategy. The Lancet. ,vol. 365, pp. 488- 492 ,(2005) , 10.1016/S0140-6736(05)17866-0