作者: C. A. Davis , F. Gerick , V. Hintermair , C. C. Friedel , K. Fundel
DOI: 10.1093/BIOINFORMATICS/BTL400
关键词: Gene 、 Gene selection 、 Classification methods 、 Cartilage sample 、 Data mining 、 Microarray 、 Pairwise comparison 、 Biology 、 Classifier (UML) 、 R package
摘要: Motivation: Two important questions for the analysis of gene expression measurements from different sample classes are (1) how to classify samples and (2) identify meaningful signatures (ranked lists) exhibiting differences between subsets. Solutions both have immediate biological biomedical applications. To achieve optimal classification performance, a suitable combination classifier selection method needs be specifically selected given dataset. The can unstable resulting accuracy unreliable, particularly when considering subsets samples. Both overestimated impair conclusions. Methods: We address these two issues by repeatedly evaluating performance all models, i.e. pairwise combinations various methods, random arrays (sampling). A model score is used select most appropriate Consensus constructed extracting those genes frequently over many samplings. Sampling additionally permits measurement stability each model, which serves as measure reliability. Results: analyzed large dataset with 78 four cartilage classes. Classifiers trained on produce models highly variable performance. Our approach provides reliable estimates via sampling. In addition we determined stable consensus (i.e. Manual literature screening showed that relevant our experiment osteoarthritic cartilage. compared others based publicly available breast cancer. Availability: R package at http://www.bio.ifi.lmu.de/~davis/edaprakt Contact: ralf.zimmer@bio.ifi.lmu.de