Evaluating Correlation Coefficients for Clustering Gene Expression Profiles of Cancer

作者: Pablo A. Jaskowiak , Ricardo J. G. B. Campello , Ivan G. Costa

DOI: 10.1007/978-3-642-31927-3_11

关键词:

摘要: Cluster analysis is usually the first step adopted to unveil information from gene expression data. One of its common applications clustering cancer samples, associated with detection previously unknown subtypes. Although guidelines have been established concerning choice appropriate algorithms, little attention has given subject proximity measures. Whereas Pearson correlation coefficient appears as de facto measure in this scenario, no comprehensive study analyzing other coefficients alternatives it conducted. Considering such facts, we evaluated five (along Euclidean distance) regarding samples. Our evaluation was conducted on 35 publicly available datasets covering both (i) intrinsic separation ability and (ii) predictive coefficients. results support that rarely considered literature may provide competitive more generally employed ones. Finally, show a recently introduced arises promising alternative commonly Pearson, providing even superior it.

参考文章(37)
Spearman C, The proof and measurement of association between two things. By C. Spearman, 1904. American Journal of Psychology. ,vol. 100, pp. 441- ,(1987)
R. Giancarlo, G. Lo Bosco, L. Pinello, F. Utro, The three steps of clustering in the post-genomic era: a synopsis computational intelligence methods for bioinformatics and biostatistics. pp. 13- 30 ,(2010) , 10.1007/978-3-642-21946-7_2
Robert Gentleman, Vincent J Carey, Wolfgang Huber, Rafael A Irizarry, Sandrine Dudoit, Bioinformatics and Computational Biology Solutions Using R and Bioconductor ,(2006)
Janez Demšar, Statistical Comparisons of Classifiers over Multiple Data Sets Journal of Machine Learning Research. ,vol. 7, pp. 1- 30 ,(2006)
David J. Hand, Robert J. Till, A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems Machine Learning. ,vol. 45, pp. 171- 186 ,(2001) , 10.1023/A:1010920819831
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Laurie J Heyer, Semyon Kruglyak, Shibu Yooseph, Exploring Expression Data: Identification and Analysis of Coexpressed Genes Genome Research. ,vol. 9, pp. 1106- 1115 ,(1999) , 10.1101/GR.9.11.1106
Young Sook Son, Jangsun Baek, A modified correlation coefficient based similarity measure for clustering time-course gene expression data Pattern Recognition Letters. ,vol. 29, pp. 232- 242 ,(2008) , 10.1016/J.PATREC.2007.09.015
Ido Priness, Oded Maimon, Irad Ben-Gal, Evaluation of gene-expression clustering via mutual information distance measure BMC Bioinformatics. ,vol. 8, pp. 111- 111 ,(2007) , 10.1186/1471-2105-8-111
N. Bolshakova, F. Azuaje, Cluster validation techniques for genome expression data Signal Processing. ,vol. 83, pp. 825- 833 ,(2003) , 10.1016/S0165-1684(02)00475-9