Clustering algorithms: on learning, validation, performance, and applications to genomics.

作者: Lori Dalton , Virginia Ballarin , Marcel Brun

DOI: 10.2174/138920209789177601

关键词: Cluster analysisDNA microarrayMicroarray analysis techniquesData miningProfiling (information science)Computer scienceImage processingGenomicsSIMPLE algorithmGene chip analysis

摘要: The development of microarray technology has enabled scientists to measure the expression thousands genes simultaneously, resulting in a surge interest several disciplines throughout biology and medicine. While data clustering been used for decades image processing pattern recognition, recent years it joined this wave activity as popular technique analyze microarrays. To illustrate its application genomics, applied from set groups together those whose levels exhibit similar behavior samples, when samples offers potential discriminate pathologies based on their differential patterns gene expression. Although now many context microarrays, remained highly problematic. choice algorithm validation index is not trivial one, more so applying them high throughput biological or medical data. Factors consider choosing an include nature application, characteristics objects be analyzed, expected number shape clusters, complexity problem versus computational power available. In some cases very simple may appropriate tackle problem, but situations require complex powerful better suited job at hand. paper, we will cover theoretical aspects clustering, including error learning, followed by overview algorithms classical indices. We also discuss relative performance these indices conclude with examples biology.

参考文章(54)
Francisco Azuaje, Nadia Bolshakova, None, Clustering Genomic Expression Data: Design and Evaluation Principles Springer, Boston, MA. pp. 230- 245 ,(2003) , 10.1007/0-306-47815-3_13
Volker Roth, Tilman Lange, Mikio Braun, Joachim Buhmann, A Resampling Approach to Cluster Validation COMPSTAT. pp. 123- 128 ,(2002) , 10.1007/978-3-642-57489-4_13
László Györfi, Luc Devroye, Gábor Lugosi, A Probabilistic Theory of Pattern Recognition ,(1996)
M. Bittner, P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, M. Radmacher, R. Simon, Z. Yakhini, A. Ben-Dor, N. Sampas, E. Dougherty, E. Wang, F. Marincola, C. Gooden, J. Lueders, A. Glatfelter, P. Pollock, J. Carpten, E. Gillanders, D. Leja, K. Dietrich, C. Beaudry, M. Berens, D. Alberts, V. Sondak, N. Hayward, J. Trent, Molecular classification of cutaneous malignant melanoma by gene expression profiling Nature. ,vol. 406, pp. 536- 540 ,(2000) , 10.1038/35020115
Ron Shamir, Roded Sharan, Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis intelligent systems in molecular biology. ,vol. 8, pp. 307- 316 ,(2000)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Peter J. Rousseeuw, Silhouettes: a graphical aid to the interpretation and validation of cluster analysis Journal of Computational and Applied Mathematics. ,vol. 20, pp. 53- 65 ,(1987) , 10.1016/0377-0427(87)90125-7
Petri Törönen, Mikko Kolehmainen, Garry Wong, Eero Castrén, Analysis of gene expression data using self‐organizing maps FEBS Letters. ,vol. 451, pp. 142- 146 ,(1999) , 10.1016/S0014-5793(99)00524-4
Lars Bullinger, Konstanze Döhner, Eric Bair, Stefan Fröhling, Richard F. Schlenk, Robert Tibshirani, Hartmut Döhner, Jonathan R. Pollack, Use of Gene-Expression Profiling to Identify Prognostic Subclasses in Adult Acute Myeloid Leukemia New England Journal of Medicine. ,vol. 350, pp. 1605- 1616 ,(2004) , 10.1056/NEJMOA031046
A. K. Jain, M. N. Murty, P. J. Flynn, Data clustering: a review ACM Computing Surveys. ,vol. 31, pp. 264- 323 ,(1999) , 10.1145/331499.331504