作者: A. Bertoni , G. Valentini
DOI: 10.1109/IJCNN.2005.1555821
关键词:
摘要: Clustering analysis of gene expression is characterized by the very high dimensionality and low cardinality data, two important related topics are validation estimate number obtained clusters. In this paper we focus on stability Our approach to problem based random projections obeying Johnson-Lindenstrauss lemma, which data may be projected into randomly selected dimensional suhspaces, approximately preserving pairwise distances between examples. We experiment with different types projections, comparing empirical theoretical distortions induced randomized embeddings Euclidean metric spaces, present cluster-stability measures that used validate quantitatively assess reliability clusters a large class clustering algorithms. Experimental results synthetic DNA microarray show effectiveness proposed approach.