Ensembles based on random projections to improve the accuracy of clustering algorithms

作者: Alberto Bertoni , Giorgio Valentini

DOI: 10.1007/11731177_5

关键词:

摘要: We present an algorithmic scheme for unsupervised cluster ensembles, based on randomized projections between metric spaces, by which a substantial dimensionality reduction is obtained. Multiple clusterings are performed random subspaces, approximately preserving the distances projected data, and then they combined using pairwise similarity matrix; in this way accuracy of each “base” clustering maintained, diversity them improved. The proposed approach effective problems characterized high dimensional as shown our preliminary experimental results.

参考文章(10)
Giorgio Valentini, Francesco Masulli, Ensembles of Learning Machines italian workshop on neural nets. ,vol. 2486, pp. 3- 22 ,(2002) , 10.1007/3-540-45808-5_1
Thomas G. Dietterich, Ensemble Methods in Machine Learning Multiple Classifier Systems. pp. 1- 15 ,(2000) , 10.1007/3-540-45014-9_1
A. Bertoni, G. Valentini, Random projections for assessing gene expression cluster stability international joint conference on neural network. ,vol. 1, pp. 149- 154 ,(2005) , 10.1109/IJCNN.2005.1555821
Mark Smolkin, Debashis Ghosh, Cluster stability scores for microarray data in cancer studies. BMC Bioinformatics. ,vol. 4, pp. 36- 36 ,(2003) , 10.1186/1471-2105-4-36
Joe H. Ward, Hierarchical Grouping to Optimize an Objective Function Journal of the American Statistical Association. ,vol. 58, pp. 236- 244 ,(1963) , 10.1080/01621459.1963.10500845
Ella Bingham, Heikki Mannila, Random projection in dimensionality reduction: applications to image and text data knowledge discovery and data mining. pp. 245- 250 ,(2001) , 10.1145/502512.502546
Alexander Strehl, Joydeep Ghosh, Cluster ensembles --- a knowledge reuse framework for combining multiple partitions Journal of Machine Learning Research. ,vol. 3, pp. 583- 617 ,(2003) , 10.1162/153244303321897735
Stefan T. Hadjitodorov, Ludmila I. Kuncheva, Ludmila P. Todorova, Moderate diversity for better cluster ensembles Information Fusion. ,vol. 7, pp. 264- 275 ,(2006) , 10.1016/J.INFFUS.2005.01.008
Tin Kam Ho, The random subspace method for constructing decision forests IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 20, pp. 832- 844 ,(1998) , 10.1109/34.709601
G. Valentini, An experimental bias-variance analysis of SVM ensembles based on resampling techniques systems man and cybernetics. ,vol. 35, pp. 1252- 1271 ,(2005) , 10.1109/TSMCB.2005.850183