作者: Zhiwei Qin , Donghui Yan , Songxiang Gu , Ying Xu
DOI:
关键词: Kernel (statistics) 、 Computer science 、 Spectral clustering 、 Simple (abstract algebra) 、 Benchmark (computing) 、 Cluster analysis 、 Kernel (linear algebra) 、 Ensemble learning 、 Random projection 、 Similarity (network science) 、 Pattern recognition 、 Artificial intelligence
摘要: Similarity plays a fundamental role in many areas, including data mining, machine learning, statistics and various applied domains. Inspired by the success of ensemble methods flexibility trees, we propose to learn similarity kernel called rpf-kernel through random projection forests (rpForests). Our theoretical analysis reveals highly desirable property rpf-kernel: far-away (dissimilar) points have low value while nearby (similar) would high similarity}, similarities native interpretation as probability remaining same leaf nodes during growth rpForests. The learned leads an effective clustering algorithm--rpfCluster. On wide variety real benchmark datasets, rpfCluster compares favorably K-means clustering, spectral state-of-the-art algorithm--Cluster Forests. approach is simple implement readily adapt geometry underlying data. Given its competitive empirical performance when expect be applicable problems unsupervised nature or regularizer some supervised weakly settings.