Similarity Kernel and Clustering via Random Projection Forests.

作者: Zhiwei Qin , Donghui Yan , Songxiang Gu , Ying Xu

DOI:

关键词: Kernel (statistics)Computer scienceSpectral clusteringSimple (abstract algebra)Benchmark (computing)Cluster analysisKernel (linear algebra)Ensemble learningRandom projectionSimilarity (network science)Pattern recognitionArtificial intelligence

摘要: Similarity plays a fundamental role in many areas, including data mining, machine learning, statistics and various applied domains. Inspired by the success of ensemble methods flexibility trees, we propose to learn similarity kernel called rpf-kernel through random projection forests (rpForests). Our theoretical analysis reveals highly desirable property rpf-kernel: far-away (dissimilar) points have low value while nearby (similar) would high similarity}, similarities native interpretation as probability remaining same leaf nodes during growth rpForests. The learned leads an effective clustering algorithm--rpfCluster. On wide variety real benchmark datasets, rpfCluster compares favorably K-means clustering, spectral state-of-the-art algorithm--Cluster Forests. approach is simple implement readily adapt geometry underlying data. Given its competitive empirical performance when expect be applicable problems unsupervised nature or regularizer some supervised weakly settings.

参考文章(67)
Xiaoming Huo, Andrew K. Smith, A Survey of Manifold-Based Learning Methods ,(2007)
Cyrus Shahabi, Donghui Yan, Real-time Pattern Isolation and Recognition Over Immersive Sensor Data Streams. conference on multimedia modeling. pp. 93- 113 ,(2003)
Michal Kleinbort, Oren Salzman, Dan Halperin, Efficient high-quality motion planning by fast all-pairs r-nearest-neighbors international conference on robotics and automation. pp. 2985- 2990 ,(2015) , 10.1109/ICRA.2015.7139608
Alberto Bertoni, Giorgio Valentini, Ensembles based on random projections to improve the accuracy of clustering algorithms italian workshop on neural nets. pp. 31- 37 ,(2005) , 10.1007/11731177_5
Fabrizio Angiulli, Clara Pizzuti, Fast Outlier Detection in High Dimensional Spaces european conference on principles of data mining and knowledge discovery. pp. 15- 26 ,(2002) , 10.1007/3-540-45681-3_2
Samy Bengio, Uri Shalit, Varun Sharma, Gal Chechik, Large Scale Online Learning of Image Similarity Through Ranking Journal of Machine Learning Research. ,vol. 11, pp. 1109- 1135 ,(2010)
Bernhard Schölkopf, Alexander J. Smola, Learning with Kernels The MIT Press. pp. 626- ,(2018) , 10.7551/MITPRESS/4175.001.0001
Dacheng Tao, Chang Xu, Chao Xu, A Survey on Multi-view Learning arXiv: Learning. ,(2013)
Mathew D. Penrose, J. E. Yukich, Laws of Large Numbers and Nearest Neighbor Distances arXiv: Probability. pp. 189- 199 ,(2011) , 10.1007/978-3-7908-2628-9_13
Amaury Habrard, Aurélien Bellet, Marc Sebban, A Survey on Metric Learning for Feature Vectors and Structured Data arXiv: Learning. ,(2013)