GMM-ClusterForest: a novel indexing approach for multi-features based similarity search in high-dimensional spaces

作者: Yuchai Wan , Xiabi Liu , Kunqi Tong , Xue Wei , Yi Wu

DOI: 10.1007/978-3-642-34481-7_26

关键词: GaussianMathematicsNearest neighbor searchMinimum description lengthArtificial intelligenceMixture modelData miningTree (data structure)Search engine indexingImage retrievalPattern recognitionCluster analysis

摘要: This paper proposes a novel clustering based indexing approach called GMM-ClusterForest for supporting multi-features similarity search in high-dimensional spaces. We fit Gaussian Mixture Model (GMM) to data through the Expectation-Maximization (EM) algorithm estimating GMM parameters and Minimum Description Length (MDL) criterion selecting structure. Each component is taken as cluster center each point assigned according Bayesian decision rule. By performing this method hierarchically, an index tree constructed corresponding developed type of features. Then fulfilled by fusing trees all types features considered. evaluated proposed applying it example-based image retrieval conducting experiments on Corel 1000 dataset self-collected large dataset. The experimental results show that our effective promising.

参考文章(13)
Ben Wang, John Q. Gan, Integration of Projected Clusters and Principal Axis Trees for High-Dimensional Data Indexing and Query intelligent data engineering and automated learning. pp. 191- 196 ,(2004) , 10.1007/978-3-540-28651-6_28
Hans-Jörg Schek, Stephen Blott, Roger Weber, A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces very large data bases. pp. 194- 205 ,(1998)
Kristin P. Bennett, Usama Fayyad, Dan Geiger, Density-based indexing for approximate nearest-neighbor queries knowledge discovery and data mining. pp. 233- 243 ,(1999) , 10.1145/312129.312236
Mark H Hansen, Bin Yu, Model Selection and the Principle of Minimum Description Length Journal of the American Statistical Association. ,vol. 96, pp. 746- 774 ,(2001) , 10.1198/016214501753168398
Hongli Xu, Dantong Yu, De Xu, Aidong Zhang, SS-ClusterTree Proceedings of the 2008 international conference on Content-based image and video retrieval - CIVR '08. pp. 95- 104 ,(2008) , 10.1145/1386352.1386369
Wenbing Tao, Hai Jin, Feng Luo, Kun Wu, Integrating image clustering and memory indexing for large scale content-based image retrieval MIPPR 2009: Remote Sensing and GIS Data Processing and Other Applications. ,vol. 7498, pp. 749853- ,(2009) , 10.1117/12.834309
Dantong Yu, Aidong Zhang, ClusterTree: integration of cluster representation and nearest-neighbor search for large data sets with high dimensions IEEE Transactions on Knowledge and Data Engineering. ,vol. 15, pp. 1316- 1337 ,(2003) , 10.1109/TKDE.2003.1232281
J.Z. Wang, Jia Li, G. Wiederhold, SIMPLIcity: semantics-sensitive integrated matching for picture libraries IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 23, pp. 947- 963 ,(2001) , 10.1109/34.955109
N. Vlassis, A. Likas, A kurtosis-based dynamic approach to Gaussian mixture modeling systems man and cybernetics. ,vol. 29, pp. 393- 399 ,(1999) , 10.1109/3468.769758
C. Li, G. Biswas, Unsupervised learning with mixed numeric and nominal data IEEE Transactions on Knowledge and Data Engineering. ,vol. 14, pp. 673- 690 ,(2002) , 10.1109/TKDE.2002.1019208