Extension of similarity measures in VSM: From orthogonal coordinate system to affine coordinate system

作者: Junyu Xuan , Jie Lu , Guangquan Zhang , Xiangfeng Luo

DOI: 10.1109/IJCNN.2014.6889693

关键词:

摘要: Similarity measures are the foundations of many research areas, e.g. information retrieval, recommender system and machine learning algorithms. Promoted by these application scenarios, a number similarity have been proposed proposing. In state-of-the-art measures, vector-based representation is widely accepted based on Vector Space Model (VSM) in which an object represented as vector composed its features. Then, between two objects evaluated operations corresponding vectors, like cosine, extended jaccard, dice so on. However, there assumption that features independent each others. This apparently unrealistic, normally, relations features, i.e. co-occurrence keywords text mining area. this paper, space geometry-based method to extend VSM from orthogonal coordinate (OVSM) affine (AVSM) OVSM proved be special case AVSM. Unit vectors AVSM inferred considered angles unit vectors. At last, five different using Within numerous fields task clustering selected evaluation criterion. Documents AVSM, respectively. The results show outweighs OVSM.

参考文章(26)
Michal Barla, Mária Bieliková, On Deriving Tagsonomies: Keyword Relations Coming from Crowd international conference on computational collective intelligence. pp. 309- 320 ,(2009) , 10.1007/978-3-642-04441-0_27
Sung-Hyuk Cha, Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions International Journal of Mathematical Models and Methods in Applied Sciences. ,vol. 1, ,(2007)
Joydeep Ghosh, Raymond Mooney, Alexander Strehl, Impact of Similarity Measures on Web-page Clustering ,(2000)
Tatsuya Kawahara, Cheongjae Lee, Hybrid vector space model for flexible voice search asia pacific signal and information processing association annual summit and conference. pp. 1- 4 ,(2012)
Zui Zhang, Hua Lin, Kun Liu, Dianshuang Wu, Guangquan Zhang, Jie Lu, A hybrid fuzzy-based personalized recommender system for telecom products/services Information Sciences. ,vol. 235, pp. 117- 129 ,(2013) , 10.1016/J.INS.2013.01.025
S. K.M. Wong, W. Ziarko, V. V. Raghavan, P. C.N. Wong, On modeling of information retrieval concepts in vector spaces ACM Transactions on Database Systems. ,vol. 12, pp. 299- 321 ,(1987) , 10.1145/22952.22957
Alexander Strehl, Joydeep Ghosh, Value-based customer grouping from large retail data-sets Proceedings of SPIE - The International Society for Optical Engineering. ,vol. 4057, pp. 33- 42 ,(2000) , 10.1117/12.381756
Ning Liu, Benyu Zhang, Jun Yan, Qiang Yang, Shuicheng Yan, Zheng Chen, Fengshan Bai, Wei-Ying Ma, Learning similarity measures in non-orthogonal space conference on information and knowledge management. pp. 334- 341 ,(2004) , 10.1145/1031171.1031240
Xiaoying Tai, Minoru Sasaki, Yasuhito Tanaka, Kenji Kita, Improvement of vector space information retrieval model based on supervised learning Proceedings of the fifth international workshop on on Information retrieval with Asian languages - IRAL '00. pp. 69- 74 ,(2000) , 10.1145/355214.355224