An Improved Feature Selection Method for Chinese Short Texts Clustering Based on HowNet

作者: Xin Chen , Yuqing Zhang , Long Cao , Donghui Li

DOI: 10.1007/978-3-319-01766-2_73

关键词:

摘要: Short texts have played an important role in the field of text data mining. Because problems arousing from complexity Chinese semantics and sparseness, which is obvious characteristic short texts, it necessary to explore some new semantic-based methods cluster texts. An improved approach feature selection based on HowNet applied this paper address sparseness By redefining Vector Space Model semantic level merging generalized synonymy features, we present a generation strategy. Experimental results show that by similar feature, our method effective dimension reduction gets better clustering performance. The proposed HowNet-based suitable for clustering.

参考文章(14)
Evgeniy Gabrilovich, Shaul Markovitch, Feature generation for text categorization using world knowledge international joint conference on artificial intelligence. pp. 1048- 1053 ,(2005)
Donald Metzler, Susan Dumais, Christopher Meek, Similarity measures for short segments of text european conference on information retrieval. pp. 16- 27 ,(2007) , 10.1007/978-3-540-71496-5_5
Xiaojun Quan, Gang Liu, Zhi Lu, Xingliang Ni, Liu Wenyin, Short text similarity based on probabilistic topics Knowledge and Information Systems. ,vol. 25, pp. 473- 491 ,(2010) , 10.1007/S10115-009-0250-Y
Chun Xia Jin, Hai Yan Zhou, Qiu Chan Bai, Short Text Clustering Algorithm with Feature Keyword Expansion Advanced Materials Research. pp. 1716- 1720 ,(2012) , 10.4028/WWW.SCIENTIFIC.NET/AMR.532-533.1716
Xia Hu, Nan Sun, Chao Zhang, Tat-Seng Chua, Exploiting internal and external semantics for the clustering of short texts using world knowledge Proceeding of the 18th ACM conference on Information and knowledge management - CIKM '09. pp. 919- 928 ,(2009) , 10.1145/1645953.1646071
Paolo Rosso, Marcelo Errecalde, David Pinto, Analysis of short texts on the Web: introduction to special issue language resources and evaluation. ,vol. 47, pp. 123- 126 ,(2013) , 10.1007/S10579-013-9220-9
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua Li, Qiang Yang, Zheng Chen, Enhancing text clustering by leveraging Wikipedia semantics Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08. pp. 179- 186 ,(2008) , 10.1145/1390334.1390367
Mehran Sahami, Timothy D. Heilman, A web-based kernel function for measuring the similarity of short text snippets Proceedings of the 15th international conference on World Wide Web - WWW '06. pp. 377- 386 ,(2006) , 10.1145/1135777.1135834
G. Salton, A. Wong, C. S. Yang, A vector space model for automatic indexing Communications of the ACM. ,vol. 18, pp. 613- 620 ,(1975) , 10.1145/361219.361220