A phrase-based method for hierarchical clustering of web snippets

作者: Xindong Wu , Zhao Li

DOI:

关键词: Information retrievalHierarchy (mathematics)Cluster analysisHierarchical clustering of networksDocument clusteringPhraseIndex (publishing)Computer scienceBrown clusteringHierarchical clustering

摘要: Document clustering has been applied in web information retrieval, which facilitates users' quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is well-suited for the favor of users. In this regard, we introduce new method snippets exploiting phrase-based document index. our method, hierarchy built based on phrases instead all snippets, and are then assigned to corresponding clusters consisting phrases. We show that, as opposed traditional clustering, not only presents meaningful cluster labels but also improves performance.

参考文章(3)
Ying Zhao, George Karypis, Evaluation of hierarchical clustering algorithms for document datasets conference on information and knowledge management. pp. 515- 524 ,(2002) , 10.1145/584792.584877
Oren Zamir, Oren Etzioni, Web document clustering: a feasibility demonstration international acm sigir conference on research and development in information retrieval. pp. 46- 54 ,(1998) , 10.1145/290941.290956
Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, Jinwen Ma, Learning to cluster web search results Proceedings of the 27th annual international conference on Research and development in information retrieval - SIGIR '04. pp. 210- 217 ,(2004) , 10.1145/1008992.1009030