作者: Xindong Wu , Zhao Li
DOI:
关键词: Information retrieval 、 Hierarchy (mathematics) 、 Cluster analysis 、 Hierarchical clustering of networks 、 Document clustering 、 Phrase 、 Index (publishing) 、 Computer science 、 Brown clustering 、 Hierarchical clustering
摘要: Document clustering has been applied in web information retrieval, which facilitates users' quick browsing by organizing retrieved results into different groups. Meanwhile, a tree-like hierarchical structure is well-suited for the favor of users. In this regard, we introduce new method snippets exploiting phrase-based document index. our method, hierarchy built based on phrases instead all snippets, and are then assigned to corresponding clusters consisting phrases. We show that, as opposed traditional clustering, not only presents meaningful cluster labels but also improves performance.