DOI: 10.1016/0306-4573(86)90097-X
关键词: Cluster analysis 、 Hierarchical clustering 、 Computer science 、 Technical report 、 Data mining 、 Exploit 、 Document retrieval 、 Implementation 、 Information retrieval
摘要: Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and terms. However, information in document-term matrix sparse: usually indexed by relatively few This paper describes implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so much larger than algorithms'' worst case running times would suggest clustered. The described have been used to a collection 12,000 documents.