Implementing agglomerative hierarchic clustering algorithms for use in document retrieval

作者: Ellen M. Voorhees

DOI: 10.1016/0306-4573(86)90097-X

关键词: Cluster analysisHierarchical clusteringComputer scienceTechnical reportData miningExploitDocument retrievalImplementationInformation retrieval

摘要: Searching hierarchically clustered document collections can be effective, but creating the cluster hierarchies is expensive since there are both many documents and terms. However, information in document-term matrix sparse: usually indexed by relatively few This paper describes implementations of three agglomerative hierarchic clustering algorithms that exploit this sparsity so much larger than algorithms'' worst case running times would suggest clustered. The described have been used to a collection 12,000 documents.

参考文章(13)
Robert Endre Tarjan, Data Structures and Network Algorithms ,(1983)
Ellen M. Voorhees, The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval The Effectiveness and Efficiency of Agglomerative Hierarchic Clustering in Document Retrieval. ,(1985)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
W. Bruce Croft, Clustering large files of documents using the single-link method Journal of the Association for Information Science and Technology. ,vol. 28, pp. 341- 344 ,(1977) , 10.1002/ASI.4630280606
W. Bruce Croft, A file organization for cluster-based retrieval ACM SIGIR Forum. ,vol. 13, pp. 65- 82 ,(1978) , 10.1145/1013234.803136
C.J. van Rijsbergen, Further Experiments with Hierarchic Clustering in Document Retrieval. Information Storage and Retrieval. ,vol. 10, pp. 1- 14 ,(1974) , 10.1016/0020-0271(74)90038-2
N. Jardine, C.J. van Rijsbergen, The use of hierarchic clustering in information retrieval Information Storage and Retrieval. ,vol. 7, pp. 217- 240 ,(1971) , 10.1016/0020-0271(71)90051-9
C.J. Van Rijsbergen, W.B. Croft, Document clustering: An evaluation of some experiments with the cranfield 1400 collection Information Processing & Management. ,vol. 11, pp. 171- 182 ,(1975) , 10.1016/0306-4573(75)90006-0
W.Bruce Croft, A model of cluster searching based on classification Information Systems. ,vol. 5, pp. 189- 195 ,(1980) , 10.1016/0306-4379(80)90010-1
Peter Willett, A note on the use of nearest neighbors for implementing single linkage document classifications Journal of the American Society for Information Science. ,vol. 35, pp. 149- 152 ,(1984) , 10.1002/ASI.4630350303