Constant interaction-time scatter/gather browsing of very large document collections

作者: Douglass R. Cutting , David R. Karger , Jan O. Pedersen

DOI: 10.1145/160688.160706

关键词:

摘要: The Scatter/Gather document browsing method uses fast clustering to produce table-of-contents-like outlines of large collections. Previous work [1] developed linear-time algorithms establish the feasibility this over moderately However, even are too slow support interactive very collections such as Tipster, DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time arbitrarily after near-linear time preprocessing. This involves construction cluster hierarchy. A modification employing scheme, and an example its use Tipster collection presented.

参考文章(3)
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 318- 329 ,(1992) , 10.1145/3130348.3130362
Peter Willett, Document clustering using an inverted file approach Journal of Information Science. ,vol. 2, pp. 223- 231 ,(1980) , 10.1177/016555158000200503
R. Sibson, SLINK: An optimally efficient algorithm for the single-link cluster method The Computer Journal. ,vol. 16, pp. 30- 34 ,(1973) , 10.1093/COMJNL/16.1.30