作者: Douglass R. Cutting , David R. Karger , Jan O. Pedersen
关键词:
摘要: The Scatter/Gather document browsing method uses fast clustering to produce table-of-contents-like outlines of large collections. Previous work [1] developed linear-time algorithms establish the feasibility this over moderately However, even are too slow support interactive very collections such as Tipster, DARPA standard text retrieval evaluation collection. We present a scheme that supports constant interaction-time arbitrarily after near-linear time preprocessing. This involves construction cluster hierarchy. A modification employing scheme, and an example its use Tipster collection presented.