作者: W.M. Shaw , Robert Burgin , Patrick Howell
DOI: 10.1016/S0306-4573(96)00043-X
关键词:
摘要: Low performance standards for the group of queries in 13 retrieval test collections have been computed. Derived from random graph hypothesis, these represent highest levels effectiveness that can be obtained meaningless clustering structures. Operational cluster-based reported selected sources during past 20 years compared to standards. Comparisons show typical operational explained on basis chance. Indeed, most results are lower than those predicted by theory. A tentative explanation poor reveals weaknesses both fundamental assumptions and implementations. The cluster hypothesis offers no guarantee relevant documents naturally grouped together, algorithms may not reveal inherent structure a set documents, strategies do reliably retrieve effective or clusters documents. That implementations implicitly rely topical relatedness equivalent relevance relationship contributes performance. Clustering capable adapting information succeed where static techniques failed.