Performance standards and evaluations in IR test collections: cluster-based retrieval models

作者: W.M. Shaw , Robert Burgin , Patrick Howell

DOI: 10.1016/S0306-4573(96)00043-X

关键词:

摘要: Low performance standards for the group of queries in 13 retrieval test collections have been computed. Derived from random graph hypothesis, these represent highest levels effectiveness that can be obtained meaningless clustering structures. Operational cluster-based reported selected sources during past 20 years compared to standards. Comparisons show typical operational explained on basis chance. Indeed, most results are lower than those predicted by theory. A tentative explanation poor reveals weaknesses both fundamental assumptions and implementations. The cluster hypothesis offers no guarantee relevant documents naturally grouped together, algorithms may not reveal inherent structure a set documents, strategies do reliably retrieve effective or clusters documents. That implementations implicitly rely topical relatedness equivalent relevance relationship contributes performance. Clustering capable adapting information succeed where static techniques failed.

参考文章(43)
Richard Dubes, A.K. Jain, Clustering Methodologies in Exploratory Data Analysis Advances in Computers. ,vol. 19, pp. 113- 228 ,(1980) , 10.1016/S0065-2458(08)60034-0
Robert F. Ling, The Expected Number of Components in Random Linear Graphs Annals of Probability. ,vol. 1, pp. 876- 881 ,(1973) , 10.1214/AOP/1176996856
C.J. VAN RIJSBERGEN, K. SPARCK JONES, A TEST FOR THE SEPARATION OF RELEVANT AND NON‐RELEVANT DOCUMENTS IN EXPERIMENTAL RETRIEVAL COLLECTIONS Journal of Documentation. ,vol. 29, pp. 251- 257 ,(1973) , 10.1108/EB026557
W.John Wilbur, Leona Coffee, The effectiveness of document neighboring in search enhancement Information Processing and Management. ,vol. 30, pp. 253- 266 ,(1994) , 10.1016/0306-4573(94)90068-X
Lee R. Dice, Measures of the Amount of Ecologic Association Between Species Ecology. ,vol. 26, pp. 297- 302 ,(1945) , 10.2307/1932409
K. SPARCK JONES, C.J. VAN RIJSBERGEN, INFORMATION RETRIEVAL TEST COLLECTIONS Journal of Documentation. ,vol. 32, pp. 59- 72 ,(1976) , 10.1108/EB026616
J. H. Curtiss, John F. Kenney, Mathematics of Statistics. The American Mathematical Monthly. ,vol. 47, pp. 309- ,(1940) , 10.2307/2302695
C.J. van Rijsbergen, Further Experiments with Hierarchic Clustering in Document Retrieval. Information Storage and Retrieval. ,vol. 10, pp. 1- 14 ,(1974) , 10.1016/0020-0271(74)90038-2
W.M. Shaw, Term-relevance computations and perfect retrieval performance Information Processing and Management. ,vol. 31, pp. 491- 498 ,(1995) , 10.1016/0306-4573(95)00011-5