Cluster-based collection selection in uncooperative distributed information retrieval

作者: B. van Voorst

DOI:

关键词:

摘要: The focus of this research is collection selection for distributed information retrieval. descriptions that are necessary selecting the most relevant collections often created from information gathered by random sampling. Collection based on an incomplete index constructed using sampling instead a full leads to inferior results. Contributions In we propose use clustering compensate incompleteness indexes. When used do not only select considered based on their descriptions, but also have similar content in Most existing cluster algorithms require specification number clusters prior execution. We describe new clustering algorithm allows us specify sizes produced clusters. Conclusions Our experiments show can indeed improve performance retrieval systems There much difference between our algorithm and well-known k-means algorithm. suggest proposed because it more scalable.

参考文章(38)
Protima Banerjee, Hyoil Han, Drexel at TREC 2007: Question Answering. text retrieval conference. ,(2007)
Narayanan Kulathuramaiyer, Wolf-Tilo Balke, Restricting the view and connecting the dots: Dangers of a web search engine monopoly Journal of Universal Computer Science. ,vol. 12, pp. 1731- 1740 ,(2006)
Stanford University. Computer Science Department, Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies very large data bases. pp. 78- 89 ,(1995)
Peter Merz, An Iterated Local Search Approach for Minimum Sum-of-Squares Clustering Advances in Intelligent Data Analysis V. pp. 286- 296 ,(2003) , 10.1007/978-3-540-45231-7_27
James P. Callan, W. Bruce Croft, Stephen M. Harding, The INQUERY Retrieval System database and expert systems applications. pp. 78- 83 ,(1992) , 10.1007/978-3-7091-7557-6_14
George Karypis, Michael Steinbach, Vipin Kumar, A Comparison of Document Clustering Techniques ,(2000)
Budi Yuwono, Dik L Lee, None, Server Ranking for Distributed Text Retrieval Systems on the Internet database systems for advanced applications. pp. 41- 50 ,(1997) , 10.1142/9789812819536_0005
Jamie Callan, Margaret Connell, Query-based sampling of text databases ACM Transactions on Information Systems. ,vol. 19, pp. 97- 130 ,(2001) , 10.1145/382979.383040
Mark D. Smucker, James Allan, Ben Carterette, A comparison of statistical significance tests for information retrieval evaluation conference on information and knowledge management. pp. 623- 632 ,(2007) , 10.1145/1321440.1321528