作者: B. van Voorst
DOI:
关键词:
摘要: The focus of this research is collection selection for distributed information retrieval. descriptions that are necessary selecting the most relevant collections often created from information gathered by random sampling. Collection based on an incomplete index constructed using sampling instead a full leads to inferior results. Contributions In we propose use clustering compensate incompleteness indexes. When used do not only select considered based on their descriptions, but also have similar content in Most existing cluster algorithms require specification number clusters prior execution. We describe new clustering algorithm allows us specify sizes produced clusters. Conclusions Our experiments show can indeed improve performance retrieval systems There much difference between our algorithm and well-known k-means algorithm. suggest proposed because it more scalable.