作者: Donald Metzler , Susan Dumais , Christopher Meek
DOI: 10.1007/978-3-540-71496-5_5
关键词:
摘要: Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing two very short segments text. These include query reformulation, sponsored search, image Standard text measures perform poorly on such because data sparseness lack context. In this work, we study problem from an retrieval perspective, focusing representations measures. We examine range measures, including purely lexical stemming, language modeling-based formally evaluate analyze methods query-query task using 363,822 web search log. Our analysis provides insights into strengths weaknesses each method, important tradeoffs effectiveness efficiency.