Similarity measures for short segments of text

作者: Donald Metzler , Susan Dumais , Christopher Meek

DOI: 10.1007/978-3-540-71496-5_5

关键词:

摘要: Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing two very short segments text. These include query reformulation, sponsored search, image Standard text measures perform poorly on such because data sparseness lack context. In this work, we study problem from an retrieval perspective, focusing representations measures. We examine range measures, including purely lexical stemming, language modeling-based formally evaluate analyze methods query-query task using 363,822 web search log. Our analysis provides insights into strengths weaknesses each method, important tradeoffs effectiveness efficiency.

参考文章(31)
William Russell Softky, Shermann Loyall Min, Constantin Lorenzo Tanno, Zachary Frank Mainen, System and method for context-based document retrieval ,(2000)
Sindhu Joseph, Sreedharan Venkataraman, Supervised self organizing maps with fuzzy error correction ,(2002)
Elena M. Zamora, Antonio Zamora, Morphological/phonetic method for ranking word similarities ,(1988)
Matthew S. Sommer, Kevin B. Thompson, Information exploration systems and methods ,(2006)
Hirokazu Taki, Atsushi Kanaegami, Hitoshi Ohgashi, Kazuhiro Koike, Text search system for locating on the basis of keyword matching and keyword relationship matching ,(1992)