A Generative Blog Post Retrieval Model that Uses Query Expansion based on External Collections

作者: Krisztian Balog , Maarten de Rijke , Wouter Weerkamp

DOI: 10.3115/1690219.1690294

关键词: Information needsVocabularyArtificial intelligenceComputer scienceUser-generated contentGenerative grammarQuery expansionRanking (information retrieval)Natural language processingInformation retrievalGenerative modelBlogosphere

摘要: User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between user's information need documents in a specific user environment, blogosphere, we apply form of query expansion, i.e., adding reweighing terms. Since blogosphere noisy, expansion on collection itself rarely effective but external, edited collections are more suitable. We propose generative model for expanding queries using external which dependencies queries, explicitly modeled. Different instantiations our discussed make different (in)dependence assumptions. Results two (news Wikipedia) show that retrieval effective; besides, conditioning very beneficial, making candidate terms dependent just document seems sufficient.

参考文章(37)
Pranam Kolari, Akshay Java, Tim Finin, Justin Martineau, Anupam Joshi, James Mayfield, The BlogVox Opinion Retrieval System text retrieval conference. ,(2007)
Gilad Mishne, Maarten de Rijke, A Study of Blog Search Lecture Notes in Computer Science. ,vol. 3936, pp. 289- 301 ,(2006) , 10.1007/11735106_26
Iadh Ounis, Maarten de Rijke, Gilad Mishne, Ian Soboroff, Craig Macdonald, Overview of the TREC 2006 Blog Track text retrieval conference. pp. 15- 27 ,(2006)
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Jacques Savoy, Claire Gautsch, UniNE at TREC 2008: Fact and Opinion Retrieval in the Blogsphere text retrieval conference. ,(2008)
Maarten De Rijke, Martha Larson, Jiyin He, Using coherence-based measures to predict query difficulty european conference on information retrieval. pp. 689- 694 ,(2008) , 10.5555/1793274.1793368
Jaime G. Carbonell, Jaime Arguello, Jonathan L. Elsas, Jamie Callan, Retrieval and Feedback Models for Blog Distillation text retrieval conference. ,(2007)
K. L. Kwok, L. Grunfeld, N. Dinstl, M. Chan, TREC-9 Cross Language, Web and Question-Answering Track Experiments using PIRCS. text retrieval conference. ,(2006) , 10.21236/ADA456271
John Lafferty, ChengXiang Zhai, Probabilistic Relevance Models Based on Document and Query Generation Springer, Dordrecht. pp. 1- 10 ,(2003) , 10.1007/978-94-017-0171-6_1