作者: Krisztian Balog , Maarten de Rijke , Wouter Weerkamp
关键词: Information needs 、 Vocabulary 、 Artificial intelligence 、 Computer science 、 User-generated content 、 Generative grammar 、 Query expansion 、 Ranking (information retrieval) 、 Natural language processing 、 Information retrieval 、 Generative model 、 Blogosphere
摘要: User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between user's information need documents in a specific user environment, blogosphere, we apply form of query expansion, i.e., adding reweighing terms. Since blogosphere noisy, expansion on collection itself rarely effective but external, edited collections are more suitable. We propose generative model for expanding queries using external which dependencies queries, explicitly modeled. Different instantiations our discussed make different (in)dependence assumptions. Results two (news Wikipedia) show that retrieval effective; besides, conditioning very beneficial, making candidate terms dependent just document seems sufficient.