摘要: Broadly speaking, given a corpus of documents and a user’s information needs (such as Who won the 2012 US presidential elections?), together with a subsequent search request which is submitted to a search engine (“2012 US elections”), information retrieval (IR) is concerned with the efficient retrieval and ranking of the documents in response to the query. A ranking of documents is considered to be of high quality if the top-ranked documents are relevant, that is, if they aid the user in answering their information needs. Traditionally, text-based information retrieval research has focussed on corpora, which—though diverse in the type of documents (such as news articles, Web pages, or patents)—have a number of commonalities: