NTLM: a time-enhanced language model based ranking approach for web search

作者: Xiaowen Li , Peiquan Jin , Xujian Zhao , Hong Chen , Lihua Yue

DOI: 10.1007/978-3-642-24396-7_13

关键词:

摘要: Time plays important roles in Web search, because most pages contain time information and a lot of queries are time-related. However, traditional search engines have little consideration on the pages. In particular, they do not take into account when ranking results. this paper, we present NTLM, new time-enhanced language model based algorithm for search. First, an effective to extract 〈keyword, content time〉 pairs pages, which associate each keyword page with appropriate time. Then introduce concept temporal tf, time-constrained term frequency, keyword. After that, propose measure similarity between temporal-textual basis combination textual relevance relevance. We conduct comparison experiments NTLM five competitor algorithms use two datasets, different types queries, metrics as MRR NDCG evaluate performance. The experimental results show that step extracting pairs, reaches high precision 93.2%, step, wins best respect NDCG.

参考文章(25)
Stephen E. Robertson, Steve Walker, Okapi/Keenbow at TREC-8. text retrieval conference. pp. 151- 162 ,(1999)
Martin Wechsler, The Probability Ranking Principle Revisited Information Retrieval. ,vol. 3, pp. 217- 227 ,(2000) , 10.1023/A:1026516825764
r;ribeiro-neto bueza-yates (b), Modern Information Retrieval ,(1999)
Taro Tezuka, Katsumi Tanaka, Temporal and Spatial Attribute Extraction from Web Documents and Time-Specific Regional Web Search System Web and Wireless Geographical Information Systems. pp. 14- 25 ,(2005) , 10.1007/11427865_2
Thomas Neumann, Gerhard Weikum, Klaus Berberich, Srikanta Bedathur, A time machine for text search international acm sigir conference on research and development in information retrieval. pp. 519- 526 ,(2007) , 10.1145/1277741.1277831
Chengxiang Zhai, John Lafferty, A study of smoothing methods for language models applied to information retrieval ACM Transactions on Information Systems. ,vol. 22, pp. 179- 214 ,(2004) , 10.1145/984321.984322
Xiaoyan Li, W. Bruce Croft, Time-based language models Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03. pp. 469- 475 ,(2003) , 10.1145/956863.956951
Masaharu Yoshioka, Makoto Haraguchi, On a combination of probabilistic and boolean ir models for WWW document retrieval ACM Transactions on Asian Language Information Processing. ,vol. 4, pp. 340- 356 ,(2005) , 10.1145/1111667.1111674
Wisam Dakka, Luis Gravano, Panagiotis Ipeirotis, Answering General Time-Sensitive Queries IEEE Transactions on Knowledge and Data Engineering. ,vol. 24, pp. 220- 235 ,(2012) , 10.1109/TKDE.2010.187