作者: Simon Jonassen
关键词:
摘要: Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success engine depends on the speed which it answers queries (efficiency) quality its (effectiveness). These two metrics large impact operational costs overall user satisfaction, determine revenue engine. In this context, any improvement in processing efficiency can reduce improve hence benefit.In thesis, we elaborate efficiency, address several problems within partitioned processing, pruning caching propose novel techniques:First, look at term-wise indexes main limitations state-of-the-art methods. Our first approach combines advantage pipelined traditional (non-pipelined) processing. This assumes one disk access per posting list term-at-a-time For second approach, follow an alternative direction document-at-a-time sub-queries skipping. Subsequently, present skipping extensions as show and/or results. Then, extend these methods intra-query parallelism, low loads.Second, optimizations designed for monolithic index. We efficient self-skipping inverted index modern compression optimizations. that provide significant speed-up compared full (non-pruned) evaluation gap between disjunctive (OR) conjunctive (AND) queries. also linear programming optimization further I/O, decompression computation Max-Score.Third, independent contributions. First, analytical model finds optimal split static memory-based two-level cache. Second, strategies selecting, ordering scheduling prefetch demonstrate effectiveness engines.We carefully evaluate our ideas either using real implementation or by simulation real-world text collections logs. Most proposed techniques are found conducted empirical studies. However, implications applicability practice need real-life settings.This dissertation was completed Department Computer Information Science Norwegian University Technology (NTNU) under advise Prof. Svein Erik Bratsberg, Dr. Oystein Torbjornsen Magnus Lie Hetland. Some work done collaboration Yahoo! Research Barcelona mentored Ricardo Baeza-Yates B. Barla Cambazoglu. Alistair Moffat (University Melbourne), Christina Lioma Copenhagen) Kjell Bratsbergsengen served committee member.Available online at: http://www.idi.ntnu.no/research/doctor_theses/simonj.pdf.