Improving the performance of pipelined query processing with skipping

作者: Simon Jonassen , Svein Erik Bratsberg

DOI: 10.1007/978-3-642-35063-4_1

关键词: Computer scienceScalabilityQuery optimizationSearch engineParallel computingQuery expansionInverted indexTheoretical computer science

摘要: Web search engines need to provide high throughput and short query latency. Recent results show that pipelined processing over a term-wise partitioned inverted index may have superior throughput. However, the latency scalability with respect collections size are main challenges associated this method. In paper, we evaluate effect of skipping on performance processing. Further, introduce novel idea using Max-Score pruning within new term assignment heuristic, partitioning by Max-Score. Our current indicate significant improvement state-of-the-art approach lead several further optimizations, which include dynamic load balancing, intra-query concurrent hybrid combination between non-pipelined execution.

参考文章(17)
Simon Jonassen, Svein Erik Bratsberg, Intra-query concurrent pipelined processing for distributed full-text retrieval european conference on information retrieval. pp. 413- 425 ,(2012) , 10.1007/978-3-642-28997-2_35
Nicholas Lester, Alistair Moffat, William Webber, Justin Zobel, Space-Limited ranked query evaluation using adaptive pruning web information systems engineering. pp. 470- 477 ,(2005) , 10.1007/11581062_37
Peter Triantafillou, Torsten Suel, Lei Chen, Web Information Systems Engineering - Wise 2010 ,(2011)
Simon Jonassen, Svein Erik Bratsberg, Efficient Compressed Inverted Index Skipping for Disjunctive Text-Queries Lecture Notes in Computer Science. pp. 530- 542 ,(2011) , 10.1007/978-3-642-20161-5_53
Marianne Lykke, Birger Larsen, Haakon Lund, Peter Ingwersen, Developing a Test Collection for the Evaluation of Integrated Search Lecture Notes in Computer Science. pp. 627- 630 ,(2010) , 10.1007/978-3-642-12275-0_63
Raffaele Perego, Claudio Lucchese, Salvatore Orlando, Fabrizio Silvestri, Mining query logs to optimize index partitioning in parallel web search engines scalable information systems. pp. 43- 43 ,(2007) , 10.5555/1366804.1366860
Alistair Moffat, William Webber, Justin Zobel, Load balancing for term-distributed parallel retrieval Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '06. pp. 348- 355 ,(2006) , 10.1145/1148170.1148232
Howard Turtle, James Flood, Query evaluation: strategies and optimizations Information Processing and Management. ,vol. 31, pp. 831- 850 ,(1995) , 10.1016/0306-4573(95)00020-H
B. Barla Cambazoglu, Enver Kayaaslan, Simon Jonassen, Cevdet Aykanat, A term-based inverted index partitioning model for efficient distributed query processing ACM Transactions on The Web. ,vol. 7, pp. 15- ,(2013) , 10.1145/2516633.2516637