From keywords to keyqueries: content descriptors for the web

作者: Tim Gollub , Matthias Hagen , Maximilian Michel , Benno Stein

DOI: 10.1145/2484028.2484181

关键词: Search analyticsDocument clusteringDynamic web pageInformation retrievalPruning (decision trees)Data miningWorld Wide WebComputer scienceSearch engineIndex (publishing)Digital content

摘要: We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by index and retrieval model a reference search engine: document minimal queries that return in top result ranks. Besides applications fields information data mining, have potential to form basis classification system future digital libraries---the modern version keywords description. To determine document, we present an exhaustive algorithm along with effective pruning strategies. For where small number diverse is sufficient, two tailored strategies proposed. Our experiments emphasize role engine show innovative large, fast evolving bodies such web.

参考文章(12)
Matthias Hagen, Benno Stein, Candidate Document Retrieval for Web-Scale Text Reuse Detection String Processing and Information Retrieval. pp. 356- 367 ,(2011) , 10.1007/978-3-642-24583-1_35
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Matthias Hagen, Benno Maria Stein, None, Capacity-constrained query formulation european conference on research and advanced technology for digital libraries. pp. 384- 388 ,(2010) , 10.1007/978-3-642-15464-5_38
Rada Mihalcea, Paul Tarau, TextRank: Bringing Order into Text empirical methods in natural language processing. pp. 404- 411 ,(2004)
Otis Gospodnetić, Erik Hatcher, Doug Cutting, Lucene in Action ,(2004)
Matthias Hagen, Martin Potthast, Anna Beyer, Benno Stein, Towards optimum query segmentation Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12. pp. 1015- 1024 ,(2012) , 10.1145/2396761.2398398
Norbert Fuhr, Marc Lechtenfeld, Benno Stein, Tim Gollub, The optimum clustering framework: implementing the cluster hypothesis Information Retrieval. ,vol. 15, pp. 93- 115 ,(2012) , 10.1007/S10791-011-9173-9
Olena Medelyan, Su Nam Kim, Min-Yen Kan, Timothy Baldwin, SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles meeting of the association for computational linguistics. pp. 21- 26 ,(2010)
Francesco Bonchi, Carlos Castillo, Debora Donato, Aristides Gionis, Topical query decomposition Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD 08. pp. 52- 60 ,(2008) , 10.1145/1401890.1401902
Leif Azzopardi, Vishwa Vinay, Retrievability Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08. pp. 561- 570 ,(2008) , 10.1145/1458082.1458157