作者: Tim Gollub , Matthias Hagen , Maximilian Michel , Benno Stein
关键词: Search analytics 、 Document clustering 、 Dynamic web page 、 Information retrieval 、 Pruning (decision trees) 、 Data mining 、 World Wide Web 、 Computer science 、 Search engine 、 Index (publishing) 、 Digital content
摘要: We introduce the concept of keyqueries as dynamic content descriptors for documents. Keyqueries are defined implicitly by index and retrieval model a reference search engine: document minimal queries that return in top result ranks. Besides applications fields information data mining, have potential to form basis classification system future digital libraries---the modern version keywords description. To determine document, we present an exhaustive algorithm along with effective pruning strategies. For where small number diverse is sufficient, two tailored strategies proposed. Our experiments emphasize role engine show innovative large, fast evolving bodies such web.