作者: Marco Maggini , Marco Pellegrini , Filippo Geraci , Fabrizio Sebastiani
DOI:
关键词: Information retrieval 、 Cluster analysis 、 Benchmark (computing) 、 Fuzzy clustering 、 Snippet 、 Document clustering 、 Computer science 、 Relevance (information retrieval) 、 Clustering high-dimensional data 、 Metric (mathematics) 、 Data mining
摘要: This paper describes Armil, a meta-search engine that groups into disjoint labelled clusters the Web snippets returned by auxiliary search engines. The cluster labels generated Armil provide user with compact guide to assessing relevance of each her information need. Striking right balance between running time and well-formedness was key point in design our system. Both clustering labelling tasks are performed on fly processing only provided engines, use no external sources knowledge. Clustering is means af ast version furthest-point-first algorithm for metric k- center clustering. Cluster achieved combining intra-cluster inter-cluster term extraction based variant gain measure. We have tested effectiveness against Vivisimo ,t hede facto industrial standard snippet clustering, us- ing as benchmark comprehensive set obtained from Open Directory Project hierarchy. According two widely accepted "ex- ternal" metrics quality, achieves better performance levels 10%. also report results thorough evaluation both algorithms.