作者: Humberto Mossri de Almeida , Marcos André Gonçalves , Marco Cristo , Pável Calado
关键词:
摘要: In this paper, we propose a new method to discover collection-adapted ranking functions based on Genetic Programming (GP). Our Combined Component Approach (CCA)is the combination of several term-weighting components (i.e.,term frequency, collection normalization) extracted from well-known functions. contrast related work, GP terminals in our CCA are not simple statistical information document collection, but meaningful, effective, and proven components. Experimental results show that approach was able outper form standard TF-IDF, BM25 another GP-based two different collections. obtained improvements mean average precision up 40.87% for TREC-8 24.85% WBR99 (a large Brazilian Web collection), over baseline The evolution process also reduce overtraining, commonly found machine learning methods, especially genetic programming, converge faster than other used comparison.