作者: Weijian Ni , Tong Liu , Qingtian Zeng
DOI: 10.1007/978-3-642-32281-5_38
关键词:
摘要: The task of automatic keyphrase extraction is usually formalized as a supervised learning problem and various algorithms have been utilized. However, most the existing approaches make assumption that samples are uniformly distributed between positive (keyphrase) negative (non-keyphrase) classes which may not be hold in real settings. In this paper, we investigate considering more common case where candidate phrases highly imbalanced classes. Motivated by observation saliency phrase can described from perspectives both morphology occurrence, multi-view under-sampling approach, named co-sampling, proposed. two classifiers learned separately using disjoint sets features redundant reliably predicted one classifier removed training set peer classifier. Through iterative interactive process, useless continuously identified while performance boosted. Experimental results show co-sampling outperforms several on exaction dataset.