An Under-Sampling Approach to Imbalanced Automatic Keyphrase Extraction

作者: Weijian Ni , Tong Liu , Qingtian Zeng

DOI: 10.1007/978-3-642-32281-5_38

关键词:

摘要: The task of automatic keyphrase extraction is usually formalized as a supervised learning problem and various algorithms have been utilized. However, most the existing approaches make assumption that samples are uniformly distributed between positive (keyphrase) negative (non-keyphrase) classes which may not be hold in real settings. In this paper, we investigate considering more common case where candidate phrases highly imbalanced classes. Motivated by observation saliency phrase can described from perspectives both morphology occurrence, multi-view under-sampling approach, named co-sampling, proposed. two classifiers learned separately using disjoint sets features redundant reliably predicted one classifier removed training set peer classifier. Through iterative interactive process, useless continuously identified while performance boosted. Experimental results show co-sampling outperforms several on exaction dataset.

参考文章(29)
Xu-ying Liu, Jianxin Wu, Zhi-hua Zhou, Exploratory Under-Sampling for Class-Imbalance Learning international conference on data mining. pp. 965- 969 ,(2006) , 10.1109/ICDM.2006.68
Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin, Craig G. Nevill-Manning, KEA: practical automatic keyphrase extraction acm international conference on digital libraries. pp. 254- 255 ,(1999) , 10.1145/313238.313437
Yanmin Sun, Mohamed S. Kamel, Andrew K.C. Wong, Yang Wang, Cost-sensitive boosting for classification of imbalanced data Pattern Recognition. ,vol. 40, pp. 3358- 3378 ,(2007) , 10.1016/J.PATCOG.2007.04.009
Jianxin Wu, Zhi-Hua Zhou, Xu-Ying Liu, Exploratory Undersampling for Class-Imbalance Learning systems man and cybernetics. ,vol. 39, pp. 539- 550 ,(2009) , 10.1109/TSMCB.2008.2007853
Zhenhui Li, Ding Zhou, Yun-Fang Juan, Jiawei Han, Keyword extraction for social snippets the web conference. pp. 1143- 1144 ,(2010) , 10.1145/1772690.1772845
Zhi-Hua Zhou, Xu-Ying Liu, Training cost-sensitive neural networks with methods addressing the class imbalance problem IEEE Transactions on Knowledge and Data Engineering. ,vol. 18, pp. 63- 77 ,(2006) , 10.1109/TKDE.2006.17
Hui Han, Wen-Yuan Wang, Bing-Huan Mao, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning Lecture Notes in Computer Science. pp. 878- 887 ,(2005) , 10.1007/11538059_91
Weijian Ni, Yalou Huang, Extracting and organizing acronyms based on ranking world congress on intelligent control and automation. pp. 4542- 4547 ,(2008) , 10.1109/WCICA.2008.4594528
Xiaoyuan Wu, Alvaro Bolivar, Keyword extraction for contextual advertisement Proceeding of the 17th international conference on World Wide Web - WWW '08. pp. 1195- 1196 ,(2008) , 10.1145/1367497.1367723
Peter D. Turney, Learning Algorithms for Keyphrase Extraction Information Retrieval. ,vol. 2, pp. 303- 336 ,(2000) , 10.1023/A:1009976227802