作者: Maolong Li , Zhixu Li , Qiang Yang , Zhigang Chen , Pengpeng Zhao
DOI: 10.1007/S11280-019-00736-3
关键词: Task (project management) 、 Sample (statistics) 、 Named-entity recognition 、 Online encyclopedia 、 Selection (linguistics) 、 Computer science 、 Empirical research 、 Set (abstract data type) 、 Machine learning 、 Artificial intelligence
摘要: Named Entity Recognition (NER) is a core task of NLP. State-of-art supervised NER models rely heavily on large amount high-quality annotated data, which quite expensive to obtain. Various existing ways have been proposed reduce the heavy reliance training but only with limited effect. In this paper, we propose crowd-efficient learning approach for by making full use online encyclopedia pages. our approach, first define three criteria (representativeness, informativeness, diversity) help select much smaller set samples crowd labeling. We then data augmentation method, could generate lot more structured knowledge greatly augment After conducting model augmented sample set, re-select some new labeling refinement. perform and selection procedure iteratively until not be further improved or performance meets requirement. Our empirical study conducted several real collections shows that 50% manual annotations almost same as fully trained model.