作者: Chien-Lung Chou , Chia-Hui Chang
DOI: 10.1007/978-3-319-12844-3_21
关键词: Pattern recognition 、 Task (project management) 、 Initialization 、 Knowledge engineering 、 Named entity 、 Natural language processing 、 Artificial intelligence 、 Personal name 、 Named-entity recognition 、 Computer science 、 Sequence labeling 、 Selection method
摘要: Detecting named entities from documents is one of the most important tasks in knowledge engineering. Previous studies rely on annotated training data, which quite expensive to obtain large data sets, limiting effectiveness recognition. In this research, we propose a semi-supervised learning approach for entity recognition (NER) via automatic labeling and tritraining make use unlabeled structured resources containing known entities. By modifying tri-training sequence deriving proper initialization, can train NER model Web news articles automatically with satisfactory performance. task Chinese personal name extraction 8,672 (with 364,685 sentences 54,449 (11,856 distinct) person names), an F-measure 90.4% be achieved.