作者: Sepideh Mesbah , Christoph Lofi , Manuel Valle Torre , Alessandro Bozzon , Geert-Jan Houben
DOI: 10.1007/978-3-030-00671-6_8
关键词: Natural language processing 、 Set (abstract data type) 、 Training set 、 Named-entity recognition 、 Specific knowledge 、 Semantic expansion 、 Task (project management) 、 Workaround 、 Artificial intelligence 、 Entity type 、 Computer science
摘要: Named Entity Recognition and Typing (NER/NET) is a challenging task, especially with long-tail entities such as the ones found in scientific publications. These (e.g. “WebKB”,“StatSnowball”) are rare, often relevant only specific knowledge domains, yet important for retrieval exploration purposes. State-of-the-art NER approaches employ supervised machine learning models, trained on expensive type-labeled data laboriously produced by human annotators. A common workaround generation of labeled training from bases; this approach not suitable entity types that are, definition, scarcely represented KBs. This paper presents an iterative NET classifiers publications relies minimal input, namely small seed set instances targeted type. We introduce different strategies extraction, semantic expansion, result filtering. evaluate our publications, focusing Datasets, Methods computer science Proteins biomedical