作者: Iñigo Jauregi Unanue , Ehsan Zare Borzeshi , Massimo Piccardi
DOI: 10.1016/J.JBI.2017.11.007
关键词: Word (computer architecture) 、 Artificial neural network 、 Artificial intelligence 、 Conditional random field 、 Natural language processing 、 Support vector machine 、 Machine learning 、 Feature engineering 、 Recurrent neural network 、 Computer science 、 Named-entity recognition 、 Deep learning
摘要: Display Omitted Past approaches to health-domain NER have mainly used manual features and conventional classifiers.In this paper, we explore a neural network approach (B-LSTM-CRF) that can learn the automatically.In addition, initializing with pre-trained embeddings lead higher accuracy.We pre-train using critical care database (MIMIC-III).Experiments been carried out over three contemporary datasets for NER, outperforming past systems. BackgroundPrevious state-of-the-art systems on Drug Name Recognition (DNR) Clinical Concept Extraction (CCE) focused combination of text feature engineering machine learning algorithms such as conditional random fields support vector machines. However, developing good is inherently heavily time-consuming. Conversely, more modern recurrent networks (RNNs) proved capable automatically effective from either assignments or automated word embeddings. Objectives(i) To create highly accurate DNR CCE system avoids conventional, time-consuming engineering. (ii) richer, specialized by health domain MIMIC-III. (iii) evaluate our datasets. MethodsTwo deep methods, namely Bidirectional LSTM LSTM-CRF, are evaluated. A CRF model set baseline compare traditional approach. The same all models. ResultsWe obtained best results LSTM-CRF model, which has outperformed previously proposed helped cover unusual words in DrugBank MedLine, but not i2b2/VA dataset. ConclusionsWe present CCE. Automated allowed us avoid costly achieve accuracy. Nevertheless, need be retrained adequate domain, order adequately domain-specific vocabulary.