作者: Eva Pettersson , Pavel Král , Helena Hubková
DOI:
关键词:
摘要: As the number of digitized archival documents increases very rapidly, named entity recognition (NER) in historical has become important for information extraction and data mining. For this task an annotated corpus is needed, which up to now been missing Czech. In paper we present a new collection NER, composed Czech newspapers. This freely available research purposes at http://chnec.kiv.zcu.cz/. corpus, have defined relevant domain-specific types created annotation manual labelling. We further conducted some experiments on using recurrent neural networks order show baseline results dataset. experimented with randomly initialized embeddings static dynamic fastText word embeddings. achieved 0.73 F1 score bidirectional LSTM model