Czech Historical Named Entity Corpus v 1.0

作者: Eva Pettersson , Pavel Král , Helena Hubková

DOI:

关键词:

摘要: As the number of digitized archival documents increases very rapidly, named entity recognition (NER) in historical has become important for information extraction and data mining. For this task an annotated corpus is needed, which up to now been missing Czech. In paper we present a new collection NER, composed Czech newspapers. This freely available research purposes at http://chnec.kiv.zcu.cz/. corpus, have defined relevant domain-specific types created annotation manual labelling. We further conducted some experiments on using recurrent neural networks order show baseline results dataset. experimented with randomly initialized embeddings static dynamic fastText word embeddings. achieved 0.73 F1 score bidirectional LSTM model

参考文章(9)
Jana Straková, Milan Straka, Jan Hajič, A New State-of-The-Art Czech Named Entity Recognizer text speech and dialogue. pp. 68- 75 ,(2013) , 10.1007/978-3-642-40585-3_10
L. A. Ramshaw, M. P. Marcus, Text Chunking Using Transformation-Based Learning meeting of the association for computational linguistics. pp. 157- 176 ,(1999) , 10.1007/978-94-017-2390-9_10
Thomas L. Packer, Joshua F. Lutes, Aaron P. Stewart, David W. Embley, Eric K. Ringger, Kevin D. Seppi, Lee S. Jensen, Extracting person names from diverse and noisy OCR text Proceedings of the fourth workshop on Analytics for noisy unstructured text data - AND '10. pp. 19- 26 ,(2010) , 10.1145/1871840.1871845
Tobias Blanke, Mike Bryant, Kepa Joseba Rodriquez, Magdalena Luszczynska, Comparison of named entity recognition tools for raw OCR text Proceedings of KONVENS 2012. pp. 410- 414 ,(2012)
Jana Straková, Milan Straka, Jan Hajič, Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 13- 18 ,(2014) , 10.3115/V1/P14-5003
Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer, Neural Architectures for Named Entity Recognition north american chapter of the association for computational linguistics. pp. 260- 270 ,(2016) , 10.18653/V1/N16-1030
Clemens Neudecker, An Open Corpus for Named Entity Recognition in Historic Newspapers language resources and evaluation. pp. 4348- 4352 ,(2016)
Steve Cassidy, Sunghwan Mac Kim, Finding Names in Trove: Named Entity Recognition for Australian Historical Newspapers Proceedings of the Australasian Language Technology Association Workshop 2015. pp. 57- 65 ,(2015)
Matthew Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, Deep contextualized word representations north american chapter of the association for computational linguistics. ,vol. 1, pp. 2227- 2237 ,(2018) , 10.18653/V1/N18-1202