Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene

DOI:

关键词: Slavic languages 、 Natural language processing 、 Computer science 、 Lexicon 、 Artificial intelligence

摘要: In this paper we present a tagger developed for inflectionally rich languages which both training corpus and lexicon are available. We do not constrain the by entries, allowing incompleteness noisiness. By using indirectly through features allow known unknown words to be tagged in same manner. test our on Slovene data, obtaining 25% error reduction of best previous results words. Given that is, comparison some other Slavic languages, well-resourced language, perform experiments impact token (corpus) vs. type (lexicon) supervision, useful insights how balance effort extending resources yield better tagging results.

参考文章(13)

Adam Radziszewski, A Tiered CRF Tagger for Polish Intelligent Tools for Building a Scientific Information Platform. pp. 215- 230 ,(2013) , 10.1007/978-3-642-35647-6_16

Adwait Ratnaparkhi, A Maximum Entropy Model for Part-Of-Speech Tagging empirical methods in natural language processing. ,(1996)

Péter Halácsy, András Kornai, Csaba Oravecz, HunPos: an open source trigram tagger meeting of the association for computational linguistics. pp. 209- 212 ,(2007) , 10.3115/1557769.1557830

Kristina Toutanova, Dan Klein, Christopher D. Manning, Yoram Singer, Feature-rich part-of-speech tagging with a cyclic dependency network Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 173- 180 ,(2003) , 10.3115/1073445.1073478

Pascal Denis, Benoît Sagot, Coupling an annotated corpus and a lexicon for state-of-the-art POS tagging language resources and evaluation. ,vol. 46, pp. 721- 736 ,(2012) , 10.1007/S10579-012-9193-0

Miroslav Spousta, Drahomíra "johanka" Spoustová, Jan Hajič, Jan Raab, Semi-Supervised Training for the Averaged Perceptron POS Tagger meeting of the association for computational linguistics. pp. 763- 771 ,(2009) , 10.3115/1609067.1609152

Edward M. McCreight, A Space-Economical Suffix Tree Construction Algorithm Journal of the ACM. ,vol. 23, pp. 262- 272 ,(1976) , 10.1145/321941.321946

Eduard BejÄek, Pavel StraÅˆ'ak, ZdenÄ›k Å½abokrtskÃ½, Magda Å evÄ'ikov'a, Jan Å tÄ›p'anek, Jan Popelka, Jarmila Panevov'a, Prague Dependency Treebank 2.5 -- a Revisited Version of PDT 2.0 international conference on computational linguistics. pp. 231- 246 ,(2012)

Nikola Ljubešić, Żeljko Agić, Danijela Merkler, Lemmatization and Morphosyntactic Tagging of Croatian and Serbian meeting of the association for computational linguistics. pp. 48- 57 ,(2013)

10.

Lukasz Kobyli'nski, PoliTa: A multitagger for Polish language resources and evaluation. pp. 2949- 2954 ,(2014)

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene

来源期刊

我的账户

Corpus vs. Lexicon Supervision in Morphosyntactic Tagging: the Case of Slovene

来源期刊

相似文章 10

我的账户