作者: Nikola Ljubesic , Tomaz Erjavec
DOI:
关键词: Slavic languages 、 Natural language processing 、 Computer science 、 Lexicon 、 Artificial intelligence
摘要: In this paper we present a tagger developed for inflectionally rich languages which both training corpus and lexicon are available. We do not constrain the by entries, allowing incompleteness noisiness. By using indirectly through features allow known unknown words to be tagged in same manner. test our on Slovene data, obtaining 25% error reduction of best previous results words. Given that is, comparison some other Slavic languages, well-resourced language, perform experiments impact token (corpus) vs. type (lexicon) supervision, useful insights how balance effort extending resources yield better tagging results.