作者: Noah A. Smith , David A. Smith , Roy W. Tromble
关键词: Context (language use) 、 Sentence 、 Lemmatisation 、 Artificial intelligence 、 Arabic 、 Czech 、 Morpheme 、 Word (computer architecture) 、 Morphological dictionary 、 Computer science 、 Speech recognition 、 Natural language processing
摘要: Finite-state approaches have been highly successful at describing the morphological processes of many languages. Such largely focused on modeling phone- or character-level that generate candidate lexical types, rather than tokens in context. For full analysis words context, disambiguation is also required (Hakkani-Tur et al., 2000; Hajic 2001). In this paper, we apply a novel source-channel model to problem (segmentation into morphemes, lemmatization, and POS tagging) for concatenative, templatic, inflectional The channel exploits an existing dictionary, constraining each word's be linguistically valid. source factored, conditionally-estimated random field (Lafferty 2001) learns disambiguate sentence by local contexts. Compared with baseline state-of-the-art methods, our method achieves statistically significant error rate reductions Korean, Arabic, Czech, various training set sizes accuracy measures.