Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets.

作者： Jakub Zavrel , Saso Dzeroski , Tomaz Erjavec

DOI:

关键词:

摘要: The paper evaluates tagging techniques on a corpus of Slovene, where we are faced with large number possible word-class tags and only small (hand-tagged) dataset. We report training testing four different taggers the Slovene MULTEXT-East containing about 100.000 words 1000 morphosyntactic tags. Results show, first all, that times Maximum Entropy Tagger Rule Based unacceptably long, while they negligible for Memory Taggers TnT tri-gram tagger. random split show accuracy varies between 86% 89% overall, 92% 95% known 54% 55% unknown words. Best results obtained by TnT. also investigates performance in relation to our EAGLES-based tagset. Here compare per-feature full tagset, accuracies these features when reduced PoS is quite high, Case lowest. Tagset reduction helps improve accuracy, but less than might be expected.

lrec-conf.org 本地加速

uni-trier.de 本地加速

aclweb.org 本地加速

lrec-conf.org LINK 下载加速

academia.edu PDF 下载加速

参考文章(14)

Nancy Ide, Tomaz Erjavec, The MULTEXT-East Corpus language resources and evaluation. pp. 971- 974 ,(1998)

Patrick Paroubek, Martin Rajman, Gilles Adda, Josette Lecomte, Joseph Mariani, The GRACE French Part-Of-Speech Tagging Evaluation Task language resources and evaluation. pp. 433- 441 ,(1998)

Dan Tufiş, Tiered Tagging and Combined Language Models Classifiers text speech and dialogue. pp. 28- 33 ,(1999) , 10.1007/3-540-48239-3_5

Jakub Zavrel, Peter Berck, Steven Gillis, Walter Daelemans, MBT: A Memory-Based Part of Speech Tagger-Generator international conference on computational linguistics. pp. 14- 27 ,(1996)

Jan Hajič, Morphological tagging: data vs. dictionaries north american chapter of the association for computational linguistics. pp. 94- 101 ,(2000)

Ludmila Dimitrova, Nancy Ide, Vladimir Petkevic, Tomaz Erjavec, Heiki Jaan Kaalep, Dan Tufis, Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages meeting of the association for computational linguistics. pp. 315- 319 ,(1998) , 10.3115/980845.980897

Thorsten Brants, TnT - Statistical Part-of-Speech Tagging Saarland University, Computational Linguistics. ,(2000)

Doug Cutting, Julian Kupiec, Jan Pedersen, Penelope Sibun, A Practical Part-of-Speech Tagger conference on applied natural language processing. pp. 133- 140 ,(1992) , 10.3115/974499.974523

Eric Brill, A Simple Rule-Based Part of Speech Tagger conference on applied natural language processing. pp. 152- 155 ,(1992) , 10.3115/974499.974526

10.

Eric Brill, Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging Computational Linguistics. ,vol. 21, pp. 543- 565 ,(1995)

Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets.

来源期刊

我的账户

Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets.

来源期刊

相似文章 10

我的账户