Morphosyntactic Tagging of Slovene Legal Language

作者： Tomaz Erjavec , Bence Sárossy

关键词:

摘要: Part-of-speech tagging or, more accurately, morphosyntactic tagging, is a procedure that assigns to each word token appearing in text its description, e.g. “masculine singular common noun the genitive case”. Morphosyntactic an important component of many language technology applications, such as machine translation, speech synthesis, or information extraction. In paper we report on experiment Slovene, sample Slovene legal language. We evaluate accuracy TnT tagger, which had been trained MULTEXT-East resources for Slovene. The test data come from freely available parallel English-Slovene corpus SVEZ-IJS, contains translation European Union acts. Presented are details manually corrected and analysis errors. also discusses simple transformation-based program fixes some errors, concludes with directions future work. Povzetek: V prispevku je opisan poskus oblikoslovnega oznacevanja na vzorcu slovenskih pravnih besedil.

uni-trier.de 本地加速

informatica.si 本地加速

ijs.si PDF 下载加速

informatica.si PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(8)

Tomaz Erjavec, The English-Slovene ACQUIS corpus language resources and evaluation. pp. 2138- 2141 ,(2006)

Camelia Ignat, Bruno Pouliquen, Ralf Steinberger, Toma Erjavec, Massive multi lingual corpus compilation: Acquis Communautaire and totale Archives of Control Sciences. ,vol. 15, pp. 529- 540 ,(2005)

Eric Brill, A Simple Rule-Based Part of Speech Tagger conference on applied natural language processing. pp. 152- 155 ,(1992) , 10.3115/974499.974526

Thorsten Brants, TnT -- A Statistical Part-of-Speech Tagger conference on applied natural language processing. pp. 224- 231 ,(2000) , 10.3115/974147.974178

Syntactic Wordclass Tagging Dordrecht : Kluwer. ,(1999) , 10.1007/978-94-015-9273-4

Richard Tobin, Claire Grover, In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC ,(2006)

Tomaz Erjavec, MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora language resources and evaluation. ,(2004)

Jan Hajic, Barbora Hladká, None, Tagging inflective languages: prediction of morphological categories for a rich, structured tagset the 36th annual meeting. pp. 483- ,(1998) , 10.3115/980845.980927

Morphosyntactic Tagging of Slovene Legal Language

来源期刊

我的账户

Morphosyntactic Tagging of Slovene Legal Language

来源期刊

相似文章 5

The JOS Morphosyntactically Tagged Corpus of Slovene

Ripple Down Rule learning for automated word lemmatisation

Language engineering for syntactic knowledge transfer

A Statistical Based Part of Speech Tagger for Urdu Language

LemmaGen: Multilingual Lemmatisation with Induced Ripple-Down Rules

我的账户