Morphosyntactic Tagging of Slovene Legal Language

作者: Tomaz Erjavec , Bence Sárossy

DOI: 10.31449/INF.V30I4.120

关键词:

摘要: Part-of-speech tagging or, more accurately, morphosyntactic tagging, is a procedure that assigns to each word token appearing in text its description, e.g. “masculine singular common noun the genitive case”. Morphosyntactic an important component of many language technology applications, such as machine translation, speech synthesis, or information extraction. In paper we report on experiment Slovene, sample Slovene legal language. We evaluate accuracy TnT tagger, which had been trained MULTEXT-East resources for Slovene. The test data come from freely available parallel English-Slovene corpus SVEZ-IJS, contains translation European Union acts. Presented are details manually corrected and analysis errors. also discusses simple transformation-based program fixes some errors, concludes with directions future work. Povzetek: V prispevku je opisan poskus oblikoslovnega oznacevanja na vzorcu slovenskih pravnih besedil.

参考文章(8)
Tomaz Erjavec, The English-Slovene ACQUIS corpus language resources and evaluation. pp. 2138- 2141 ,(2006)
Camelia Ignat, Bruno Pouliquen, Ralf Steinberger, Toma Erjavec, Massive multi lingual corpus compilation: Acquis Communautaire and totale Archives of Control Sciences. ,vol. 15, pp. 529- 540 ,(2005)
Eric Brill, A Simple Rule-Based Part of Speech Tagger conference on applied natural language processing. pp. 152- 155 ,(1992) , 10.3115/974499.974526
Thorsten Brants, TnT -- A Statistical Part-of-Speech Tagger conference on applied natural language processing. pp. 224- 231 ,(2000) , 10.3115/974147.974178