Improving Morphosyntactic Tagging of Slovene Language through Meta-tagging

作者： Tomaz Erjavec , Jan Rupnik , Miha Grcar

关键词: Natural language processing 、 Speech recognition 、 Hidden Markov model 、 Classifier (linguistics) 、 Artificial intelligence 、 Slovene language 、 Computer science 、 Statistical classification 、 Feature (machine learning) 、 Task (project management) 、 Language technology

摘要: Part-of-speech (PoS) or, better, morphosyntactic tagging is the process of assigning categories to words in a text, an important pre-processing step for most human language technology applications. PoS-tagging Slovene texts challenging task since size tagset over one thousand tags (as opposed English, where typically around sixty) and state-of-the-art accuracy still below levels desired. The paper describes experiment aimed at improving Slovene, by combining outputs two taggers – proprietary rule-based tagger developed Amebis HLT company, TnT, tri-gram HMM tagger, trained on handannotated corpus Slovene. have comparable accuracy, but there are many cases where, if predictions differ, does assign correct tag. We investigate training classifier top both that predicts which correct. with selecting different classification algorithms constructing feature sets show some yield meta-tagger significant increase compared either isolation.

uni-trier.de 本地加速

informatica.si 本地加速

informatica.si PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(43)

Tomaz Erjavec, Simon Krek, The JOS Morphosyntactically Tagged Corpus of Slovene language resources and evaluation. ,(2008)

Alexander Herrigel, Sviatoslav Voloshynovskiy, Yuriy Rytsar, The watermark template attack Security and watermarking of multimedia contents. Conference. pp. 4314- 4346 ,(2001)

Janez Demšar, Blaž Zupan, Gregor Leban, Tomaz Curk, Orange: from experimental machine learning to interactive data mining european conference on principles of data mining and knowledge discovery. pp. 537- 539 ,(2004) , 10.1007/978-3-540-30116-5_58

Bernd Girod, Joachim J. Eggers, Jonathan K. Su, Capacity of digital watermarks subjected to an optimal collusion attack european signal processing conference. pp. 1- 4 ,(2000) , 10.5281/ZENODO.37135

Fabien A. P. Petitcolas, Ross J. Anderson, Markus G. Kuhn, Attacks on Copyright Marking Systems information hiding. pp. 218- 238 ,(1998) , 10.1007/3-540-49380-8_16

Chiraz Ben Othmane Zribi, Aroua Torjmen, Mohamed Ben Ahmed, An efficient multi-agent system combining POS-Taggers for arabic texts international conference on computational linguistics. pp. 121- 131 ,(2006) , 10.1007/11671299_15

Joshua R. Smith, Barrett O. Comiskey, Modulation and Information Hiding in Images information hiding. pp. 207- 226 ,(1996) , 10.1007/3-540-61996-8_42

M. Kutter, S.K. Bhattacharjee, T. Ebrahimi, Towards second generation watermarking schemes international conference on image processing. ,vol. 1, pp. 320- 323 ,(1999) , 10.1109/ICIP.1999.821622

J. Domingo-Ferrer, J. Herrera-Joancomarti, Simple collusion-secure fingerprinting schemes for images international conference on information technology coding and computing. pp. 128- 132 ,(2000) , 10.1109/ITCC.2000.844195

10.

Matt Cutts, An introduction to the GIMP ACM Crossroads Student Magazine. ,vol. 3, pp. 28- 30 ,(1997) , 10.1145/270955.270972

Improving Morphosyntactic Tagging of Slovene Language through Meta-tagging

来源期刊

我的账户

Improving Morphosyntactic Tagging of Slovene Language through Meta-tagging

来源期刊

相似文章 6

Image watermarking with feature point based synchronization robust to print-scan attack

A normalization based robust image watermarking scheme in Contourlet domain

Image watermarking with a directed periodic pattern to embed multibit messages resilient to print-scan and compound attacks

Digital Image Watermarking Against Desynchronization Attacks

Using a Morphological Database to Increase the Accuracy in POS Tagging

An image authentication method for paper checks

我的账户