Unknown Word Guessing and Part-of-Speech Tagging Using Support Vector Machines.

作者: Yuji Matsumoto , Taku Kudo , Tetsuji Nakagawa

DOI:

关键词:

摘要: The accuracy of part-of-speech (POS) tagging for unknown words is substantially lower than that known words. Considering the high rate up-to-date statistical POS taggers, account a non-negligible portion errors. This paper describes prediction using Support Vector Machines. We achieve in tag substrings and surrounding context as features. Furthermore, we integrate this method with practical English tagger, 97.1%, higher conventional approaches.

参考文章(17)
Jason Weston, Chris Watkins, Support vector machines for multi-class pattern recognition. the european symposium on artificial neural networks. pp. 219- 224 ,(1999)
Jakub Zavrel, Saso Dzeroski, Tomaz Erjavec, Morphosyntactic Tagging of Slovene: Evaluating Taggers and Tagsets. language resources and evaluation. ,(2000)
Andrei Mikheev, Automatic rule induction for unknown-word guessing Computational Linguistics. ,vol. 23, pp. 405- 423 ,(1997)
Jeff Palmucci, Lance Ramshaw, Richard Schwartz, Marie Meteer, Ralph Weischedel, Coping with ambiguity and unknown words through probabilistic models Computational Linguistics. ,vol. 19, pp. 361- 382 ,(1993)
Adwait Ratnaparkhi, A Maximum Entropy Model for Part-Of-Speech Tagging empirical methods in natural language processing. ,(1996)
Giorgos S. Orphanos, Dimitris N. Christodoulakis, POS disambiguation and unknown word guessing with decision trees Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics -. pp. 134- 141 ,(1999) , 10.3115/977035.977054
Taku Kudoh, Yuji Matsumoto, Use of support vector learning for chunk identification Proceedings of the 2nd workshop on Learning language in logic and the 4th conference on Computational natural language learning -. pp. 142- 144 ,(2000) , 10.3115/1117601.1117635
Silviu Cucerzan, David Yarowsky, Language independent, minimally supervised induction of lexical probabilities Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL '00. pp. 270- 277 ,(2000) , 10.3115/1075218.1075253
Scott M. Thede, Predicting Part-of Speech Information about Unknown Words using Statistical Methods meeting of the association for computational linguistics. pp. 1505- 1507 ,(1998) , 10.3115/980691.980821