Handling Unknown Words in Statistical Latent-Variable Parsing Models for Arabic, English and French

作者: Josef van Genabith , Jennifer Foster , Mohammed Attia , Deirdre Hogan , Joseph Le Roux

DOI:

关键词:

摘要: This paper presents a study of the impact using simple and complex morphological clues to improve classification rare unknown words for parsing. We compare this approach language-independent technique often used in parsers which is based solely on word frequencies. applied three languages that exhibit different levels expressiveness: Arabic, French English. integrate information about Arabic affixes morphotactics into PCFG-LA parser obtain state-of-the-art accuracy. also show these can be learnt automatically from an annotated corpus.

参考文章(21)
Kenneth R. Beesley, Lauri Karttunen, Finite State Morphology ,(2003)
Daniel Jurafsky, Christopher D. Manning, Huihsin Tseng, Morphological features help POS tagging of unknown words across language varieties. Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing. ,(2005)
Slav Petrov, Dan Klein, Improved Inference for Unlexicalized Parsing north american chapter of the association for computational linguistics. pp. 404- 411 ,(2007)
Eugene Charniak, A maximum-entropy-inspired parser north american chapter of the association for computational linguistics. pp. 132- 139 ,(2000)
Mark Johnson, PCFG models of linguistic tree representations Computational Linguistics. ,vol. 24, pp. 613- 632 ,(1998)
Marie Candito, Benoît Crabbé, Improving generative statistical parsing with semi-supervised word clustering Proceedings of the 11th International Conference on Parsing Technologies - IWPT '09. pp. 138- 141 ,(2009) , 10.3115/1697236.1697263
Mohamed Maamouri, Ann Bies, Developing an Arabic treebank Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages - Semitic '04. pp. 2- 9 ,(2004) , 10.3115/1621804.1621808
Rani Nelken, Stuart M. Shieber, Arabic diacritization using weighted finite-state transducers Proceedings of the ACL Workshop on Computational Approaches to Semitic Languages - Semitic '05. pp. 79- 86 ,(2005) , 10.3115/1621787.1621802
Yoav Goldberg, Reut Tsarfaty, Meni Adler, Michael Elhadad, Enhancing Unlexicalized Parsing Performance Using a Wide Coverage Lexicon, Fuzzy Tag-Set Mapping, and EM-HMM-Based Lexical Probabilities meeting of the association for computational linguistics. pp. 327- 335 ,(2009) , 10.3115/1609067.1609103
Marie Candito, Benoît Crabbé, Djamé Seddah, On Statistical Parsing of French with Supervised and Semi-Supervised Strategies conference of the european chapter of the association for computational linguistics. pp. 49- 57 ,(2009) , 10.3115/1705475.1705483