Three-Dimensional Parametrization for Parsing Morphologically Rich Languages

作者: Reut Tsarfaty , Khalil Sima'an

DOI: 10.3115/1621410.1621429

关键词:

摘要: Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely phrase-structures often shallow, there additional morphological factors that govern the generation process. Here we propose agreement features percolated up parse-tree third dimension parametrization is orthogonal to previous two. This differs from mere "state-splits" as it applies whole set categories than individual ones encodes linguistically motivated co-occurrences between them. paper presents extensive experiments with extensions PCFGs for parsing Modern Hebrew tuning three dimensions gradually leads improved performance. Our best result introduces new, stronger, lower bound performance treebank grammars Hebrew, par current results Standard Arabic obtained by fully lexicalized parser trained much larger treebank.

参考文章(26)
Mark Johnson, PCFG models of linguistic tree representations Computational Linguistics. ,vol. 24, pp. 613- 632 ,(1998)
Eugene Charniak, Statistical parsing with a context-free grammar and word statistics national conference on artificial intelligence. pp. 598- 603 ,(1997)
Steven P. Abney, Stochastic attribute-value grammars Computational Linguistics. ,vol. 23, pp. 597- 618 ,(1997) , 10.5555/972791.972800
Rens Bod, Khalil Sima'an, Remko Scha, Data-Oriented Parsing CSLI Studies in comutational Linguistics. ,(2003)
Daniel M. Bikel, Intricacies of Collins' Parsing Model Computational Linguistics. ,vol. 30, pp. 479- 511 ,(2004) , 10.1162/0891201042544929
Helmut Schmid, Efficient parsing of highly ambiguous context-free grammars with bit vectors Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 162- 168 ,(2004) , 10.3115/1220355.1220379
Daniel M. Bikel, David Chiang, Two statistical parsing models applied to the Chinese Treebank Proceedings of the second workshop on Chinese language processing held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics -. pp. 1- 6 ,(2000) , 10.3115/1117769.1117771
Gabi Danon, Syntactic definiteness in the grammar of Modern Hebrew Linguistics. ,vol. 39, pp. 1071- 1116 ,(2001) , 10.1515/LING.2001.042