作者: Reut Tsarfaty , Khalil Sima'an
关键词:
摘要: Current parameters of accurate unlexicalized parsers based on Probabilistic Context-Free Grammars (PCFGs) form a two-dimensional grid in which rewrite events are conditioned both horizontal (head-outward) and vertical (parental) histories. In Semitic languages, where arguments may move around rather freely phrase-structures often shallow, there additional morphological factors that govern the generation process. Here we propose agreement features percolated up parse-tree third dimension parametrization is orthogonal to previous two. This differs from mere "state-splits" as it applies whole set categories than individual ones encodes linguistically motivated co-occurrences between them. paper presents extensive experiments with extensions PCFGs for parsing Modern Hebrew tuning three dimensions gradually leads improved performance. Our best result introduces new, stronger, lower bound performance treebank grammars Hebrew, par current results Standard Arabic obtained by fully lexicalized parser trained much larger treebank.