作者: Slav Petrov , Leon Barrett , Romain Thibaux , Dan Klein
关键词:
摘要: We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged maximize the likelihood of a training treebank. Starting with simple X-bar grammar, we learn new grammar whose nonterminals subsymbols original nonterminals. In contrast previous work, able various terminals different degrees, as appropriate actual complexity data. Our grammars automatically kinds linguistic distinctions exhibited work on manual annotation. On other hand, our much more compact substantially accurate than Despite its simplicity, best achieves F1 90.2% Penn Treebank, higher fully lexicalized systems.