作者: Takuya Matsuzaki , Yusuke Miyao , Jun'ichi Tsujii
关键词:
摘要: This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. is an extension PCFG in non-terminal symbols are augmented with latent variables. Fine-grained CFG rules automatically induced from parsed corpus by training PCFG-LA using EM-algorithm. Because exact parsing NP-hard, several approximations described and empirically compared. In experiments the Penn WSJ corpus, our trained gave performance 86.6% (F1, sentences ≤ 40 words), comparable to that unlexicalized parser created extensive manual feature selection.