Probabilistic CFG with Latent Annotations

作者: Takuya Matsuzaki , Yusuke Miyao , Jun'ichi Tsujii

DOI: 10.3115/1219840.1219850

关键词:

摘要: This paper defines a generative probabilistic model of parse trees, which we call PCFG-LA. is an extension PCFG in non-terminal symbols are augmented with latent variables. Fine-grained CFG rules automatically induced from parsed corpus by training PCFG-LA using EM-algorithm. Because exact parsing NP-hard, several approximations described and empirically compared. In experiments the Penn WSJ corpus, our trained gave performance 86.6% (F1, sentences ≤ 40 words), comparable to that unlexicalized parser created extensive manual feature selection.

参考文章(16)
Khalil Sima'an, Computational complexity of probabilistic disambiguation Grammars. ,vol. 5, pp. 125- 151 ,(2002) , 10.1023/A:1016340700671
Mark Johnson, PCFG models of linguistic tree representations Computational Linguistics. ,vol. 24, pp. 613- 632 ,(1998)
Libin Shen, Nondeterministic LTAG Derivation Tree Extraction Proceedings of the 7th International Workshop on Tree Adjoining Grammar and Related Formalisms. pp. 199- 203 ,(2004)
Joshua Goodman, Probabilistic Feature Grammars international workshop/conference on parsing technologies. pp. 89- 100 ,(2000) , 10.1007/978-94-015-9470-7_4
Joshua Goodman, Efficient Algorithms for Parsing the DOP Model empirical methods in natural language processing. ,(1996)
David Chiang, Daniel M. Bikel, Recovering latent information in treebanks Proceedings of the 19th international conference on Computational linguistics -. pp. 1- 7 ,(2002) , 10.3115/1072228.1072354
Michael Collins, Head-Driven Statistical Models for Natural Language Parsing Computational Linguistics. ,vol. 29, pp. 589- 637 ,(2003) , 10.1162/089120103322753356
Dan Klein, Christopher D. Manning, Accurate unlexicalized parsing Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03. pp. 423- 430 ,(2003) , 10.3115/1075096.1075150
Tommi Jaakkola, Brendan J. Frey, Relu Patrascu, Jodi Moran, Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks neural information processing systems. ,vol. 13, pp. 493- 499 ,(2000)
James Henderson, Inducing history representations for broad coverage statistical parsing Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - NAACL '03. pp. 24- 31 ,(2003) , 10.3115/1073445.1073459