A fully Bayesian approach to unsupervised part-of-speech tagging

作者: Sharon Goldwater , Tom Griffiths

DOI:

关键词: TrigramDiscriminative modelBayesian probabilityMachine learningPrior probabilityNatural languageComputer sciencePattern recognitionArtificial intelligenceUnsupervised learningHidden Markov modelGenerative model

摘要: Unsupervised learning of linguistic structure is a difficult problem. A common approach to define generative model and maximize the probability hidden given observed data. Typically, this done using maximum-likelihood estimation (MLE) parameters. We show part-of-speech tagging that fully Bayesian can greatly improve performance. Rather than estimating single set parameters, integrates over all possible parameter values. This difference ensures learned will have high range permits use priors favoring sparse distributions are typical natural language. Our has standard trigram HMM, yet its accuracy closer state-of-the-art discriminative (Smith Eisner, 2005), up 14 percentage points better MLE. find improvements both when training from data alone, dictionary.

参考文章(15)
Q.I. Wang, D. Schuurmans, Improved estimation for unsupervised part-of-speech tagging international conference natural language processing. pp. 219- 224 ,(2005) , 10.1109/NLPKE.2005.1598738
Stuart Geman, Donald Geman, Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. PAMI-6, pp. 721- 741 ,(1984) , 10.1109/TPAMI.1984.4767596
Aria Haghighi, Dan Klein, Prototype-Driven Learning for Sequence Models language and technology conference. pp. 320- 327 ,(2006) , 10.3115/1220835.1220876
Bernard Merialdo, Tagging English text with a probabilistic model Computational Linguistics. ,vol. 20, pp. 155- 171 ,(1994)
Vincent J. Della Pietra, Jenifer C. Lai, Robert L. Mercer, Peter F. Brown, Peter V. deSouza, Class-based n -gram models of natural language Computational Linguistics. ,vol. 18, pp. 467- 479 ,(1992) , 10.5555/176313.176316
Dan Klein, Christopher D. Manning, A generative constituent-context model for improved grammar induction Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL '02. pp. 128- 135 ,(2001) , 10.3115/1073083.1073106
Markov Chain Monte Carlo in Practice Technometrics. ,vol. 39, pp. 338- 338 ,(1997) , 10.1201/B14835
E. Brill, M. Pop, Unsupervised Learning of Disambiguation Rules for Part-of-Speech Tagging meeting of the association for computational linguistics. pp. 27- 42 ,(1999) , 10.1007/978-94-017-2390-9_3
David J. C. MacKay, Linda C. Bauman Peto, A hierarchical Dirichlet language model Natural Language Engineering. ,vol. 1, pp. 289- 308 ,(1995) , 10.1017/S1351324900000218
Noah A. Smith, Jason Eisner, Contrastive Estimation: Training Log-Linear Models on Unlabeled Data meeting of the association for computational linguistics. pp. 354- 362 ,(2005) , 10.3115/1219840.1219884