Stem-based maximum entropy language models for inflectional languages.

作者: Vassilios Digalakis , Dimitris Oikonomidis

DOI:

关键词:

摘要: In this work we build language models using three different training methods: n-gram, class-based and maximum entropy models. The main issue is the use of stem information to cope with very large number distinct words an inflectional language, like Greek. We compare both perplexity word error rate. also examine thoroughly differences on specific subsets words.

参考文章(9)
Vassilios Digalakis, Vassilios Diakoloukas, Nikos Tsourakis, Dimitris Pratsolis, Dimitris Oikonomidis, Christos Vosnidis, Nikos Chatzichrisafis, Large vocabulary continuous speech recognition in greek: corpus and an automatic dictation system. conference of the international speech communication association. ,(2003)
Jun Wu, Sanjeev Khudanpur, Building a topic-dependent maximum entropy model for very large corpora IEEE International Conference on Acoustics Speech and Signal Processing. ,vol. 1, pp. 777- 780 ,(2002) , 10.1109/ICASSP.2002.5743833
Hermann Ney, Ute Essen, Reinhard Kneser, On structuring probabilistic dependences in stochastic language modelling Computer Speech & Language. ,vol. 8, pp. 1- 38 ,(1994) , 10.1006/CSLA.1994.1001
Vincent J. Della Pietra, Adam L. Berger, Stephen A. Della Pietra, A maximum entropy approach to natural language processing Computational Linguistics. ,vol. 22, pp. 39- 71 ,(1996) , 10.5555/234285.234289
I.H. Witten, T.C. Bell, The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression IEEE Transactions on Information Theory. ,vol. 37, pp. 1085- 1094 ,(1991) , 10.1109/18.87000
D.J. Kershaw, L. Lamel, D.A. Leeuwen, D. Pye, A.J. Robinson, H.J.M. Steeneken, P.C. Woodland, S.J. Young, M. Adda-Dekker, X. Aubert, C. Dugast, J.L. Gauvain, Multilingual large vocabulary speech recognition: the European SQALE project Computer Speech & Language. ,vol. 11, pp. 73- 89 ,(1997) , 10.1006/CSLA.1996.0023
S. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 35, pp. 400- 401 ,(1987) , 10.1109/TASSP.1987.1165125
Joshua T. Goodman, Stanley F. Chen, An Empirical Study of Smoothing Techniques for Language Modeling arXiv: Computation and Language. ,(1996)