Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

作者: S. Deligne , F. Bimbot

DOI: 10.1109/ICASSP.1995.479391

关键词:

摘要: The multigram model assumes that language can be described as the output of a memoryless source emits variable-length sequences words. estimation parameters formulated maximum likelihood problem from incomplete data. We show estimates computed through an iterative expectation-maximization algorithm and we describe forward-backward procedure for its implementation. report results systematical evaluation multigrams modeling on ATIS database. objective performance measure is test set perplexity. Our outperform conventional n-grams this task.

参考文章(7)
T. Kuhn, H. Niemann, E.G. Schukat-Talamazzini, Ergodic hidden Markov models and polygrams for language modeling international conference on acoustics, speech, and signal processing. pp. 357- 360 ,(1994) , 10.1109/ICASSP.1994.389282
A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum Likelihood from Incomplete Data Via theEMAlgorithm Journal of the Royal Statistical Society: Series B (Methodological). ,vol. 39, pp. 1- 22 ,(1977) , 10.1111/J.2517-6161.1977.TB01600.X
Lynette Hirschman, Multi-site data collection for a spoken language corpus Proceedings of the workshop on Speech and Natural Language - HLT '91. pp. 7- 14 ,(1992) , 10.3115/1075527.1075531
F. Jelinek, Self-organized language modeling for speech recognition Morgan Kaufmann Publishers Inc.. pp. 450- 506 ,(1990) , 10.1016/B978-0-08-051584-7.50045-0