Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries

作者: Tapas Kanungo , Song Mao

DOI:

关键词:

摘要: Electronic bilingual lexicons are crucial for machine translation, cross-lingual information retrieval and speech recognition. For low-density languages, however, the availability of electronic is questionable. One solution to acquire from printed dictionaries. While manual data entry a possibility, automatic acquisition scanned images dictionaries would expedite prototyping process cross-language systems. Printed have logical model that defines syntax dictionary entries – i.e. order entry, its part speech, pronunciation definition. In this article we propose an algorithm automatically extract based on stochastic language models. We demonstrate Chinese-English dictionary. This work can be easily used extracting other tabular structures like telephone books, catalogs, etc.

参考文章(18)
Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)
Leon Todoran, Marco Aiello, Christof Monz, Marcel Worring, Logical structure detection for heterogeneous document classes document recognition and retrieval. ,vol. 4307, pp. 99- 110 ,(2000) , 10.1117/12.410827
Jeffrey D. Ullman, Alfred V. Aho, The Theory of Parsing, Translation, and Compiling ,(1972)
Andreas Stolcke, An efficient probabilistic context-free parsing algorithm that computes prefix probabilities Computational Linguistics. ,vol. 21, pp. 165- 201 ,(1995)
Tapas Kanungo, Song Mao, Stochastic language model for analyzing document physical layout. document recognition and retrieval. ,vol. 4670, pp. 28- 36 ,(2001) , 10.1117/12.450736
King-Sun Fu, Mark A. Aizerman, Syntactic Methods in Pattern Recognition ,(1974)
Jongwoo Kim, Daniel X. Le, George R. Thoma, Automated labeling in document images document recognition and retrieval. ,vol. 4307, pp. 111- 122 ,(2000) , 10.1117/12.410828
Theo Pavlidis, Jiangying Zhou, Page segmentation and classification CVGIP: Graphical Models and Image Processing. ,vol. 54, pp. 484- 496 ,(1992) , 10.1016/1049-9652(92)90068-9
H.S. Baird, S.E. Jones, S.J. Fortune, Image segmentation by shape-directed covers international conference on pattern recognition. pp. 820- 825 ,(1990) , 10.1109/ICPR.1990.118223