Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries

作者： Tapas Kanungo , Song Mao

DOI:

关键词:

摘要: Electronic bilingual lexicons are crucial for machine translation, cross-lingual information retrieval and speech recognition. For low-density languages, however, the availability of electronic is questionable. One solution to acquire from printed dictionaries. While manual data entry a possibility, automatic acquisition scanned images dictionaries would expedite prototyping process cross-language systems. Printed have logical model that defines syntax dictionary entries – i.e. order entry, its part speech, pronunciation definition. In this article we propose an algorithm automatically extract based on stochastic language models. We demonstrate Chinese-English dictionary. This work can be easily used extracting other tabular structures like telephone books, catalogs, etc.

暂无可下载资源，当前可以选择系统获取到有开放资源时通知我或者直接发起求助文献求助

参考文章(18)

Lawrence Rabiner, Biing-Hwang Juang, Fundamentals of speech recognition ,(1993)

Leon Todoran, Marco Aiello, Christof Monz, Marcel Worring, Logical structure detection for heterogeneous document classes document recognition and retrieval. ,vol. 4307, pp. 99- 110 ,(2000) , 10.1117/12.410827

Jeffrey D. Ullman, Alfred V. Aho, The Theory of Parsing, Translation, and Compiling ,(1972)

Andreas Stolcke, An efficient probabilistic context-free parsing algorithm that computes prefix probabilities Computational Linguistics. ,vol. 21, pp. 165- 201 ,(1995)

Tapas Kanungo, Song Mao, Stochastic language model for analyzing document physical layout. document recognition and retrieval. ,vol. 4670, pp. 28- 36 ,(2001) , 10.1117/12.450736

King-Sun Fu, Mark A. Aizerman, Syntactic Methods in Pattern Recognition ,(1974)

Jongwoo Kim, Daniel X. Le, George R. Thoma, Automated labeling in document images document recognition and retrieval. ,vol. 4307, pp. 111- 122 ,(2000) , 10.1117/12.410828

Theo Pavlidis, Jiangying Zhou, Page segmentation and classification CVGIP: Graphical Models and Image Processing. ,vol. 54, pp. 484- 496 ,(1992) , 10.1016/1049-9652(92)90068-9

Matthew Hurst, Layout and language: an efficient algorithm for detecting text blocks based on spatial and linguistic evidence document recognition and retrieval. ,vol. 4307, pp. 56- 67 ,(2000) , 10.1117/12.410860

10.

H.S. Baird, S.E. Jones, S.J. Fortune, Image segmentation by shape-directed covers international conference on pattern recognition. pp. 820- 825 ,(1990) , 10.1109/ICPR.1990.118223

Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries

来源期刊

我的账户

Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries

来源期刊

相似文章 8

我的账户