作者: Tapas Kanungo , Song Mao
DOI:
关键词:
摘要: Electronic bilingual lexicons are crucial for machine translation, cross-lingual information retrieval and speech recognition. For low-density languages, however, the availability of electronic is questionable. One solution to acquire from printed dictionaries. While manual data entry a possibility, automatic acquisition scanned images dictionaries would expedite prototyping process cross-language systems. Printed have logical model that defines syntax dictionary entries – i.e. order entry, its part speech, pronunciation definition. In this article we propose an algorithm automatically extract based on stochastic language models. We demonstrate Chinese-English dictionary. This work can be easily used extracting other tabular structures like telephone books, catalogs, etc.