作者: V. Margner , M. Pechwitz
DOI: 10.1109/ICDAR.2001.953967
关键词:
摘要: A system for the automatic generation of synthetic databases development or evaluation Arabic word text recognition systems (Arabic OCR) is presented. The proposed works without any scanning printed paper. Firstly has to be typeset using a standard typesetting system. Secondly noise-free bitmap document and corresponding ground truth (GT) automatically generated. Finally, an image distortion can superimposed character simulate expected real world noise intended application. All necessary modules are presented together with some examples. Special problems caused by specific features Arabic, such as printing from right left, many diacritical points, variation in height characters, changes relative position writing line, suggested. data set was used train test based on hidden Markov model (HMM), which originally developed German cursive script, words. Recognition results different sets