Synthetic data for Arabic OCR system development

作者: V. Margner , M. Pechwitz

DOI: 10.1109/ICDAR.2001.953967

关键词:

摘要: A system for the automatic generation of synthetic databases development or evaluation Arabic word text recognition systems (Arabic OCR) is presented. The proposed works without any scanning printed paper. Firstly has to be typeset using a standard typesetting system. Secondly noise-free bitmap document and corresponding ground truth (GT) automatically generated. Finally, an image distortion can superimposed character simulate expected real world noise intended application. All necessary modules are presented together with some examples. Special problems caused by specific features Arabic, such as printing from right left, many diacritical points, variation in height characters, changes relative position writing line, suggested. data set was used train test based on hidden Markov model (HMM), which originally developed German cursive script, words. Recognition results different sets

参考文章(10)
Thomas A. Nartker, Frank R. Jenkins, Stephen V. Rice, The Fourth Annual Test of OCR Accuracy Information Science Research Institute Technical Report. ,(1995)
Adnan Amin, Off-line Arabic character recognition: the state of the art Pattern Recognition. ,vol. 31, pp. 517- 530 ,(1998) , 10.1016/S0031-3203(97)00084-8
Structured Document Image Analysis Springer-Verlag New York, Inc.. ,(1992) , 10.1007/978-3-642-77281-8
Henry S. Baird, Document image defect models Document image analysis. pp. 315- 325 ,(1995) , 10.1007/978-3-642-77281-8_26
R. Bippus, V. Margner, Script recognition using inhomogeneous P2DHMM and hierarchical search space reduction international conference on document analysis and recognition. pp. 773- 776 ,(1999) , 10.1109/ICDAR.1999.791902
Badr Al-Badr, Sabri A. Mahmoud, Survey and bibliography of Arabic optical text recognition Signal Processing. ,vol. 41, pp. 49- 77 ,(1995) , 10.1016/0165-1684(94)00090-M
Tapas Kanungo, Gregory A. Marton, Osama Bulbul, OmniPage vs. Sakhr: paired model evaluation of two Arabic OCR products document recognition and retrieval. ,vol. 3651, pp. 109- 120 ,(1999) , 10.1117/12.335808
Maeda Kenichi, DOCUMENT PREPARATION SYSTEM ,(2002)