Synthetic data for Arabic OCR system development

作者： V. Margner , M. Pechwitz

DOI: 10.1109/ICDAR.2001.953967

关键词:

摘要: A system for the automatic generation of synthetic databases development or evaluation Arabic word text recognition systems (Arabic OCR) is presented. The proposed works without any scanning printed paper. Firstly has to be typeset using a standard typesetting system. Secondly noise-free bitmap document and corresponding ground truth (GT) automatically generated. Finally, an image distortion can superimposed character simulate expected real world noise intended application. All necessary modules are presented together with some examples. Special problems caused by specific features Arabic, such as printing from right left, many diacritical points, variation in height characters, changes relative position writing line, suggested. data set was used train test based on hidden Markov model (HMM), which originally developed German cursive script, words. Recognition results different sets

uni-trier.de 本地加速

computer.org 本地加速

ieeecomputersociety.org 本地加速

ieee.org LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(10)

Tapas Kanungo, Document degradation models and a methodology for degradation model validation University of Washington. ,(1996)

Thomas A. Nartker, Frank R. Jenkins, Stephen V. Rice, The Fourth Annual Test of OCR Accuracy Information Science Research Institute Technical Report. ,(1995)

Adnan Amin, Off-line Arabic character recognition: the state of the art Pattern Recognition. ,vol. 31, pp. 517- 530 ,(1998) , 10.1016/S0031-3203(97)00084-8

Leslie Lamport, Latex : A Document Preparation System ,(1985)

Structured Document Image Analysis Springer-Verlag New York, Inc.. ,(1992) , 10.1007/978-3-642-77281-8

Henry S. Baird, Document image defect models Document image analysis. pp. 315- 325 ,(1995) , 10.1007/978-3-642-77281-8_26

R. Bippus, V. Margner, Script recognition using inhomogeneous P2DHMM and hierarchical search space reduction international conference on document analysis and recognition. pp. 773- 776 ,(1999) , 10.1109/ICDAR.1999.791902

Badr Al-Badr, Sabri A. Mahmoud, Survey and bibliography of Arabic optical text recognition Signal Processing. ,vol. 41, pp. 49- 77 ,(1995) , 10.1016/0165-1684(94)00090-M

Tapas Kanungo, Gregory A. Marton, Osama Bulbul, OmniPage vs. Sakhr: paired model evaluation of two Arabic OCR products document recognition and retrieval. ,vol. 3651, pp. 109- 120 ,(1999) , 10.1117/12.335808

10.

Maeda Kenichi, DOCUMENT PREPARATION SYSTEM ,(2002)

Synthetic data for Arabic OCR system development

来源期刊

我的账户

Synthetic data for Arabic OCR system development

来源期刊

相似文章 10

我的账户