作者: Atallah AL-Shatnawi , Khairuddin Omar
DOI:
关键词: Natural language processing 、 Preprocessor 、 Word (computer architecture) 、 Normalization (image processing) 、 Artificial intelligence 、 Feature extraction 、 Imaginary line 、 Character (computing) 、 Segmentation 、 Computer science 、 Pattern recognition 、 Baseline (configuration management)
摘要: Summary Preprocessing is the most important stage in Arabic OCR system; it has a direct effect on reliability and efficiency of segmentation feature extraction stages. It worth mentioning that language cursively written, its characters have between 2 to 4 shapes. An word likely consists two or more which are connected through an imaginary line called baseline. Detecting baseline one main majorities preprocessing system. The can be used for both skew normalization character segmentation. This paper aims provide comprehensive review methods proposed by researchers detect detection categorized into four methods: (a) based horizontal projection methods, (b) skeleton method, (c) contour tracing (d) principle component analysis method. Each these own advantages drawbacks.