Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition

作者: Mahdi Hamdani , Amr El-Desoky Mousa , Hermann Ney

DOI: 10.1109/ICDAR.2013.63

关键词:

摘要: The use of Language Models (LMs) is a very important component in large and open vocabulary recognition systems. This paper presents an open-vocabulary approach for Arabic handwriting recognition. proposed makes word decomposition based on morphological analysis. combination words sub-words obtained by the process. Out Of Vocabulary (OOV) can be recognized combining different elements from lexicon. system Hidden Markov (HMMs) with position context dependent character models. An n-gram LM trained decomposed text used along HMMs during search. evaluated using two datasets. leads to significant improvement performance. Two types experiments tasks are conducted this work. allows have absolute up 1% Word Error Rate (WER) constrained task keep same performance baseline unconstrained one.

参考文章(11)
Philippe Dreuw, David Rybach, Georg Heigold, Hermann Ney, RWTH OCR: A Large Vocabulary Optical Character Recognition System for Arabic Scripts Springer, London. pp. 215- 254 ,(2012) , 10.1007/978-1-4471-4072-6_9
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
Sabri A. Mahmoud, Irfan Ahmad, Mohammad Alshayeb, Wasfi G. Al-Khatib, Mohammad Tanvir Parvez, Gernot A. Fink, Volker Margner, Haikal El Abed, KHATT: Arabic Offline Handwritten Text Database international conference on frontiers in handwriting recognition. pp. 449- 454 ,(2012) , 10.1109/ICFHR.2012.224
David Rybach, Christian Gollan, Ralf Schlüter, Hermann Ney, Amr El-Desoky, Investigating the Use of Morphological Decomposition and Diacritization for Improving Arabic LVCSR conference of the international speech communication association. pp. 2679- 2682 ,(2009)
A. Brakensiek, J. Rottland, G. Rigoll, Handwritten address recognition with open vocabulary using character n-grams international conference on frontiers in handwriting recognition. pp. 357- 362 ,(2002) , 10.1109/IWFHR.2002.1030936
Patrick Doetsch, Mahdi Hamdani, Hermann Ney, Adria Gimenez, Jesus Andres-Ferrer, Alfons Juan, Comparison of Bernoulli and Gaussian HMMs Using a Vertical Repositioning Technique for Off-Line Handwriting Recognition international conference on frontiers in handwriting recognition. pp. 3- 7 ,(2012) , 10.1109/ICFHR.2012.194
Slim Kanoun, Adel M Alimi, Yves Lecourtier, Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition systems man and cybernetics. ,vol. 41, pp. 579- 590 ,(2011) , 10.1109/TSMCB.2010.2072990
I. Bazzi, R. Schwartz, J. Makhoul, An omnifont open-vocabulary OCR system for English and Arabic IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 21, pp. 495- 504 ,(1999) , 10.1109/34.771314
Sungho Ryu, Jin Hyung Kim, Learning the lexicon from raw texts for open-vocabulary Korean word recognition international conference on document analysis and recognition. pp. 202- 206 ,(2003) , 10.1109/ICDAR.2003.1227659
A. L. Koerich, R. Sabourin, C. Y. Suen, Large vocabulary off-line handwriting recognition: A survey Pattern Analysis and Applications. ,vol. 6, pp. 97- 121 ,(2003) , 10.1007/S10044-002-0169-3