KPTI: Katib's Pashto Text Imagebase and Deep Learning Benchmark

作者: Riaz Ahmad , M. Zeshan Afzal , S. Faisal Rashid , Marcus Liwicki , Thomas Breuel

DOI: 10.1109/ICFHR.2016.0090

关键词:

摘要: This paper presents the first Pashto text image database for scientific research and thereby dataset with complete handwritten printed line images which ultimately covers all alphabets of Arabic Persian languages. Language like Pashto, written in a complex way by calligraphers, still requires mature Optical Character Recognition (OCR), system. Although 50 million people use this language both oral communication, there is no significant effort devoted to recognition Script. A real 17,015 having lines introduced. The are acquired via scanning from hand scribed books. Further, work, we evaluated performance deep learning based models Bidirectional Multi-Dimensional Long Short Term Memory (BLSTM MDLSTM) networks texts provide baseline character error rate 9.22%.

参考文章(21)
Youssouf Chherawala, Partha Pratim Roy, Mohamed Cheriet, Feature Design for Offline Arabic Handwriting Recognition: Handcrafted vs Automated? international conference on document analysis and recognition. pp. 290- 294 ,(2013) , 10.1109/ICDAR.2013.65
Haikal El Abed, Volker Märgner, ICDAR 2009-Arabic handwriting recognition competition International Journal on Document Analysis and Recognition. ,vol. 14, pp. 3- 13 ,(2011) , 10.1007/S10032-010-0117-5
Anupama Ray, Sai Rajeswar, Santanu Chaudhury, Text recognition using deep BLSTM networks international conference on advances in pattern recognition. pp. 1- 6 ,(2015) , 10.1109/ICAPR.2015.7050699
Olivier Morillot, Laurence Likforman-Sulem, Emmanuèle Grosicki, New baseline correction algorithm for text-line recognition with bidirectional recurrent neural networks Journal of Electronic Imaging. ,vol. 22, pp. 023028- 023028 ,(2013) , 10.1117/1.JEI.22.2.023028
Michael Decerbo, Ehry MacRostie, Premkumar Natarajan, The BBN Byblos Pashto OCR system Proceedings of the 1st ACM workshop on Hardcopy document processing. pp. 29- 32 ,(2004) , 10.1145/1031442.1031447
Riaz Ahmad, Syed Hassan Amin, Mohammad A.U. Khan, Scale and rotation invariant recognition of cursive Pashto script using SIFT features international conference on emerging technologies. pp. 299- 303 ,(2010) , 10.1109/ICET.2010.5638470
Sheikh Faisal Rashid, Marc-Peter Schambach, Jörg Rottland, Stephan von der Nüll, Low resolution Arabic recognition with multidimensional recurrent neural networks Proceedings of the 4th International Workshop on Multilingual OCR. pp. 6- ,(2013) , 10.1145/2505377.2505385
Saeed Mozaffari, Hadi Soltanizadeh, ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition international conference on document analysis and recognition. pp. 1413- 1417 ,(2009) , 10.1109/ICDAR.2009.283
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber, A Novel Connectionist System for Unconstrained Handwriting Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 31, pp. 855- 868 ,(2009) , 10.1109/TPAMI.2008.137