Training Strategies for OCR Systems for Historical Documents

作者: Jiří Martínek , Ladislav Lenc , Pavel Král

DOI: 10.1007/978-3-030-19823-7_30

关键词:

摘要: This paper presents an overview of training strategies for optical character recognition historical documents. The main issue is the lack annotated data and its quality. We summarize several ways synthetic preparation. goal this to show compare possibilities how train a convolutional recurrent neural network classifier using combination with real dataset.

参考文章(14)
Andrea Vedaldi, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition arXiv: Computer Vision and Pattern Recognition. ,(2014)
V. Margner, M. Pechwitz, Synthetic data for Arabic OCR system development international conference on document analysis and recognition. pp. 1159- 1163 ,(2001) , 10.1109/ICDAR.2001.953967
Maya R. Gupta, Nathaniel P. Jacobson, Eric K. Garcia, OCR binarization and image pre-processing for searching historical documents Pattern Recognition. ,vol. 40, pp. 389- 397 ,(2007) , 10.1016/J.PATCOG.2006.04.043
Thomas M. Breuel, The OCRopus open source OCR system document recognition and retrieval. ,vol. 6815, ,(2008) , 10.1117/12.783598
Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735
Fotini Simistira, Adnan Ul-Hassan, Vassilis Papavassiliou, Basilis Gatos, Vassilis Katsouros, Marcus Liwicki, Recognition of historical Greek polytonic scripts using LSTM networks 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 766- 770 ,(2015) , 10.1109/ICDAR.2015.7333865
Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber, Connectionist temporal classification Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 369- 376 ,(2006) , 10.1145/1143844.1143891
Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms IEEE Transactions on Systems, Man, and Cybernetics. ,vol. 9, pp. 62- 66 ,(1979) , 10.1109/TSMC.1979.4310076
Shivansh Gaur, Siddhant Sonkar, Partha Pratim Roy, Generation of synthetic training data for handwritten Indic script recognition 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 491- 495 ,(2015) , 10.1109/ICDAR.2015.7333810
Alex Graves, Jürgen Schmidhuber, Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks neural information processing systems. ,vol. 21, pp. 545- 552 ,(2008) , 10.1007/978-1-4471-4072-6_12