Training Strategies for OCR Systems for Historical Documents

作者： Jiří Martínek , Ladislav Lenc , Pavel Král

DOI: 10.1007/978-3-030-19823-7_30

关键词:

摘要: This paper presents an overview of training strategies for optical character recognition historical documents. The main issue is the lack annotated data and its quality. We summarize several ways synthetic preparation. goal this to show compare possibilities how train a convolutional recurrent neural network classifier using combination with real dataset.

参考文章(14)

Andrea Vedaldi, Max Jaderberg, Karen Simonyan, Andrew Zisserman, Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition arXiv: Computer Vision and Pattern Recognition. ,(2014)

V. Margner, M. Pechwitz, Synthetic data for Arabic OCR system development international conference on document analysis and recognition. pp. 1159- 1163 ,(2001) , 10.1109/ICDAR.2001.953967

Maya R. Gupta, Nathaniel P. Jacobson, Eric K. Garcia, OCR binarization and image pre-processing for searching historical documents Pattern Recognition. ,vol. 40, pp. 389- 397 ,(2007) , 10.1016/J.PATCOG.2006.04.043

Thomas M. Breuel, The OCRopus open source OCR system document recognition and retrieval. ,vol. 6815, ,(2008) , 10.1117/12.783598

Sepp Hochreiter, Jürgen Schmidhuber, Long short-term memory Neural Computation. ,vol. 9, pp. 1735- 1780 ,(1997) , 10.1162/NECO.1997.9.8.1735

Fotini Simistira, Adnan Ul-Hassan, Vassilis Papavassiliou, Basilis Gatos, Vassilis Katsouros, Marcus Liwicki, Recognition of historical Greek polytonic scripts using LSTM networks 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 766- 770 ,(2015) , 10.1109/ICDAR.2015.7333865

Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber, Connectionist temporal classification Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 369- 376 ,(2006) , 10.1145/1143844.1143891

Nobuyuki Otsu, A Threshold Selection Method from Gray-Level Histograms IEEE Transactions on Systems, Man, and Cybernetics. ,vol. 9, pp. 62- 66 ,(1979) , 10.1109/TSMC.1979.4310076

Shivansh Gaur, Siddhant Sonkar, Partha Pratim Roy, Generation of synthetic training data for handwritten Indic script recognition 2015 13th International Conference on Document Analysis and Recognition (ICDAR). pp. 491- 495 ,(2015) , 10.1109/ICDAR.2015.7333810

10.

Alex Graves, Jürgen Schmidhuber, Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks neural information processing systems. ,vol. 21, pp. 545- 552 ,(2008) , 10.1007/978-1-4471-4072-6_12

Training Strategies for OCR Systems for Historical Documents

来源期刊

我的账户

Training Strategies for OCR Systems for Historical Documents

来源期刊

相似文章 5

Processing topical queries on images of historical newspaper pages.

On the Accuracy of CRNNs for Line-Based OCR: A Multi-Parameter Evaluation

Layout Detection and Table Recognition – Recent Challenges in Digitizing Historical Documents and Handwritten Tabular Data

Learning from Synthetic Point Cloud Data for Historical Buildings Semantic Segmentation

Density based semi-automatic labeling on multi-feature representations for ground truth generation: Application to handwritten character recognition

我的账户