Layout Detection and Table Recognition – Recent Challenges in Digitizing Historical Documents and Handwritten Tabular Data

作者: Constantin Lehenmeier , Manuel Burghardt , Bernadette Mischka

DOI: 10.1007/978-3-030-54956-5_17

关键词:

摘要: In this paper, we discuss the computer-aided processing of handwritten tabular records historical weather data. The observationes meteorologicae, which are housed by Regensburg University Library, one oldest collections data in Europe. Starting 1771, meteorological was consistently documented a standardized form over almost 60 years several writers. structure, as well unconstrained textual layout comments and use characters, propose various challenges text recognition. We present customized strategy to digitize combining state-of-the-art methods for OCR fit collection. Since recognition documents still poses major challenges, provide lessons learned from experimental testing during first project stages. Our results show that deep learning can be used detection. However, they less efficient structures. Furthermore, tailored approach had developed characters manual creation ground truth system achieved an accuracy rate 82% heterogeneous handwriting 87% tables.

参考文章(30)
Basilis Gatos, Imaging Techniques in Document Analysis Processes. Handbook of Document Image Processing and Recognition. pp. 73- 131 ,(2014)
Asds sp. z o.o., Esslli Site, Natural Language Processing for Historical Texts ESSLLI 2012 Site. ,(2012)
A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber, A Novel Connectionist System for Unconstrained Handwriting Recognition IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 31, pp. 855- 868 ,(2009) , 10.1109/TPAMI.2008.137
Stefan Pletschacher, Apostolos Antonacopoulos, The PAGE (Page Analysis and Ground-Truth Elements) Format Framework international conference on pattern recognition. pp. 257- 260 ,(2010) , 10.1109/ICPR.2010.72
Lothar Mundt, Empfehlungen zur Edition neulateinischer Texte Probleme der Edition von Texten der frühen Neuzeit. ,(1992) , 10.1515/9783110946932.186
Heike Neuroth, Bibliothek, Archiv, Museum J.B. Metzler, Stuttgart. pp. 213- 222 ,(2017) , 10.1007/978-3-476-05446-3_15
Handbook of Document Image Processing and Recognition Handbook of Document Image Processing and Recognition. pp. 1055- ,(2014) , 10.1007/978-0-85729-859-1
Benjamin Charles Germain Lee, Line detection in binary document scans: A case study with the international tracing service archives 2017 IEEE International Conference on Big Data (Big Data). pp. 2256- 2261 ,(2017) , 10.1109/BIGDATA.2017.8258178
Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, Sheraz Ahmed, DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images international conference on document analysis and recognition. pp. 1162- 1167 ,(2017) , 10.1109/ICDAR.2017.192