Ground Truth Model, Tool, and Dataset for Layout Analysis of Historical Documents

作者: Kai Chen , Mathias Seuret , Hao Wei , Marcus Liwicki , Jean Hennebert

DOI: 10.1117/12.2075858

关键词:

摘要: In this paper, we propose a new dataset and ground-truthing methodology for layout analysis of historical documents with complex layouts. The is based on generic model ground-truth presentation of the structure historical documents. For the purpose extracting uniformly document contents, our defines five types regions interest: page, text block, line, decoration , comment . Unconstrained polygons are used to outline regions. A performance metric proposed in order evaluate various page segmentation methods model. We have analysed four state-of-the-art ground-truthing tools: TRUVIZ, GEDI, WebGT, Aletheia. From analysis, conceptualized developed Divadia, a new tool that overcomes some drawbacks these tools, targeting simplicity efficiency the layout ground truthing process document images. With created public dataset. This contains 120 pages from three image collections different styles and is made freely available scientific community research.

参考文章(15)
, SUS: A 'Quick and Dirty' Usability Scale Usability Evaluation in Industry. pp. 207- 212 ,(1996) , 10.1201/9781498710411-35
Micheal Baechler, Rolf Ingold, Medieval manuscript layout model Proceedings of the 10th ACM symposium on Document engineering - DocEng '10. pp. 275- 278 ,(2010) , 10.1145/1860559.1860622
Rafi Cohen, Abedelkadir Asi, Klara Kedem, Jihad El-Sana, Itshak Dinstein, Robust text and drawing segmentation algorithm for historical documents Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing. pp. 110- 117 ,(2013) , 10.1145/2501115.2501117
C. Clausner, S. Pletschacher, A. Antonacopoulos, Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments international conference on document analysis and recognition. pp. 48- 52 ,(2011) , 10.1109/ICDAR.2011.19
Andreas Fischer, Andreas Keller, Volkmar Frinken, Horst Bunke, Lexicon-free handwritten word spotting using character HMMs Pattern Recognition Letters. ,vol. 33, pp. 934- 942 ,(2012) , 10.1016/J.PATREC.2011.09.009
A. Antonacopoulos, C. Clausner, C. Papadopoulos, S. Pletschacher, Historical Document Layout Analysis Competition international conference on document analysis and recognition. pp. 1516- 1520 ,(2011) , 10.1109/ICDAR.2011.301
Chulapong Panichkriangkrai, Liang Li, Kozaburo Hachimura, None, Character segmentation and retrieval for learning support system of Japanese historical books Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing. pp. 118- 122 ,(2013) , 10.1145/2501115.2501129
Maroua Mehri, Petra Gomez-Krämer, Pierre Héroux, Alain Boucher, Rémy Mullot, Texture feature evaluation for segmentation of historical document images Proceedings of the 2nd International Workshop on Historical Document Imaging and Processing. pp. 102- 109 ,(2013) , 10.1145/2501115.2501121
Ofer Biller, Abedelkadir Asi, Klara Kedem, Jihad El-Sana, Itshak Dinstein, WebGT: An Interactive Web-Based System for Historical Document Ground Truth Generation international conference on document analysis and recognition. pp. 305- 308 ,(2013) , 10.1109/ICDAR.2013.68
Andreas Fischer, Emanuel Indermühle, Horst Bunke, Gabriel Viehhauser, Michael Stolz, Ground truth creation for handwriting recognition in historical documents document analysis systems. pp. 3- 10 ,(2010) , 10.1145/1815330.1815331