作者: Kai Chen , Mathias Seuret , Hao Wei , Marcus Liwicki , Jean Hennebert
DOI: 10.1117/12.2075858
关键词:
摘要: In this paper, we propose a new dataset and ground-truthing methodology for layout analysis of historical documents with complex layouts. The is based on generic model ground-truth presentation of the structure historical documents. For the purpose extracting uniformly document contents, our defines five types regions interest: page, text block, line, decoration , comment . Unconstrained polygons are used to outline regions. A performance metric proposed in order evaluate various page segmentation methods model. We have analysed four state-of-the-art ground-truthing tools: TRUVIZ, GEDI, WebGT, Aletheia. From analysis, conceptualized developed Divadia, a new tool that overcomes some drawbacks these tools, targeting simplicity efficiency the layout ground truthing process document images. With created public dataset. This contains 120 pages from three image collections different styles and is made freely available scientific community research.