作者: Reinhold Huber-Mörk , Alexander Schindler
DOI: 10.1007/978-3-642-41939-3_27
关键词:
摘要: We consider the task of content based analysis and categorization in large-scale historical book scanning projects. Mixed content, deprecated language, noise unexpected distortions suggest an image approach. The use keypoint extractors combined with bag features approach is applied to scanned text documents. In order incorporate spatial information into we three methods verification. An on comparison statistical properties local such as size orientation scale showed comparable quality while being computationally much more efficient. Cluster delivers groups pages characterized by common properties, especially duplicated page detected high reliability.