An Image Based Approach for Content Analysis in Document Collections

作者： Reinhold Huber-Mörk , Alexander Schindler

DOI: 10.1007/978-3-642-41939-3_27

关键词:

摘要: We consider the task of content based analysis and categorization in large-scale historical book scanning projects. Mixed content, deprecated language, noise unexpected distortions suggest an image approach. The use keypoint extractors combined with bag features approach is applied to scanned text documents. In order incorporate spatial information into we three methods verification. An on comparison statistical properties local such as size orientation scale showed comparable quality while being computationally much more efficient. Cluster delivers groups pages characterized by common properties, especially duplicated page detected high reliability.

tuwien.ac.at PDF 下载加速

sci-hub.st HTML 下载加速

参考文章(29)

Reinhold Huber-Mörk, Alexander Schindler, Quality assurance for document image collections in digital preservation advanced concepts for intelligent vision systems. pp. 108- 119 ,(2012) , 10.1007/978-3-642-33140-4_10

Joost van Beusekom, Faisal Shafait, Thomas M. Breuel, Image-matching for revision detection in printed historical documents dagm conference on pattern recognition. pp. 507- 516 ,(2007) , 10.1007/978-3-540-74936-3_51

Jan Knopp, Josef Sivic, Tomas Pajdla, Avoiding confusing features in place recognition european conference on computer vision. ,vol. 6311, pp. 748- 761 ,(2010) , 10.1007/978-3-642-15549-9_54

G. Csurka, Visual categorization with bags of keypoints european conference on computer vision. ,vol. 1, pp. 22- ,(2004)

Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)

Adam Langley, Dan S. Bloomberg, Google Books: making the public domain universally accessible document recognition and retrieval. ,vol. 6500, ,(2007) , 10.1117/12.710609

Wan-Lei Zhao, Chong-Wah Ngo, Hung-Khoon Tan, Xiao Wu, Near-Duplicate Keyframe Identification With Interest Point Matching and Pattern Learning IEEE Transactions on Multimedia. ,vol. 9, pp. 1037- 1048 ,(2007) , 10.1109/TMM.2007.898928

Lykele Hazelhoff, Ivo Creusen, Dennis van de Wouw, Peter H. N. de With, Large-scale classification of traffic signs under real-world conditions Proceedings of SPIE. ,vol. 8304, ,(2012) , 10.1117/12.910490

Yan Ke, Rahul Sukthankar, Larry Huston, An efficient parts-based near-duplicate and sub-image retrieval system acm multimedia. pp. 869- 876 ,(2004) , 10.1145/1027527.1027729

10.

Angelika Garz, Robert Sablatnig, Markus Diem, Layout Analysis for Historical Manuscripts Using Sift Features international conference on document analysis and recognition. pp. 508- 512 ,(2011) , 10.1109/ICDAR.2011.108

An Image Based Approach for Content Analysis in Document Collections

来源期刊

我的账户

An Image Based Approach for Content Analysis in Document Collections

来源期刊

相似文章 2

Constructing Scalable Data-Flows on Hadoop with Legacy Components

Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with Visual Computing for Improved Music Video Analysis

我的账户