Machine Learning for Document Structure Recognition

作者： Gerhard Paaß , Iuliu Konya

关键词:

摘要: The backbone of the information age is digital which may be searched, accessed, and transferred instantaneously. Therefore digitization paper documents extremely interesting. This chapter describes approaches for document structure recognition detecting hierarchy physical components in images documents, such as pages, paragraphs, figures, transforms this into a logical components, titles, authors, sections. structural improves readability useful indexing retrieving contained documents. First we present rule-based system segmenting image estimating role these zones. It extensively used processing newspaper collections showing world-class performance. In second part introduce several machine learning exploring large numbers interrelated features. They can adapted to geometrical models structure, set up linear sequence or general graph. These advanced require far more computational resources but show better performance than simpler alternatives might future.

参考文章(46)

Julian Besag, Statistical Analysis of Non-Lattice Data The Statistician. ,vol. 24, pp. 179- 195 ,(1975) , 10.2307/2987782

Balaraman Ravindran, Pranjal Awasthi, Aakanksha Gagrani, Image modeling using tree structured conditional random fields international joint conference on artificial intelligence. pp. 2060- 2065 ,(2007)

Gerhard Paaß, Frank Reichartz, Exploiting Semantic Constraints for Estimating Supersenses with CRFs. siam international conference on data mining. pp. 485- 496 ,(2009)

S. Messelodi, C. M. Modena, R. Cattoni, T. Coianiz, Geometric Layout Analysis Techniques for Document Image Understanding: a Review ,(2008)

Thomas M. Breuel, High Performance Document Layout Analysis ,(2003)

O.T. Akindele, A. Belaid, Page segmentation by segment tracing Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93). pp. 341- 344 ,(1993) , 10.1109/ICDAR.1993.395719

Fuchun Peng, Andrew McCallum, Accurate Information Extraction from Research Papers using Conditional Random Fields north american chapter of the association for computational linguistics. pp. 329- 336 ,(2004)

Faisal Shafait, Daniel Keysers, Thomas M. Breuel, Performance comparison of six algorithms for page segmentation document analysis systems. pp. 368- 379 ,(2006) , 10.1007/11669487_33

Jaekyu Ha, R.M. Haralick, I.T. Phillips, Document page decomposition by the bounding-box project international conference on document analysis and recognition. ,vol. 2, pp. 1119- 1122 ,(1995) , 10.1109/ICDAR.1995.602115

10.

Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)

Machine Learning for Document Structure Recognition

来源期刊

我的账户

Machine Learning for Document Structure Recognition

来源期刊

相似文章 10

我的账户