作者: Gerhard Paaß , Iuliu Konya
DOI: 10.1007/978-3-642-22613-7_12
关键词:
摘要: The backbone of the information age is digital which may be searched, accessed, and transferred instantaneously. Therefore digitization paper documents extremely interesting. This chapter describes approaches for document structure recognition detecting hierarchy physical components in images documents, such as pages, paragraphs, figures, transforms this into a logical components, titles, authors, sections. structural improves readability useful indexing retrieving contained documents. First we present rule-based system segmenting image estimating role these zones. It extensively used processing newspaper collections showing world-class performance. In second part introduce several machine learning exploring large numbers interrelated features. They can adapted to geometrical models structure, set up linear sequence or general graph. These advanced require far more computational resources but show better performance than simpler alternatives might future.