作者: Leon Todoran , Marco Aiello , Christof Monz , Marcel Worring
DOI: 10.1117/12.410827
关键词:
摘要: We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents which only general layout information is assumed. In particular, we focus reading order. Our integrates components computer vision, artificial intelligence, and natural language processing techniques. The prominent feature our framework its ability to handle from heterogeneous collections. has been evaluated standard collection measure quality order detection. Experimental results each component as whole are presented discussed in detail. performance promising, especially when considering diversity collection.