作者: Liangcai Gao , Zhi Tang , Xiaoyan Lin , Yongtao Wang
DOI:
关键词:
摘要: The primary information units in a newspaper are the articles. Article reconstruction from newspapers including article aggregation and reading order recovery is known to be quite challenging task due complexity of multi-article page layout. In this paper, we propose novel approach for using bipartite graph framework, which models complex relationships between text blocks as one-to-one correspondences, accomplishes by finding optimal match on graph. During optimization process, various sources, geometric layout, linguistic semantic content, deeply mined model deal with wide range layouts. Moreover, different existing methods, perform two sub-tasks reverse order, that is, detect orders first then use aggregate belonging same Experimental results 3312 pages 23184 articles demonstrate our method outperforms state-of-the-art methods reconstruction. addition, has been adopted several large-scale digitalization projects.