Stochastic attributed K-d tree modeling of technical paper title pages

作者: S. Mao , A. Rosenfeld , T. Kanungo

DOI: 10.1109/ICIP.2003.1247016

关键词: Hidden Markov modelHierarchy (mathematics)Information retrievalSearch engine indexingStructure (mathematical logic)Title pageComputer scienceImage retrievalTree (data structure)k-d tree

摘要: Structural information about a document is essential for structured query processing, indexing, and retrieval. A page can be partitioned into hierarchy of homogeneous regions such as columns, paragraphs, etc.; these are called physical components, define the layout page. In this paper we develop class models layouts technical title pages. We model using hidden semiMarkov directional projections regions, stochastic attributed K-d tree grammar 2D hierarchical structure regions. use to generate sets synthetic images three distinctive styles, which in controlled experiments on analysis.

参考文章(6)
Taku A. Tokuyasu, Turbo recognition: decoding page layout acm/ieee joint conference on digital libraries. pp. 475- ,(2001) , 10.1145/379437.379810
Taku A. Tokuyasu, Philip A. Chou, Turbo recognition: a statistical approach to layout analysis document recognition and retrieval. ,vol. 4307, pp. 123- 129 ,(2000) , 10.1117/12.410829
G.E. Kopec, P.A. Chou, Document image decoding using Markov source models IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 16, pp. 602- 617 ,(1994) , 10.1109/34.295905
M. Krishnamoorthy, G. Nagy, S. Seth, M. Viswanathan, Syntactic segmentation and labeling of digitized pages from technical journals IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 15, pp. 737- 747 ,(1993) , 10.1109/34.221173
Tapas Kanungo, Robert M. Haralick, Ihsin Phillips, Nonlinear global and local document degradation models International Journal of Imaging Systems and Technology. ,vol. 5, pp. 220- 230 ,(1994) , 10.1002/IMA.1850050305
David R. Cox, Hilton D. Miller, The theory of stochastic processes ,(1965)