Using stochastic syntactic analysis for extracting a logical structure from a document image

作者: Y. Tateisi , N. Itoh

DOI: 10.1109/ICPR.1994.576951

关键词:

摘要: A method of stochastic syntactic analysis is applied to extracting the logical structure a printed document from its physical layout and keywords indicating components. The parsed as sentence consisting text lines graphic objects according regular grammar with attributes. By using analysis, parser can retain possible results in order their probability, thus, if ambiguity occurs, it selects an optimal result more appropriately than deterministic systems. mark up system applying was constructed, 87% components manuals 82% those technical papers are correctly marked up. rate improved 89% when second candidates were considered, showing advantage authors' approach over approach.

参考文章(6)
Robert Stutely, The Standard Generalized Markup Language Workstations and Publication Systems. pp. 176- 189 ,(1987) , 10.1007/978-1-4612-4770-8_16
Y. Hirayama, A block segmentation method for document images with complicated column structures Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93). pp. 91- 94 ,(1993) , 10.1109/ICDAR.1993.395775
Henry Thompson, Best-first enumeration of paths through a lattice—an active chart parsing solution Computer Speech & Language. ,vol. 4, pp. 263- 274 ,(1990) , 10.1016/0885-2308(90)90008-T
F.K. Soong, E.-F. Huang, A tree-trellis based fast search for finding the N-best sentence hypotheses in continuous speech recognition international conference on acoustics, speech, and signal processing. pp. 705- 708 ,(1991) , 10.1109/ICASSP.1991.150437
Miwako Doi, Mika Fukui, Kouji Tamaguchi, Youichi Takebyashi, Isamu Iwai, Development of document architecture extraction Systems and Computers in Japan. ,vol. 25, pp. 67- 82 ,(1994) , 10.1002/SCJ.4690250906
A. Conway, Page grammars and page parsing. A syntactic approach to document layout recognition international conference on document analysis and recognition. pp. 761- 764 ,(1993) , 10.1109/ICDAR.1993.395626