作者: Y. Tateisi , N. Itoh
关键词:
摘要: A method of stochastic syntactic analysis is applied to extracting the logical structure a printed document from its physical layout and keywords indicating components. The parsed as sentence consisting text lines graphic objects according regular grammar with attributes. By using analysis, parser can retain possible results in order their probability, thus, if ambiguity occurs, it selects an optimal result more appropriately than deterministic systems. mark up system applying was constructed, 87% components manuals 82% those technical papers are correctly marked up. rate improved 89% when second candidates were considered, showing advantage authors' approach over approach.