Bootstrapping structured page segmentation

关键词:

摘要: In this paper, we present an approach to the bootstrap learning of a page segmentation model. The idea evolves from attempts segment dictionaries that often have consistent structure, and is extended more general structured documents. cases highly regular layout can be learned examples only few pages. system first trained using small number samples, larger test set processed based on training result. After making corrections selected subset set, these corrected samples are combined with original generate samples. newly created used retrain system, refine features resegment This procedure applied iteratively until parameters stable. Using approach, do not need initially provide large We many documents such as dictionaries, phone books, spoken language transcripts, obtained satisfying performance.

uni-trier.de 本地加速

spiedigitallibrary.org 本地加速

harvard.edu 本地加速

spiedigitallibrary.org 本地加速

doi.org 本地加速

spiedigitallibrary.org LINK 下载加速

umd.edu LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(8)

Tapas Kanungo, Song Mao, Stochastic Language Models for Automatic Acquisition of Lexicons from Printed Bilingual Dictionaries ,(2001)

Haralick, Document image understanding: geometric and logical layout computer vision and pattern recognition. pp. 385- 390 ,(1994) , 10.1109/CVPR.1994.323855

David S. Doermann, Huanfeng Ma, Burcu Karagol-Ayan, Douglas W. Oard, Translation lexicon acquisition from bilingual dictionaries document recognition and retrieval. ,vol. 4670, pp. 37- 48 ,(2001) , 10.1117/12.450737

Y. Hamamoto, S. Uchimura, S. Tomita, A bootstrap technique for nearest neighbor classifier design IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 19, pp. 73- 79 ,(1997) , 10.1109/34.566814

B. Efron, Bootstrap Methods: Another Look at the Jackknife Annals of Statistics. ,vol. 7, pp. 1- 26 ,(1979) , 10.1214/AOS/1176344552

Seong-Whan Lee, Dae-Seok Ryu, Parameter-free geometric document layout analysis IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 23, pp. 1240- 1256 ,(2001) , 10.1109/34.969115

Jisheng Liang, I.T. Phillips, R.M. Haralick, An optimization methodology for document structure extraction on Latin character documents IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 23, pp. 719- 734 ,(2001) , 10.1109/34.935846

G.E. Kopec, P.A. Chou, Document image decoding using Markov source models IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 16, pp. 602- 617 ,(1994) , 10.1109/34.295905

Bootstrapping structured page segmentation

来源期刊

我的账户

Bootstrapping structured page segmentation

来源期刊

相似文章 10

我的账户