Distinguishing mathematics notation from English text using computational geometry

作者： D.M. Drake , H.S. Baird

DOI: 10.1109/ICDAR.2005.89

关键词:

摘要: A trainable method for distinguishing between mathematics notation and natural language (here, English) in images of textlines, using computational geometry methods only with no assistance from symbol recognition, is described. The input to our a "neighbor graph" extracted bilevel image an isolated textline by the Kise et al. (1998): this pruned form Delaunay triangulation set locations black connected components. Our first attempts classify each vertex and, separately, edge neighbor graph as belonging math or English; then these results are combined yield classification entire textline. All three classifiers automatically trainable. Features were selected semi-manually large number process driven training data: stage potentially fully automatable. In experiments on scanned books generated synthetically, methodology converged iterations classifier error rate less than one percent.

uni-trier.de 本地加速

sci-hub.se PDF 下载加速

参考文章(10)

Masayuki Okamoto, Akira Miyazawa, An Experimental Implementation of a Document Recognition System for Papers Containing Mathematical Expressions Springer Berlin Heidelberg. pp. 36- 53 ,(1992) , 10.1007/978-3-642-77281-8_2

David G. Stork, Richard O. Duda, Peter E. Hart, Pattern Classification (2nd ed.) ,(1999)

Yue Lu, Zhe Wang, Chew Lim Tan, Word Grouping in Document Images Based on Voronoi Tessellation Document Analysis Systems VI. ,vol. 3163, pp. 147- 157 ,(2004) , 10.1007/978-3-540-28640-0_14

Masakazu Suzuki, Toshihiro Kanahori, Nobuyuki Ohtake, Katsuhito Yamaguchi, An Integrated OCR Software for Mathematical Documents and Its Output with Accessibility Lecture Notes in Computer Science. pp. 648- 655 ,(2004) , 10.1007/978-3-540-27817-7_97

HENRY S. BAIRD, BACKGROUND STRUCTURE IN DOCUMENT IMAGES International Journal of Pattern Recognition and Artificial Intelligence. ,vol. 8, pp. 1013- 1030 ,(1994) , 10.1142/S0218001494000516

Richard J. Fateman, How to find mathematics on a scanned page document recognition and retrieval. ,vol. 3967, pp. 98- 109 ,(1999) , 10.1117/12.373482

Koichi Kise, Akinori Sato, Motoi Iwata, Segmentation of Page Images Using the Area Voronoi Diagram Computer Vision and Image Understanding. ,vol. 70, pp. 370- 382 ,(1998) , 10.1006/CVIU.1998.0684

D.J. Ittner, H.S. Baird, Language-free layout analysis Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93). pp. 336- 340 ,(1993) , 10.1109/ICDAR.1993.395720

S.P. Chowdhury, S. Mandal, A.K. Das, B. Chanda, Automated segmentation of math-zones from document images international conference on document analysis and recognition. pp. 755- 759 ,(2003) , 10.1109/ICDAR.2003.1227763

10.

David G. Stork, Richard O. Duda, Peter E. Hart, Pattern Classification ,(1973)

Distinguishing mathematics notation from English text using computational geometry

来源期刊

我的账户

Distinguishing mathematics notation from English text using computational geometry

来源期刊

相似文章 10

我的账户