Extracting Words and Multi-Part Symbols in Graphics Rich Documents

作者: Mark Burge , Gladys Monagan

DOI: 10.1007/3-540-60298-4_310

关键词:

摘要: We present an algorithm for grouping multipart symbols, dashed lines, and character strings extraction from line drawings. The image undergoes a lossless raster-to-vector conversion creating as its vector representation undirected graph, so-called run graph. Next, the elements of graph are extracted classified probabilistically based upon their geometric features using decision tree. An area Voronoi tessellation members sets is constructed, which neighborhood derived, guaranteed to be minimal complete. then traversed group various input different recognition modules. No priori font or other domain specific information required grouping, no special geometrical relationships among assumed. Results presented with example images taken those used by our Swiss cadastral map understanding system.

参考文章(7)
Liqiu Meng, Toward the Automatic Digitization of Map Text Mustererkennung 1991, 13. DAGM-Symposium. pp. 361- 366 ,(1991) , 10.1007/978-3-662-08896-8_47
Mark J. Burge, Gladys Monagan, Using the Voronoi tessellation for grouping words and multipart symbols in documents SPIE's 1995 International Symposium on Optical Science, Engineering, and Instrumentation. ,vol. 2573, pp. 116- 124 ,(1995) , 10.1117/12.216407
Atsuyuki Okabe, Barry Boots, Kokichi Sugihara, Nearest neighbourhood operations with generalized Voronoi diagrams: a review International Journal of Geographic Information Systems. ,vol. 8, pp. 43- 71 ,(1994) , 10.1080/02693799408901986
Friedrich M. Wahl, Kwan Y. Wong, Richard G. Casey, Block segmentation and text extraction in mixed text/image documents Computer Graphics and Image Processing. ,vol. 20, pp. 375- 390 ,(1982) , 10.1016/0146-664X(82)90059-4
A. Nakamura, O. Shiku, M. Anegawa, C. Nakamura, H. Kuroda, A method for recognizing character strings from maps using linguistic knowledge international conference on document analysis and recognition. pp. 561- 564 ,(1993) , 10.1109/ICDAR.1993.395673
L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 10, pp. 910- 918 ,(1988) , 10.1109/34.9112
L. Boatto, V. Consorti, M. Del Buono, S. Di Zenzo, V. Eramo, A. Esposito, F. Melcarne, M. Meucci, A. Morelli, M. Mosciatti, S. Scarci, M. Tucci, An interpretation system for land register maps IEEE Computer. ,vol. 25, pp. 25- 33 ,(1992) , 10.1109/2.144437