Text string extraction within mixed-mode documents

作者: F. Hones , J. Lichter

DOI: 10.1109/ICDAR.1993.395652

关键词:

摘要: Digitized images of printed documents typically consist a mixture text, graphics, and image elements. For proper processing efficient representation, these elements have to be separated. most applications it is sufficient separate between text non-text, because captures the information. The authors describe implementation performance robust algorithm for string extraction which completely independent from orientation may deal with in various font styles sizes. Text objects nested non-text areas inverse printing can also analyzed. It should mentioned that no recognition individual characters performed. classification only based on rough features. >

参考文章(8)
J Patrick Bixler, Tracking text in mixed-mode documents Proceedings of the ACM conference on Document processing systems. pp. 177- 185 ,(2000) , 10.1145/62506.62541
Dacheng Wang, Sargur N Srihari, Classification of newspaper image blocks using texture analysis Graphical Models \/graphical Models and Image Processing \/computer Vision, Graphics, and Image Processing. ,vol. 47, pp. 327- 352 ,(1989) , 10.1016/0734-189X(89)90116-3
Friedrich M. Wahl, Kwan Y. Wong, Richard G. Casey, Block segmentation and text extraction in mixed text/image documents Computer Graphics and Image Processing. ,vol. 20, pp. 375- 390 ,(1982) , 10.1016/0146-664X(82)90059-4
E. Mandler, M.F. Oberlander, One-pass encoding of connected components in multivalued images international conference on pattern recognition. pp. 64- 69 ,(1990) , 10.1109/ICPR.1990.119331
L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 10, pp. 910- 918 ,(1988) , 10.1109/34.9112
Rainer Zimmer, Frank Hönes, SEPARATION OF TEXTUAL AND NON-TEXTUAL INFORMATION WITHIN MIXED-MODE DOCUMENTS* Journal of Machine Vision and Applications. pp. 71- 74 ,(1992)
J.L. Fisher, S.C. Hinds, D.P. D'Amato, A rule-based system for document image segmentation international conference on pattern recognition. pp. 567- 572 ,(1990) , 10.1109/ICPR.1990.118166
Lawrence O'Gorman, The Document Spectrum for Bottom-Up Page Layout Analysis WORLD SCIENTIFIC. pp. 270- 279 ,(1993) , 10.1142/9789812797919_0021