作者: F. Hones , J. Lichter
DOI: 10.1109/ICDAR.1993.395652
关键词:
摘要: Digitized images of printed documents typically consist a mixture text, graphics, and image elements. For proper processing efficient representation, these elements have to be separated. most applications it is sufficient separate between text non-text, because captures the information. The authors describe implementation performance robust algorithm for string extraction which completely independent from orientation may deal with in various font styles sizes. Text objects nested non-text areas inverse printing can also analyzed. It should mentioned that no recognition individual characters performed. classification only based on rough features. >