System and method for identifying text-based SPAM in rasterized images

作者: Evgegy P. Smirnov

DOI:

关键词:

摘要: A system, method and computer program product for identifying spam in an image, including (a) a plurality of contours the corresponding to probable symbols; (b) ignoring that are too small or large; (c) text lines based on remaining contours; (d) parsing into words; (e) words short long from identified lines; (f) short; (g) verifying image contains by comparing number pixels symbol color within total there is at least one line after filtration; (h) if text, rendering spam/no verdict contour representation which appears step (f).

参考文章(20)
Axel San Jose Wernicke, Rainer W. Santa Clara Lienhart, Generalized text localization in images ,(2001)
Daniel P. Huttenlocher, Eric W. Jaquith, Method for identifying word bounding boxes in text ,(1993)
Roland G. Borrey, Mauritius A.R. Schmidtler, Robert A. Taylor, Joel S. Fechter, Hari S. Asuri, Systems and methods of accessing random access cache for rescanning ,(2006)
Steven C. Bagley, Dan S. Bloomberg, Daniel P. Huttenlocher, Douglass R. Cutting, M. Margaret Withgott, Todd A. Cass, Ramana B. Rao, Per-Kristian Halvorsen, Ronald M. Kaplan, Methods and apparatus for selecting semantically significant images in a document image without decoding image content ,(1992)
Dan S. Bloomberg, Daniel P. Huttenlocher, Todd A. Cass, Ramana B. Rao, Per-Kristian Halvorsen, Ronald M. Kaplan, Steven C. Bagley, M. Margaret Withgott, Method and apparatus for summarizing a document without document image decoding ,(1992)
Jan H. Elenbaas, Lalitha Agnihotri, Nevenka Dimitrova, Method and system for analyzing video content using detected text in video frames ,(1999)
Daniel P. Huttenlocher, Todd A. Cass, Ramana B. Rao, Per-Kristian Halvorsen, Ronald M. Kaplan, M. Margaret Withgott, Method and apparatus for determining the frequency of words in a document without document image decoding ,(1992)