System and method for identifying pictures in documents

作者: Laurent Denoue , Francine Chen , Patrick Chiu

DOI:

关键词:

摘要: A system and method to identify pictures in documents. An image representing a page of document is received. The analyzed text objects the page. masked generated by masking out regions including Groups pixels are identified, wherein respective group corresponds at least one picture When there or more groups pixels, for identified based on pixels. Metadata tags stored, metadata tag includes information about bounding box picture.

参考文章(19)
Trevor Darrell, Kristen Grauman, Pyramid match kernel and related techniques ,(2007)
Robert S. Cooperman, System for document layout analysis ,(1996)
Kathrin Berkner, Edward L. Schwartz, Christophe Marle, SmartNails: Display and image dependent thumbnails document recognition and retrieval. ,vol. 5296, pp. 54- 65 ,(2003) , 10.1117/12.523666
Dan S. Bloomberg, Francine R. Chen, Extraction of text-related features for condensing image documents Document Recognition III. ,vol. 2660, pp. 72- 88 ,(1996) , 10.1117/12.234726
Susan E. Hauser, Daniel X. Le, George R. Thoma, Automated zone correction in bitmapped document images document recognition and retrieval. ,vol. 3967, pp. 248- 258 ,(1999) , 10.1117/12.373499
L.A. Fletcher, R. Kasturi, A robust algorithm for text string separation from mixed text/graphics images IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 10, pp. 910- 918 ,(1988) , 10.1109/34.9112
Jianbo Shi, J. Malik, Normalized cuts and image segmentation IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 22, pp. 888- 905 ,(2000) , 10.1109/34.868688
S. Chen, S. Mao, G. Thoma, Simultaneous Layout Style and Logical Entity Recognition in a Heterogeneous Collection of Documents international conference on document analysis and recognition. ,vol. 1, pp. 118- 122 ,(2007) , 10.1109/ICDAR.2007.4378687