Text line extraction in graphical documents using background and foreground information

作者: Partha Pratim Roy , Umapada Pal , Josep Lladós

DOI: 10.1007/S10032-011-0167-3

关键词:

摘要: In graphical documents (e.g., maps, engineering drawings), artistic etc., the text lines are annotated in multiple orientations or curvilinear way to illustrate different locations symbols. For optical character recognition of such documents, individual from need be extracted. this paper, we propose a novel method segment and is based on foreground background information components. To effectively utilize information, water reservoir concept used here. proposed scheme, at first, components detected grouped into clusters hierarchical using size positional information. Next, extended two extreme sides determine potential candidate regions. Finally, with help these regions, The experimental results presented datasets camera-based warped noisy images containing seals, etc. demonstrate that our approach robust invariant orientation present document.

参考文章(26)
Frank H�nes, J�rgen Lichter, Layout extraction of mixed mode documents machine vision applications. ,vol. 7, pp. 237- 246 ,(1994) , 10.1007/BF01213414
U. Pal, Bidyut Baran Chaudhuri, Kaushik Roy, A System for Joining and Recognition of Broken Bangla Numerals for Indian Postal Automation. indian conference on computer vision, graphics and image processing. pp. 581- 586 ,(2004)
U. Pal, S. Sinha, B. B. Chaudhuri, Multi-oriented english text line identification scandinavian conference on image analysis. pp. 1146- 1153 ,(2003) , 10.1007/3-540-45103-X_150
Karl Tombre, Salvatore Tabbone, Loïc Pélissier, Bart Lamiroy, Philippe Dosch, Text/Graphics Separation Revisited document analysis systems. ,vol. 2423, pp. 200- 211 ,(2002) , 10.1007/3-540-45869-7_24
Poh Kok Loo, Chew Lim Tan, Word and Sentence Extraction Using Irregular Pyramid document analysis systems. pp. 307- 318 ,(2002) , 10.1007/3-540-45869-7_36
Mark Burge, Gladys Monagan, Extracting Words and Multi-Part Symbols in Graphics Rich Documents international conference on image analysis and processing. pp. 533- 538 ,(1995) , 10.1007/3-540-60298-4_310
Partha Pratim Roy, Umapada Pal, Josep Lladós, Touching text character localization in graphical documents using SIFT graphics recognition. ,vol. 6020, pp. 199- 211 ,(2009) , 10.1007/978-3-642-13728-0_18
Bruno Taconet, Laurence Likforman-Sulem, Abderrazak Zahour, Text line segmentation of historical documents: a survey International Journal on Document Analysis and Recognition. ,vol. 9, pp. 123- 138 ,(2007) , 10.1007/S10032-006-0023-Z
Hideaki Goto, Hirotomo Aso, Extracting curved text lines using local linearity of the text line International Journal on Document Analysis and Recognition. ,vol. 2, pp. 111- 119 ,(1999) , 10.1007/S100320050041
B. Gatos, I. Pratikakis, K. Ntirogiannis, Segmentation Based Recovery of Arbitrarily Warped Document Images international conference on document analysis and recognition. ,vol. 2, pp. 989- 993 ,(2007) , 10.1109/ICDAR.2007.4377063