Visual language processing (VLP) of ancient manuscripts: Converting collections to windows on the past

作者: Mohamed Cheriet , Reza Farrahi Moghaddam , Rachid Hedjam

DOI: 10.1109/IEEEGCC.2013.6705813

关键词:

摘要: Ancient manuscripts constitute a primary carrier of cultural heritage globally, and they are currently being intensively digitized all over the world to ensure their preservation, and, ultimately, wide accessibility content. Critical this research process legibility documents in image form, access live texts. Several state-of-the-art methods approaches have been proposed developed address challenges associated with processing these manuscripts. However, there is huge amount data involved, also high cost scarcity human expert feedback reference call for development fundamental that encompass aspects an objective tractable manner. In paper, we propose one such approach, which novel framework computational pattern analysis ancient data-driven, multilevel, self-sustaining, learning-based, takes advantage large quantities unprocessed available. Unlike many approaches, fast-forward feature vectors, our innovative represents new perspective on task, starts from ground zero problem, definition objects. addition, it leverages data-driven mining relations among objects discover hidden but persistent links between them. The problem addressed at three main levels. At lowest level, images, tackles automatic, enhancement restoration document images using spatial, spectral, sparse, graph-based representations visual second transliteration, directed graphical models, HMMs, Undirected Random Fields, spatial models used extract text manuscript reduces dependency experts. Finally, highest network (from patches words writers) involves search `social networks' linking Considering approach under umbrella Visual Language Processing (VLP), hope will be further enriched by community, form insights contributed various

参考文章(42)
Hendrik Pesch, Mahdi Hamdani, Jens Forster, Hermann Ney, Analysis of Preprocessing Techniques for Latin Handwriting Recognition international conference on frontiers in handwriting recognition. pp. 280- 284 ,(2012) , 10.1109/ICFHR.2012.179
David Hebert, Stephane Nicolas, Thierry Paquet, Discrete CRF Based Combination Framework for Document Image Binarization international conference on document analysis and recognition. pp. 1165- 1169 ,(2013) , 10.1109/ICDAR.2013.236
Michael Elad, Sparse and Redundant Representation Modeling—What Next? IEEE Signal Processing Letters. ,vol. 19, pp. 922- 928 ,(2012) , 10.1109/LSP.2012.2224655
Youssouf Chherawala, Partha Pratim Roy, Mohamed Cheriet, Feature Design for Offline Arabic Handwriting Recognition: Handcrafted vs Automated? international conference on document analysis and recognition. pp. 290- 294 ,(2013) , 10.1109/ICDAR.2013.65
Rachid Hedjam, Mohamed Cheriet, Historical document image restoration using multispectral imaging system Pattern Recognition. ,vol. 46, pp. 2297- 2312 ,(2013) , 10.1016/J.PATCOG.2012.12.015
Andrew Piper, Reading's Refrain: From Bibliography to Topology ELH. ,vol. 80, pp. 373- 399 ,(2013) , 10.1353/ELH.2013.0022
Yao Qian, F. K. Soong, Zhi-Jie Yan, A Unified Trajectory Tiling Approach to High Quality Speech Rendering IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 21, pp. 280- 290 ,(2013) , 10.1109/TASL.2012.2221460
José A. Rodríguez-Serrano, Florent Perronnin, Handwritten word-spotting using hidden Markov models and universal vocabularies Pattern Recognition. ,vol. 42, pp. 2106- 2116 ,(2009) , 10.1016/J.PATCOG.2009.02.005
Vladislavs Dovgalecs, Alexandre Burnett, Pierrick Tranouez, Stephane Nicolas, Laurent Heutte, Spot It! Finding Words and Patterns in Historical Documents international conference on document analysis and recognition. pp. 1039- 1043 ,(2013) , 10.1109/ICDAR.2013.208
Ismet Zeki Yalniz, Ethem F. Can, R. Manmatha, Partial duplicate detection for large book collections Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11. pp. 469- 474 ,(2011) , 10.1145/2063576.2063647