Utilisation du Web pour la reconnaissance de mots manuscrits hors vocabulaire.

作者: Laurence Likforman-Sulem , Chafic Mokbel , Adrian Popescu , Cristina Oprean

DOI:

关键词: Handwriting recognitionWord (computer architecture)SequenceNatural language processingWeb resourceComputer scienceDecoding methodsArtificial intelligence

摘要: Handwriting recognition systems rely on predefined classifiers. Small and static dictionaries are usually exploited to obtain high in-vocabulary (IV) accuracy at the expense of coverage. Thus out-of-vocabulary (OOV) words cannot be handled efficiently. To improve OOV while keeping IV small, we introduce a multi-step approach that exploits Web resources. After an initial IV-OOV classification, external resources used create sequence-adapted dynamic dictionaries. A final CTC-based decoding is performed over dictionary determine most probable word for sequence. We validate our with experiments conducted RIMES dataset. Results show improvements obtained compared standard handwriting recognition. MOTS-CLES : reconnaissance d’ecriture manuscrite, dictionnaires dynamiques, Wikipedia

参考文章(22)
Fang Zheng, Hui Sun, Mingxing Xu, Guoliang Zhang, Using Word Confidence Measure for OOV Words Detection in a Spontaneous Spoken Dialog System conference of the international speech communication association. ,(2003)
Maximilian Bisani, Hermann Ney, Open vocabulary speech recognition with flat hybrid models. conference of the international speech communication association. pp. 725- 728 ,(2005)
Abhinav Sethy, Frederick Jelinek, Mark Dredze, Carolina Parada, A spoken term detection framework for recovering out-of-vocabulary words using the web. conference of the international speech communication association. pp. 1269- 1272 ,(2010)
Stanislas Oger, Georges Linarès, Vladimir Popescu, Using the World Wide Web for Learning New Words in Continuous Speech Recognition Tasks: Two Case Studies SPECOM'2009. ,(2009)
James Glass, Issam Bazzi, Modelling out-of-vocabulary words for robust speech recognition conference of the international speech communication association. pp. 401- 404 ,(2002)
Robert Sabourin, Marisa Emika Morita, Automatic recognition of handwritten dates on brazilian bank cheques École de technologie supérieure. ,(2003)
V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals Soviet physics. Doklady. ,vol. 10, pp. 707- 710 ,(1966)
Anja Brakensiek, Daniel Willett, Gerhard Rigoll, Unlimited Vocabulary Script Recognition Using Character N-Grams Mustererkennung 2000, 22. DAGM-Symposium. pp. 436- 443 ,(2000) , 10.1007/978-3-642-59802-9_55
Mahdi Hamdani, Amr El-Desoky Mousa, Hermann Ney, Open Vocabulary Arabic Handwriting Recognition Using Morphological Decomposition international conference on document analysis and recognition. pp. 280- 284 ,(2013) , 10.1109/ICDAR.2013.63
Lokendra Shastri, Thomas Fontaine, Recognizing handwritten digit strings using modular spatio-temporal connectionist networks Connection Science. ,vol. 7, pp. 211- 246 ,(1995) , 10.1080/09540099550039237