作者: Laurence Likforman-Sulem , Chafic Mokbel , Adrian Popescu , Cristina Oprean
DOI:
关键词: Handwriting recognition 、 Word (computer architecture) 、 Sequence 、 Natural language processing 、 Web resource 、 Computer science 、 Decoding methods 、 Artificial intelligence
摘要: Handwriting recognition systems rely on predefined classifiers. Small and static dictionaries are usually exploited to obtain high in-vocabulary (IV) accuracy at the expense of coverage. Thus out-of-vocabulary (OOV) words cannot be handled efficiently. To improve OOV while keeping IV small, we introduce a multi-step approach that exploits Web resources. After an initial IV-OOV classification, external resources used create sequence-adapted dynamic dictionaries. A final CTC-based decoding is performed over dictionary determine most probable word for sequence. We validate our with experiments conducted RIMES dataset. Results show improvements obtained compared standard handwriting recognition. MOTS-CLES : reconnaissance d’ecriture manuscrite, dictionnaires dynamiques, Wikipedia