Method and system for diacritizing arabic text

作者: Hamdy S Mubarak , Kareem Mohamed Darwish , Ahmed Abdelali , Hassan Sajjad , Younes Samih

DOI:

关键词:

摘要: The presently disclosed method and system automatically diacritize written Arabic text for use with applications that require verbalizing Arabic text. A method may comprise converting a written sentence into a word sequence and identifying a target source word. The method then may comprise repeatedly overlaying and translating a context window at a plurality of positions in the word sequence to select a plurality of subsets of the word sequence contained within the context window. A diacritized form of the target source word may be generated in each of the word sequence subsets. A final diacritized form of the target source word may be selected from the plurality of diacritized forms based on a voting scheme. The voting scheme may include selecting the diacritized form that is generated the most or may be based on a system of weighting factors.

参考文章(0)