An Approach to Normalization of Dai Text for Speech Synthesis

作者: 烛梅 伍

DOI: 10.12677/CSA.2016.67051

关键词: Natural language processingArtificial intelligenceComputer science

摘要: 本文以开发傣语语音合成系统为目的,重点研究傣语文本中的数字归一化和特殊字符归一化问题。数字和特殊字符都属于傣语文本中的非标准词,文本归一化的主要目的是用标准词表示非标准词的发音。归一化处理过程包括:非标准词识别、歧义判断、消歧处理和非标准词转换为标准词4个步骤。本文采用基于规则和上下文关键词相结合的方法识别非标准词,利用正则表达式判断其歧义类型,根据转换规则对非标准词进行消歧并确定其正确的傣文读音。实验结果表明,本文提出的文本归一化方法的正确率达到了94.6%,可以完全满足傣语文语转换系统前端文本分析的需求,并具有良好的自然语言处理应用价值。 With the purpose of developing a Dai speech synthesis system, this paper focuses on study numbers and special characters normalization. Both are non-standard words in text. The main text normalization is to represent pronunciation with standard words. process includes recognition, ambiguity judgment, disambiguation transla-tion. Firstly, recognized ambiguous types these non-stan- dard determined using method based rule-based context-keyword, paper. Then, judged regular expression. Lastly, correct no-standard according transformation rules. Experimental results show that rate more than 94.6%. This purposed can fully satisfy front-end analysis conversion has good natural language processing application value.

参考文章(3)
Robin Haunschild, Lutz Bornmann, Normalization of Mendeley reader counts for impact assessment Journal of Informetrics. ,vol. 10, pp. 62- 73 ,(2016) , 10.1016/J.JOI.2015.11.003
Richard Sproat, Alan W. Black, Stanley Chen, Shankar Kumar, Mari Ostendorf, Christopher Richards, Normalization of non-standard words Computer Speech & Language. ,vol. 15, pp. 287- 333 ,(2001) , 10.1006/CSLA.2001.0169
Timothy Edmunds, Huw Hopkins, Broadcast system using text to speech conversion ,(2011)