作者: Bing Xiang , Kham Nguyen , Long Nguyen , R. Schwartz , J. Makhoul
DOI: 10.1109/ICASSP.2006.1660214
关键词:
摘要: In this paper, we present a novel approach for morphological de-composition in large vocabulary Arabic speech recognition. It achieved low out-of-vocabulary (OOV) rate as well high recognition accuracy state-of-the-art broadcast news transcription system. approach, the compound words are decomposed into stems and affixes both language training acoustic data. The output re-joined before scoring. Four algorithms experimented compared work. best system 1.9% absolute reduction (9.8% relative) word error (WER) when to 64K-word baseline. performance of is also comparable 300K-word trained on normal words. meantime, much faster terms speed needs less memory than systems with larger 64K vocabularies.