作者: Yang Yang , Anusha Lalitha , Jinwon Lee , Chris Lott
DOI: 10.1109/ICASSP.2019.8682157
关键词: Noise (video) 、 Pronunciation 、 Set (abstract data type) 、 Acoustic model 、 Pipeline (computing) 、 Computer science 、 Stress (linguistics) 、 Speech recognition 、 Grammar
摘要: This paper proposes a novel pipeline for automatic grammar augmentation that provides significant improvement in the voice command recognition accuracy systems with small footprint acoustic model (AM). The is achieved by augmenting user-defined set, also called alternate expressions. For given set of potential expressions (candidate set) constructed from an AM-specific statistical pronunciation dictionary captures consistent patterns and errors decoding AM induced variations pronunciation, pitch, tempo, accent, ambiguous spellings, noise conditions. Using this candidate greedy optimization based cross-entropy-method (CEM) algorithms are considered to search augmented improved utilizing command-specific dataset. Our experiments show proposed along significantly reduce mis-detection mis-classification rate without increasing false-alarm rate. Experiments demonstrate superior performance CEM method over greedy-based algorithms.