作者: Mengzhe Chen , Qingqing Zhang , Jielin Pan , Yonghong Yan
关键词:
摘要: In current DNN/HMM hybrid systems, the DNN models are trained by 1-of-V targets which obtained Viterbi-based forced-alignment. The states viewed as unrelated and isolated. fact, some phonemes acoustically similar. Especially for Chinese, a tonal language, its number of similar pairs is quadrupled. To add similarity information between into model training, correlation-generated investigated in modeling. For each frame, besides target state from forced-alignment, other to this will be assigned nonzero values. degrees measured through calculating correlation. paper, different methods generating correlation matrix were investigated, details implementation with described. On task Mandarin conversational speech recognition customer-service domain, experiments showed that System based on achieved consistent improvements amounts training data.