作者: Ryan Price , Ken-ichi Iso , Koichi Shinoda
关键词:
摘要: Deep neural networks (DNN) used for acoustic modeling in speech recognition often have a very large number of output units corresponding to context dependent (CD) triphone HMM states. The amount data available speaker adaptation is limited so majority these CD states may not be observed during adaptation. In this case, the posterior probabilities unseen are only pushed towards zero DNN and ability predict can degraded relative independent network. We address problem by appending an additional layer which maps original set classes smaller phonetic (e.g. monophones) thereby reducing occurrences data. Adaptation proceeds backpropagation errors from new layer, disregarded at time when over used. demonstrate benefits approach adapting network with using experiments on Japanese voice search task obtain 5.03% reduction character error rate approximately 60 seconds