作者: Yu Tsao , Chin-Hui Lee
DOI: 10.1109/ASRU.2007.4430087
关键词: Discriminative model 、 Gaussian process 、 Hidden Markov model 、 Gaussian 、 Artificial intelligence 、 Cluster analysis 、 Speech recognition 、 Speaker recognition 、 Pattern recognition 、 Affine transformation 、 Computer science 、 Digit recognition
摘要: Recently an ensemble speaker and speaking environment modeling (ESSEM) approach to characterizing unknown testing environments was studied for robust speech recognition. Each is modeled by a super-vector consisting of the entire set mean vectors from all Gaussian densities HMMs particular environment. The new then obtained affine transformation on super-vectors. In this paper, we propose minimum classification error training procedure obtain discriminative elements, clustering technique achieve refined structures. We test these two extentions ESSEM Aurora2. per-utterance unsupervised adaptation mode achieved average WER 4.99% OdB 20 dB conditions with when compared 5.51% ML-trained gender-dependent baseline. To our knowledge represents best result reported in literature Aurora2 connected digit recognition task.