作者: Yun Lei , Aaron Lawson , Mitchell McLaren , Luciana Ferrer , Nicolas Scheffer
DOI:
关键词:
摘要: This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained automatic speech recognition (ASR). In the CNN/i-vector frontend, CNN is used to obtain posterior probabilities i-vector training and extraction instead of universal background model (UBM). The CNN/posterior frontend somewhat similar phonetic system in that occupation counts (tied) triphone states (senones) given by are classification. They compressed low dimensional vector probabilistic principal component analysis (PPCA). Evaluated on heavily degraded data, proposed front ends provide significant improvements up 50% average equal error rate compared UBM/i-vector baseline. Moreover, complementary give gains 20% relative best single when combined.