Application of Convolutional Neural Networks to Language Identification in Noisy Conditions

作者: Yun Lei , Aaron Lawson , Mitchell McLaren , Luciana Ferrer , Nicolas Scheffer

DOI:

关键词:

摘要: This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained automatic speech recognition (ASR). In the CNN/i-vector frontend, CNN is used to obtain posterior probabilities i-vector training and extraction instead of universal background model (UBM). The CNN/posterior frontend somewhat similar phonetic system in that occupation counts (tied) triphone states (senones) given by are classification. They compressed low dimensional vector probabilistic principal component analysis (PPCA). Evaluated on heavily degraded data, proposed front ends provide significant improvements up 50% average equal error rate compared UBM/i-vector baseline. Moreover, complementary give gains 20% relative best single when combined.

参考文章(23)
Luis Fernando D'Haro, Oldrich Plchot, Mehdi Soufifar, Jan Cernocký, Ondrej Glembek, Ricardo de Córdoba, Pavel Matejka, Phonotactic language recognition using i-vectors and phoneme posteriogram counts conference of the international speech communication association. pp. 42- 45 ,(2012)
Jan Cernocký, Petr Schwarz, Pavel Matejka, Pavel Chytil, Phonotactic language identification using high quality phoneme recognition. conference of the international speech communication association. pp. 2237- 2240 ,(2005)
Mikel Peñagarikano, Luis Javier Rodríguez-Fuentes, Germán Bordel, Mireia Díez, Amparo Varona, Study of different backends in a state-of-the-art language recognition system conference of the international speech communication association. pp. 2049- 2052 ,(2012)
M. Karafiat, O. Plchot, Sandro Cumani, M. Soufifar, N. Brummer, O. Glembek, J. Pesan, P. Matejka, J. Cernocky, E. De Villiers, Description and analysis of the Brno276 system for LRE2011 Odyssey 2012: The Speaker and Language Recognition Workshop. pp. 216- 223 ,(2012)
Yoshua Bengio, Yoshua Bengio, Yoshua Bengio, Yann LeCun, Convolutional networks for images, speech, and time series The handbook of brain theory and neural networks. pp. 255- 258 ,(1998)
Yun Lei, Nicolas Scheffer, Luciana Ferrer, Mitchell McLaren, A novel scheme for speaker recognition using a phonetically-aware deep neural network international conference on acoustics, speech, and signal processing. pp. 1695- 1699 ,(2014) , 10.1109/ICASSP.2014.6853887
Mitchell McLaren, Nicolas Scheffer, Luciana Ferrer, Yun Lei, Effective use of DCTS for contextualizing features for speaker recognition international conference on acoustics, speech, and signal processing. pp. 4027- 4031 ,(2014) , 10.1109/ICASSP.2014.6854358
Tara N. Sainath, Abdel-rahman Mohamed, Brian Kingsbury, Bhuvana Ramabhadran, Deep convolutional neural networks for LVCSR international conference on acoustics, speech, and signal processing. pp. 8614- 8618 ,(2013) , 10.1109/ICASSP.2013.6639347
Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition Proceedings of the IEEE. ,vol. 86, pp. 2278- 2324 ,(1998) , 10.1109/5.726791
Wade Shen, William Campbell, Terry Gleason, Doug Reynolds, Elliot Singer, Experiments with Lattice-based PPRLM Language Identification 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. pp. 1- 6 ,(2006) , 10.1109/ODYSSEY.2006.248100