Multilingual MLP features for low-resource LVCSR systems

作者: Samuel Thomas , Sriram Ganapathy , Hynek Hermansky

DOI: 10.1109/ICASSP.2012.6288862

关键词: Low resourceSpeech recognitionNatural language processingFeature extractionVocabularyComputer sciencePerceptronArtificial intelligenceTraining set

摘要: We introduce a new approach to training multilayer perceptrons (MLPs) for large vocabulary continuous speech recognition (LVCSR) in languages which have only few hours of annotated in-domain data (for example, 1 hour data). In our approach, amounts out-of-domain from multiple are used train multilingual MLP systems without dealing with the different phoneme sets these languages. Features extracted LVCSR low-resource language similar Tandem approach. experiments, proposed features provide relative improvement about 30% an setting one data.

参考文章(11)
Qifeng Zhu, Barry Y. Chen, Nelson Morgan, Learning long-term temporal features in LVCSR using neural networks. conference of the international speech communication association. ,(2004)
Frantisek Grezl, Martin Karafiat, Stanislav Kontar, Jan Cernocky, Probabilistic and Bottle-Neck Features for LVCSR of Meetings international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 757- 760 ,(2007) , 10.1109/ICASSP.2007.367023
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky, Modulation frequency features for phoneme recognition in noisy speech Journal of the Acoustical Society of America. ,vol. 125, ,(2009) , 10.1121/1.3040022
Hynek Hermansky, Perceptual linear predictive (PLP) analysis of speech Journal of the Acoustical Society of America. ,vol. 87, pp. 1738- 1752 ,(1990) , 10.1121/1.399423
Fabio Valente, Hynek Hermansky, Combination of Acoustic Classifiers Based on Dempster-Shafer Theory of Evidence international conference on acoustics, speech, and signal processing. ,vol. 4, pp. 1129- 1132 ,(2007) , 10.1109/ICASSP.2007.367273
J. Park, F. Diehl, M.J.F. Gales, M. Tomalin, P.C. Woodland, Training and adapting MLP features for Arabic speech recognition international conference on acoustics, speech, and signal processing. pp. 4461- 4464 ,(2009) , 10.1109/ICASSP.2009.4960620
Lukas Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel, Martin Karafiat, Daniel Povey, Ariya Rastrow, Richard C. Rose, Samuel Thomas, Multilingual acoustic modeling for speech recognition based on subspace Gaussian Mixture Models international conference on acoustics, speech, and signal processing. pp. 4334- 4337 ,(2010) , 10.1109/ICASSP.2010.5495646
Hui Lin, Li Deng, Dong Yu, Yi-fan Gong, Alex Acero, Chin-Hui Lee, A study on multilingual acoustic modeling for large vocabulary ASR international conference on acoustics, speech, and signal processing. pp. 4333- 4336 ,(2009) , 10.1109/ICASSP.2009.4960588
H. Hermansky, D.P.W. Ellis, S. Sharma, Tandem connectionist feature extraction for conventional HMM systems international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1635- 1638 ,(2000) , 10.1109/ICASSP.2000.862024
J.J. Godfrey, E.C. Holliman, J. McDaniel, SWITCHBOARD: telephone speech corpus for research and development international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 517- 520 ,(1992) , 10.1109/ICASSP.1992.225858