Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI.

作者: Daniel Povey , Vijayaditya Peddinti , Daniel Galvez , Pegah Ghahremani , Vimal Manohar

DOI: 10.21437/INTERSPEECH.2016-595

关键词:

摘要: … The basic premise of this paper is to do MMI training directly on the GPU, without lattices, … We don’t give any equations here, because MMI training is well known (eg see [4]). We …

参考文章(14)
Françoise Beaufays, Andrew W. Senior, Hasim Sak, Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition arXiv: Neural and Evolutionary Computing. ,(2014)
Hasim Sak, Andrew Senior, Kanishka Rao, Ozan Irsoy, Alex Graves, Francoise Beaufays, Johan Schalkwyk, Learning acoustic frame labeling for speech recognition with recurrent neural networks international conference on acoustics, speech, and signal processing. pp. 4280- 4284 ,(2015) , 10.1109/ICASSP.2015.7178778
Hang Su, Gang Li, Dong Yu, Frank Seide, Error back propagation for sequence training of Context-Dependent Deep NetworkS for conversational speech transcription international conference on acoustics, speech, and signal processing. pp. 6664- 6668 ,(2013) , 10.1109/ICASSP.2013.6638951
George Saon, Hagen Soltau, David Nahamoo, Michael Picheny, None, Speaker adaptation of neural network acoustic models using i-vectors ieee automatic speech recognition and understanding workshop. pp. 55- 59 ,(2013) , 10.1109/ASRU.2013.6707705
S.F. Chen, B. Kingsbury, Lidia Mangu, D. Povey, G. Saon, H. Soltau, G. Zweig, Advances in speech transcription at IBM under the DARPA EARS program IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 14, pp. 1596- 1608 ,(2006) , 10.1109/TASL.2006.879814
Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber, Connectionist temporal classification Proceedings of the 23rd international conference on Machine learning - ICML '06. pp. 369- 376 ,(2006) , 10.1145/1143844.1143891
Lukás Burget, Arnab Ghoshal, Karel Veselý, Daniel Povey, Sequence-discriminative training of deep neural networks conference of the international speech communication association. pp. 2345- 2349 ,(2013)
Daniel Povey, Mirko Hannemann, Gilles Boulianne, Lukas Burget, Arnab Ghoshal, Milos Janda, Martin Karafiat, Stefan Kombrink, Petr Motlicek, Yanmin Qian, Korbinian Riedhammer, Karel Vesely, Ngoc Thang Vu, Generating exact lattices in the WFST framework 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 4213- 4216 ,(2012) , 10.1109/ICASSP.2012.6288848
Paul Del'eglise, Yannick Est`eve, Anthony Rousseau, TED-LIUM: an Automatic Speech Recognition dedicated corpus language resources and evaluation. pp. 125- 129 ,(2012)
Abdel-rahman Mohamed, Frank Seide, Dong Yu, Jasha Droppo, Andreas Stoicke, Geoffrey Zweig, Gerald Penn, Deep bi-directional recurrent networks over spectral windows ieee automatic speech recognition and understanding workshop. pp. 78- 83 ,(2015) , 10.1109/ASRU.2015.7404777