Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training

作者: Erik McDermott , Shinji Watanabe , Atsushi Nakamura

DOI:

关键词: Hidden Markov modelComputer scienceDiscriminative modelHinge lossSupport vector machineSpeech recognition

摘要: Abstract Using the central observation that margin-based weightedclassification error (modeled using Minimum Phone Error(MPE)) corresponds to derivative with respect mar-gin term of hinge loss MaximumMutual Information (MMI)), this article subsumes and extendsmargin-based MPE MMI within a broader framework inwhich objective function is an integral over arange margin values. Applying Fundamental Theorem ofCalculus,thisintegraliseasilyevaluatedusingfinitedifferencesof functionals; lattice-based training new crite-rion can then be carried out differences gradi-ents. Experimental results comparing withmargin-based MMI, MCE on Corpus Sponta-neous Japanese MIT OpenCourseWare/MIT-World cor-pus are presented. 1. Introduction The field discriminative for speech recognition haswitnessed considerable activity in recent years. appeal ofminimizingphoneorworderrorratherthanstringerrorhasmo-tivated transition from well-known string-level methods suchas [1][2] error-weighted approaches, such asMPE [3][4]. More recently, there has been surge proposalsfor“largemargin”approachestohiddenMarkovmodel(HMM)design, as “large-margin HMM” [5], “soft es-timation” [6], incrementally shifted [7]. Sha andSaul [8] made important proposal fine-grained er-ror measure, Hamming distance between candidaterecognition strings, itself directly incorporated into HMM-based learning. It turns introducinga multiplies easily bebrought based HMM well,simply by adding margin-scaled local frame/phone/word errorto lattice arc log-likelihoods during Forward-Backward com-putation [9][10][11]. This approach links original use ofmargin context machine learning (e.g. Support VectorMachines (SVMs)) “tried-and-tested” frameworks large-scale withwell-understood optimization large-scaleASR tasks. Benefits performance tasks havebeen reported MPE, thoughit appears relative gains larger than MPE[10][11].Aiming at leveraging benefits thecontextofMPE-styleerror-weightedHMMtraining,thisarticlepresents unification trainingbased novel concept:

参考文章(13)
George Saon, Daniel Povey, Penalty function maximization for large margin HMM training. conference of the international speech communication association. pp. 920- 923 ,(2008)
Erik McDermott, Atsushi Nakamura, String and Lattice based Discriminative Training for the Corpus of Spontaneous Japanese Lecture Transcription Task conference of the international speech communication association. pp. 2081- 2084 ,(2007)
Georg Heigold, Thomas Deselaers, Ralf Schlüter, Hermann Ney, Modified MMI/MPE Proceedings of the 25th international conference on Machine learning - ICML '08. pp. 384- 391 ,(2008) , 10.1145/1390156.1390205
Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang, A study on soft margin estimation for LVCSR ieee automatic speech recognition and understanding workshop. pp. 268- 271 ,(2007) , 10.1109/ASRU.2007.4430122
Daniel Povey, Dimitri Kanevsky, Brian Kingsbury, Bhuvana Ramabhadran, George Saon, Karthik Visweswariah, Boosted MMI for model and feature-space discriminative training international conference on acoustics, speech, and signal processing. pp. 4057- 4060 ,(2008) , 10.1109/ICASSP.2008.4518545
Atsushi Nakamura, Erik McDermott, Shinji Watanabe, Shigeru Katagiri, A unified view for discriminative objective functions based on negative exponential of difference measure between strings international conference on acoustics, speech, and signal processing. pp. 1633- 1636 ,(2009) , 10.1109/ICASSP.2009.4959913
Xinwei Li, Hui Jiang, Chaojun Liu, Large margin HMMs for speech recognition international conference on acoustics, speech, and signal processing. ,vol. 5, pp. 513- 516 ,(2005) , 10.1109/ICASSP.2005.1416353
Fei Sha, Lawrence K. Saul, Large Margin Hidden Markov Models for Automatic Speech Recognition neural information processing systems. ,vol. 19, pp. 1249- 1256 ,(2006)
D. Povey, P.C. Woodland, Minimum Phone Error and I-smoothing for improved discriminative training international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 105- 108 ,(2002) , 10.1109/ICASSP.2002.5743665
Ralf Schlüter, Hermann Ney, Lars Haferkamp, Wolfgang Macherey, Investigations on Error Minimizing Training Criteria for Discriminative Training in Automatic Speech Recognition conference of the international speech communication association. pp. 2133- 2136 ,(2005)