ACOUSTIC DATA-DRIVEN GRAPHEME-TO-PHONEME CONVERSION IN THE PROBABILISTIC LEXICAL MODELING FRAMEWORK

作者: Marzieh Razavi , Ramya Rasipuram , Mathew Magimai.-Doss

DOI: 10.1016/J.SPECOM.2016.03.003

关键词:

摘要: One of the primary steps in building automatic speech recognition (ASR) and text-to-speech systems is development a phonemic lexicon that provides mapping between each word its pronunciation as sequence phonemes. Phoneme lexicons can be developed by humans through use linguistic knowledge, however, this would costly time-consuming task. To facilitate process, grapheme-to phoneme conversion (G2P) techniques are used which, given an initial lexicon, relationship graphemes phonemes learned data-driven methods. This article presents novel G2P formalism which learns grapheme-to-phoneme acoustic data potentially relaxes need for target language. The involves training part followed inference part. In part, captured probabilistic lexical modeling framework. framework, hidden Markov model (HMM) trained HMM state representing grapheme parameterized categorical distribution Then orthographic transcription HMM, most probable inferred. article, we show recently proposed approach Kullback Leibler divergence-based (KL-HMM) framework particular case formalism. We then benchmark against two popular approaches, namely joint multigram decision tree-based approach. Our experimental studies on English French despite relatively poor performance at level, not significantly different than state-of-the-art methods ASR level. (C) 2016 Elsevier B.V. All rights reserved.

参考文章(54)
Ramya Rasipuram, Mathew Magimai.-Doss, Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation conference of the international speech communication association. pp. 1820- 1823 ,(2012)
Stanley F. Chen, Conditional and joint models for grapheme-to-phoneme conversion. conference of the international speech communication association. ,(2003)
Paul Taylor, Hidden Markov models for grapheme to phoneme conversion. conference of the international speech communication association. pp. 1973- 1976 ,(2005)
Frédéric Bimbot, Sabine Deligne, François Yvon, Variable-length sequence matching for phonetic transcription using joint multigrams. conference of the international speech communication association. ,(1995)
Ramya Rasipuram, Hervé Bourlard, Mathew Magimai.-Doss, Guillermo Aradilla, Grapheme-based Automatic Speech Recognition using KL-HMM conference of the international speech communication association. pp. 445- 448 ,(2011)
Mathew Magimai-Doss, Marzieh Razavi, On Recognition of Non-Native Speech Using Probabilistic Lexical Model conference of the international speech communication association. pp. 26- 30 ,(2014)
Michael Harris Cohen, Lotfi A. Zadeh, Phonological structures for speech recognition University of California, Berkeley. ,(1989)
Martin Kay, Ronald M. Kaplan, Regular models of phonological rule systems Computational Linguistics. ,vol. 20, pp. 331- 378 ,(1994) , 10.5555/204915.204917
Hervé Bourlard, Mathew Magimai.-Doss, Guillermo Aradilla, Using KL-based Acoustic Models in a Large Vocabulary Recognition Task conference of the international speech communication association. pp. 928- 931 ,(2008)
Petr Motlicek, David Imseng, Hervé Bourlard, Philip N. Garner, John Dines, Comparing different acoustic modeling techniques for multilingual boosting conference of the international speech communication association. pp. 1191- 1194 ,(2012)