Multilingual Speech-to-Speech Translation System: VoiceTra

作者: Shigeki Matsuda , Xinhui Hu , Yoshinori Shiga , Hideki Kashioka , Chiori Hori

DOI: 10.1109/MDM.2013.99

关键词: Computer scienceSpeech technologyLanguage translationArtificial intelligenceSpeech recognitionSpeech synthesisNatural language processingChinese speech synthesisLanguage modelVoxForgeSpeech corpusSpeech analytics

摘要: This study presents an overview of VoiceTra, which was developed by NICT and released as the world's first network-based multilingual speech-to-speech translation system for smartphones, describes in detail its speech recognition, translation, synthesis regards to field experiments. We show effects updates using data collected from experiments improve our acoustic language models.

参考文章(15)
Toshiyuki Takezawa, Gen-ichiro Kikui, Seiichi Yamamoto, Multilingual corpora for speech-to-speech translation research. conference of the international speech communication association. ,(2004)
Frank K Soong, Wai-Kit Lo, Satoshi Nakamura, Generalized word posterior probability (GWPP) for measuring reliability of recognized words Proc. SWIM 2004. ,(2004)
Takatoshi Jitsuhiro, Tomoko Matsui, Satoshi Nakamura, Automatic Generation of Non-uniform HMM Topologies Based on the MDL Criterion IEICE Transactions on Information and Systems. ,vol. 87, pp. 2121- 2129 ,(2004)
Wolfgang Wahlster, None, Verbmobil : foundations of speech-to-speech translation Springer Berlin Heidelberg. ,(2000) , 10.1007/978-3-662-04230-4
Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, Tadashi Kitamura, Takayoshi Yoshimura, Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis conference of the international speech communication association. pp. 2347- 2350 ,(1999)
H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited international conference on acoustics, speech, and signal processing. ,vol. 2, pp. 1303- 1306 ,(1997) , 10.1109/ICASSP.1997.596185
Masakiyo Fujimoto, Satoshi Nakamura, A Non-stationary Noise Suppression Method Based on Particle Filtering and Polyak Averaging The IEICE transactions on information and systems. ,vol. 89, pp. 922- 930 ,(2006) , 10.1093/IETISY/E89-D.3.922
Hirofumi Yamamoto, Shuntaro Isogai, Yoshinori Sagisaka, Multi-class composite N-gram language model Speech Communication. ,vol. 41, pp. 369- 379 ,(2003) , 10.1016/S0167-6393(02)00179-6
J.-L. Gauvain, Chin-Hui Lee, Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains IEEE Transactions on Speech and Audio Processing. ,vol. 2, pp. 291- 298 ,(1994) , 10.1109/89.279278