Analysis and Recognition of NAM Speech Using HMM Distances and Visual Information

作者: Panikos Heracleous , Viet-Anh Tran , Takayuki Nagai , Kiyohiro Shikano

DOI: 10.1109/TASL.2009.2037398

关键词:

摘要: Non-audible murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with use of special acoustic sensors (i.e., NAM microphones) attached behind talker's ear. The authors had previously reported experimental results for recognition using a stethoscopic and silicon microphone. Using small amount training data from single speaker adaptation approaches, 93.9% word accuracy was achieved 20 k Japanese vocabulary dictation task. In this paper, further analysis made distance measures between hidden Markov models (HMMs). It has been shown owing to reduced spectral space speech, HMM distances are also when compared those normal speech. case vowels fricatives, in follow same relative inter-phoneme relationship as without significant differences. However, differences have found plosives. More specifically, voiced/unvoiced consonant pairs articulated place drastically decreased. As result, normal-speech changed significantly, causing substantial decrease accuracy. A speaker-dependent phoneme experiment conducted, obtained 81.5% correct, showing microphone, transmission loss lip radiation act low-pass filter. higher frequency components attenuated signal. Because reduction, NAM's nature, type articulation, sounds become similar, larger number confusions Yet many visually different on face/mouth/lips, integration visual information increases their discrimination. well. article, extracted talkers' facial movements fused reveal improvement 10.5% average were used only

参考文章(35)
Sadaoki Furui, Koji Iwano, Masanobu Nakamura, Analysis of spectral space reduction in spontaneous speech and its effects on speech recognition performances. conference of the international speech communication association. pp. 3381- 3384 ,(2005)
Hideki Kashioka, Yoshitaka Nakajima, Kiyohiro Shikano, Nick Campbell, Non-Audible Murmur Recognition conference of the international speech communication association. ,(2003)
Herbert Reininger, Dietrich Wolf, Markus Falkhausen, Calculation of distance measures between hidden Markov models. conference of the international speech communication association. ,(1995)
Rita Singh, Richard M. Stern, Bhiksha Raj, Structured redefinition of sound units by merging and splitting for improved speech recognition. conference of the international speech communication association. pp. 151- 154 ,(2000)
Alex Waibel, Szu-Chen Stan Jou, Tanja Schultz, Florian Kraft, Matthias Walliczek, Sub-Word Unit based Non-audible Speech Recognition using Surface Electromyography conference of the international speech communication association. ,(2006)
Hideki Kashioka, Yoshitaka Nakajima, Kiyohiro Shikano, Nick Campbell, Remodeling of the sensor for non-audible murmur (NAM) conference of the international speech communication association. pp. 389- 392 ,(2005)
Lionel Reveret, Christian Benoit, A New 3D Lip Model for Analysis and Synthesis of Lip Motion in Speech Production ESCA Workshop on Audio-Visual Speech Processing, AVSP'98. pp. 207- 212 ,(1998)
Marcus E. Hennecke, David G. Stork, K. Venkatesh Prasad, Visionary Speech: Looking Ahead to Practical Speechreading Systems Springer Berlin Heidelberg. pp. 331- 349 ,(1996) , 10.1007/978-3-662-13015-5_25
A. Adjoudani, C. Benoît, On the Integration of Auditory and Visual Parameters in an HMM-based ASR Springer, Berlin, Heidelberg. pp. 461- 471 ,(1996) , 10.1007/978-3-662-13015-5_35
Tomoki Toda, Kiyohiro Shikano, NAM-to-Speech Conversion with Gaussian Mixture Models conference of the international speech communication association. pp. 1957- 1960 ,(2005)