作者: Panikos Heracleous , Viet-Anh Tran , Takayuki Nagai , Kiyohiro Shikano
DOI: 10.1109/TASL.2009.2037398
关键词:
摘要: Non-audible murmur (NAM) is an unvoiced speech signal that can be received through the body tissue with use of special acoustic sensors (i.e., NAM microphones) attached behind talker's ear. The authors had previously reported experimental results for recognition using a stethoscopic and silicon microphone. Using small amount training data from single speaker adaptation approaches, 93.9% word accuracy was achieved 20 k Japanese vocabulary dictation task. In this paper, further analysis made distance measures between hidden Markov models (HMMs). It has been shown owing to reduced spectral space speech, HMM distances are also when compared those normal speech. case vowels fricatives, in follow same relative inter-phoneme relationship as without significant differences. However, differences have found plosives. More specifically, voiced/unvoiced consonant pairs articulated place drastically decreased. As result, normal-speech changed significantly, causing substantial decrease accuracy. A speaker-dependent phoneme experiment conducted, obtained 81.5% correct, showing microphone, transmission loss lip radiation act low-pass filter. higher frequency components attenuated signal. Because reduction, NAM's nature, type articulation, sounds become similar, larger number confusions Yet many visually different on face/mouth/lips, integration visual information increases their discrimination. well. article, extracted talkers' facial movements fused reveal improvement 10.5% average were used only