作者: Harish Arsikere
DOI:
关键词:
摘要: Author(s): Arsikere, Harish | Advisor(s): Alwan, Abeer Abstract: The subglottal system comprises the trachea, bronchi and their accompanying airways. Its configuration changes very little compared to that of supraglottal vocal tract, as a result which its acoustic properties are relatively more stationary speaker specific. In this dissertation, our knowledge acoustics - resonances (SGRs), most importantly is leveraged develop novel solutions three problems involve using or estimating speaker-specific characteristics: (1) body height estimation, (2) normalization for automatic speech recognition (ASR), (3) identification (SID) verification (SV). focus on scenarios where purely statistical methods may be sub-optimal owing limited and/or noisy data.Simultaneous recordings collected (using microphone an accelerometer, respectively) from native American English speakers (50 adults 43 children) 6 adult bilingual Mexican Spanish (first language) English. data analyzed understand relationships between SGRs, vocal-tract (formants), language. Results indicate phonological vowel features (tongue backness) can characterized via measures formants SGRs correlate well with height, practically independent language phonetic content. Based these findings, algorithms developed estimation signals (i.e., without accelerometer information). found effective both children, in quiet environments; performance equally good English/Spanish speakers, does not degrade much data.Predictive models (in conjunction SGR algorithms) used approach speech-based speakers. method comparable existing data-driven techniques, but requires less training data, offers better generalization, robust noise. context ASR piece-wise linear frequency warping. On digit-recognition task, achieves lower word error rates than conventional length normalization, clean environments. benefit particularly significant young (6-8 years old) short utterances (1 2 words). For SID SV (with adults' speech), algorithm deriving informative (than SGRs) regard discriminability. When combined Mel-frequency cepstral coefficients (conventional SV), provide improvements, especially test (5-10 seconds duration).