On the role of subglottal acoustics in height estimation, and speech and speaker recognition

作者: Harish Arsikere

DOI:

关键词:

摘要: Author(s): Arsikere, Harish | Advisor(s): Alwan, Abeer Abstract: The subglottal system comprises the trachea, bronchi and their accompanying airways. Its configuration changes very little compared to that of supraglottal vocal tract, as a result which its acoustic properties are relatively more stationary speaker specific. In this dissertation, our knowledge acoustics - resonances (SGRs), most importantly is leveraged develop novel solutions three problems involve using or estimating speaker-specific characteristics: (1) body height estimation, (2) normalization for automatic speech recognition (ASR), (3) identification (SID) verification (SV). focus on scenarios where purely statistical methods may be sub-optimal owing limited and/or noisy data.Simultaneous recordings collected (using microphone an accelerometer, respectively) from native American English speakers (50 adults 43 children) 6 adult bilingual Mexican Spanish (first language) English. data analyzed understand relationships between SGRs, vocal-tract (formants), language. Results indicate phonological vowel features (tongue backness) can characterized via measures formants SGRs correlate well with height, practically independent language phonetic content. Based these findings, algorithms developed estimation signals (i.e., without accelerometer information). found effective both children, in quiet environments; performance equally good English/Spanish speakers, does not degrade much data.Predictive models (in conjunction SGR algorithms) used approach speech-based speakers. method comparable existing data-driven techniques, but requires less training data, offers better generalization, robust noise. context ASR piece-wise linear frequency warping. On digit-recognition task, achieves lower word error rates than conventional length normalization, clean environments. benefit particularly significant young (6-8 years old) short utterances (1 2 words). For SID SV (with adults' speech), algorithm deriving informative (than SGRs) regard discriminability. When combined Mel-frequency cepstral coefficients (conventional SV), provide improvements, especially test (5-10 seconds duration).

参考文章(94)
John-Paul Hosom, Khaldoun Shobaki, Ronald A. Cole, The OGI kids² speech corpus and recognizers. conference of the international speech communication association. pp. 258- 261 ,(2000)
Martin J. Russell, Qun Li, Why is automatic recognition of children's speech difficult? conference of the international speech communication association. pp. 2671- 2674 ,(2001)
Steven M. Lulich, Tekla Etelka Gráczi, Zsuzsanna Bárkányi, Tamás Bohm, Tamás Gábor Csapó, Relation of formants and subglottal resonances in Hungarian vowels conference of the international speech communication association. pp. 484- 487 ,(2009)
Evandro B. Gouvêa, Richard M. Stern, Speaker normalization through formant-based warping of the frequency scale. conference of the international speech communication association. ,(1997)
Wolfgang Wokurek, Andreas Madsack, Comparison of manual and automated estimates of subglottal resonances. conference of the international speech communication association. pp. 1671- 1674 ,(2009)
Alvin F. Martin, Craig S. Greenberg, NIST 2008 speaker recognition evaluation: performance across telephone and room microphone channels. conference of the international speech communication association. pp. 2579- 2582 ,(2009)
Sorin Dusan, Estimation of speaker's height and vocal tract length from speech signal. conference of the international speech communication association. pp. 1989- 1992 ,(2005)
Shizhen Wang, Abeer Alwan, Yi-Hui Lee, Bark-shift based nonlinear speaker normalization using the second subglottal resonance. conference of the international speech communication association. pp. 1619- 1622 ,(2009)
Abe Kazemzadeh, Markus Iseli, Patti Price, Elaine Andersen, Abeer Alwan, Xiaodong Cui, Shrikanth S. Narayanan, Hong You, Barbara Jones, TBALL data collection: the making of a young children's speech corpus. conference of the international speech communication association. pp. 1581- 1584 ,(2005)