Learning speech models for mobile device users

作者: Leonard Henry Grokop , Vidya Narayanan

DOI:

关键词:

摘要: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into plurality of signals. For each signal, the signal associated with value/s for or more features (e.g., Mel-Frequency Cepstral coefficients). The clustered by clustering signals). A predominate voice cluster identified and user. speech model Gaussian Mixture Model Hidden Markov Model) trained based on cluster. then processed using to, e.g.,: determine who was speaking; whether user determining anyone and/or what words were said. context device inferred at least partly signal.

参考文章(8)
Nissanka Arachchige Bodhi Priyantha, Amy K. Karlson, Alice Jane B. Brush, Hong Lu, Jie Liu, Energy-efficient unobtrusive identification of a speaker ,(2011)
Raquel Tato, Thomas Kemp, Silke Goronzy, Ralf Kompe, Yin Hay Lam, Krzysztof Marasek, Apparatus and method for automatic dissection of segmented audio signals ,(2004)
Chao-Shih Huang, Background learning of speaker voices Journal of the Acoustical Society of America. ,vol. 122, pp. 33- ,(2002) , 10.1121/1.2756508
Raquel Tato, Silke Goronzy, Ralf Kompe, Yin Lam, Thomas Kemp, Krzysztof Marasek, Apparatus and method for classifying an audio signal ,(2004)
Masayuki Yamada, Yasuhiro Komori, Toshiaki Fukada, Segment set creating method and apparatus ,(2005)
Herbert Gish, William R. Belfield, Self-organizing speech recognition for information extraction ,(2004)