作者: Leonard Henry Grokop , Vidya Narayanan
DOI:
关键词:
摘要: Techniques are provided to recognize a speaker's voice. In one embodiment, received audio data may be separated into plurality of signals. For each signal, the signal associated with value/s for or more features (e.g., Mel-Frequency Cepstral coefficients). The clustered by clustering signals). A predominate voice cluster identified and user. speech model Gaussian Mixture Model Hidden Markov Model) trained based on cluster. then processed using to, e.g.,: determine who was speaking; whether user determining anyone and/or what words were said. context device inferred at least partly signal.