System and Method for Multi-User GPU-Accelerated Speech Recognition Engine for Client-Server Architectures

作者: Jungsuk Kim , Ian Richard Lane

DOI:

关键词: Audio miningAcoustic modelSpeech recognitionClient–server modelMulti-userSpeech analyticsScalabilityVoice activity detectionLanguage modelReal-time computingComputer science

摘要: Disclosed herein is a GPU-accelerated speech recognition engine optimized for faster than real time on scalable server-client heterogeneous CPU-GPU architecture, which specifically to simultaneously decode multiple users in real-time. In order efficiently support real-time users, “producer/consumer” design pattern applied decouple processes that run at different rates handle the same time. Furthermore, process divided into consumers maximize hardware utilization. As result, platform architecture able more 45 audio streams with an average latency of less 0.3 seconds using one-million-word vocabulary language models.

参考文章(34)
Kurt Keutzer, Ekaterina Gonina, Jike Chong, Youngmin Yi, A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. conference of the international speech communication association. pp. 1183- 1186 ,(2009)
Miroslav Novak, Pavel Kveton, Accelerating hierarchical acoustic likelihood computation on graphics processors. conference of the international speech communication association. pp. 350- 353 ,(2010)
R.C. Rose, I. Arizmendi, S. Parthasarathy, An efficient framework for robust mobile speech recognition services international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 316- 319 ,(2003) , 10.1109/ICASSP.2003.1198781
Jike Chong, Ian Richard Lane, Senaka Wimal Buthpitiya, None, Utilizing multiple processing units for rapid training of hidden markov models ,(2014)
WeiQi Zhang, Liang He, Yen-Lu Chow, RongZhen Yang, YePing Su, The study on distributed speech recognition system international conference on acoustics, speech, and signal processing. ,vol. 3, pp. 1431- 1434 ,(2000) , 10.1109/ICASSP.2000.861880
K. Hacioglu, B. Pellom, A distributed architecture for robust automatic speech recognition international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 328- 331 ,(2003) , 10.1109/ICASSP.2003.1198784
Kishor Morkhandikar, Pallaki Gururaj, Ian M. Bennett, Bandi Ramesh Babu, Distributed real time speech recognition system ,(2000)
Jungsuk Kim, Jike Chong, Ian Lane, Methods for hybrid gpu/cpu data processing ,(2013)