作者: Jungsuk Kim , Ian Richard Lane
DOI:
关键词: Audio mining 、 Acoustic model 、 Speech recognition 、 Client–server model 、 Multi-user 、 Speech analytics 、 Scalability 、 Voice activity detection 、 Language model 、 Real-time computing 、 Computer science
摘要: Disclosed herein is a GPU-accelerated speech recognition engine optimized for faster than real time on scalable server-client heterogeneous CPU-GPU architecture, which specifically to simultaneously decode multiple users in real-time. In order efficiently support real-time users, “producer/consumer” design pattern applied decouple processes that run at different rates handle the same time. Furthermore, process divided into consumers maximize hardware utilization. As result, platform architecture able more 45 audio streams with an average latency of less 0.3 seconds using one-million-word vocabulary language models.