End-to-end text-dependent speaker verification

作者: Georg Heigold , Ignacio Moreno , Samy Bengio , Noam Shazeer

DOI: 10.1109/ICASSP.2016.7472652

关键词:

摘要: In this paper we present a data-driven, integrated approach to speaker verification, which maps test utterance and few reference utterances directly single score for verification jointly optimizes the system's components using same evaluation protocol metric as at time. Such an will result in simple efficient systems, requiring little domain-specific knowledge making model assumptions. We implement idea by formulating problem neural network architecture, including estimation of on only utterances, evaluate it our internal "Ok Google" benchmark text-dependent verification. The proposed appears be very effective big data applications Like ours that require highly accurate, easy-to-maintain systems with small footprint.

参考文章(26)
Shahla Parveen, Phil D. Green, Abdul Qadeer, Speaker recognition with recurrent neural networks. conference of the international speech communication association. pp. 306- 309 ,(2000)
Marcel Kockmann, Themos Stafylakis, Pierre Dumouchel, Pierre Ouellet, Patrick Kenny, Javier Perez, Text-dependent speaker recognition using PLDA with uncertainty propagation conference of the international speech communication association. pp. 3684- 3688 ,(2013)
Rohit Prabhavalkar, Raziel Alvarez, Carolina Parada, Preetum Nakkiran, Tara N. Sainath, Automatic gain control and multi-style training for robust small-footprint keyword spotting with deep neural networks international conference on acoustics, speech, and signal processing. pp. 4704- 4708 ,(2015) , 10.1109/ICASSP.2015.7178863
Johan Schalkwyk, Doug Beeferman, Françoise Beaufays, Bill Byrne, Ciprian Chelba, Mike Cohen, Maryam Kamvar, Brian Strope, “Your Word is my Command”: Google Search by Voice: A Case Study Advances in Speech Recognition. pp. 61- 90 ,(2010) , 10.1007/978-1-4419-5951-5_4
John R. Hershey, Jonathan Le Roux, Felix Weninger, Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures arXiv: Learning. ,(2014)
Ilya Sutskever, Geoffrey E. Hinton, Alex Krizhevsky, Ruslan R. Salakhutdinov, Nitish Srivastava, Improving neural networks by preventing co-adaptation of feature detectors arXiv: Neural and Evolutionary Computing. ,(2012)
Daniel Garcia-Romero, Xiaohui Zhang, Alan McCree, Daniel Povey, Improving speaker recognition performance in the domain adaptation challenge using deep neural networks 2014 IEEE Spoken Language Technology Workshop (SLT). pp. 378- 383 ,(2014) , 10.1109/SLT.2014.7078604
Yun Lei, Nicolas Scheffer, Luciana Ferrer, Mitchell McLaren, A novel scheme for speaker recognition using a phonetically-aware deep neural network international conference on acoustics, speech, and signal processing. pp. 1695- 1699 ,(2014) , 10.1109/ICASSP.2014.6853887
Anthony Larcher, Kong Aik Lee, Bin Ma, Haizhou Li, Phonetically-constrained PLDA modeling for text-dependent speaker verification with multiple short utterances international conference on acoustics, speech, and signal processing. pp. 7673- 7677 ,(2013) , 10.1109/ICASSP.2013.6639156
Douglas A. Reynolds, Thomas F. Quatieri, Robert B. Dunn, Speaker Verification Using Adapted Gaussian Mixture Models Digital Signal Processing. ,vol. 10, pp. 19- 41 ,(2000) , 10.1006/DSPR.1999.0361