Evaluation of the Vulnerability of Speaker Verification to Synthetic Speech

作者: Junichi Yamagishi , P.L. De Leon , M. Pucher

DOI:

关键词:

摘要: In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although problem was first examined over decade ago, dramatic improvements in both SV and speech synthesis have renewed interest problem. We use HMM-based synthesizer, which creates for targeted through adaptation background model GMM-UBM-based system. Using 283 speakers from Wall-Street Journal (WSJ) corpus, our has 0.4% EER. When is tested with generated models derived WSJ journal 90% matched claims are accepted. This result suggests possible systems order detect prior recognition, investigate an automatic recognizer (ASR), dynamic-timewarping (DTW) distance mel-frequency cepstral coefficients (MFCC), previously-proposed average inter-frame difference log-likelihood (IFDLL). Overall, while impressive accuracy, even proposed detector, high-quality can lead unacceptably high acceptance rate speakers.

参考文章(33)
Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, Takafumi Hitotsumatsu, On the security of HMM-based speaker verification systems against imposture using synthetic speech. conference of the international speech communication association. ,(1999)
Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, Takayuki Satoh, A robust speaker verification system against imposture using an HMM-based speech synthesis system. conference of the international speech communication association. pp. 759- 762 ,(2001)
Takashi Masuko, Keiichi Tokuda, Takao Kobayashi, Imposture using synthetic speech against speaker verification based on spectrum and pitch. conference of the international speech communication association. pp. 302- 305 ,(2000)
Mikko Kurimo, Bela Usabaev, Keiichiro Oura, Junichi Yamagishi, Reima Karhila, Keiichi Tokuda, Jilei Tian, Oliver Watts, John Dines, Simon King, Yong Guan, Rile Hu, Thousands of voices for HMM-based speech synthesis conference of the international speech communication association. pp. 420- 423 ,(2009)
Junichi Yamagishi, Zhenhua Ling, Simon King, Robustness of HMM-based Speech Synthesis conference of the international speech communication association. pp. 581- 584 ,(2008)
T. Anastasakos, J. McDonough, R. Schwartz, J. Makhoul, A compact model for speaker-adaptive training international conference on spoken language processing. ,vol. 2, pp. 1137- 1140 ,(1996) , 10.1109/ICSLP.1996.607807
Koichi Shinoda, Takao Watanabe, MDL-based context-dependent subword modeling for speech recognition. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN (E). ,vol. 21, pp. 79- 86 ,(2000) , 10.1250/AST.21.79
C. Longworth, M. J. F. Gales, Combining Derivative and Parametric Kernels for Speaker Verification IEEE Transactions on Audio, Speech, and Language Processing. ,vol. 17, pp. 748- 757 ,(2009) , 10.1109/TASL.2008.2012193
Phillip L. De Leon, Vijendra Raj Apsingekar, Michael Pucher, Junichi Yamagishi, Revisiting the security of speaker verification systems against imposture using synthetic speech 2010 IEEE International Conference on Acoustics, Speech and Signal Processing. pp. 1798- 1801 ,(2010) , 10.1109/ICASSP.2010.5495413