作者: Junichi Yamagishi , P.L. De Leon , M. Pucher
DOI:
关键词:
摘要: In this paper, we evaluate the vulnerability of a speaker verification (SV) system to synthetic speech. Although problem was first examined over decade ago, dramatic improvements in both SV and speech synthesis have renewed interest problem. We use HMM-based synthesizer, which creates for targeted through adaptation background model GMM-UBM-based system. Using 283 speakers from Wall-Street Journal (WSJ) corpus, our has 0.4% EER. When is tested with generated models derived WSJ journal 90% matched claims are accepted. This result suggests possible systems order detect prior recognition, investigate an automatic recognizer (ASR), dynamic-timewarping (DTW) distance mel-frequency cepstral coefficients (MFCC), previously-proposed average inter-frame difference log-likelihood (IFDLL). Overall, while impressive accuracy, even proposed detector, high-quality can lead unacceptably high acceptance rate speakers.