Automatic versus human speaker verification: The case of voice mimicry

作者: Rosa González Hautamäki , Tomi Kinnunen , Ville Hautamäki , Anne-Maria Laukkanen

DOI: 10.1016/J.SPECOM.2015.05.002

关键词:

摘要: In this work, we compare the performance of three modern speaker verification systems and non-expert human listeners in presence voice mimicry. Our goal is to gain insights on how vulnerable are mimicry attack it listeners. We study both traditional Gaussian mixture model-universal background model (GMM-UBM) an i-vector based classifier with cosine scoring probabilistic linear discriminant analysis (PLDA) scoring. For studied material Finnish language, decreased lightly equal error rate (EER) for GMM-UBM from 10.83 10.31, while EER increased 6.80 13.76 4.36 7.38. The listening panel shows that imitated speech increases difficulty task. It even more difficult recognize a person who intentionally concealing his or her identity. Impersonator A, average listener made 8 errors 34 trials automatic had 6 same set. B 7 28 trials, 9 errors. A statistical was also conducted. found out statistically significant association, p ¼ 0:00019 R 2 0:59, between accuracy self reported factors only when familiar voices were present test.

参考文章(50)
Ville Hautamäki, Tomi Kinnunen, Mohaddeseh Nosratighods, Bin Ma, Haizhou Li, Kong-Aik Lee, Approaching human listener accuracy with modern speaker verification. conference of the international speech communication association. pp. 1473- 1476 ,(2010)
Matthieu Hébert, Text-Dependent Speaker Recognition Springer, Berlin, Heidelberg. pp. 743- 762 ,(2008) , 10.1007/978-3-540-49127-9_37
Jesús Villalba, Eduardo Lleida, Detecting replay attacks from far-field recordings on speaker verification systems BioID'11 Proceedings of the COST 2101 European conference on Biometrics and ID management. pp. 274- 285 ,(2011) , 10.1007/978-3-642-19530-3_25
Philipos C. Loizou, Speech Enhancement: Theory and Practice ,(2007)
Yee W. Lau, Dat Tran, Michael Wagner, Testing Voice Mimicry with the YOHO Speaker Verification Corpus Lecture Notes in Computer Science. pp. 15- 21 ,(2005) , 10.1007/11554028_3
Mark A. Przybocki, Douglas A. Reynolds, Alvin F. Martin, George R. Doddington, Walter Liggett, Sheep, Goats, Lambs and Wolves: A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation conference of the international speech communication association. ,(1998)
R. Saeidi, Kong-Aik Lee, T. Kinnunen, Tawfik Hasan, Benoit Fauve, P-M Bousquet, Elie Khoury, PL Sordo Martinez, Jia Min Karen Kua, CH You, H. Sun, Anthony Larcher, P. Rajan, Ville Hautamäki, C. Hanilci, Billy Braithwaite, R Gonzales-Hautamäki, S. O. Sadjadi, Gang Liu, Hynek Boril, N. Shokouhi, Driss Matrouf, Laurent El Shafey, Pejman Mowlaee, Julien Epps, Tharmarajah Thiruvaran, David A van Leeuwen, Bin Ma, Haizhou Li, John HL Hansen, J-F Bonastre, Sébastien Marcel, John Mason, Eliathamby Ambikairajah, I4U submission to NIST SRE 2012: A large-scale collaborative effort for noise-robust speaker verification conference of the international speech communication association. pp. 1986- 1990 ,(2013)
Yee Wah Lau, M. Wagner, Dat Tran, Vulnerability of speaker verification to voice mimicking international symposium on intelligent multimedia video and speech processing. pp. 145- 148 ,(2004) , 10.1109/ISIMP.2004.1434021
Mark Ordowski, Mark A. Przybocki, Alvin F. Martin, George R. Doddington, Terri Kamm, The DET Curve in Assessment of Detection Task Performance conference of the international speech communication association. ,(1997)