作者: Lara Lynn Stoll
DOI:
关键词: Distance measures 、 False alarm 、 Population 、 Speaker diarisation 、 Speaker recognition 、 Formant 、 Precision and recall 、 Support vector machine 、 Speech recognition 、 Computer science
摘要: The task of automatic speaker recognition, wherein a system verifies or determines speaker's identity using sample speech, has been studied for few decades. In that time, great deal progress made in improving the accuracy system's decisions, through use more successful machine learning algorithms, and application channel compensation techniques other methodologies aimed at addressing sources errors such as noise data mismatch. general, can be expected to have one causes, involving both intrinsic extrinsic factors. Extrinsic factors correspond external influences, including reverberation, noise, microphone effects. Intrinsic relate inherently himself, include sex, age, dialect, accent, emotion, speaking style, voice characteristics. This dissertation focuses on relatively unexplored issue dependence particular, I investigate phenomenon some speakers within given population tendency cause large proportion errors, explore ways finding speakers.There are two main components this thesis. first, establish performance speakers, building upon expanding previous work demonstrating existence with tendencies false alarm rejection errors. To end, different sets: is an older collection telephone conversational recent speech recorded variety channels, telephone, well various types microphones. Furthermore, addition considering traditional recognition approach, second set utilize outputs contemporary approach better able handle variations channel. results analysis repeatedly show behavior across true impostor cases. Variation occurs level utterances, depend which his utterances used, level, overall Additionally, lamb-ish (where tends produce alarms target) correlated wolf-ish impostor). On set, 50% caused by only 15-25% speakers.The component thesis investigates straightforward predict will difficult correctly recognize. features calculate feature statistics then used compute measure similarity between pairs. By ranking these measures pairs, determine those pairs easy distinguish difficult-to-distinguish. A simple distance could successfully select easy- difficult-to-distinguish evaluated differences detection cost probability number systems. Of tested, best feature-measure most least was Euclidean vectors mean second, third formant frequencies. Even greater success attained Kullback-Liebler (KL) divergence speaker-specific GMMs. examination smallest biggest distances (as computed KL divergence) revealed individual consistently fall among (or least) pairs.I develop who system, calculated over regions speech. support vector (SVM) classifier trained examples, order difficulty target impostor. resulting precision recall were 0.8 detection, 0.7 detection. Depending application, threshold tuned improve precision, recall, specificity suit needs particular task. same taken single conversation sides, sides corresponding speaker, since input any samples.