作者: Andrew Senior , Ignacio Lopez-Moreno
DOI: 10.1109/ICASSP.2014.6853591
关键词: Speaker recognition 、 Artificial neural network 、 Artificial intelligence 、 Speaker diarisation 、 Normalization (statistics) 、 Backpropagation 、 Speech recognition 、 I vector 、 Pattern recognition 、 Computer science
摘要: We propose providing additional utterance-level features as inputs to a deep neural network (DNN) facilitate speaker, channel and background normalization. Modifications of the basic algorithm are developed which result in significant reductions word error rates (WERs). The algorithms shown combine well with speaker adaptation by backpropagation, resulting 9% relative WER reduction. address implementation for streaming task.