Multi-speaker/speaker-independent architectures for the multi-state time delay neural network

作者: H. Hild , A. Waibel

DOI: 10.1109/ICASSP.1993.319284

关键词: Pattern recognitionNetwork architectureEntropy (information theory)Artificial intelligenceSpeech recognitionHidden Markov modelTime delay neural networkMulti stateArtificial neural networkUnsupervised learningComputer science

摘要: The authors present an improved multistate time delay neural network (MS-TDNN) for speaker-independent, connected letter recognition which outperforms HMM (hidden Markov model) based system (SPHINX) and previous MS-TDNNs. They also explore new architectures with internal speaker models. Four different characterized by increasing number of speaker-specific parameters are introduced. can be adjusted automatic identification or adaptation, allowing tuning-in to a speaker. Both methods lead significant improvements over the straightforward speaker-independent architecture. Even unsupervised (speech is unlabeled) works well. >

参考文章(9)
M.Y. Hwang, X. Huang, Subphonetic modeling with Markov states-Senone international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 33- 36 ,(1992) , 10.1109/ICASSP.1992.225979
O. Schmidbauer, J. Tebelskis, An LVQ based reference model for speaker-adaptive speech recognition [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. ,vol. 1, pp. 441- 444 ,(1992) , 10.1109/ICASSP.1992.225877
J.B. Hampshire, A.H. Waibel, The Meta-Pi network: connectionist rapid adaptation for high-performance multi-speaker phoneme recognition international conference on acoustics, speech, and signal processing. pp. 165- 168 ,(1990) , 10.1109/ICASSP.1990.115564
A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K.J. Lang, Phoneme recognition using time-delay neural networks IEEE Transactions on Acoustics, Speech, and Signal Processing. ,vol. 37, pp. 393- 404 ,(1989) , 10.1109/29.21701
Hermann Hild, Alex Waibel, Connected Letter Recognition with a Multi-State Time Delay Neural Network neural information processing systems. ,vol. 5, pp. 712- 719 ,(1992)
P. Haffner, M. Franzini, A. Waibel, Integrating time alignment and neural networks for high performance continuous speech recognition [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing. pp. 105- 108 ,(1991) , 10.1109/ICASSP.1991.150289
John S. Bridle, Stephen J. Cox, RecNorm: Simultaneous Normalisation and Classification applied to Speech Recognition neural information processing systems. ,vol. 3, pp. 234- 240 ,(1990)
J.B. Hampshire, A.H. Waibel, A novel objective function for improved phoneme recognition using time-delay neural networks IEEE Transactions on Neural Networks. ,vol. 1, pp. 216- 228 ,(1990) , 10.1109/72.80233
A. Waibel, H. Sawai, K. Shikano, Consonant recognition by modular construction of large phonemic time-delay neural networks International Conference on Acoustics, Speech, and Signal Processing. pp. 405- 408 ,(1989) , 10.1109/ICASSP.1989.266376