A neural architecture for computing acoustic-phonetic invariants

作者: E. Tsiang

DOI: 10.1109/ICASSP.1998.675463

关键词: Training setFeature extractionContext (language use)GeneralizationComputer scienceRepresentation (mathematics)Net (mathematics)Artificial intelligenceWavelet transformPattern recognitionInvariant (mathematics)Set (abstract data type)

摘要: The proposed neural architecture consists of an analytic lower net, and a synthetic upper net. This paper focuses on the net performs 2D multiresolution wavelet decomposition initial spectral representation to yield multichannel local frequency modulations at multiple scales. From this representation, synthesizes increasingly complex features, resulting in set acoustic observables top layer with multiscale context dependence. also provides for invariance under shifts, dilatations tone intervals time intervals, by building these transformations into architecture. Application recognition gross fine phonetic categories from continuous speech diverse speakers shows that it high accuracy strong generalization modest amounts training data.

参考文章(12)
C. E. Schreiner, J. R. Mendelson, Functional topography of cat primary auditory cortex: distribution of integrated excitation. Journal of Neurophysiology. ,vol. 64, pp. 1442- 1459 ,(1990) , 10.1152/JN.1990.64.5.1442
Etienne Barnard, David Casasent, Shift invariance and the neocognitron Neural Networks. ,vol. 3, pp. 403- 410 ,(1990) , 10.1016/0893-6080(90)90023-E
John Laver, Principles of phonetics ,(1994)
Steve Young, A review of large-vocabulary continuous-speech IEEE Signal Processing Magazine. ,vol. 13, pp. 45- ,(1996) , 10.1109/79.536824
Kevin J. Lang, Alex H. Waibel, Geoffrey E. Hinton, A time-delay neural network architecture for isolated word recognition Neural Networks. ,vol. 3, pp. 23- 43 ,(1990) , 10.1016/0893-6080(90)90044-L
A.J. Robinson, An application of recurrent nets to phone probability estimation IEEE Transactions on Neural Networks. ,vol. 5, pp. 298- 305 ,(1994) , 10.1109/72.279192
E.Y.L. Tsiang, Multiresolution elementary tonotopic features for speech perception Proceedings of International Conference on Neural Networks (ICNN'97). ,vol. 1, pp. 575- 579 ,(1997) , 10.1109/ICNN.1997.611733
N. Morgan, H. Bourlard, Continuous speech recognition IEEE Signal Processing Magazine. ,vol. 12, pp. 24- 42 ,(1995) , 10.1109/79.382443