Discriminative Input Stream Combination for Conditional Random Field Phone Recognition

作者: I. Heintz , E. Fosler-Lussier , C. Brew

DOI: 10.1109/TASL.2009.2022204

关键词:

摘要: In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification recognition tasks by combining non-Gaussian distributed representations of acoustic input. previous work I. Heintz (latent phonetic analysis: Use singular value decomposition determine features for CRF recognition, Proc. ICASSP, pp. 4541-4544, 2008), experimented with phonological feature posterior estimators within a framework; treating estimates as terms in ldquophoneme information retrievalrdquo task allowed more effective use multiple streams than directly feeding these the recognizer. this paper, examine some design choices our work, extend results up six streams. We concentrate on design, rather selection, find best way introduction into log-linear model. improve upon several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed nonlinear transform provided multilayer perceptron, provides significant gain accuracy TIMIT task.

参考文章(47)
Hynek Hermansky, S. R. Mahadeva Prasanna, MRASTA and PLP in automatic speech recognition. conference of the international speech communication association. pp. 1166- 1169 ,(2007)
Eric Fosler-Lussier, Prateeti Mohapatra, Investigations into phonological attribute classifier representations for CRF phone recognition. conference of the international speech communication association. pp. 2558- 2561 ,(2008)
Eric Fosler-Lussier, Jeremy Morris, Combining phonetic attributes using conditional random fields conference of the international speech communication association. ,(2006)
S. Weerakone, P.J. Turner, Speech recognition software. Dental update. ,vol. 28, pp. 450- 456 ,(2001) , 10.12968/DENU.2001.28.9.450
Eric Fosler-Lussier, Yu Wang, Integrating phonetic boundary discrimination explicitly into HMM systems. conference of the international speech communication association. ,(2006)
Nelson Morgan, Naghmeh Nikki Mirghafori, A multiband approach to automatic speech recognition A multiband approach to automatic speech recognition. pp. 153- 153 ,(1998)
Andreas Stolcke, Nelson Morgan, Qifeng Zhu, Barry Y. Chen, On using MLP features in LVCSR. conference of the international speech communication association. ,(2004)
Raymond E. Slyh, Brian M. Ore, Score fusion for articulatory feature detection. conference of the international speech communication association. pp. 1845- 1848 ,(2007)
Ralf Schlüter, Hermann Ney, András Zolnay, Feature combination using linear discriminant analysis and its pitfalls. conference of the international speech communication association. ,(2006)