作者: I. Heintz , E. Fosler-Lussier , C. Brew
DOI: 10.1109/TASL.2009.2022204
关键词:
摘要: In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification recognition tasks by combining non-Gaussian distributed representations of acoustic input. previous work I. Heintz (latent phonetic analysis: Use singular value decomposition determine features for CRF recognition, Proc. ICASSP, pp. 4541-4544, 2008), experimented with phonological feature posterior estimators within a framework; treating estimates as terms in ldquophoneme information retrievalrdquo task allowed more effective use multiple streams than directly feeding these the recognizer. this paper, examine some design choices our work, extend results up six streams. We concentrate on design, rather selection, find best way introduction into log-linear model. improve upon several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed nonlinear transform provided multilayer perceptron, provides significant gain accuracy TIMIT task.