作者: James R. Glass , Jane W. Chang
DOI:
关键词:
摘要: Recently, we have developed a probabilistic framework for segmentbased speech recognition that represents the signal as network of segments and associated feature vectors [2]. Although in general, each path through does not traverse all segments, argued must account network. We then demonstrated an efficient search algorithm uses single additional model to are traversed. In this paper, present two new extensions our framework. First, replace acoustic segmentation with “segmentation by recognition,” can combine multiple contextual constraints towards hypothesizing only most likely segments. Second, generalize “near-miss modeling” describe efficiently use models enforce across report experiments phonetic on TIMIT corpus which achieve diphone context-dependent error rate 26.6% NIST core test set over 39 classes. This is 12.8% reduction from best previously reported result.