作者: Andreas Stolcke
DOI:
关键词:
摘要: Language modeling, especially for spontaneous speech, often suffers from a mismatch of utterance segmentations between training and test conditions. In particular, uses linguistically-based segments, whereas testing occurs on acoustically determined resulting in degraded performance. We present an N-best rescoring algorithm that removes the effect segmentation mismatch. Furthermore, we show explicit language modeling hidden linguistic segment boundaries is improved by including turn-boundary events model. 1. THE SEGMENTATION PROBLEM IN LANGUAGE MODELING One problems encountered speech recognition continuous, long waveforms. Because current recognizers prefer short waveform segments best performance to limit computational resources, conversation-length waveforms are typically pre-segmented using simple acoustic criteria, such as locations pauses turn switches. This creates several modeling: The used (including its parameters) influences statistics embodied model (LM), creating potential set. Strictly speaking, one would have resegment data, recreate word-level transcriptions, retrain every time process modified. yields units not linguistically coherent, hence sub-optimal modeling. research [10] shows N-gram LMs based complete give lower perplexity than those only segmentations. work reported [12] showed word error rate can be reduced simply resegmenting at same segmentation. Explicit phenomena disfluencies also requires (as opposed acoustic) [15]. Similarly, sophisticated syntactic structure assume sentences their input [12]. following excerpt Switchboard corpus [2] illustrates discrepancies Linguistic marked , indicated //. A subset corresponds boundaries, . B: Worried they’re going get enough attention?