作者: Martin Wolf
DOI:
关键词: Microphone 、 Speech recognition 、 Signal processing 、 Word error rate 、 Feature extraction 、 Noise (signal processing) 、 Engineering 、 Reverberation 、 Feature vector 、 Speaker recognition
摘要: If speech is acquired by a close-talking microphone in controlled and noise-free environment, current state-of-the-art recognition systems often show an acceptable error rate. The use of microphones, however, may be too restrictive many applications. Alternatively, distant-talking placed several meters far from the speaker, used. Such setup less intrusive, since speaker does not have to wear any microphone, but Automatic Speech Recognition (ASR) performance strongly affected noise reverberation. thesis focused on ASR applications room where reverberation dominant source distortion, considers both single- multi-microphone setups. If recorded parallel microphones arbitrarily located room, degree distortion vary one channel another. difference among signal quality each recording even more evident if those different characteristics: some are hanging walls, others standing table, or build personal communication devices people present room. In scenario like that, system benefit with highest used for recognition. To find such signal, what commonly referred as Channel Selection (CS), techniques been proposed, which discussed detail this thesis. In fact, CS aims rank signals according their perspective. create ranking, measure that either estimates intrinsic given how well it fits acoustic models needed. we provide overview measures presented literature so far, compare them experimentally. Several new introduced, surpass former terms accuracy and/or computational efficiency. A combination also proposed further increase accuracy, reduce load without significant loss. Besides, together other robust techniques, improvements cumulative up extent. An online real-time version selection method based variance sub-band envelopes, was developed thesis, designed implemented smart environment. When evaluated experiments real recordings moving speakers, improvement observed. Another contribution require multiple cooperation colleagues chair Multimedia Communications Signal Processing at University Erlangen-Nuremberg, Erlangen, Germany. It deals problem feature extraction within REMOS (REverberation MOdeling recognition), generic framework framework, conventional methods obtain decorrelated vector coefficients, discrete cosine transform, constrained inner optimization REMOS, become unsolvable reasonable time. frequency filtering avoid problem.