Channel selection and reverberation-robust automatic speech recognition

作者: Martin Wolf

DOI:

关键词: MicrophoneSpeech recognitionSignal processingWord error rateFeature extractionNoise (signal processing)EngineeringReverberationFeature vectorSpeaker recognition

摘要: If speech is acquired by a close-talking microphone in controlled and noise-free environment, current state-of-the-art recognition systems often show an acceptable error rate. The use of microphones, however, may be too restrictive many applications. Alternatively, distant-talking placed several meters far from the speaker, used. Such setup less intrusive, since speaker does not have to wear any microphone, but Automatic Speech Recognition (ASR) performance strongly affected noise reverberation. thesis focused on ASR applications room where reverberation dominant source distortion, considers both single- multi-microphone setups. If recorded parallel microphones arbitrarily located room, degree distortion vary one channel another. difference among signal quality each recording even more evident if those different characteristics: some are hanging walls, others standing table, or build personal communication devices people present room. In scenario like that, system benefit with highest used for recognition. To find such signal, what commonly referred as Channel Selection (CS), techniques been proposed, which discussed detail this thesis. In fact, CS aims rank signals according their perspective. create ranking, measure that either estimates intrinsic given how well it fits acoustic models needed. we provide overview measures presented literature so far, compare them experimentally. Several new introduced, surpass former terms accuracy and/or computational efficiency. A combination also proposed further increase accuracy, reduce load without significant loss. Besides, together other robust techniques, improvements cumulative up extent. An online real-time version selection method based variance sub-band envelopes, was developed thesis, designed implemented smart environment. When evaluated experiments real recordings moving speakers, improvement observed. Another contribution require multiple cooperation colleagues chair Multimedia Communications Signal Processing at University Erlangen-Nuremberg, Erlangen, Germany. It deals problem feature extraction within REMOS (REverberation MOdeling recognition), generic framework framework, conventional methods obtain decorrelated vector coefficients, discrete cosine transform, constrained inner optimization REMOS, become unsolvable reasonable time. frequency filtering avoid problem.

参考文章(94)
Kevin Lohde, Rüdiger Hoffmann, Rico Petrick, Matthias Wolff, The harming part of room acoustics in automatic speech recognition. conference of the international speech communication association. pp. 1094- 1097 ,(2007)
Richard M. Stern, Michael L. Seltzer, Microphone array processing for robust speech recognition Carnegie Mellon University. ,(2003)
Christophe Ris, Laurent Couvreur, Christophe Couvreur, A corpus-based approach for robust ASR in reverberant environments. conference of the international speech communication association. pp. 397- 400 ,(2000)
Alberto Abad Gareta, A multi-microphone approach to speech processing in a smart-room environment TDX (Tesis Doctorals en Xarxa). ,(2007)
David Rybach, Christian Gollan, Ralf Schlüter, Björn Hoffmeister, Jonas Lööf, Hermann Ney, Georg Heigold, The RWTH aachen university open source speech recognition system. conference of the international speech communication association. pp. 2111- 2114 ,(2009)
Jacob Benesty, Jingdong Chen, Emanuël A. P. Habets, Multichannel Speech Enhancement with Filters Springer, Berlin, Heidelberg. pp. 77- 92 ,(2012) , 10.1007/978-3-642-23250-3_5
John McDonough, Matthias Woelfel, Distant Speech Recognition ,(2009)
Jaume Padrell, Climent Nadeu, Dusan Macho, Pere Pujol, Speech recognition experiments with the SPEECON database using several robust front-ends. conference of the international speech communication association. ,(2004)
Masato Nakayama, Yuki Denda, Takanobu Nishiura, Yoshiki Hirano, Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria. conference of the international speech communication association. pp. 1082- 1085 ,(2007)