ENHANCEMENT OF SPEECH INTELLIGIBILITY USING SPEECH TRANSIENTS EXTRACTED BY A WAVELET PACKET-BASED REAL-TIME ALGORITHM

作者: Daniel Motlotle Rasetshwane

DOI:

关键词:

摘要: Studies have shown that transient speech, which is associated with consonants, transitions between consonants and vowels, within some an important cue for identifying discriminating speech sounds. However, compared to the relatively steady-state vowel segments of has much lower energy thus easily masked by background noise. Emphasis can improve intelligibility in noise, but methods demonstrate this improvement either identified manually or proposed algorithms cannot be implemented run real-time.We developed algorithm automatically extract real-time. The involves use a function, we term transitivity characterize rate change wavelet coefficients packet transform representation signal. function large positive when signal changing rapidly small steady state. Two different definitions one based on short-time other Mel-frequency cepstral coefficients, were evaluated experimentally, MFCC-based produced better results. extracted used create modified combining it original speech.To facilitate comparison our processed using researcher emphasize transients, three indices. indices are extent modification/processing method emphasizes (1) particular region (2) relative to, (3) onsets offsets formants formant. These very useful because they quantify differences signals difficult show spectrograms, spectra time-domain waveforms.The extraction includes parameters varied influence speech. best values these selected psycho-acoustic testing. Measurements noise testing showed was more intelligible than especially at high levels (-20 -15 dB). incorporation identifies boosts unvoiced into does not result additional improvements.

参考文章(74)
Beth Logan, Mel frequency cepstral coefficients for music modeling international symposium/conference on music information retrieval. ,(2000)
J. R. Boston, Sungyub Yoo, Speech decomposition and enhancement University of Pittsburgh. ,(2005)
Ian B. Thomas, Russell J. Niederjohn, Enhancement of Speech Intelligibility at High Noise Levels by Filtering and Clipping Journal of The Audio Engineering Society. ,vol. 16, pp. 412- 415 ,(1968)
Gilbert G. Walter, Wavelets and other orthogonal systems with applications CRC Press. ,(1994)
Gunnar Fant, Speech sounds and features ,(1973)
Amro El-Jaroudi, Kristie Kovacyk, Susan Shaiman, Sungyub Yoo, John D. Durrant, Stacey Karn, J. Robert Boston, Ching-Chung Li, Relative energy and intelligibility of transient speech components european signal processing conference. pp. 1031- 1034 ,(2004) , 10.5281/ZENODO.38667
E. Aschkenasy, T. W. Parsons, M. R. Weiss, Study and Development of the INTEL Technique for Improving Speech Intelligibility nsc. ,(1975)
K.L. Brown, V.R. Algazi, Characterization of spectral transitions with applications to acoustic sub-word segmentation and automatic speech recognition International Conference on Acoustics, Speech, and Signal Processing. pp. 104- 107 ,(1989) , 10.1109/ICASSP.1989.266374