Acoustic modeling based on the MDL principle for speech recognition.

作者: Koichi Shinoda , Takao Watanabe

DOI:

关键词:

摘要: ACOUSTIC MODELING BASED ON THE MDLPRINCIPLE FOR SPEECH RECOGNITIONKoichi Shinoda and Takao WatanabeNEC Corp oration4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216, JAPANfshino da,watanab eg@hum.cl .nec.co.jpABSTRACTRecently context-dep endent phone units, such as tri-phones, have b een used to mo del subword units in sp eechrecognition based on Hidden Markov Mo dels (HMMs).While most metho ds employ clustering of theHMM parameters(e.g., clustering, state cluster-ing, etc.), control HMM size so avoid p o or recogni-tion accuracy due an insuciency training data, noneof them provide any e ective criterion for the optimal de-gree that should erformed. This pap erprop oses a d which is accom-plished byway phonetic decision trees theMDL optimize degree cluster-ing. Large-vo cabulary Japanese recognition exp erimentsshow obtained by this achievedthe highest among various sizesobtained with conventional approaches.1.INTRODUCTIONOver past few years, extensive studies car-ried out eaker-indep end ent eech us-ing continuous density (HMMs).It well known systems, use ofcontext-dep endent(CD) instead context-indep endent(CI) dels(monophon es), improvesrecognition accuracy[1-7].Since numb er CD usually much largerthan CI dels, using etter capturesvariations data. However, amountof aail-able data likely insucient supp ortthe large dels. It oftenimpractical prepare amount Fur-thermore, frequency app earsin di ers substantiall y set ofCD phones; case, frequencies some CDphones are small those phones do not earin even if pro-vided. often causes serious degra-dation erformance. Most systems delparameters try alleviate part problem.Various develop ed thispurp ose. First, there several choices towhich carried out; K.F. Leeet al.[1], ex-ample, Hwanget al.[2] stateclustering, Digalakis et al.[3] cluster mixture com-p onents HMMs Gaussian-mixture ob-servation densities.Second, dsto select acoustically-si mi lar clustered.Some only acoustic characteristics thedata merging ab ottom-up manner[4 , 2, 3 ]. The other ds, addi-tion, utilizea prioriknowledge similariti esbetween mostly represented deci-sion trees[1, 5, 6, 7]. In latter split-ting top-downmanner, dels.In these it imp ortant prop-erly measure es etween unitsutilizing order tob clustered.One successful approachis approach maximum-likel ihood(ML)criterion(e.g.,[7 ]).In following,for simplicity,the splitting d(top-down clustering) explained,though similar explanation also applicable themerging d(b clustering). approach,the increase likeliho calculated foreach unit set, has largestincrease selected split.However, ML one drawback. mostcase, likelihood becomes larger unitsb ecomes larger.In nal stage splitting, themo almost identical d-els without clustering. Therefore, requiresan external parameter clustering.Most limit threshold in-crease units. Thesethresholds needs optimized through series recog-nition eriments test cross-validationmetho d. These optimization pro cesses computation-ally ensive, need more no strong theo-retical justi cation.In we prop ose new whicha minimum description length(MDL) criterion, insteadof clustering.The MDLapproach[9 ] information theoretic criterion,which selecting probabilisti c delwith appropriate complexity given amountofdata. MDL select-ing split, but deciding whether tostop splitting. isneeded We apply thiscriterion tree.2.MDL CRITERIONMDL[9] provento from amongvarious probabilis tic selectsthe length thegiven d-els. When delsf1; :::;i;:::;Igis given, de-scription length,li(xN), data,f=1;:::;xNg,together underlying deliis by,l(i)=logP^(i)xN)+i2N+ logI(1)whereiis dimensionali ty (the free param-eters) deli, and^(i)is maximum es-timates parameters(i)=(1;:::;i)ofmodeli. rst term (1) co de dataxNwhen probabili stic del.

参考文章(1)
R. Schwartz, Y. Chow, S. Roucos, M. Krasner, J. Makhoul, Improved hidden Markov modeling of phonemes for continuous speech recognition international conference on acoustics, speech, and signal processing. ,vol. 9, pp. 21- 24 ,(1984) , 10.1109/ICASSP.1984.1172751