Ameliorated language modelling for lecture speech recognition of Indian English

作者: Disha Kaur Phull , G Bharadwaja Kumar

DOI: 10.1007/S12046-018-0976-X

关键词: PerplexityWord error rateTranscription (software)Language modelSpeech recognitionSphinxIndian EnglishOut of vocabularyComputer scienceLanguage modelling

摘要: A great amount of research is growing towards the automatic transcription lectures that consist numerous information and knowledge could be helpful to educational systems institutes. In large vocabulary speech recognition, language model plays a paramount role in reducing humongous search space. However, modelling very brittle when moving from one domain another or read spontaneous speech. Also, lecture recognition will have some characteristics Hence, it challenging build for this task. paper, judicious approach adapt way where close proximity topic spoken has been depicted. The evaluation devised using proposed with existing models such as CMU Sphinx, Gigaword HUB-4. We observed results analysis outperform terms word error rate, perplexity out rate. Analysis shows presented two-phase resulted an average decrease rate approximately 14% decreased by half on average.

参考文章(30)
Sadaoki Furui, Koji Iwano, Koichi Shinoda, Haruo Yokota, Hiroki Yamazaki, Dynamic language model adaptation using presentation slides for lecture speech recognition. conference of the international speech communication association. pp. 2349- 2352 ,(2007)
Vesa Siivola, Mikko Kurimo, Mathias Creutz, Morfessor and VariKN machine learning tools for speech and language technology conference of the international speech communication association. pp. 1549- 1552 ,(2007)
Ronald Baecker, Gerald Penn, Cosmin Munteanu, Web-based language modelling for automatic lecture transcription. conference of the international speech communication association. pp. 2353- 2356 ,(2007)
Abhinav Sethy, Panayiotis G. Georgiou, Shrikanth S. Narayanan, Building topic specific language models from webdata using competitive models. conference of the international speech communication association. pp. 1293- 1296 ,(2005)
John C. Wells, Accents of English Cambridge University Press. ,(1982) , 10.1017/CBO9780511611759
Scott Novotney, Richard Schwartz, Sanjeev Khudanpur, Getting more from automatic transcripts for semi-supervised language modeling Computer Speech & Language. ,vol. 36, pp. 93- 109 ,(2016) , 10.1016/J.CSL.2015.08.007
R. Kneser, H. Ney, Improved backing-off for M-gram language modeling international conference on acoustics, speech, and signal processing. ,vol. 1, pp. 181- 184 ,(1995) , 10.1109/ICASSP.1995.479394
Tomáš Brychcín, Miloslav Konopík, Latent semantics in language models Computer Speech & Language. ,vol. 33, pp. 88- 108 ,(2015) , 10.1016/J.CSL.2015.01.004
Antonio Toral, Pavel Pecina, Longyue Wang, Josef van Genabith, Linguistically-augmented perplexity-based data selection for language models Computer Speech & Language. ,vol. 32, pp. 11- 26 ,(2015) , 10.1016/J.CSL.2014.10.002
Md. Akmal Haidar, Douglas O'Shaughnessy, Unsupervised language model adaptation using LDA-based mixture models and latent semantic marginals Computer Speech & Language. ,vol. 29, pp. 20- 31 ,(2015) , 10.1016/J.CSL.2014.06.002