Analyzing various topic modeling approaches for Domain-Specific language model

作者: Disha Kaur Phull , G. Bharadwaja Kumar

DOI: 10.1109/NETACT.2017.8076743

关键词: Topic modelSemanticsLanguage modelComputer scienceLatent semantic indexingModeling languageNatural language processingPachinko allocationHierarchical Dirichlet processLatent Dirichlet allocationDomain-specific languageArtificial intelligenceMachine translationSearch engine indexing

摘要: In recent times, topic modeling approaches for adaptive language have been extensively explored Natural Language Processing applications such as machine translation, speech recognition etc. model is extremely fragile in adapting towards the required domain, so it needs to be channeled an area or a producing optimal results. This paves need investigate various which are used infer knowledge from large corpora. this paper, we mileage techniques include Latent Semantic Indexing, Dirichlet Allocation and Hierarchical Process. process, baseline dynamically adapted different topics results analyzed these three approaches.

参考文章(23)
Sadaoki Furui, Koji Iwano, Koichi Shinoda, Haruo Yokota, Hiroki Yamazaki, Dynamic language model adaptation using presentation slides for lecture speech recognition. conference of the international speech communication association. pp. 2349- 2352 ,(2007)
Ronald Baecker, Gerald Penn, Cosmin Munteanu, Web-based language modelling for automatic lecture transcription. conference of the international speech communication association. pp. 2353- 2356 ,(2007)
Abhinav Sethy, Panayiotis G. Georgiou, Shrikanth S. Narayanan, Building topic specific language models from webdata using competitive models. conference of the international speech communication association. pp. 1293- 1296 ,(2005)
Scott Novotney, Richard Schwartz, Sanjeev Khudanpur, Getting more from automatic transcripts for semi-supervised language modeling Computer Speech & Language. ,vol. 36, pp. 93- 109 ,(2016) , 10.1016/J.CSL.2015.08.007
Thierry Murgue, Colin de la Higuera, Distances between Distributions: Comparing Language Models Lecture Notes in Computer Science. pp. 269- 277 ,(2004) , 10.1007/978-3-540-27868-9_28
Andreas Stolcke, SRILM – An Extensible Language Modeling Toolkit conference of the international speech communication association. ,(2002)
David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937
Tomáš Brychcín, Miloslav Konopík, Latent semantics in language models Computer Speech & Language. ,vol. 33, pp. 88- 108 ,(2015) , 10.1016/J.CSL.2015.01.004
Antonio Toral, Pavel Pecina, Longyue Wang, Josef van Genabith, Linguistically-augmented perplexity-based data selection for language models Computer Speech & Language. ,vol. 32, pp. 11- 26 ,(2015) , 10.1016/J.CSL.2014.10.002
Thomas K Landauer, Latent Semantic Analysis Encyclopedia of Cognitive Science. ,(2006) , 10.1002/0470018860.S00561