Domain-specific Noisy Query Correction using Linguistic Network Community Detection

作者: Sangameshwar Patil

DOI: 10.1145/3366424.3382731

关键词: SpellingInformation retrievalShort Message ServiceFocus (computing)Social network analysisDomain (software engineering)Set (abstract data type)Task (computing)Computer scienceSearch engine

摘要: Noisy queries pose an important challenge for retrieving relevant search results. The importance query correction increases with increasing use of hand-held devices and technologies such as SMS, tweets to access information. task is further complicated domain-specific engines the amount logs may be significantly smaller than general purpose engines. In this paper, we propose community detection technique from social network analysis spelling a set noisy SMS messages. We focus on identifying questions Frequently Asked Questions (FAQ) different domains incoming queries. Experimental validation shows that proposed CD-Speller method performs better Hunspell, popular industry-strength tool.

参考文章(4)
Dan Jurafsky, James H. Martin, Speech and Language Processing ,(1999)
Monojit Choudhury, Rahul Saraf, Vijit Jain, Animesh Mukherjee, Sudeshna Sarkar, Anupam Basu, Investigation and modeling of the structure of texting language analytics for noisy unstructured text data. ,vol. 10, pp. 157- 174 ,(2007) , 10.1007/S10032-007-0054-0
Govind Kothari, Sumit Negi, Tanveer A. Faruquie, Venkatesan T. Chakaravarthy, L. Venkata Subramaniam, SMS based Interface for FAQ Retrieval international joint conference on natural language processing. pp. 852- 860 ,(2009) , 10.3115/1690219.1690266
Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, Etienne Lefebvre, Fast unfolding of communities in large networks Journal of Statistical Mechanics: Theory and Experiment. ,vol. 2008, pp. 10008- ,(2008) , 10.1088/1742-5468/2008/10/P10008