Semi-supervised learning of semantic classes for query understanding: from the web and for the web

作者: Ye-Yi Wang , Raphael Hoffmann , Xiao Li , Jakub Szymanski

DOI: 10.1145/1645953.1645961

关键词:

摘要: Understanding intents from search queries can improve a user's experience and boost site's advertising profits. Query tagging via statistical sequential labeling models has been shown to perform well, but annotating the training set for supervised learning requires substantial human effort. Domain-specific knowledge, such as semantic class lexicons, reduces amount of needed manual annotations, much effort is still required maintain these topics evolve over time. This paper investigates semi-supervised algorithms that leverage structured data (HTML lists) Web automatically generate semantic-class which are used query performance - even with far less data. We focus our study on understanding correct objectives lexicon crucial success tagging. Prior work acquisition largely focused precision we show not important if lexicons A more adequate criterion should emphasize trade-off between maximizing recall instances in data, minimizing confusability. ensures similar levels observed both test set, hence prevents over-fitting features. Experimental results retail product enhancing tagger learned this objective word level errors by up 25% compared baseline does use any In contrast, obtained through precision-centric algorithm degrade baseline. Furthermore, proposed method outperforms one have extracted database.

参考文章(24)
Hisami Suzuki, Mamoru Komachi, Minimally Supervised Learning of Semantic Knowledge from Query Logs international joint conference on natural language processing. pp. 358- 365 ,(2008)
Ellen Riloff, Rosie Jones, Learning dictionaries for information extraction by multi-level bootstrapping national conference on artificial intelligence. pp. 474- 479 ,(1999)
John Lafferty, Xiaojin Zhu, Ronald Rosenfeld, Semi-supervised learning with graphs Carnegie Mellon University. ,(2005)
Michael Cafarella, Oren Etzioni, Daniel S. Weld, Tal Shaked, Stephen Soderland, Alexander Yates, Doug Downey, Ana-Maria Popescu, Methods for domain-independent information extraction from the web: an experimental comparison national conference on artificial intelligence. pp. 391- 398 ,(2004)
Fuchun Peng, Andrew McCallum, Accurate Information Extraction from Research Papers using Conditional Random Fields north american chapter of the association for computational linguistics. pp. 329- 336 ,(2004)
Rajeev Motwani, Terry Winograd, Lawrence Page, Sergey Brin, The PageRank Citation Ranking : Bringing Order to the Web the web conference. ,vol. 98, pp. 161- 172 ,(1999)
Richard C. Wang, Nico Schlaefer, William W. Cohen, Eric Nyberg, Automatic set expansion for list question answering Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 947- 954 ,(2008) , 10.3115/1613715.1613837
Fuchun Peng, Fangfang Feng, Andrew McCallum, Chinese segmentation and new word detection using conditional random fields Proceedings of the 20th international conference on Computational Linguistics - COLING '04. pp. 562- 568 ,(2004) , 10.3115/1220355.1220436
Dekang Lin, Patrick Pantel, Concept discovery from text Proceedings of the 19th international conference on Computational linguistics -. pp. 1- 7 ,(2002) , 10.3115/1072228.1072372
Partha Pratim Talukdar, Joseph Reisinger, Marius Paşca, Deepak Ravichandran, Rahul Bhagat, Fernando Pereira, Weakly-supervised acquisition of labeled class instances using graph random walks Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08. pp. 582- 590 ,(2008) , 10.3115/1613715.1613787