作者: Ye-Yi Wang , Raphael Hoffmann , Xiao Li , Jakub Szymanski
关键词:
摘要: Understanding intents from search queries can improve a user's experience and boost site's advertising profits. Query tagging via statistical sequential labeling models has been shown to perform well, but annotating the training set for supervised learning requires substantial human effort. Domain-specific knowledge, such as semantic class lexicons, reduces amount of needed manual annotations, much effort is still required maintain these topics evolve over time. This paper investigates semi-supervised algorithms that leverage structured data (HTML lists) Web automatically generate semantic-class which are used query performance - even with far less data. We focus our study on understanding correct objectives lexicon crucial success tagging. Prior work acquisition largely focused precision we show not important if lexicons A more adequate criterion should emphasize trade-off between maximizing recall instances in data, minimizing confusability. ensures similar levels observed both test set, hence prevents over-fitting features. Experimental results retail product enhancing tagger learned this objective word level errors by up 25% compared baseline does use any In contrast, obtained through precision-centric algorithm degrade baseline. Furthermore, proposed method outperforms one have extracted database.