作者: Yafeng Ren , Jiayuan Deng , Donghong Ji
DOI: 10.1007/978-3-319-48740-3_2
关键词:
摘要: Twitter messages are written in an informal style, which hinders many information retrieval and natural language processing applications. Existing normalization systems have two major drawbacks. The first is that these methods largely require large-scale annotated training data. second assume a nonstandard token recovered to one standard word. However, there tokens should be or more words, so the problem remains highly challenging. To address above issues, we propose unsupervised system based on context similarity. proposed does not any Meanwhile, will words. Results show approach achieves state-of-the-art performance.