Automatically generated spam detection based on sentence-level topic information

作者： Yoshihiko Suhara , Hiroyuki Toda , Shuichi Nishioka , Seiji Susaki

关键词:

摘要: Spammers use a wide range of content generation techniques with low quality pages known as spam to achieve their goals. We argue that must be tackled using features. In this paper, we propose novel sentence-level diversity features based on the probabilistic topic model. combine them other build classifier. Our experiments show our method outperforms conventional methods.

www2013.org LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(19)

Christopher D. Manning, Hinrich Schütze, Foundations of Statistical Natural Language Processing ,(1999)

Chris Biemann, Martin Riedl, Sweeping through the Topic Space: Bad luck? Roll again! Proceedings of the Joint Workshop on Unsupervised and Semi-Supervised Learning in NLP. pp. 19- 27 ,(2012)

Zoltán Gyöngyi, Hector Garcia-Molina, Jan Pedersen, Combating web spam with trustrank very large data bases. pp. 576- 587 ,(2004) , 10.1016/B978-012088469-8.50052-8

Chris Biemann, Martin Riedl, TopicTiling: A Text Segmentation Algorithm based on LDA meeting of the association for computational linguistics. pp. 37- 42 ,(2012)

David M Blei, Andrew Y Ng, Michael I Jordan, None, Latent dirichlet allocation Journal of Machine Learning Research. ,vol. 3, pp. 993- 1022 ,(2003) , 10.5555/944919.944937

Enrique Vallés, Paolo Rosso, Detection of near-duplicate user generated contents Proceedings of the 3rd international workshop on Search and mining user-generated contents - SMUC '11. pp. 27- 34 ,(2011) , 10.1145/2065023.2065031

T. L. Griffiths, M. Steyvers, Finding scientific topics Proceedings of the National Academy of Sciences of the United States of America. ,vol. 101, pp. 5228- 5235 ,(2004) , 10.1073/PNAS.0307752101

Juan Martinez-Romo, Lourdes Araujo, Web spam identification through language model analysis Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09. pp. 21- 28 ,(2009) , 10.1145/1531914.1531920

István Bíró, Dávid Siklósi, Jácint Szabó, András A. Benczúr, Linked latent Dirichlet allocation in web spam filtering Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web - AIRWeb '09. pp. 37- 40 ,(2009) , 10.1145/1531914.1531922

10.

Yohan Jo, Alice H. Oh, Aspect and sentiment unification model for online review analysis web search and data mining. pp. 815- 824 ,(2011) , 10.1145/1935826.1935932

Automatically generated spam detection based on sentence-level topic information

来源期刊

我的账户

Automatically generated spam detection based on sentence-level topic information

来源期刊

相似文章 8

我的账户