Filtering Offensive Language in Online Communities using Grammatical Relations

作者: Sencun Zhu , Zhi Xu

DOI:

关键词:

摘要: Offensive language has arisen to be a big issue the health of both online communities and their users. To community, spread offensive undermines its reputation, drives users away, even directly affects growth. users, viewing brings negative influence mental health, especially for children youth. When is detected in user message, problem arises about how should removed, i.e. filtering problem. solve this problem, manual approach known produce best result. However, costly time labor thus can not widely applied. In paper, we analyze text messages posted communities, propose new automatic sentence-level that able semantically remove by utilizing grammatical relations among words. Comparing with existing approaches, proposed provides results much closer filtering. demonstrate our work, created dataset manually over 11,000 comments from YouTube website. Experiments on show 90% agreement filtered between approach. Moreover, overhead applying reasonable, making it practical adopted real life applications.

参考文章(8)
Jonas Sjöbergh, Kenji Araki, A Multi-Lingual Dictionary of Dirty Words language resources and evaluation. ,(2008)
Robert MacIntyre, Karen Katz, Ann Bies, Mark Ferguson, Bracketing Guidelines For Treebank II Style Penn Treebank Project ,(1995)
Kazi Zubair Ahmed, Altaf Mahmud, Mumit Khan, Detecting flames and insults in text BRAC University. ,(2008)
Dan Klein, Christopher D. Manning, Accurate unlexicalized parsing Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - ACL '03. pp. 423- 430 ,(2003) , 10.3115/1075096.1075150
Christopher D. Manning, Marie-Catherine de Marnee, Stanford typed dependencies manual ,(2010)
Howard Rheingold, The Virtual Community ,(1993)