作者: Momchil Hardalov , Ivan Koychev , Preslav Nakov
DOI: 10.1007/978-3-319-44748-3_17
关键词: Information retrieval 、 Scale (social sciences) 、 World Wide Web 、 Punctuation 、 Feature set 、 Pronoun 、 Fake news 、 Social media 、 Capitalization 、 Credibility 、 Computer science
摘要: We study the problem of finding fake online news. This is an important as news questionable credibility have recently been proliferating in social media at alarming scale. As this understudied problem, especially for languages other than English, we first collect and release to research community three new balanced credible vs. datasets derived from four sources. then propose a language-independent approach automatically distinguishing news, based on rich feature set. In particular, use linguistic (n-gram), credibility-related (capitalization, punctuation, pronoun use, sentiment polarity), semantic (embeddings DBPedia data) features. Our experiments different testsets show that our model can distinguish with very high accuracy.