Automatic identification and removal of low quality online information

作者: Calton Pu , Steve Webb

DOI:

关键词:

摘要: The advent of the Internet has generated a proliferation online information-rich environments, which provide information consumers with an unprecedented amount freely available information. However, openness these environments also made them vulnerable to new class attacks called Denial Information (DoI) attacks. Attackers launch by deliberately inserting low quality into promote that or deny access high These directly threaten usefulness and dependability as result, important research question is how automatically identify remove this from environments. first contribution thesis set techniques for recognizing countering various forms DoI in email systems. We develop attack based on camouflaged messages, we show spam producers are entrenched arms race. To break free race, propose two solutions. One solution involves refining statistical learning process associating disproportionate weights legitimate features, other leverages existence non-textual features (e.g., URLs) make classification more resilient against second framework collecting, analyzing, classifying examples World Wide Web. fully automatic Web collection technique use it create Webb Spam Corpus—a first-of-its-kind, large-scale, publicly data set. Then, perform large-scale characterization using content HTTP session analysis. Next, present lightweight, predictive approach relies exclusively final detect help prevent within social First, detailed descriptions each novel capturing spam, our collected spammers their behaviors.

参考文章(90)
Daniel Lowd, Christopher Meek, Good Word Attacks on Statistical Spam Filters. conference on email and anti-spam. ,(2005)
Baoning Wu, Brian D. Davison, Cloaking and Redirection: A Preliminary Study. adversarial information retrieval on the web. pp. 7- 16 ,(2005)
Ion Androutsopoulos, Eirinaios Michelakis, E. Michelakis, Georgios Paliouras, Learning to Filter Unsolicited Commercial E-Mail ,(2006)
Shyhtsun Felix Wu, Gregory L. Wittel, On Attacking Statistical Spam Filters. conference on email and anti-spam. ,(2004)
Chad Verbowski, Jeffrey Wang, Yi-Min Wang, Doug Beck, Brad Daniels, Strider typo-patrol: discovery and analysis of systematic typo-squatting conference on steps to reducing unwanted traffic on internet. pp. 5- 5 ,(2006)
Gopalakrishnan Seshadrinathan, Anthony Penta, Geoff Hulten, Manav Mishra, Trends in Spam Products and Methods. conference on email and anti-spam. ,(2004)
William W. Cohen, Learning Rules that Classify E-Mail ,(1996)
Yiming Yang, Bryan Klimt, Introducing the Enron Corpus. conference on email and anti-spam. ,(2004)
Wen-tau Yih, Joshua Goodman, Online Discriminative Spam Filter Training. conference on email and anti-spam. ,(2006)
Károly Csalogány, András A. Benczúr, Tamás Sarlós, Máté Uher, SpamRank -- Fully Automatic Link Spam Detection. adversarial information retrieval on the web. pp. 25- 38 ,(2005)