Learning techniques for automatic email message tagging

作者: Tony Tam

DOI:

关键词:

摘要: Automatic organization of email messages is still a challenge in machine learning. The problem “email overload”, coined 1998 by Whittaker et al, presently affecting enterprise and power users. This thesis addresses automatic proposing solution based on supervised learning algorithms that automatically labels with tags. We approach tagging using previously created user-folders as tags top-N ranking classifier output. Learning techniques are reviewed the different fields an message analyzed for their suitability classification. Special attention given to textual (subject body), studying testing representations, feature selection methods several classification algorithms. participant evaluated work vector-space model graph representation. combined combination technique Majority Voting. Experiments done subset Enron Corpus private data set from Institute Systems Technologies Information, Control Communication (INSTICC). sets extensively order understand characteristics data. evaluation system, accuracy, shows great promise, experimental results presenting significant improvement over related works.

参考文章(57)
Mário A. T. Figueiredo, Artur J. Ferreira, Feature Transformation and Reduction for Text Classification pattern recognition in information systems. pp. 72- 81 ,(2010)
István Hegedűs, Richárd Farkas, Gábor Berend, András Kárpáti, Balázs Krich, Automatic free-text-tagging of online news archives european conference on artificial intelligence. pp. 529- 534 ,(2010)
William W. Cohen, Learning Rules that Classify E-Mail ,(1996)
Yiming Yang, Bryan Klimt, Introducing the Enron Corpus. conference on email and anti-spam. ,(2004)
V. Bellotti, S. Whittaker, P. Moody, Revisiting and reinventing email Taylor & Francis. ,(2005)
John Lafferty, Kamal Nigam, Andrew McCallum, Using Maximum Entropy for Text Classification ,(1999)
Robert Tibshirani, Trevor Hastie, Jerome H. Friedman, The Elements of Statistical Learning ,(2001)
Francisco Escolano, Pablo Suau, Boyn Bonev, Information Theory in Computer Vision and Pattern Recognition ,(2009)
Shih-Wen Ke, Chris Bowerman, Michael Oakes, PERC: A Personal Email Classifier Lecture Notes in Computer Science. pp. 460- 463 ,(2006) , 10.1007/11735106_41