作者: Tony Tam
DOI:
关键词:
摘要: Automatic organization of email messages is still a challenge in machine learning. The problem “email overload”, coined 1998 by Whittaker et al, presently affecting enterprise and power users. This thesis addresses automatic proposing solution based on supervised learning algorithms that automatically labels with tags. We approach tagging using previously created user-folders as tags top-N ranking classifier output. Learning techniques are reviewed the different fields an message analyzed for their suitability classification. Special attention given to textual (subject body), studying testing representations, feature selection methods several classification algorithms. participant evaluated work vector-space model graph representation. combined combination technique Majority Voting. Experiments done subset Enron Corpus private data set from Institute Systems Technologies Information, Control Communication (INSTICC). sets extensively order understand characteristics data. evaluation system, accuracy, shows great promise, experimental results presenting significant improvement over related works.