作者: Sarah Jane Delany , Pádraig Cunningham
DOI: 10.1007/978-3-540-28631-8_11
关键词: Computer science 、 Case base 、 Redundancy (engineering) 、 Electronic mail 、 Machine learning 、 Concept drift 、 Artificial intelligence
摘要: Because of the volume spam email and its evolving nature, any deployed Machine Learning- based filtering system will need to have procedures for case-base maintenance. Key this be edit remove noise eliminate redundancy. In paper we present a two stage process do this. We new reduction algorithm called Blame-Based Noise Reduction that removes cases are observed cause misclassification. also an Conservative Redundancy is much less aggressive than state-of-the-art alternatives has significantly better generalisation performance in domain. These techniques evaluated against literature on four datasets 1000 emails each (50% 50% non spam).