Adding Semantics to Email Clustering

作者： Hua Li , Dou Shen , Benyu Zhang , Zheng Chen , Qiang Yang

DOI: 10.1109/ICDM.2006.16

关键词:

摘要: This paper presents a novel algorithm to cluster emails according their contents and the sentence styles of subject lines. In our algorithm, natural language processing techniques frequent itemset mining are utilized automatically generate meaningful generalized patterns (GSPs) from subjects emails. Then we put forward unsupervised approach which treats GSPs as pseudo class labels conduct email clustering in supervised manner, although no human labeling is involved. Our proposed not only expected improve performance, it can also provide descriptions resulted clusters by GSPs. Experimental results on open dataset (Enron dataset) personal collected ourselves demonstrate that outperforms K-means terms popular measurement F1. Furthermore, naming readability improved 68.5% dataset.

参考文章(5)

Andrew McCallum, Ron Bekkerman, Gary Huang, Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora ,(2005)

Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal, Discovering Frequent Closed Itemsets for Association Rules international conference on database theory. ,vol. 1540, pp. 398- 416 ,(1999) , 10.1007/3-540-49257-7_25

Gilles Celeux, Gérard Govaert, Comparison of the mixture and the classification maximum likelihood in cluster analysis Journal of Statistical Computation and Simulation. ,vol. 47, pp. 127- 146 ,(1993) , 10.1080/00949659308811525

Kamal Nigam, Andrew Kachites McCallum, Sebastian Thrun, Tom Mitchell, Text Classification from Labeled and Unlabeled Documents using EM Machine Learning. ,vol. 39, pp. 103- 134 ,(2000) , 10.1023/A:1007692713085

Jianyong Wang, Jiawei Han, Jian Pei, CLOSET+: searching for the best strategies for mining frequent closed itemsets knowledge discovery and data mining. pp. 236- 245 ,(2003) , 10.1145/956750.956779

Adding Semantics to Email Clustering

来源期刊

我的账户

Adding Semantics to Email Clustering

来源期刊

相似文章 10

我的账户