Frequent term-based text clustering

作者: Florian Beil , Martin Ester , Xiaowei Xu

DOI: 10.1145/775047.775110

关键词:

摘要: Text clustering methods can be used to structure large sets of text or hypertext documents. The well-known clustering, however, do not really address the special problems clustering: very high dimensionality data, size databases and understandability cluster description. In this paper, we introduce a novel approach which uses frequent item (term) for clustering. Such efficiently discovered using algorithms association rule mining. To based on term sets, measure mutual overlap with respect supporting We present two term-based FTC creates flat clusterings HFTC hierarchical An experimental evaluation classical documents as well web demonstrates that proposed obtain comparable quality significantly more than state-of-the- art algorithms. Furthermore, our provide an understandable description clusters by their sets.

参考文章(12)
Ramakrishnan Srikant, Rakesh Agrawal, Fast algorithms for mining association rules very large data bases. pp. 580- 592 ,(1998)
Ramakrishnan Srikant, Rakesh Agrawal, Fast Algorithms for Mining Association Rules in Large Databases very large data bases. pp. 487- 499 ,(1994)
Maria-Luiza Antonie, Osmar R. Zaïane, Classifying text documents by associating terms with text categories australasian database conference. ,vol. 24, pp. 215- 222 ,(2002) , 10.1145/563932.563930
George Karypis, Michael Steinbach, Vipin Kumar, A Comparison of Document Clustering Techniques ,(2000)
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter/Gather: a cluster-based approach to browsing large document collections international acm sigir conference on research and development in information retrieval. ,vol. 51, pp. 318- 329 ,(1992) , 10.1145/3130348.3130362
Soumen Chakrabarti, Data mining for hypertext ACM SIGKDD Explorations Newsletter. ,vol. 1, pp. 1- 11 ,(2000) , 10.1145/846183.846187
Jochen Hipp, Ulrich Güntzer, Gholamreza Nakhaeizadeh, Algorithms for association rule mining — a general survey and comparison Sigkdd Explorations. ,vol. 2, pp. 58- 64 ,(2000) , 10.1145/360402.360421
Bjornar Larsen, Chinatsu Aone, None, Fast and effective text mining using linear-time document clustering knowledge discovery and data mining. pp. 16- 22 ,(1999) , 10.1145/312129.312186
Oren Zamir, Oren Etzioni, Web document clustering: a feasibility demonstration international acm sigir conference on research and development in information retrieval. pp. 46- 54 ,(1998) , 10.1145/290941.290956
Eui-Hong Han, Daniel Boley, Maria Gini, Robert Gross, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, Jerome Moore, None, WebACE: a Web agent for document categorization and exploration adaptive agents and multi-agents systems. pp. 408- 415 ,(1998) , 10.1145/280765.280872