Criterion functions for document clustering

作者： George Karypis , Ying Zhao , Ding-Zhu Du

DOI:

关键词:

摘要: Fast and high-quality document clustering algorithms play an important role in providing intuitive navigation browsing mechanisms by organizing large amounts of information into a small number meaningful clusters. In this thesis, we focus on class that treat the problem as optimization process which seeks to maximize or minimize particular criterion function defined over entire solution. In present comprehensive study desirable characteristics feasibility various functions under different requirements raised real world applications. particular, seven global for documents datasets, three are introduced us. The first part thesis consists detailed experimental evaluation using 15 datasets partitional approaches, followed theoretical analysis functions. Our shows more robust difference cluster tightness produce balanced clusters tend perform well. new among ones achieving best overall results. We further discuss how hierarchical soft solutions. We six nine agglomerative methods twelve datasets. A algorithms, constrained algorithm, is also proposed achieves four functions, derive their soft-clustering extensions, involving analyze characteristics. Finally, extend incorporate prior knowledge natural topics existing Specifically, define topic-driven clustering, organizes collection according given set topics. propose schemes consider similarity between relationship themselves simultaneously. results show efficient effective with topic prototypes levels specificity.

acm.org LINK 下载加速

参考文章(40)

Isidore Rigoutsos, Dennis Shasha, Kaizhong Zhang, Bruce Shapiro, Xiong Wang, Jason T. L. Wang, Sitaram Dikshitulu, Automated discovery of active motifs in three dimensional molecules knowledge discovery and data mining. pp. 89- 95 ,(1997)

Xiaofeng He, Chris Ding, Ming Gu, Hongyuan Zha, Horst Simon, Spectral min-max cut for graph partitioning and data clustering ,(2001)

D. D. Lewis, Reuters-21578 Text Categorization Test Collection, Distribution 1.0 ,(1997)

Anthony K. H. Tung Jiawei Han, Michelin Kamber, Spatial clustering methods in data mining : A survey Geographic data mining and knowledge discovery. pp. 188- 217 ,(2001)

George Karypis, Bamshad Mobasher, Eui-Hong Han, Vipin Kumar, Hypergraph Based Clustering in High-Dimensional Data Sets: A Summary of Results. IEEE Data(base) Engineering Bulletin. ,vol. 21, pp. 15- 22 ,(1998)

R. C. T. Lee, Clustering Analysis and Its Applications Springer, Boston, MA. pp. 169- 292 ,(1981) , 10.1007/978-1-4613-9883-7_4

Daniel Boley, Principal Direction Divisive Partitioning Data Mining and Knowledge Discovery. ,vol. 2, pp. 325- 344 ,(1998) , 10.1023/A:1009740529316

Daniel Boley, Maria Gini, Robert Gross, Eui-Hong Han, Kyle Hastings, George Karypis, Vipin Kumar, Bamshad Mobasher, Jerome Moore, None, Document Categorization and Query Generation on the World Wide WebUsing WebACE Artificial Intelligence Review. ,vol. 13, pp. 365- 391 ,(1999) , 10.1023/A:1006592405320

Alexander Strehl, Joydeep Ghosh, A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets ieee international conference on high performance computing data and analytics. pp. 525- 536 ,(2000) , 10.1007/3-540-44467-X_48

10.

John Stutz, Peter Cheeseman, Bayesian classification (AutoClass): theory and results knowledge discovery and data mining. pp. 153- 180 ,(1996)

Criterion functions for document clustering

来源期刊

我的账户

Criterion functions for document clustering

来源期刊

相似文章 10

我的账户