Joint cluster analysis of attribute and relationship data withouta-priori specification of the number of clusters

作者: Flavia Moser , Rong Ge , Martin Ester

DOI: 10.1145/1281192.1281248

关键词:

摘要: In many applications, attribute and relationship data areavailable, carrying complementary information about real world entities. such cases, a joint analysis of both types can yield more accurate results than classical clustering algorithms that either use only or (graph) data. The Connected k-Center (CkC) has been proposed as the first cluster model to discover k clusters which are cohesive on However, it is well-known prior knowledge number often unavailable in applications community dentification hotspot analysis. this paper, we introduce formalize problem discovering an a-priori unspecified context data, called X Clusters (CXC) problem. True assumed be compact distinctive from their neighboring terms internally connected Different attribute-based methods, neighborhood not defined but To efficiently solve CXC problem, present JointClust, algorithm adopts dynamic two-phase approach. phase, find so atoms. We provide probability for thisphase, gives us probabilistic guarantee, each true represented by at least one initial second these atoms merged bottom-up manner resulting dendrogram. final determined our objective function. Our experimental evaluation several datasets demonstrates JointClust indeed discovers meaningful clusterings without requiring user specify clusters.

参考文章(29)
Padhraic Smyth, Scott White, A Spectral Clustering Approach To Finding Communities in Graph. siam international conference on data mining. pp. 274- 285 ,(2005)
Robert A. Hanneman, Introduction to Social Network Methods ,(2001)
Chad Carson, Megan Thomas, Serge Belongie, Joseph M. Hellerstein, Jitendra Malik, Blobworld: A System for Region-Based Image Indexing and Retrieval Lecture Notes in Computer Science. pp. 509- 516 ,(1999) , 10.1007/3-540-48762-X_63
Christopher M. Bishop, Neural networks for pattern recognition ,(1995)
Martin Ester, Zengjian Hu, Byron J. Gao, Boaz Ben-Moshe, Rong Ge, Joint Cluster Analysis of Attribute Data and Relationship Data: the Connected k-Center Problem. siam international conference on data mining. pp. 246- 257 ,(2006)
Dan Pelleg, Andrew W. Moore, X-means: Extending K-means with Efficient Estimation of the Number of Clusters international conference on machine learning. pp. 727- 734 ,(2000)
Usama M. Fayyad, Paul S. Bradley, Refining Initial Points for K-Means Clustering international conference on machine learning. pp. 91- 99 ,(1998)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Gerard Salton, Michael J. McGill, Introduction to Modern Information Retrieval ,(1983)
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)