Clustering by pattern similarity in large data sets

作者： Haixun Wang , Wei Wang , Jiong Yang , Philip S. Yu

关键词: Clustering high-dimensional data 、 Closeness 、 Fuzzy clustering 、 Artificial intelligence 、 Data mining 、 Similarity (network science) 、 Euclidean distance 、 Computer science 、 Pattern recognition 、 Collaborative filtering 、 Set (abstract data type) 、 Data set 、 Cluster analysis

摘要: Clustering is the process of grouping a set objects into classes similar objects. Although definitions similarity vary from one clustering model to another, in most these models concept based on distances, e.g., Euclidean distance or cosine distance. In other words, are required have close values at least dimensions. this paper, we explore more general type similarity. Under pCluster proposed, two if they exhibit coherent pattern subset For instance, DNA microarray analysis, expression levels genes may rise and fall synchronously response environmental stimuli. magnitude their not be close, patterns can very much alike. Discovery such clusters essential revealing significant connections gene regulatory networks. E-commerce applications, as collaborative filtering, also benefit new model, which captures only closeness certain leading indicators but (purchasing, browsing, etc.) exhibited by customers. Our paper introduces an effective algorithm detect clusters, perform tests several real synthetic data sets show its effectiveness.

researchgate.net LINK 下载加速

sci-hub.se PDF 下载加速

参考文章(18)

George M. Church, Yizong Cheng, Biclustering of Expression Data intelligent systems in molecular biology. ,vol. 8, pp. 93- 103 ,(2000)

Ryszard S. Michalski, Robert E. Stepp, Learning from Observation: Conceptual Clustering Machine Learning. pp. 331- 363 ,(1983) , 10.1007/978-3-662-12405-5_11

Raymond T. Ng, Jiawei Han, Efficient and Effective Clustering Methods for Spatial Data Mining very large data bases. pp. 144- 155 ,(1994)

Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft, When Is ''Nearest Neighbor'' Meaningful? international conference on database theory. pp. 217- 235 ,(1999) , 10.1007/3-540-49257-7_15

Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)

Keinosuke Fukunaga, Introduction to statistical pattern recognition (2nd ed.) Academic Press Professional, Inc.. ,(1990)

H. V. Jagadish, Raymond T. Ng, J. Madar, Semantic Compression and Pattern Extraction with Fascicles very large data bases. pp. 186- 198 ,(1999) , 10.14288/1.0051612

Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, Automatic subspace clustering of high dimensional data for data mining applications Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98. ,vol. 27, pp. 94- 105 ,(1998) , 10.1145/276304.276314

Chun-Hung Cheng, Ada Waichee Fu, Yi Zhang, None, Entropy-based subspace clustering for mining numerical data knowledge discovery and data mining. pp. 84- 93 ,(1999) , 10.1145/312129.312199

10.

Charu C. Aggarwal, Joel L. Wolf, Philip S. Yu, Cecilia Procopiuc, Jong Soo Park, Fast algorithms for projected clustering international conference on management of data. ,vol. 28, pp. 61- 72 ,(1999) , 10.1145/304181.304188

Clustering by pattern similarity in large data sets

来源期刊

我的账户

Clustering by pattern similarity in large data sets

来源期刊

相似文章 10

我的账户