作者: Haixun Wang , Wei Wang , Jiong Yang , Philip S. Yu
关键词: Clustering high-dimensional data 、 Closeness 、 Fuzzy clustering 、 Artificial intelligence 、 Data mining 、 Similarity (network science) 、 Euclidean distance 、 Computer science 、 Pattern recognition 、 Collaborative filtering 、 Set (abstract data type) 、 Data set 、 Cluster analysis
摘要: Clustering is the process of grouping a set objects into classes similar objects. Although definitions similarity vary from one clustering model to another, in most these models concept based on distances, e.g., Euclidean distance or cosine distance. In other words, are required have close values at least dimensions. this paper, we explore more general type similarity. Under pCluster proposed, two if they exhibit coherent pattern subset For instance, DNA microarray analysis, expression levels genes may rise and fall synchronously response environmental stimuli. magnitude their not be close, patterns can very much alike. Discovery such clusters essential revealing significant connections gene regulatory networks. E-commerce applications, as collaborative filtering, also benefit new model, which captures only closeness certain leading indicators but (purchasing, browsing, etc.) exhibited by customers. Our paper introduces an effective algorithm detect clusters, perform tests several real synthetic data sets show its effectiveness.