Representative clustering of uncertain data

作者: Andreas Züfle , Tobias Emrich , Klaus Arthur Schmid , Nikos Mamoulis , Arthur Zimek

DOI: 10.1145/2623330.2623725

关键词:

摘要: This paper targets the problem of computing meaningful clusterings from uncertain data sets. Existing methods for clustering compute a single without any indication its quality and reliability; thus, decisions based on their results are questionable. In this paper, we describe framework, possible-worlds semantics; when applied an dataset, it computes set representative clusterings, each which has probabilistic guarantee not to exceed some maximum distance ground truth clustering, i.e., actual (but unknown) data. Our framework can be combined with existing algorithm is first provide guarantees about result. addition, our experimental evaluation shows that have much smaller deviation than approaches, thus reducing effect uncertainty.

参考文章(55)
Ricardo J. G. B. Campello, Davoud Moulavi, Joerg Sander, Density-Based Clustering Based on Hierarchical Density Estimates pacific-asia conference on knowledge discovery and data mining. pp. 160- 172 ,(2013) , 10.1007/978-3-642-37456-2_14
Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng, Uncertain data mining: an example in clustering location data knowledge discovery and data mining. pp. 199- 204 ,(2006) , 10.1007/11731139_24
H. Kriegel, M. Pfeifle, Hierarchical density-based clustering of uncertain data international conference on data mining. pp. 689- 692 ,(2005) , 10.1109/ICDM.2005.75
Sunil Prabhakar, Reynold Cheng, Sarvjeet Singh, U-DBMS: a database system for managing constantly-evolving data very large data bases. pp. 1271- 1274 ,(2005)
Francesco Gullo, Giovanni Ponti, Andrea Tagarelli, Clustering Uncertain Data Via K-Medoids Lecture Notes in Computer Science. pp. 229- 242 ,(2008) , 10.1007/978-3-540-87993-0_19
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Anja Struyf, Mia Hubert, Peter Rousseeuw, Clustering in an Object-Oriented Environment Journal of Statistical Software. ,vol. 1, pp. 1- 30 ,(1997) , 10.18637/JSS.V001.I04
Lise Getoor, Prithviraj Sen, Amol Deshpande, PrDB: managing and exploiting rich correlations in probabilistic databases very large data bases. ,vol. 18, pp. 1065- 1090 ,(2009) , 10.1007/S00778-009-0153-2
Hans-Peter Kriegel, Martin Pfeifle, Density-based clustering of uncertain data knowledge discovery and data mining. pp. 672- 677 ,(2005) , 10.1145/1081870.1081955
C.J. Veenman, M.J.T. Reinders, E. Backer, A maximum variance cluster algorithm IEEE Transactions on Pattern Analysis and Machine Intelligence. ,vol. 24, pp. 1273- 1280 ,(2002) , 10.1109/TPAMI.2002.1033218