Multi-purpose exploratory mining of complex data

作者: Xiao He

DOI:

关键词: Machine learningSuccinct data structureCluster analysisFeature vectorData miningComputer scienceMolecule miningComplex data typeData typeData stream miningArtificial intelligenceData structure

摘要: Due to the increasing power of data acquisition and storage technologies, a large amount sets with complex structure are collected in era explosion. Instead simple representations by low-dimensional numerical features, such sources range from high-dimensional feature spaces graph describing relationships among objects. Many techniques exist literature for mining but only few approaches touch challenge data, as vectors non-numerical type, time series graphs, multi-instance where each object is represented finite set vectors. Besides, there many important tasks clustering, outlier detection, dimensionality reduction, similarity search, classification, prediction result interpretation. algorithms have been proposed solve these separately, although some cases they closely related. Detecting exploiting them another challenge. This thesis aims challenges order gain new knowledge data. We propose several combining different acquire novel data: ROCAT (Relevant Overlapping Subspace Clusters on Categorical Data) automatically detects most relevant overlapping subspace clusters categorical data. It integrates selection pattern without any input parameters an information theoretic way. The next algorithm MSS (Multiple Selection) finds multiple subspaces moderately exhibiting interesting cluster structure. For better interpretation results, visualizes hierarchical SCMiner (Summarization-Compression Miner) focuses bipartite which co-clustering, summarization, link prediction, discovery hidden basis compression. Finally, we measure Probabilistic Integral Metric (PIM) based probabilistic generative model requiring assumptions. Experiments demonstrate effectiveness efficiency PIM search (multi-instance indexing M-tree), explorative analysis classification). To sum up, various types structures discover behind

参考文章(96)
Karin Kailing, Hans-Peter Kriegel, Peer Kroger, Density-Connected Subspace Clustering for High-Dimensional Data siam international conference on data mining. pp. 246- 256 ,(2004)
Andreas Stolcke, Stephen Omohundro, Inducing Probabilistic Grammars by Bayesian Model Merging international colloquium on grammatical inference. pp. 106- 118 ,(1994) , 10.1007/3-540-58473-0_141
Alexander Hinneburg, Daniel A. Keim, An efficient approach to clustering in large multimedia databases with noise knowledge discovery and data mining. pp. 58- 65 ,(1998)
Hans-Peter Kriegel, Martin Ester, Jörg Sander, Xiaowei Xu, A density-based algorithm for discovering clusters in large spatial Databases with Noise knowledge discovery and data mining. pp. 226- 231 ,(1996)
Elke Achtert, Christian Böhm, Hans-Peter Kriegel, Peer Kröger, Ina Müller-Gorman, Arthur Zimek, Finding Hierarchies of Subspace Clusters Lecture Notes in Computer Science. pp. 446- 453 ,(2006) , 10.1007/11871637_42
Asa Ben-Hur, Hava T. Siegelmann, Vladimir Vapnik, David Horn, Support vector clustering Journal of Machine Learning Research. ,vol. 2, pp. 125- 137 ,(2002) , 10.5555/944790.944807
Richard C. Dubes, Anil K. Jain, Algorithms for clustering data ,(1988)
Jan Ramon, Maurice Bruynooghe, A polynomial time computable metric between point sets Acta Informatica. ,vol. 37, pp. 765- 780 ,(2001) , 10.1007/PL00013304
Guojun Gan, Jianhong Wu, Subspace clustering for high dimensional categorical data Sigkdd Explorations. ,vol. 6, pp. 87- 94 ,(2004) , 10.1145/1046456.1046468
Don Coppersmith, Shmuel Winograd, Matrix multiplication via arithmetic progressions Journal of Symbolic Computation. ,vol. 9, pp. 251- 280 ,(1990) , 10.1016/S0747-7171(08)80013-2