作者: Xiao He
DOI:
关键词: Machine learning 、 Succinct data structure 、 Cluster analysis 、 Feature vector 、 Data mining 、 Computer science 、 Molecule mining 、 Complex data type 、 Data type 、 Data stream mining 、 Artificial intelligence 、 Data structure
摘要: Due to the increasing power of data acquisition and storage technologies, a large amount sets with complex structure are collected in era explosion. Instead simple representations by low-dimensional numerical features, such sources range from high-dimensional feature spaces graph describing relationships among objects. Many techniques exist literature for mining but only few approaches touch challenge data, as vectors non-numerical type, time series graphs, multi-instance where each object is represented finite set vectors. Besides, there many important tasks clustering, outlier detection, dimensionality reduction, similarity search, classification, prediction result interpretation. algorithms have been proposed solve these separately, although some cases they closely related. Detecting exploiting them another challenge. This thesis aims challenges order gain new knowledge data. We propose several combining different acquire novel data: ROCAT (Relevant Overlapping Subspace Clusters on Categorical Data) automatically detects most relevant overlapping subspace clusters categorical data. It integrates selection pattern without any input parameters an information theoretic way. The next algorithm MSS (Multiple Selection) finds multiple subspaces moderately exhibiting interesting cluster structure. For better interpretation results, visualizes hierarchical SCMiner (Summarization-Compression Miner) focuses bipartite which co-clustering, summarization, link prediction, discovery hidden basis compression. Finally, we measure Probabilistic Integral Metric (PIM) based probabilistic generative model requiring assumptions. Experiments demonstrate effectiveness efficiency PIM search (multi-instance indexing M-tree), explorative analysis classification). To sum up, various types structures discover behind