Clustering deep web databases semantically

作者: Ling Song , Po Yan , Li Lian , Dongmei Zhang , Jun Ma

DOI: 10.5555/1786374.1786422

关键词: Fuzzy clusteringSemantic similarityk-means clusteringDatabaseComputer scienceVector space modelCluster analysisData miningSimilarity (network science)Rand indexCosine similarity

摘要: Deep Web database clustering is a key operation in organizing resources. Cosine similarity Vector Space Model (VSM) used as the computation traditional ways. However it cannot denote semantic between contents of two databases. In this paper how to cluster databases semantically discussed. Firstly, fuzzy measure, which integrates ontology and set theory compute visible features forms, proposed, then hybrid Particle Swarm Optimization (PSO) algorithm provided for clustering. Finally results are evaluated according Average Similarity Document Cluster Centroid (ASDC) Rand Index (RI). Experiments show that: 1) PSO approach has higher ASDC values than those based on K-Means approaches. It means intra lowest inter similarity; 2) have RI cosine similarity. reflects conclusion that can explore latent semantics.

参考文章(26)
Xiaodan Zhang, Liping Jing, Xiaohua Hu, Michael Ng, Xiaohua Zhou, A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering Advances in Databases: Concepts, Systems and Applications. pp. 115- 126 ,(2007) , 10.1007/978-3-540-71703-4_12
Hinrich Schütze, Christopher D. Manning, Prabhakar Raghavan, Introduction to Information Retrieval ,(2005)
Stephan Bloehdorn, Philipp Cimiano, Andreas Hotho, Learning Ontologies to Improve Text Clustering and Classification GfKl. pp. 334- 341 ,(2006) , 10.1007/3-540-31314-1_40
D.W. van der Merwe, A.P. Engelbrecht, Data clustering using particle swarm optimization congress on evolutionary computation. ,vol. 1, pp. 215- 220 ,(2003) , 10.1109/CEC.2003.1299577
Bin He, Tao Tao, Kevin Chen-Chuan Chang, Organizing structured web sources by query schemas: a clustering approach conference on information and knowledge management. pp. 22- 31 ,(2004) , 10.1145/1031171.1031178
Werasak Kurutach, Surat Srinoy, Combination Artificial Ant Clustering and K-PSO Clustering Approach to Network Security Model international conference on hybrid information technology. ,vol. 2, pp. 128- 134 ,(2006) , 10.1109/ICHIT.2006.94
Shi M. Shan, Data Clustering using Hybridization of Clustering Based on Grid and Density with PSO international conference on service operations and logistics, and informatics. pp. 868- 872 ,(2006) , 10.1109/SOLI.2006.235863
Yanbo Ru, Ellis Horowitz, Indexing the invisible web: a survey Online Information Review. ,vol. 29, pp. 249- 265 ,(2005) , 10.1108/14684520510607579
L.A. Zadeh, Similarity relations and fuzzy orderings Information Sciences. ,vol. 3, pp. 177- 200 ,(1971) , 10.1016/S0020-0255(71)80005-1
L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility Fuzzy Sets and Systems. ,vol. 100, pp. 9- 34 ,(1999) , 10.1016/S0165-0114(99)80004-9