CluSandra: A Framework and Algorithm for Data Stream Cluster Analysis

作者: Josh R , Eman M.

DOI: 10.14569/IJACSA.2011.021115

关键词: Artificial intelligenceAlgorithmCURE data clustering algorithmKnowledge extractionMachine learningData miningComputer scienceData stream miningClustering high-dimensional dataCluster analysisData stream clusteringData stream

摘要: The clustering or partitioning of a dataset’s records into groups similar is an important aspect knowledge discovery from datasets. A considerable amount research has been applied to the identification clusters in very large multi-dimensional and static However, traditional and/or pattern recognition algorithms that have resulted this are inefficient for data streams. stream dynamic dataset characterized by sequence evolves over time, extremely fast arrival rates unbounded. Today, world abounds with processes generate high-speed evolving Examples include click streams, credit card transactions sensor networks. stream’s inherent characteristics present interesting set time space related challenges algorithms. In particular, processing severely constrained must be performed single pass incoming data. This paper presents both framework algorithm that, combined, address these allows end-users explore gain Our approach includes integration open source products used control facilitate harnessing stream. Experimental results testing various streams also discussed.

参考文章(16)
João Gama, Pedro Pereira Rodrigues, Learning from Data Streams Encyclopedia of Data Warehousing and Mining. pp. 1137- 1141 ,(2007) , 10.4018/978-1-60566-010-3.CH176
Ryan Breidenbach, Craig Walls, Spring in Action ,(2004)
Mohamed Medhat Gaber, Joao Gama, Learning from Data Streams: Processing Techniques in Sensor Networks Springer. ,(2007)
Geoff Hulten, Laurie Spencer, Pedro Domingos, Mining time-changing data streams knowledge discovery and data mining. pp. 97- 106 ,(2001) , 10.1145/502512.502529
Pedro Domingos, Geoff Hulten, Mining high-speed data streams knowledge discovery and data mining. pp. 71- 80 ,(2000) , 10.1145/347090.347107
Tian Zhang, Raghu Ramakrishnan, Miron Livny, BIRCH: an efficient data clustering method for very large databases international conference on management of data. ,vol. 25, pp. 103- 114 ,(1996) , 10.1145/233269.233324
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, A framework for projected clustering of high dimensional data streams very large data bases. pp. 852- 863 ,(2004) , 10.1016/B978-012088469-8.50075-9
Micheline Kamber, Jiawei Han, Jian Pei, Data Mining: Concepts and Techniques ,(2000)
Charu C. Aggarwal, Philip S. Yu, Jiawei Han, Jianyong Wang, A framework for clustering evolving data streams very large data bases. pp. 81- 92 ,(2003) , 10.1016/B978-012722442-8/50016-1