Middleware for data mining applications on clusters and grids

作者： Leonid Glimcher , Ruoming Jin , Gagan Agrawal

DOI: 10.1016/J.JPDC.2007.06.007

关键词: Grid 、 Middleware (distributed applications) 、 Information extraction 、 Data warehouse 、 Distributed computing 、 Data stream mining 、 Node (computer science) 、 Middleware 、 Computer science 、 Data mining 、 Database 、 Transaction processing 、 Data retrieval

摘要: This paper gives an overview of two middleware systems that have been developed over the last 6 years to address challenges involved in developing parallel and distributed implementations data mining algorithms. FREERIDE (FRamework for Rapid Implementation Data Engines) focuses on a cluster environment. is based observation versions several well-known techniques share relatively similar structure, can be parallelized by dividing instances (or records or transactions) among nodes. The computation each node involves reading arbitrary order, processing instance, performing local reduction. reduction only commutative associative operations, which means result independent order are processed. After node, global performed. similarity structure exploited system execute tasks efficiently parallel, starting from high-level specification technique. To enable sets stored remote repositories, we extended into FREERIDE-G Engines Grid). supports interface scientific applications involve repositories. added functionality aims at abstracting details retrieval, movements, caching application developers.

sciencedirect.com 本地加速

sciencedirect.com PDF 下载加速

sci-hub.se PDF 下载加速

参考文章(35)

Peter Brezany, A Min Tjoa, Jürgen Hofer, Guenter Kickinger, Grid knowledge discovery processes and an architecture for their composition. Parallel and distributed computing and networks. pp. 76- 81 ,(2004)

Werner Dubitzky, Vlado Stankovski, Damian McCourt, Assaf Schuster, Michael May, Jürgen Franke, A Service-Centric Perspective for Data Mining in Complex Problem Solving Environments. parallel and distributed processing techniques and applications. pp. 780- 787 ,(2004)

Mario Cannataro, Domenico Talia, KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery ,(2002)

Ruoming Jin, Gagan Agrawal, Shared Memory Paraellization of Data Mining Algorithms: Techniques, Programming Interface, and Performance. siam international conference on data mining. pp. 77- 94 ,(2002)

Ruoming Jin, Gagan Agrawal, Shared Memory Parallelization of Decision Tree Construction Using a General Data Mining Middleware european conference on parallel processing. pp. 346- 354 ,(2002) , 10.1007/3-540-45706-2_46

John C. Shafer, Rakesh Agrawal, Manish Mehta, SPRINT: A Scalable Parallel Classifier for Data Mining very large data bases. pp. 544- 555 ,(1996)

Ruoming Jin, Gagan Agrawal, Communication and Memory Efficient Parallel Decision Tree Construction. siam international conference on data mining. pp. 119- 129 ,(2003) , 10.1137/1.9781611972733.11

John Stutz, Peter Cheeseman, Bayesian classification (AutoClass): theory and results knowledge discovery and data mining. pp. 153- 180 ,(1996)

P. Becuzzi, M. Coppola, M. Vanneschi, Mining of Association Rules in Very Large Databases: A Structured Parallel Approach european conference on parallel processing. ,vol. 1685, pp. 1441- 1450 ,(1999) , 10.1007/3-540-48311-X_204

10.

Raghu Machiraju, James E. Fowler, David Thompson, Bharat Soni, Will Schroeder, EVITA — Efficient Visualization and Interrogation of Tera-Scale Data Springer, Boston, MA. pp. 257- 279 ,(2001) , 10.1007/978-1-4615-1733-7_15

Middleware for data mining applications on clusters and grids

来源期刊

我的账户

Middleware for data mining applications on clusters and grids

来源期刊

相似文章 10

我的账户