作者: Leonid Glimcher , Ruoming Jin , Gagan Agrawal
DOI: 10.1016/J.JPDC.2007.06.007
关键词: Grid 、 Middleware (distributed applications) 、 Information extraction 、 Data warehouse 、 Distributed computing 、 Data stream mining 、 Node (computer science) 、 Middleware 、 Computer science 、 Data mining 、 Database 、 Transaction processing 、 Data retrieval
摘要: This paper gives an overview of two middleware systems that have been developed over the last 6 years to address challenges involved in developing parallel and distributed implementations data mining algorithms. FREERIDE (FRamework for Rapid Implementation Data Engines) focuses on a cluster environment. is based observation versions several well-known techniques share relatively similar structure, can be parallelized by dividing instances (or records or transactions) among nodes. The computation each node involves reading arbitrary order, processing instance, performing local reduction. reduction only commutative associative operations, which means result independent order are processed. After node, global performed. similarity structure exploited system execute tasks efficiently parallel, starting from high-level specification technique. To enable sets stored remote repositories, we extended into FREERIDE-G Engines Grid). supports interface scientific applications involve repositories. added functionality aims at abstracting details retrieval, movements, caching application developers.