A heterogeneous computing system for data mining workflows in multi-agent environments

作者: Ping Luo , Kevin Lü , Rui Huang , Qing He , Zhongzhi Shi

DOI: 10.1111/J.1468-0394.2006.00408.X

关键词:

摘要: The computing-intensive data mining (DM) process calls for the support of a heterogeneous computing system, which consists multiple computers with different configurations connected by high-speed large-area network increased computational power and resources. DM can be described as multi-phase pipeline process, in each phase there could many optional methods. This makes workflow very complex it modeled only directed acyclic graph (DAG). A system needs an effective efficient scheduling framework, orchestrates all hardware to perform competitive workflows. Motivated need practical solution problem workflow, this paper proposes dynamic DAG algorithm according characteristics execution time estimation model jobs. Based on approximate job time, first maps jobs machines decentralized diligent (defined paper) manner. Then performance initial mapping improved through migrations when necessary. heuristic used considers factors both minimal completion criterion critical path DAG. We implement established multi-agent environment, reuse existing algorithms is achieved encapsulating them into agents. evaluation its usage oil well logging analysis are also discussed.

参考文章(22)
Ping Luo, Kevin Lü, Qing He, Zhongzhi Shi, A Heterogeneous Computing System for Data Mining Workflows Flexible and Efficient Information Handling. pp. 177- 189 ,(2006) , 10.1007/11788911_15
Domenico Talia, Paolo Trunfio, Oreste Verta, Weka4WS: A WSRF-Enabled Weka Toolkit for Distributed Data Mining on Grids Knowledge Discovery in Databases: PKDD 2005. pp. 309- 320 ,(2005) , 10.1007/11564126_32
Zhongzhi Shi, Qiujian Sheng, Zhikung Zhao, Yuncheng Jiang, Yong Cheng, Haijun Zhang, MAGE: An Agent-Oriented Programming Environment ieee international conference on cognitive informatics. pp. 250- 257 ,(2004) , 10.1109/ICCI.2004.20
S. Orlando, P. Palmerini, R. Perego, F. Silvestri, Scheduling High Performance Data Mining Tasks on a Data Grid Environment european conference on parallel processing. ,vol. 2400, pp. 375- 384 ,(2002) , 10.1007/3-540-45706-2_49
Shonali Krishnaswamy, Arkady Zaslavsky, Seng Wai Loke, Supporting the optimisation of distributed data mining by predicting application run times international conference on enterprise information systems. pp. 142- 149 ,(2003)
M. Iverson, F. Ozguner, Dynamic, competitive scheduling of multiple DAGs in a distributed heterogeneous environment Proceedings Seventh Heterogeneous Computing Workshop (HCW'98). pp. 70- 78 ,(1998) , 10.1109/HCW.1998.666546
A.S. Ali, O.F. Rana, I.J. Taylor, Web services composition for distributed data mining international conference on parallel processing. pp. 11- 18 ,(2005) , 10.1109/ICPPW.2005.87