Practical Fault-Tolerance for Mobile Agents

作者: Kjetil Jacobsen

DOI:

关键词:

摘要: The amount of computational resources available on the Internet is increasing. Effectively using these for distributed computations challenging. An infrastructure called grids provides tools structuring and deploying large-scale Internet. One key problems in managing resources; based mobile agents are being advocated to solve this problem. However, be widely adopted, such must robust towards failures grid environment, thus require effective mechanisms agent fault-tolerance. To gain insight how applications perform Internet, dissertation investigates two master-worker algorithms, one group communication message flooding. Both algorithms executed simulations traces. results from running evaluating used infer requirements our fault-tolerance approach. This then derives a fault-tolerant protocol. protocol rooted primary-backup approach, where set backups monitor progress during computation. allows changed computation adapt current network topology. describes an implementation top platform, evaluates performance show that explicit management can beneficial performance, applicable outside scope computations.

参考文章(175)
E. Zayas, Attacking the process migration bottleneck symposium on operating systems principles. ,vol. 21, pp. 13- 24 ,(1987) , 10.1145/37499.37503
George C. Necula, Proof-carrying code symposium on principles of programming languages. pp. 106- 119 ,(1997) , 10.1145/263699.263712
Michael J. Fischer, Nancy A. Lynch, Michael S. Paterson, Impossibility of distributed consensus with one faulty process Journal of the ACM. ,vol. 32, pp. 374- 382 ,(1985) , 10.1145/3149.214121
R. Guerraoui, A. Schiper, Software-based replication for fault tolerance IEEE Computer. ,vol. 30, pp. 68- 74 ,(1997) , 10.1109/2.585156
S Mishra, L L Peterson, R D Schlichting, Consul: a communication substrate for fault-tolerant distributed programs Distributed Systems Engineering. ,vol. 1, pp. 87- 103 ,(1992) , 10.1088/0967-1846/1/2/004
Jeremy Sussman, Keith Marzullo, The Bancomat problem: an example of resource allocation in a partitionable asynchronous system international symposium on distributed computing. ,vol. 291, pp. 103- 131 ,(2003) , 10.1016/S0304-3975(01)00398-X
J. Baumann, F. Hohl, K. Rothermel, M. Straßer, Mole – Concepts of a mobile agent system international conference on mobile technology, applications, and systems. ,vol. 1, pp. 535- 554 ,(1999) , 10.1023/A:1019211714301
Omar Bakr, Idit Keidar, None, Evaluating the running time of a communication round over the internet Proceedings of the twenty-first annual symposium on Principles of distributed computing - PODC '02. pp. 243- 252 ,(2002) , 10.1145/571825.571864
Peter J. Denning, Douglas E Comer, D. Gries, Michael C. Mulder, Allen Tucker, A. Joe Turner, Paul R Young, None, Computing As a Discipline ,(1989)
Dag Johansen, Gunnar Hartvigsen, Convenient abstractions in stormcast applications Proceedings of the 6th workshop on ACM SIGOPS European workshop Matching operating systems to application needs - EW 6. pp. 11- 16 ,(1994) , 10.1145/504390.504394