作者: K. Vaidyanathan , K.S. Trivedi
DOI: 10.1109/ISSRE.1999.809313
关键词:
摘要: Software systems are known to suffer from outages due transient errors. Recently, the phenomenon of "software aging", in which state software system degrades with time, has been reported (S. Garg et al., 1998). The primary causes this degradation exhaustion operating resources, data corruption and numerical error accumulation. This may eventually lead performance or crash/hang failure, both. Earlier work area detect aging estimate its effect on resources did not take into account workload. In paper, we propose a measurement-based model rate both as function time workload state. A semi-Markov reward is constructed based resource usage collected UNIX system. We first identify different states using statistical cluster analysis build state-space model. Corresponding each resource, then defined for states. solved obtain trends estimated rates time-to-exhaustion resources. With help measure, proactive fault management techniques such rejuvenation" (Y. Huang 1995) be employed prevent unexpected outages.